Why Data About Data Matters
About the author: Molly O’Neill, EPA’s Assistant Administrator for the Office of Environmental Information and Chief Information Officer.
In my first job after college, I was an environmental biologist/analyst. I spent some of that time taking surface water, sediment, groundwater, soil, and biological samples in the field. Of course, I followed the EPA standard sampling procedures and believe me they are quite extensive – 14-hour days were common and much of that time was to ensure that the quality of the sample was not compromised. There is a lot of documentation that goes along with each sample taken. After those long days in the field, I used to think, does all this documentation really make a difference?
Last week I participated in a listening session with a stakeholder group as part of the National Dialogue for Access to Environmental Information. One of the important themes that kept coming up during the discussion was the necessity to have access to quality data. This means that the data sample and results are not compromised and that the information about the data sample is not lost or forgotten along the way. For example, a community may take water samples at a local beach for a specific place and time, and then post the results to a website. These results are then consumed by other interested parties and made available to the public in a variety of ways. The data about the data, or “metadata”, doesn’t always convey with the data set and therefore, secondary users of this data may draw the wrong conclusions. In this case, without the time/place data with the sample an assumption that a local beach is currently contaminated may not be accurate.
Along that same theme, there was concern that while new mapping tools allow almost anyone to grab data sets (including some of EPA’s) and plot them on a map, combining data sets doesn’t always make sense. Data Set A + Data Set B doesn’t necessarily = Conclusion C. These are good cautions and the takeaway for me was that while providing access is good, we need to ensure that access to the metadata is equally as important. We also need invest in describing the data set and why it is collected.
Getting back to my first job and the question about whether the documentation with a sample is important, you bet the answer is yes! If you have comments on how we might enhance access to environmental information, please checkout our National Dialogue web site.
The views expressed here are intended to explain EPA policy. They do not change anyone's rights or obligations. You may share this post. However, please do not change the title or the content, or remove EPA’s identity as the author. If you do make substantive changes, please do not attribute the edited title or content to EPA or the author.
EPA's official web site is www.epa.gov. Some links on this page may redirect users from the EPA website to specific content on a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.
EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.