Why Data About Data Matters
About the author: Molly O’Neill, EPA’s Assistant Administrator for the Office of Environmental Information and Chief Information Officer.
In my first job after college, I was an environmental biologist/analyst. I spent some of that time taking surface water, sediment, groundwater, soil, and biological samples in the field. Of course, I followed the EPA standard sampling procedures and believe me they are quite extensive – 14-hour days were common and much of that time was to ensure that the quality of the sample was not compromised. There is a lot of documentation that goes along with each sample taken. After those long days in the field, I used to think, does all this documentation really make a difference?
Last week I participated in a listening session with a stakeholder group as part of the National Dialogue for Access to Environmental Information. One of the important themes that kept coming up during the discussion was the necessity to have access to quality data. This means that the data sample and results are not compromised and that the information about the data sample is not lost or forgotten along the way. For example, a community may take water samples at a local beach for a specific place and time, and then post the results to a website. These results are then consumed by other interested parties and made available to the public in a variety of ways. The data about the data, or “metadata”, doesn’t always convey with the data set and therefore, secondary users of this data may draw the wrong conclusions. In this case, without the time/place data with the sample an assumption that a local beach is currently contaminated may not be accurate.
Along that same theme, there was concern that while new mapping tools allow almost anyone to grab data sets (including some of EPA’s) and plot them on a map, combining data sets doesn’t always make sense. Data Set A + Data Set B doesn’t necessarily = Conclusion C. These are good cautions and the takeaway for me was that while providing access is good, we need to ensure that access to the metadata is equally as important. We also need invest in describing the data set and why it is collected.
Getting back to my first job and the question about whether the documentation with a sample is important, you bet the answer is yes! If you have comments on how we might enhance access to environmental information, please checkout our National Dialogue web site.
Editor's Note: The opinions expressed in Greenversations are those of the author. They do not reflect EPA policy, endorsement, or action, and EPA does not verify the accuracy or science of the contents of the blog.