Filling the Gaps in Environmental Science with Big Data

By Christina Burchette

Illustration of earth with data flowing in through spaceThere’s no doubt about it—we’re living in a data-driven age. Organizations of all kinds depend on data for things like decision making and problem solving, analyzing trends, understanding their customers, and doing research. EPA is certainly no exception.

At EPA we have a large computational science effort that focuses on predicting exposure and toxicity for the thousands of chemicals present in the environment. By combining high throughput testing methods with computational approaches, these “big data” projects aim to improve our understanding of what these exposures mean for public health and the environment.

By continually exploring new datasets and combining our datasets together, we can increase our ability to predict impacts in areas where we have little information or data. This also helps scientists identify vulnerabilities and data gaps that could benefit from additional attention or protection. This allows EPA to be more predictive and responsive of environmental and health impacts.

That’s why we’re pleased to announce that EPA has joined the National Consortium for Data Science (NCDS), which is a collaboration of leaders in various fields that work together to encourage data science research and identify data science challenges. As a member of the NCDS, EPA will have the opportunity to collaborate with other leaders in toxicity research and can incorporate cutting edge approaches in data science to build upon what we already know.

EPA already has several research projects being developed that both generate and use “big data.”

  • The Stream-Catchment (StreamCat) dataset is an extensive collection of landscape metrics for 2.6 million streams and associated catchments and watersheds within the continental United States. These data are being used by EPA to model reference conditions to which future assessments will be compared for the determination of important changes in stream and watershed condition.
  • EnviroAtlas provides interactive tools and large datasets for exploring the benefits people receive from nature, or “ecosystem goods and services.”  The available data covers the continental United States, and fine-scale data is available for selected communities.
  • The Web-based Interspecies Correlation Estimation (Web-ICE) is a user-friendly internet platform that uses several datasets to allow investigators to estimate the acute toxicity of a chemical to a species based on the known toxicity of the chemical to a surrogate species, since data is often not available or is limited for the majority of species within an ecosystem. Information on the acute toxicity to multiple species within an ecosystem is important for the assessment of the risks to individuals, populations and communities.
  • The Environmental Quality Index (EQI) is a dataset that includes an index of environmental quality based on criteria in five domains: air, water, land, build environment, and sociodemographic space that covers all 50 States at the county level. The EQI allows investigators to conduct association studies between environmental quality and specific health outcomes, such as the rate of preterm birth.  These results are useful in allowing communities to make decisions about effective public health interventions and also can direct further research to specific areas of concern.

Our partnership with NCDS is an opportunity to take on data science with the best minds and best technology possible so that we can continue to fill in data gaps. The more we know, the closer we will be to solving key problems related to air and water quality, human health, ecosystem sustainability, and more.

About the Author: Christina Burchette is an Oak Ridge Associated Universities contractor and writer for the science communication team in EPA’s Office of Research and Development.

Editor's Note: The views expressed here are intended to explain EPA policy. They do not change anyone's rights or obligations. You may share this post. However, please do not change the title or the content, or remove EPA’s identity as the author. If you do make substantive changes, please do not attribute the edited title or content to EPA or the author.

EPA's official web site is Some links on this page may redirect users from the EPA website to specific content on a non-EPA, third-party site. In doing so, EPA is directing you only to the specific content referenced at the time of publication, not to any other content that may appear on the same webpage or elsewhere on the third-party site, or be added at a later date.

EPA is providing this link for informational purposes only. EPA cannot attest to the accuracy of non-EPA information provided by any third-party sites or any other linked site. EPA does not endorse any non-government websites, companies, internet applications or any policies or information expressed therein.