Suggested Data Sources
2009 June 14
Tell us what data sources should be in Data Finder. Or tell us what the kinds of information you need to find. We will review your suggestions and revise Data Finder accordingly.
Editor's Note: The opinions expressed in Greenversations are those of the author. They do not reflect EPA policy, endorsement, or action, and EPA does not verify the accuracy or science of the contents of the blog.
31 Responses
leave one →

For numerous suggested data sources,
take a look at the extensive appendix, table 2, in this document:
Georgopoulos P. (2008). A multiscale approach for assessing the interactions of environmental and biological systems in a holistic health risk assessment framework. Water, Air, and Soil Pollution: Focus 8(1): 3-21. DOI:10.1007/s11267-007-9137-7 [DOI link]
http://dx.doi.org/10.1007/s11267-007-9137-7
http://www.springerlink.com/content/5r8100723p704h16/fulltext.pdf
Additional suggestions for links in the data finder…..
here’s some things that NEIC uses that might be of interest:
Pacific Northwest National Laboratory Infrared Spectral Library
IRIS database (toxicological information)
NIST CHEMBOOK
Envirofacts (TRI)
RCRA Online (policy letters)
SW846 online (hazardous waste methods)
OW methods databases
Henrys Law constant database
They also use EPA Models such as Minteqa2, Screen3, ISC, Water9, IWAIR, and AERMOD, but these may not fit within the database framework of the Data Finder site.
There are some good data available through the site. Access to data might be made easier for the public if terms such as marine, ocean, and vessel discharges provide results via searching.
Have you tapped the information obtained during the National Dialog? Much of our input was internal EPA, especially during the Jam sessions. I believe I recall specific data systems being mentioned.
I think the Nutrients Database would be a useful addition to this meta-database.
As a future addition, at least for nutrients, the ability to coordinate with universities and other research facilities would be of immense value.
Add and report out metrics –
We are collecting data in real time. We should also be capable of analyzing the data in real or near real time?
Examples – Daily frequency of comments, where are comments comming from, how many new datasets suggested, how many suggestions incorporated / pending / turned down, how many bugs identified. Later we will also want to know, how many datasets downloaded? How many datasets accessed? By whom (generally) ? We have the technology. How will measure sucess or failure?
Surface Water – link to GLENDA (I see it’s under Monitoring as well)
Hazardous Waste – link to RCRAInfo rather than (or in addition to) the RCRAOnline document database. Your definition of data stresses websites where numerical data can be downloaded. I would think RCRAInfo would be the better link for DataFinder to highlight.
http://www.epa.gov/enviro/html/rcris/rcris_query_java.html
I searched for PCBs and no results were found. What about linking to http://www.epa.gov/epawaste/hazard/tsd/pcbs/pubs/data.htm?
For the Climate Change topic, you might want to include a link to this page on Renewable Energy.
http://www.epa.gov/renewableenergyland/
The data is stored in an Excel file and posted as a Google Earth file also.
Nice website in general.
CHAD, of which I am the developer, is a database of human activities, and as such best fits in: Health Risks / Exposure. It has almost nothing about Health Effects in it, but is used to do exposure assessments. It should be located where HEDS is, developed also by our Division in NERL. By the way, it soon will be part of HEDS, which will be used as a portal.
TITLE: actor
DESCRIPTION: ACToR (Aggregated Computational Toxicology Resource) is a collection of databases collated or developed by the US EPA National Center for Computational Toxicology (NCCT). More than 200 sources of publicly available data on environmental chemicals have been brought together and made searchable by chemical name and other identifiers, and by chemical structure. Data includes chemical structure, physico-chemical values, in vitro assay data and in vivo toxicology data. Chemicals include, but are not limited to, high and medium production volume industrial chemicals, pesticides (active and inert ingredients), and potential ground and drinking water contaminants.
URL: http://actor.epa.gov/actor/faces/ACToRHome.jsp
ORGANIZATION: ord
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
TITLE: environmental enforcement results
DESCRIPTION:
URL:
ORGANIZATION: Region 5′s Office of Regional Counsel
GEOGRAPHIC_SCALE: Regional
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
COMMENT: I would like to see this tool be used to retrieve environmental enforcement data (e.g. injunctive relief, penalties, supplemental environmental projects)
TITLE: Pollution Abatement Costs and Expenditures (PACE)
DESCRIPTION: The Pollution Abatement Costs and Expenditures (PACE) survey is the most comprehensive national source of pollution abatement costs and expenditures related to environmental protection for the manufacturing sector of the United States. The PACE survey collects facility-level data on pollution abatement capital expenditures and operating costs associated with compliance to local, state, and federal regulations and voluntary or market-driven pollution abatement activities. Because the facility-level data contains confidential information, user access to the data at this level or detail is managed by the Census Bureau. More aggregate data and summary statistics have been published and are available from Census and the EPA.
URL: http://yosemite.epa.gov/ee/epa/eed.nsf/pages/pace2005.html
ORGANIZATION: AO, OPEI, NCEE
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
COMMENT: This data is collected by the Census Bureau working under an IAG with the EPA. The original data series began back in the early 1970s and was collected on mostly an annual basis up until the mid-1990s. At that point, largely due to budget issues (Census had been paying for the survey) Census discontinued the survey. After a period of time passed, EPA agreed to begin paying for the data to be collected, and a dedesigned and renewed survey was implemented to collect 1999 data. Again, there was a lapse in the survey due to budget constraints and efforts to reassess and redesign the survey, so the next survey collected data from 2005 (what the EPA and Census websites both focus on). The Census’ website on the PACE survey is http://www.census.gov/mcd/pace.htm
The EPA website is: http://yosemite.epa.gov/ee/epa/eed.nsf/pages/pace2005.html
The Census Bureau is resonsible for securing the plant-level data and survey responses, but can provide access to the micro-level data to EPA and other organizations or researchers for a fee. Historically, researchers have made considerable use of the micro-level data, including linking the plant-level data to other economic and environmental data for research purposes.
Possible keywords: pollution abatement, economic cost
Contacts: Brett Snyder, Cynthia Morgan, and Ron Shadbegian
TITLE: Beach Advisory Data
DESCRIPTION: Database of beach advisories at coastal and Great Lakes beaches that receive EPA Beach Act grants.
URL: http://iaspub.epa.gov/waters10/beacon_national_page.main
ORGANIZATION: Office of Water
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
COMMENT: Contact is Bill Kramer
TITLE: National Listing of Fish Advisories
DESCRIPTION: Database of state and tribal advisories on consumption of fish
URL: http://134.67.99.49/scripts/esrimap.dll?name=Listing&Cmd=Map
ORGANIZATION:
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
COMMENT: Contact is Jeff Bigler
Lots of services are published on geodata.epa.gov. I think it would be helpful to list the services for the data systems as they do in http://www.data.gov
2nd the BASINS. There’s a ton of data there.
Also, PRAWNS. It basically outputs to BEACON, which you can see at http://iaspub.epa.gov/waters10/beacon_national_page.main . It’s possible to download this raw data, for each state, from here: http://www.epa.gov/waterscience/beaches/seasons/
Michael, Thanks for suggesting data sources for Data Finder. I looked at BEACHES and BEACON and it appears that they provide data about beach condition and closure. You can find data about a beach of interest by clicking progressively deeper in the site.
For this version of Data Finder we’ve defined data sources as sites where you can download numerical data. I’m not sure whether these sites fit that definition because you have to click a few levels deep in order to get to the data. People told us that they needed a place to find numerical data, so we’ve tried not to include databases of non-numerical information, like lists of Superfund sites. Based on user feedback we may change the site to include sites that are more query oriented, help people find specific data sets, or find tools. Do you have any suggestions for what sites should be included or excluded?
Understood, Ethan. Thanks.
You can download the raw BEACON data by visiting the seasons pages and clicking on a state (example: http://www.epa.gov/ost/beaches/seasons/2007/xl/ca.xls ). OW doesn’t yet provide a one-file download of all the data, but that, I’m sure, can be done. (I know that xls isn’t the best format, but it’s still the raw data. Also, 2008 data will go up this summer.)
The Nutrients DB, which you do include in Data Finder, is the same way. You have to drill down a few pages to get to the data.
TITLE: GIS Data
DESCRIPTION: EPA’s Geospatial Data Catalog
URL: http://www.epa.gov/geospatial/data.html#catalog
ORGANIZATION: US EPA
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: Data Connection
COMMENT: Has data services and GIS data downloads
I’ve sent you comments using yellow stickies on PDF files, but here are most of them for the blog:
Link to ACRES in CIMC or Envirofacts for Brownfields data.
Link to RCRAInfo query in Envirofacts or to CIMC for RCRA and RCRA Corrective Action sites – not to RCRA Online which is documents, not sites.
SuperCPAD is a better link for Superfund sites than whta you have – it gets people directly to the data
New ideas:
What about the geospatial download!!? It is part of Envirfacts, but really works with GoogleEarth and other mashups.
What about the system Jerry Johnston has set up to create data sets for mashups?
TITLE: BSAF Data Set
DESCRIPTION: EPA MED researchers developed a data set of approximately 20,000 biota-sediment accumulation factors (BSAFs) from 20 locations (mostly Superfund sites) for nonionic organic chemicals, e.g., PCBs, PCDDs, PCDFs, DDTs, PAHs, and pesticides. Fresh, tidal, and marine ecosystems are included in the data set, and species in the data set include fish and benthic species (e.g., lobster, crayfish, and benthic invertebrates). The purpose of the data set is fivefold: i) provides tools for evaluating the reasonableness of BSAFs from other locations, ii) provides a tool for building a BSAF data set for locations of your interest, iii) provides data for performing bounding assessments of risks for locations where limited or no bioaccumulation are available, iv) permits inquiry into underlying relationships and dependences of BSAFs upon ecosystem conditions and parameters, and v) allows comparison of PCB, PCDD, and PCDF residues to residue-effects data download from PCBRes (see
http://www.epa.gov/med/prods_pubs.htm).
URL: http://www.epa.gov/med/Prods_Pubs/bsaf.htm
ORGANIZATION: ORD, NHEERL, MED
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
TITLE: PCBRes Database
DESCRIPTION: The PCBRes database is used by scientists and risk assessors in correlating PCB and dioxin-like compound residues with toxic effects. The purpose is to develop PCB critical residue values for fish, mammals and birds, especially as these relate to aquatic and aquatic-dependent species. This database includes expression of critical residue values based upon PCB Aroclors and total PCB-based congener specific methods because PCBs occur as complex mixtures. Because PCB toxicity occurs via the arylhydrocarbon-receptor (AhR), PCB toxicity has also been expressed using the sum of the dioxin-like PCBs after adjustment using toxicity equivalence factors (TEF). Limited dioxin and furan compounds in single and mixture studies are also included.
URL: http://www.epa.gov/med/Prods_Pubs/pcbres.htm
ORGANIZATION: ORD, NHEERL, MED
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
COMMENT: As I mentioned in an email, I think a general category on environmental toxicology would be useful.
TITLE: Enforcement and Compliance History Online (ECHO)
DESCRIPTION: Enforcement data
URL: http://www.epa-echo.gov/echo/index.html
ORGANIZATION:
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
TITLE: Envirofacts
DESCRIPTION: Various data sets. Could be referenced individually in Data Finder. Don’t necessarily need Envirofacts.
URL: http://www.epa.gov/enviro/index.html
ORGANIZATION: Various
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
TITLE: Cleanups in My Community
DESCRIPTION: Cleanups in My Community is a mapping and listing tool that shows sites where pollution is being or has been cleaned up throughout the United States. It maps, lists and provides cleanup progress profiles for:
Sites, facilities and properties that have been contaminated by hazardous materials and are being, or have been, cleaned up under EPA’s Superfund, RCRA and/or Brownfields cleanup programs.
Federal facilities that have been contaminated by hazardous materials and are being, or have been, cleaned up under EPA’s Superfund and/or RCRA cleanup programs.
URL: http://iaspub.epa.gov/Cleanups/
ORGANIZATION: OSWER
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
TITLE: Geo Spatial Data
DESCRIPTION: Geospatial data for various EPA data @ http://www.epa.gov/enviro/geo_data.html
URL: http://www.epa.gov/enviro/geo_data.html
ORGANIZATION: various
GEOGRAPHIC_SCALE: National
REGISTRY_SYSTEM: No
REGISTRATION: No
SERVICE: None
Ecosystems is not on your list. Climate change is on it twice. It doesn’t look like you are following the EPA Web Standards; i.e. I think the pictures should have the grey border not green and quickfinder. I think it is a great idea.
- Debbie Westerman
1) The Clean Air Markets Division (OAR) has a few data sources and I don’t see them listed. We have “Data and Maps” at http://camddataandmaps.epa.gov/gdm/ and we have CASTNet (deposition data) at http://www.epa.gov/castnet/
- Cindy Walke
Consider including the ACE (America’s Children and the Environment) Summary List of Measures. While they are data summaries and not raw data, it’s useful for seeing how EPA data are combined with other sets. If you go to the measures themselves, it lists the way they combine EPA sets with CDC and Census, for example.
For those of us that work on environmental health issues, EPA data are great to have, but not much help if we don’t have the health end combined. These combinations are important and what can add power to the enviro set.
While the EPA links may be repetitive to what you already have, combining with outside federal agency datasets is not repetitive and it does provide links. Here’s a link an example.
http://www.epa.gov/economics/children/contaminants/e1-sources.htm
Thanks again for sharing. I’m glad you are doing this.
- Maureen O’Neill
Two of my favorites are BASINS for water and NATA for air that are not found.
Zenny Sadlon
Thanks for these ideas. We’re looking for “data sources,” sites where you can download numerical data from EPA.
I see that AFS is owned by OECA and we’ll note it as such. I think you’re right about AFS and its focus on facilities rather than data about air. We’ll look into which sites are data sources (AFS, AQS, NEI, AirData).
Additional sites for air pollution / air quality data:
Air Compare – http://www.epa.gov/aircompare/
Air Data – http://www.epa.gov/air/data/
Air Emission Sources – http://www.epa.gov/air/emissions/
Air Explorer – http://www.epa.gov/airexplorer/
Clean Air Markets Data and Maps – http://camddataandmaps.epa.gov/gdm/
National-Scale Air Toxics Assessment – http://www.epa.gov/ttn/atw/natamain/
(See http://www.epa.gov/air/airpolldata.html for thumbnail descriptions of these sites.)
Correction/suggestion:
The AIRS/AFS database (listed in Data Finder) is mainly regulatory compliance information, and does not have much data about air releases. Metadata should say AIRS/AFS is owned by the Office of Compliance, rather than OAR.