PLOS One

Open Science and Cyanobacterial Research at EPA

By: Jeff Hollister, Betty Kreakie, and Bryan Milstead

Green, algal-filled pond

Algal bloom containing cyanobacteria.

It wasn’t long ago that science always occurred along a well-worn path. Observations led to hypotheses; hypotheses led to data collection; data led to analyses; and analyses led to publications. And along this path, data, hypotheses, and analyses were held close and, more often than not, the only public-facing view of the research was the final publication.

Science has come a long way with this model.  However, it was conceived when print was the main media and most scientific questions could be investigated by few scientists over a short period of time.

Then came computers. Then came the internet.

Just like in every other aspect of modern life, these advances are greatly impacting science. It has changed who conducts our science, how we share it, and how others interact with scientific information. All of these changes are playing out through the increasing openness of all parts of the scientific process.

This broad area has been defined as having several components. These components suggest that “open science”:

  • is transparent (and, of course, open)
  • includes all parts of research (data, code, etc.)
  • allows others to repeat the work
  • should be posted on an open and accessible website (while protecting Personally Identifiable Information, etc.)
  • occurs along a gradient (i.e. not just a binary open vs. not open)

At EPA, we are learning how to make our research on cyanobacteria and human health (for more info join our webinar) meet those criteria.  We are implementing open science in three ways: (1) making our work available via open access publishing; (2) providing access to the code used in our analysis; and (3) making our data openly available.

Several members of our research group have embraced open access options for publishing their research. For instance, our colleague Elizabeth Hilborn and her co-authors published results of their study—examining a group of dialysis patients following exposure to the cyanobacteria toxin microcystin—in one of the pioneering open access journals, PLoS ONE. Also in PLoS ONE, EPA scientist Bryan Milstead and his collaborators published a modeling method to combine the U.S. Geological Survey’s SPARROW model (a modeling tool for interpreting regional water-quality monitoring data), lake depth, lake volume, and EPA National Lakes Assessment data to estimate nutrient concentrations.

As our work progresses, we will continue to choose open access journals. In our experience, this has allowed our research to reach a larger audience and we can more easily track the impact through readership levels using available tools such as PLoS Article Level Metrics.

We are also sharing our data. Currently, this is accomplished through supplements added to publications and through sites such as the EPA’s Environmental Dataset Gateway. We plan to expand these efforts via data publications, version-controlled repositories, and through the development of Application Programming Interfaces (APIs) that provide access to data for developers and other scientists.

The goal of these efforts, and more (stay tuned for a future post on how coding fits in to open science), is to increase the reproducibility of our work (but challenges remain), reach broader audiences, and eventually have a greater impact on our understanding and management of harmful algal blooms.

About the Authors: EPA ecologists Jeff Hollister, Betty Kreakie and Bryan Milstead study greenwater for a living. If you have questions for them, join the webinar on June 25th or follow the twitter chat on June 26th using #greenwater.

Editor's Note: The opinions expressed here are those of the author. They do not reflect EPA policy, endorsement, or action.

Please share this post. However, please don't change the title or the content. If you do make changes, don't attribute the edited title or content to EPA or the author.

SPARROWs, Lakes, and Nutrients?

By Jeff Hollister

Dock extending into a lake with forested background.Based on the title above, you probably think I don’t know what I am talking about. I mean really, what do sparrows, lakes, and nutrients have in common? In this case, a lot. So much so, an inter-agency team of EPA researchers in Narragansett RI, and a colleague from the U.S. Geological Survey (USGS) in New Hampshire have been working together to better understand how these three seemingly disparate concepts can be linked together. Some of the results of this work are outlined in a recent publication in the Open Access journal, PLos One

The sparrow I am referring to isn’t small and feathered, it is a model developed and refined by the USGS. Since the late 1990’s, USGS has been developing SPARROW models which have been widely used to understand and predict the total amount of nutrients (among other materials) that streams are exposed to over the long-term. This is known as “nutrient load.” The models are important because they provide a picture over a very large extent of where nutrients might be relatively high.

However, when it comes to lakes, SPARROW doesn’t directly provide the information we need. For our research on lakes, we need reasonable estimates of the quantity of nutrients in a given volume of water (i.e., nitrogen and phosphorus concentration), not long term nutrient load for the year. This is important, because the higher the nutrient concentrations at any given time, the greater the chance of triggering algal blooms—and more blooms mean a greater probability of toxins released by algae reaching unhealthy levels.

In order to better estimate the nutrient concentrations, we needed to use the SPARROW model for total load, but also account for the differences between load and concentration. Our solution: combining field data, data on lake volume and the SPARROW Model.

In our paper “Estimating Summer Nutrient Concentrations in Northeastern Lakes from SPARROW Load Predictions and Modeled Lake Depth and Volume,” recently published in PLoS One, we describe how we combined modeling information from SPARROW, summertime nutrient concentrations collected during EPA’s 2007 National Lakes Assessment, and estimated lake volume (see this and this for more).

The end result of this effort is better predictions, by an average of 18.7% and 19.0% for nitrogen and phosphorus, respectively.

What is the meaning of this in terms of our environment, and importantly, the potential human health impacts? If we are able to better predict concentrations of nutrients it will hopefully also improve our ability to know where and when we might expect to see harmful algal blooms, specifically harmful cyanobacterial algal blooms. Cyanobacteria have been associated with many human health issues, from gastro-intestinal problems, to skin rash, and even a hypothesized association with Lou Gehrig’s Disease (for example, see this). So, in short, better predictions of nutrients, will, in the long run, improve our understanding of cyanobacteria and hopefully reduce the public’s exposure to a potential threat to health.

About the author: Jeff Hollister, a co-author on the study outlined in this blog post, is a research ecologist with an interest in landscape ecology, Geographic Information Systems (GIS), the statistical language R, and open science. The focus of Jeff’s work is to develop computational and statistics tools to help with the cyanobacteria groups research efforts. Jeff is also an outspoken advocate for open science and open access among his colleagues.

Editor's Note: The opinions expressed here are those of the author. They do not reflect EPA policy, endorsement, or action.

Please share this post. However, please don't change the title or the content. If you do make changes, don't attribute the edited title or content to EPA or the author.