This tutorial introduces the WaterML R package. This tutorial shows an example how to retrieve data from the Hydrologic Information System and do statistical analysis in R.
#import required libraries
library(WaterML)#get the list of supported CUAHSI HIS services
services <- GetServices()View(services)http://hydroportal.cuahsi.org/ipswich/cuahsi_1_1.asmx?WSDL that enlists volunteers to collect data on the health of the Ipswich River and its tributaries in Massachusetts, USA. We can use the GetVariables() and GetSites() functions to get the tables of variables and sites on the server.#point to an CUAHSI HIS service and get a list of the variables and sites
server <- "http://hydroportal.cuahsi.org/ipswich/cuahsi_1_1.asmx?WSDL"
variables <- GetVariables(server)
sites <- GetSites(server)#get full site info for all sites using the GetSiteInfo method
siteinfo <- GetSiteInfo(server, "IRWA:FB-BV")View(siteinfo)IRWA:Temp) and dissolved oxygen (full variable code IRWA:DO). In this example we get the values for all available days. Note that we can also use the startDate and endDate parameters to restrict the time period of interest. To get help on the GetValues function, you can type ?GetValues in the R console. Note that for this particular site there are 21 Temperature and 22 dissolved oxygen observations.#get full site info for all sites using the GetSiteInfo method
Temp <- GetValues(server,siteCode="IRWA:FB-BV",variableCode="IRWA:Temp")
DO <- GetValues(server, siteCode="IRWA:FB-BV",variableCode="IRWA:DO")points() function for adding the dissolved oxygen data points to the existing plot.plot(DataValue~time, data=Temp, col="red")
points(DataValue~time, data=DO, col="blue")Note that the “time” represents the local time, and “DateTimeUTC” represents the UTC time. The “DateTimeUTC” columns are in POSIXct format. POSIXct is a special format in R for storing date and time. POSIXct represents the number of seconds since the beginning of 1970. You can use the strftime function to get the year, month, day, hour, minute and second corresponding to each time as shown below:
years <- strftime(DO$time, "%Y")
months <- strftime(DO$time, "%m")
days <- strftime(DO$time, "%d")
hours <- strftime(DO$time, "%h")
minutes <- strftime(DO$time, "%M")
seconds <- strftime(DO$time, "%s")#merge our two tables based on the time column
data <- merge(DO, Temp, by="time")
#rename the column DataValue.x in the merged table to "DO"
names(data)[names(data)=="DataValue.x"] <- "DO"
#rename the column DataValue.y in the merged table to "Temp"
names(data)[names(data)=="DataValue.y"] <- "Temp"plot(DO~Temp, data=data)# Perform a linear regression on the dissolved oxygen vs. temperature values
model <- lm(DO~Temp, data=data)summary(model)
abline(model)The code creates two outputs when run in RStudio. First, it creates a scatter plot of dissolved oxygen concentration versus water temperature with the linear regression line.
Second, it outputs the results from the regression analysis. From these results, there appears to be a significant negative linear relationship between water temperature and dissolved oxygen at this site.
#> 
#> Call:
#> lm(formula = DO ~ Temp, data = data)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -2.1075 -1.2097 -0.5861  1.1340  3.1318 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  8.90404    0.75302  11.824 1.26e-09 ***
#> Temp        -0.16965    0.04813  -3.525   0.0026 ** 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.584 on 17 degrees of freedom
#> Multiple R-squared:  0.4223, Adjusted R-squared:  0.3883 
#> F-statistic: 12.43 on 1 and 17 DF,  p-value: 0.002599This tutorial shows how you can use the WaterML library in R to access data from a CUAHSI HIS web service directly within R without the need to first download data to your local computer. While this was demonstrated for a data service hosted by Ipswich River Watershed Association, the WaterML R package can be used to access data from any compliant CUAHSI HIS web service including the 100+ data services listed on the HIS Central website.
For additional information on the tutorial and the WaterML R Package, please refer to:
Jiri Kadlec, Bryn StClair, Daniel P.Ames, Richard A. Gill (2015). WaterML R package for managing ecological experiment data on a CUAHSI HydroServer. Ecological Informatics, 28, 19-28. http://www.sciencedirect.com/science/article/pii/S1574954115000801