Introduction

EnviroData has the capability of taking your custom R scripts and turning them into reports that can be accessed via the EnviroData UI. We call these microservices. This article will explain how to turn an R script into an EnviroData microservice. In this example, we’ll assume we’re using EnviroData R to collect the necessary data from EnviroData.

Specifically, we’ll cover the following:

  • Describing an example scenario of turning a script into a service
  • Basics of HTTP requests and JSON format
  • Setting up your script to accept JSON input

Example: A Report for Generating Stats

As a simple example, let’s imagine we have an R package where a user can generate statistics for a specific time series parameter across a range of dates. The main function in this script would probably look something like this:

generate_report <- function(timeseries_name, start_date, end_date){
  # Use EnviroData R to collect data
  EnviroDataR::authenticate_envirodata(username, password)
  data <- EnviroDataR::get_aquarius_report_data(start_date, end_date, timeseries_name)
  
  # Perform some analysis on the raw data
  analyzed_data <- function_to_analyze_data(data)
  
  # Output the analyzed data to an excel file and then download
  function_to_create_excel_file(analyzed_data)
}

As you can see, our function takes three arguments:

  1. timeseries_name
  2. start_date
  3. end_date

We then set up an access token with EnviroData using authenticate_envirodata() (We assume we’ve already set username and password constants somewhere else in our package). After that, we access whatever data it is we’re interested in.

Note that I’ve not included any information on the function_to_analyze_data() or function_to_create_excel_file() functions as this is up to you!

When we use this package on our computers, we simply call the generate_report() function and pass it the arguments:

generate_report(timeseries_name = "My_cool_timeseries", start_date = "2020-01-01", end_date = '2021-01-01')

But our goal now is to turn this into an EnviroData microservice. That means we want someone to be able to go to EnviroData, select a timeseries and a date range, then hit a big red button and have it run our generate_report function with the selected timeseries and date range. Obviously this is different than calling the function from an R console as shown above. This means we’ll need to change our package slightly. Before we know WHAT we need to change, we first need to understand what happens when we hit that big red button.

HTTP Requests and JSON in 1 Minute or Less

Whenever a user presses that big red button, EnviroData will send an HTTP request to wherever our R script is hosted (this is a simplification, but for our purposes it’s all we need to know). In simple terms, an HTTP request is some amount textual data sent from EnviroData to our R package. The format of this data is JSON: which stands for javascript object notation.

Remember, we’re going to be sending three pieces of data when that big red button is hit: timeseries name, start date, and enddate. Here’s what it will look like in JSON format: (Actually this is a JSON string, since r markdown can’t parse JSON.)

http_message = '{
   "timeseries_name" : "My_cool_timeseries",
   "start_date" : "2020-01-01",
   "end_date" : "2021-01-01"
}'

This is what EnviroData sends our package. That means function is going to be called with this ONE variable as the argument. If it was running on a console line, this is probably what it would look like.

generate_report(timeseries_name = http_message, start_date = NULL, end_date = NULL)

Hopefully it’s clear why this won’t work. Since the internet packaged all three of our arguments as one JSON object, our script was called with only one argument (http_message)! As a result, the other two arguments, start_date and end_date were left blank.

What we need is a way of making our script understand how to read this HTTP message correctly.

Setting up your script to parse HTTP requests

In order to correctly call our function, we need a way of taking the HTTP request and extracting the three arguments inside of it. We’ll do this by creating a second function that first accepts the HTTP request, parses it, saves all three arguments as variables, and then calls our original generate_report() function with those variables. We’ll call this the parse_http_request() function. (Behind the scenes, the devs at Hatfield will tell EnviroData to send the HTTP request to this function instead of our original generate_report function).

parse_http_request <- function(received_http_message){
  # Extract a list of arguments from the JSON received_http_message
  extracted_list <- jsonlite::fromJSON(received_http_message, simplifyDataFrame = F)
  
  # Check to make sure that any blank values are actually set to NULL
  arguments_list <- purrr::map(extracted_list, function(x) {
    if (all(x == "NULL") | all(x == "") | all(is.null(x))) 
      {
        x <- NULL
      } else {
        x <- x
        }
    return(x)
    })
  
  # Ensure we select only the items we need
  final_arguments_list <- arguments_list[c("timeseries_name", "start_date", "end_date")]

  # Turn each item in the checked_arguments_list into variables in our environment
  list2env(final_arguments, envir = .GlobalEnv)

  # Call our original function!
  generate_report(timeseries_name = timeseries_name, start_date = start_date, end_date = end_date)
}

We’ll go through this step by step. (Note: typically you’d abstract this into 2 or more functions, but for this tutorial we’ve kept it as one function.)

First, our function takes the received_http_message (which is a JSON object) and turns it into a list of values. Here’s what the result of this step will look like:

extracted_list <- jsonlite::fromJSON(received_http_message, simplifyDataFrame = F)
extracted_list
## $timeseries_name
## [1] "My_cool_timeseries"
## 
## $start_date
## [1] "2020-01-01"
## 
## $end_date
## [1] "2021-01-01"

Next, our function runs a simple check to make sure that blank values that should be NULL are actually being set to NULL. It looks way more complicated than it actually is. Basically, we loop through every item in our list and check if it’s equal to an empty string “” or the string “NULL” or the value NULL. If it’s equal to one of those, we set the value to the NULL. If not, then we just leave it as is.

After that, we make sure that our list only consists of the items we need. Ideally the HTTP request will only send the information you need, but that won’t always bee the case. Our final arguments list is thus named, aptly, final_arguments_list.

The final step is to save each item in our list as a variable in our environment using the list2env() function. This function takes the three items in the final_arguments_list and creates three variables. The name of each variable will be the name of the item in the list. The value of each variable will be the value! So we’ll wind up with three variables: timeseries_name, start_date, and end_date.

All that’s left to do now is call our original function with these variables! We do just that in the final line of the code chunk above.

Conclusion

By adding one function, we’ve successfully changed our package to accept HTTP requests from EnviroData. Note that you’ll also want to be in contact with EnviroData dev team throughout this process, as they’ll need to setup some stuff on their end. I’ve outlined below the entire process, including what you’ll want to tell developers as well as some of the steps that are out of this tutorial’s scope.

  1. Build an analysis script

  2. Turn that script into an R package. See the Data Analytics Team’s SOP on Developing R Packages for more info

  3. Identify the arguments that the main function in your package accepts.

  4. Imagine what you want the EnviroData UI to look like. (For example, do you want a big red button? Will users select start and end date using a calendar or a dropdown menu?)

  5. Speak to a Hatfield EnviroData developer and tell them you’d like to turn your R package into an envirodata service. You’ll want to give them a brief overview of your package’s goal (i.e., what it does), a list of arguments it accepts, and how you want the EnviroData UI to look.

  6. Alter your package as outlined in this tutorial. Specifically, you’ll need a function that can parse the incoming JSON formatted HTTP request.

  7. Wait for the dev team to link up your new service

Now that you know how to set up your package as an EnviroData service, you can spend more time focusing on writing powerful data analysis packages!