While TensorFlow models are typically defined and trained using R or Python code, it is possible to deploy TensorFlow models in a wide variety of environments without any runtime dependency on R or Python:
TensorFlow Serving is an open-source software library for serving TensorFlow models using a gRPC interface.
CloudML is a managed cloud service that serves TensorFlow models using a REST interface.
RStudio Connect provides support for serving models using the same REST API as CloudML, but on a server within your own organization.
TensorFlow models can also be deployed to mobile and embedded devices including iOS and Android mobile phones and Raspberry Pi computers.
The R interface to TensorFlow includes a variety of tools designed to make exporting and serving TensorFlow models straightforward. The basic process for deploying TensorFlow models from R is as follows:
Train a model using the keras, tfestimators, or tensorflow R packages.
Call the export_savedmodel()
function on your trained model to write it to disk as a TensorFlow SavedModel.
Use the serve_savedmodel()
function from the tfdeploy package to run a local test server that supports the same REST API as CloudML and RStudio Connect.
Deploy your model using TensorFlow Serving, CloudML, or RStudio Connect.
Begin by installing the tfdeploy package from CRAN as follows:
To demonstrate the basics, we’ll walk through an end-to-end example that trains a Keras model with the MNIST dataset, exports the saved model, and then serves the exported model locally for predictions with a REST API. After that we’ll describe in more depth the specific requirements and various options associated with exporting models. Finally, we’ll cover the various deployment options and provide links to additional documentation.
We’ll use a Keras model that recognizes handwritten digits from the MNIST dataset as an example. MNIST consists of 28 x 28 grayscale images of handwritten digits like these:
The dataset also includes labels for each image. For example, the labels for the above images are 5, 0, 4, and 1.
Here’s the complete source code for the model:
library(keras)
# load data
c(c(x_train, y_train), c(x_test, y_test)) %<-% dataset_mnist()
# reshape and rescale
x_train <- array_reshape(x_train, dim = c(nrow(x_train), 784)) / 255
x_test <- array_reshape(x_test, dim = c(nrow(x_test), 784)) / 255
# one-hot encode response
y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)
# define and compile model
model <- keras_model_sequential()
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784),
name = "image") %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax',
name = "prediction") %>%
compile(
loss = 'categorical_crossentropy',
optimizer = optimizer_rmsprop(),
metrics = c('accuracy')
)
# train model
history <- model %>% fit(
x_train, y_train,
epochs = 35, batch_size = 128,
validation_split = 0.2
)
In R, it is easy to make predictions using the the trained model and R’s predict
function:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0 0 0 0 0 0 0 1 0 0
[2,] 0 0 1 0 0 0 0 0 0 0
[3,] 0 1 0 0 0 0 0 0 0 0
[4,] 1 0 0 0 0 0 0 0 0 0
Each row represents an image, each column represents a digit from 0-9, and the values represent the model’s prediction. For example, the first image is predicted to be a 7.
What if we want to deploy the model in an environment where R isn’t available? The following sections cover exporting and deploying the model with the tfdeploy package.
After training, the next step is to export the model as a TensorFlow SavedModel using the export_savedmodel()
function:
This will create a “savedmodel” directory that contains a saved version of your MNIST model. You can view the graph of your model using TensorBoard with the view_savedmodel()
function:
To test the exported model locally, use the serve_savedmodel()
function.
The REST API for the model is served under localhost with port 8989. Because we specified the browse = TRUE
parameter, a webpage that describes the REST interface to the model is also displayed. The REST interface is based on the CloudML predict request API.
The model can be used for prediction by making HTTP POST requests. The body of the request should contain instances of data to generate predictions for. The HTTP response will provide the model’s predictions. The data in the request body should be pre-processed and formatted in the same way as the original training data (e.g. feature scaling and normalization, pixel transformations for images, etc.).
For MNIST, the request body could be a JSON file containing one or more pre-processed images:
new_image.json
{
"instances": [
{
"image_input": [0.12,0,0.79,...,0,0]
}
]
}
The HTTP POST request would be:
Similar to R’s predict function, the response includes an array representing the digits 0-9. The image in new_image.json
is predicted to be a 7 (since that’s the column which has a 1
, whereas the other columns have values approximating zero).
{
"predictions": [
{
"prediction": [
1.3306e-24,
4.9968e-26,
1.8917e-23,
1.7047e-21,
0,
8.963e-33,
0,
1,
2.3306e-32,
2.0314e-22
]
}
]
}
Once you are satisifed with local testing, the next step is to deploy the model so others can use it. There are a number of available options for this including TensorFlow Serving, CloudML, and RStudio Connect. For example, to deploy the saved model to CloudML we could use the cloudml package:
The same HTTP POST request we used to test the model locally can be used to generate predictions on CloudML, provided the proper access to the CloudML API.
Now that we’ve deployed a simple end-to-end example, we’ll describe the process of Model Export and Model Deployment in more detail.
TensorFlow SavedModel defines a language-neutral format to save machine-learned models that is recoverable and hermetic. It enables higher-level systems and tools to produce, consume and transform TensorFlow models.
The export_savedmodel()
function creates a SavedModel from a model trained using the keras, tfestimators, or tensorflow R packages. There are subtle differences in how this works in practice depending on the package you are using.
The Keras Example above includes complete example code for creating and using SavedModel instances from Keras so we won’t repeat all of those details here.
To export a TensorFlow SavedModel from a Keras model, simply call the export_savedmodel()
function on any Keras model:
Keras learning phase set to 0 for export (restart R session before doing additional training)
Note the message that is printed: exporting a Keras model requires setting the Keras “learning phase” to 0. In practice, this means that after calling export_savedmodel
you can not continue to train models in the same R session.
It is important to assign reasonable names to the the first and last layers. For example, in the model code above we named the first layer “image” and the last layer “prediction”.
model %>%
layer_dense(units = 256, activation = 'relu', input_shape = c(784),
name = "image") %>%
layer_dense(units = 128, activation = 'relu') %>%
layer_dense(units = 10, activation = 'softmax',
name = "prediction")
The layer names are reflected in the structure of REST requests and responses to and from the deployed model.
Exporting a TensorFlow SavedModel from a TF Estimators model works exactly the same way as exporting a keras model, simply call export_savedmodel()
on the estimator. Here is a complete example:
library(tfestimators)
mtcars_input_fn <- function(data, num_epochs = 1) {
input_fn(data,
features = c("disp", "cyl"),
response = "mpg",
batch_size = 32,
num_epochs = num_epochs)
}
cols <- feature_columns(column_numeric("disp"), column_numeric("cyl"))
model <- linear_regressor(feature_columns = cols)
indices <- sample(1:nrow(mtcars), size = 0.80 * nrow(mtcars))
train <- mtcars[indices, ]
test <- mtcars[-indices, ]
model %>% train(mtcars_input_fn(train, num_epochs = 10))
export_savedmodel(model, "savedmodel")
Generating predictions is done in the same way as with exported Keras models. First, use serve_savedmodel()
to host the model locally. Once running, an HTTP POST request can be made:
curl -X POST "http://127.0.0.1:8089/predict/predict/" \
-H "accept: application/json" \
-H "Content-Type: application/json" \
-d "{ \"instances\": [ { \"disp\": [ 160 ], \"cyl\": [ 4 ] } ]}"
Each instance of new data should be formatted as a json array, and each element in the array should be a named array corresponding to the feature columns. This structure is similar to a named list in R.
The response is the predicted MPG:
{
"predictions": [
{
"predictions": [
8.4974
]
}
]
}
The tensorflow package provides a lower-level interface to the TensorFlow API. You can also use the export_savedmodel()
function to export models created with this API, however you need to provide some additional parmaeters indicating which tensors represent the inputs and outputs for your model.
For example, here’s an MNIST model using the core TensorFlow API along with the requisite call to export_savedmodel()
:
library(tensorflow)
sess <- tf$Session()
datasets <- tf$contrib$learn$datasets
mnist <- datasets$mnist$read_data_sets("MNIST-data", one_hot = TRUE)
# Note that we define x as the input tensor
# and y as the output tensor that will contain
# the scores. These are referenced in export_savedmodel
x <- tf$placeholder(tf$float32, shape(NULL, 784L))
W <- tf$Variable(tf$zeros(shape(784L, 10L)))
b <- tf$Variable(tf$zeros(shape(10L)))
y <- tf$nn$softmax(tf$matmul(x, W) + b)
y_ <- tf$placeholder(tf$float32, shape(NULL, 10L))
cross_entropy <- tf$reduce_mean(
-tf$reduce_sum(y_ * tf$log(y), reduction_indices=1L)
)
optimizer <- tf$train$GradientDescentOptimizer(0.5)
train_step <- optimizer$minimize(cross_entropy)
init <- tf$global_variables_initializer()
sess$run(init)
for (i in 1:1000) {
batches <- mnist$train$next_batch(100L)
batch_xs <- batches[[1]]
batch_ys <- batches[[2]]
sess$run(train_step,
feed_dict = dict(x = batch_xs, y_ = batch_ys))
}
export_savedmodel(
sess,
"savedmodel",
inputs = list(image_input = x),
outputs = list(scores = y))
Once the model is exported, the same process of using serve_savedmodel
can be used, and the same HTTP requests demonstrated in the Keras example can be used against the tensorflow model.
There are a variety of ways to deploy a TensorFlow SavedModel, each of which are described below. Of the 3 methods described, 2 of them (CloudML and RStudio Connect) share the same REST interface that we have been using with serve_savedmodel
to test locally. The REST interface is described in detail here: https://cloud.google.com/ml-engine/docs/v1/predict-request.
You can deploy TensorFlow SavedModels to Google’s CloudML service using functions from the cloudml package. For example:
Once deployed to CloudML, predictions can be made using the same REST interace we previously used locally. The HTTP POST requests will be similar to the sample requests, but CloudML additiobnally requires proper authorization.
See the Deploying Models article on the CloudML package website for additional details.
RStudio Connect is a publishing platform for applications, reports, and APIs created with R. Connect runs on-premise or in your own cloud infrastructure, giving you full control of the deployment environment. Connect can also integrate with your own security services and user management tools.
An upcoming version of RStudio Connect will include support for hosting TensorFlow SavedModels, using the same REST interface as is supported by the local server and CloudML.
Exported models will be published to Connect using the rsconnect
package, for example:
library(rsconnect)
deployTFModel('savedmodel', account = <username>, server = <internal_connect_server>)
If you would like to preview the feature, or get more information, contact sales@rstudio.com.
TensorFlow Serving is an open-source library and server implementation that allows you to serve TensorFlow SavedModels using a gRPC interface as opposed to the REST interface offered by the previous deployment tools.
Once you have exported a TensorFlow model using export_savedmodel()
it’s straightforward to deploy it using TensorFlow Serving. See the documentation at https://www.tensorflow.org/serving for additional details.