Image Recognition with Tensorflow classification on OpenWhisk

The big picture

Image classificationAs described in a previous article we (Niklas and I) are going to use Tensorflow to classify images into pre-trained categories. The previous artikel was about  on how to train a model with Tensorflow on Kubernetes. This article here now describes how to use the pre trained model which is stored on Object Storage. Similar to the training we will also use docker to host our program but this time we will use OpenWhisk as a platform.

Like the first part I also use the Google training Tensorflow for Poets. This time not the code itself but I copied the important classification parts from their script into my python file.

OpenWhisk with Docker

OpenWhisk is the open source implementation of an so called serverless computing platform. It is hosted by apache and maintained by many companies. IBM offers OpenWhisk on their IBM cloud and for testing and even playing around with it it the use is for free. Beside python and javascript OpenWhisk also offers the possibility to run docker containers. Internally all python and javascript code is executed anyhow on docker containers. So we will use the same official Tensorflow docker container we used to build our training docker container.

Internally OpenWhisk has three stages for docker containers. When we register a new method the execution instruction is only stored in a database and as soon as the first call approaches OpenWhisk the docker container is pulled from the repository, then initialised by an REST call to ‘\init‘ and then executed by calling the REST interface ‘\run‘. The docker container keeps active and each time the method is called only the ‘\run‘ part is executed. After some time of inactivity the container is destroyed and needs to be called with ‘\init‘ again. After even more time of inactivity even the image is removed and need to be pulled again.

The setup

The code itself is stored on github. Let’s have a look first on how we build the Docker container:


FROM tensorflow/tensorflow:1.4.0-py3

WORKDIR /tensorflow
COPY requirements.txt requirements.txt
RUN  pip install -r   requirements.txt


CMD python -u

As you can see this Docker is now really simple. It basically installs the python requirements to access the SWIFT Object Store and starts the python program. The python program keeps running until the OpenWhisk system decides the stop the container.

We make heavy use of the idea of having a init and a run part in the execute code. So the python program has two main parts. The first on is init and the second run. Let’ have a look the init part first which is basically setting up the stage for the classification itself.


@app.route('/init', methods=['POST'])
def init():

        message = flask.request.get_json(force=True, silent=True)

        if message and not isinstance(message, dict):

        conn = Connection(key='xxxxx',
                          os_options={"project_id": 'xxxxxx',
                                      "user_id": 'xxxxxx',
                                      "region_name": 'dallas'}

        obj       = conn.get_object("tensorflow", "retrained_graph.pb")
        graph_def = tf.GraphDef()
        with graph.as_default():

        obj    = conn.get_object("tensorflow", "retrained_labels.txt")
        for i in obj[1].decode("utf-8").split():

    except Exception as e:
        print("Error in downloading content")
        response = flask.jsonify({'error downloading models': e})
        response.status_code = 512

    return ('OK', 200)

Unfortunately it is not so easy to configure the init part in a dynamic way with parameters from outside. So for this demo we need to build the Object Store credentials in our source code. Doesn’t feel right but for a demo it is ok. In a later article I will describe how to change the flow and inject  the parameters in a dynamic way. So what are we doing here?

  1. 10-16 is setting up a connection to the Object Store as described here.
  2. 18-22 is reading the pre trained Tensorflow graph directly into memory. tf is a global variable
  3. 24-26 is reading the labels which are basically a string of names separated by line breaks. The labels are in the same order as the categories in the graph

By doing all this in the init part we only need to do it once and the run part can concentrate on classifying the images without doing any time consuming loading any more.

Tensorflow image manipulation and classification

def run():

    def error():
        response = flask.jsonify({'error': 'The action did not receive a dictionary as an argument.'})
        response.status_code = 404
        return response

    message = flask.request.get_json(force=True, silent=True)

    if message and not isinstance(message, dict):
        return error()
        args = message.get('value', {}) if message else {}

        if not isinstance(args, dict):
            return error()


        if "payload" not in args:
            return error()

        with open("/test.jpg", "wb") as f:

        file_reader      = tf.read_file("/test.jpg", "file_reader")
        #file_reader      = tf.decode_base64(args['payload'])
        image_reader     = tf.image.decode_jpeg(file_reader, channels=3, name='jpeg_reader')
        float_caster     = tf.cast(image_reader, tf.float32)
        dims_expander    = tf.expand_dims(float_caster, 0)
        resized          = tf.image.resize_bilinear(dims_expander, [224, 224])
        normalized       = tf.divide(tf.subtract(resized, [128]), [128])
        input_operation  = graph.get_operation_by_name("import/input")
        output_operation = graph.get_operation_by_name("import/final_result")
        tf_picture       = tf.Session().run(normalized)

        with tf.Session(graph=graph) as sess:
            results = np.squeeze([0], {input_operation.outputs[0]: tf_picture}))
            index   = results.argsort()
            answer  = {}

            for i in index:
                answer[labels[i]] = float(results[i])

            response = flask.jsonify(answer)
            response.status_code = 200

    return response

How to get the image

The image is transferred base64 encoded as part of the Line 24-25 request. Part of the dictionary is the key payload. I choose this because Node-red is using the same name for some kind of most important key. Tensorflow has a function to consume base64 encoded data as well but I could not get it to run with the image encoding I use. So I took the little extra step here and write the image on file and read it back later. By directly consuming it I think we could same some milliseconds processing time.

Transfer the image

  • Line 27 reads the image back from file
  • Line 29 decode the jpeg into an internal representation format
  • Line 30 cast the values to an float32 array
  • Line 31 adds a new dimension on the beginning of the array
  • Line 32 resizes the image to 224, 244 to have a similar size with the training data
  • Line 33 normalize the image values

Classify the image

  • Line 34-35 gets the input and output layer and stores it in the variables
  • Line 36 loads the image into Tensorflow
  • Line 39 here is the magic happening. Tensorflow processes the CNN with the input and output layer connected and consumes the Tensorflow image. Furthermore numpy is squeezing out all array nesting to a single array.
  • Line 40 has an array with probabilities for each category.

Mapp the result to labels

The missing last step is now to map the label names to the results which is be done in line 43 and 44.

Build and deploy it in OpenWhisk

The docker container can be build with

docker build -t <namespace>/tensorflow-openwhisk-classify:latest .

and pushed with

docker push <namespace>/tensorflow-openwhisk-classify:latest

Run it in OpenWhisk

After configuring the command line tool wsk the action itself can be created with

wsk action create tensorflow-classify --docker <namespace>/tensorflow-openwhisk-classify:latest

For testing we need an image base64 encoded as file on our local hard disk. Then we can invoke the call with

wsk action invoke --result tensorflow-classify --param payload `cat test.base64`

The first execution will take up to 15 seconds because the docker container will be pulled from docker hub and the graph will be loaded from the Object Store. Calls later should be around 150 milliseconds processing time. The parameter –result will force OpenWhisk to wait for the function to end and also show you the result on your command line.

    "daisy": 0.9998985528945923,
    "dandelion": 0.00007187054143287241,
    "roses": 4.515387388437375E-7,
    "sunflowers": 0.000029122467822162434,
    "tulips": 4.63972159303605E-11

If you want to get the log file and also an exact execution time try this command:

wsk activation get `wsk activation list | grep tensorflow-classify | cut -f 1 -d " " |head -n 1`
  • First call results in  “duration”: 3805. Your call itself took way longer in the first call because 3805 is only the execution of the docker container (including init) not the time it tooks OpenWhisk to pull the docker container from docker hub.
  • Second call results in  “duration”: 156.

Build a web UI

Well UI is nothing I can talk about. But have a look at Niklas blog post on how to build a web UI. An test installation can be found here:

Image Recognition with Tensorflow training on Kubernetes

The big picture

Modern Visual Recognition is done with deep neural networks (DNN). One framework (and I would say the most famous one) to build this kind of network is Tensorflow from Google. Being open source and specially awesome it is perfect to play around and build your own Visual Recognition System. As the compute power and specially the RAM memory raises there is now a chance of having much more complicated networks compared to the 90th where there where only one or two hidden layer.

One architecture is the Convolutional Neural Network (CNN). The idea is very close to brain structure. The basic idea is to intensively train a network on gazillions of images and let it learn features inside the many hidden layers. Only the last layer connects features to real categories. Similar to our brain the networks learns concepts and patterns but not really the picture groups.

After spending a lot of compute power to train these networks they can be easily reused to train new images by replacing only the last layer with a new one representing the to be trained categories. Training this network is only training the last connection between the last layer and the rest of the network. This training is extremely fast (only minutes) compared to month for the complete network. The charming effect is to train only the “mapping” from features to categories. This is what we are going now.

Basically the development of such a system can be divided into two parts. The first part (training) is described there. For the “use” aka classification have a look into the second part on my blog. I developed this system together with a good friend and colleague of mine. Check out Niklas Heidloff, here is his blog and twitter account. The described system has mainly three parts. Two docker containers described in this blog and one epic frontend described in Niklas blog. The source code can be found on github.


If you want to train a neural network (supervised learning) you need a lot of images in categories. Not ten or hundred but better hundred thousands or even 15 million pictures. A wonderful source for this is Imagenet.  >14 million pictures organized in >20k categories. So a perfect source to train this kind of network. Google has done the same and participated in the Large Scale Visual Recognition Challenge (ILSVRC). Not only Google but many other research institutes build networks on top of Tensorflow in order have a better image recognition. The outcome are pre-trained models which can be used for system like we want to build.

Tensorflow for poets

Like always it is best to stand on shoulders of giants. So in our case use the python code developed by google at the codelabs. In this very fascinating and content full online training on Tensorflow Google developed python code to retrain the CNN and also to use the new trained model to classify images. Well, actually the training part is just using the original code and wraps it into a docker container and connects this container to an Object Store. So no much new work there but a nice and handy way to use this code for an own project. I highly recommend taking the 15 minutes and take the online training to learn how to use Tensorflow and Python.

MobileNet vs. Inception

As discussed there are many trained networks available the most famous ones are Inception and MobileNet. Inception has a much higher classification rate but also needs more compute power. Both on training and on classification. While we use kubernetes on “the cloud” the training is not a big problem. But we wanted to use the classifier later on on OpenWhisk we need to take care of the RAM memory usage. (512MB). The docker container can we configured to train each model but for OpenWhisk we are limited to the MobileNet.

Build your own classifier

Visual Recognition ArchitectureAs you can see in the picture we need to build two containers. The left one is loading the training images and the categories from an Object Store, trains the neural network and uploads the trained net back to the Object Store. This container can run on your laptop or somewhere in “the cloud”. As I developed a new passion for Kubernetes I added a small minimal yaml file to start the docker container on a Kubernetes Cluster. Well not really with multiple instances as the python code only uses one container but see it as some kind of “offloading” the workload.

The second container (will be described in the next article)  runs on OpenWhisk and uses the pre-trained network downloaded from the Object Store.

Use docker / kubernetes to train your model

We use the official Tensorflow docker container with python support as published from Google and the training script from Tensorflow for poets.


FROM tensorflow/tensorflow:1.4.0-py3

# Update repository and install git and zip
RUN apt-get update && \
    apt-get install -y git zip

# Install python requirements for swift
COPY requirements.txt requirements.txt
RUN  pip install -r   requirements.txt

# Get the tensorflow tainingscripts
RUN     git clone
WORKDIR /tensorflow-for-poets-2

# Copy the runtime script
RUN  chmod 700

CMD /tensorflow-for-poets-2/

The Dockerfile is straightforward. We use the Tensorflow docker image as base and install the git and zip (unpacking the training data) packages. Then we install all necessary python requirements. As all the Tensorflow related packages for Python are already installed these packages are only for accessing the Object Store (see my blog article). Then we clone the official github tensorflow-for-poets repository, add our execution shell script and finish with the CMD to call this script.

Execution Script

#!/usr/bin/env bash

echo ${TF_MODEL}

export OS_AUTH_URL=

swift auth
swift download ${OS_BUCKET_NAME} ${OS_FILE_NAME}

unzip ${OS_FILE_NAME} -d tf_files/photos

python -m scripts.retrain                            \
       --bottleneck_dir=tf_files/bottlenecks         \
       --how_many_training_steps=5000                \
       --model_dir=tf_files/models/                  \
       --summaries_dir=tf_files/training_summaries   \
       --output_graph=tf_files/retrained_graph.pb    \
       --output_labels=tf_files/retrained_labels.txt \
       --architecture=${TF_MODEL}                    \

cd tf_files

swift upload tensorflow retrained_graph.pb
swift upload tensorflow retrained_labels.txt

All important and sensitive parameters are configured via environment variables introduced by the docker container call. The basic and always the same parameters are set here. Where to do the keystone authentication and which protocol version for the Object Store. The swift commands downloads a zip file containing all training images in subfolders for each category. So you need to build a folder structure like this one:

. |

The execution script unpacks the training data and calls the retrain script from Tensorflow-for-poets. Important parameters are how_many_training_steps (can be reduced to speed up for testing) and the architecture. As the last parameter can be changed depending on how accurate the classifier has to be and also how much memory is available for the classifier this parameter is also transferred via a command line parameter.

The image can be build with:

docker build -t <namespace>/tensorflow-openwhisk-trainer:latest .

and pushed with:

docker push <namespace>/tensorflow-openwhisk-trainer:latest


apiVersion: v1
kind: Pod
  name: tensorflow-openwhisk-trainer
  restartPolicy: Never
    - name: tensorflow-openwhisk-trainer
      image: ansi/tensorflow-openwhisk-trainer:latest
      imagePullPolicy: Always
      - name: OS_USER_ID
        value: xxxx 
      - name: OS_PASSWORD
        value: xxxx
      - name: OS_PROJECT_ID
        value: xxxxxxx 
      - name: OS_REGION_NAME
        value: dallas
      - name: OS_BUCKET_NAME
        value: tensorflow
      - name: OS_FILE_NAME
      - name: TF_MODEL
        value: mobilenet_0.50_224   # inception_v3, mobilenet_0.50_224, mobilenet_0.50_128, mobilenet_0.50_16

After building the docker container and pushing it to docker hub this yaml file triggers Kubernetes to run the container with the given parameters, many taken from your Object Store credential file:

VCAP = {
  "auth_url": "",
  "project": "object_storage_07xxxxxx_xxxx_xxxx_xxxx_6d007e3f9118",
  "projectId": "512bfxxxxxxxxxxxxxxxxxxxxxxfe4e1",
  "region": "dallas",
  "userId": "4de3dxxxxxxxxxxxxxxxxxxxxxxx723b",
  "username": "member_caeae76axxxxxxxxxxxxxxxxxxxxxxxxxxxxxx7d",
  "password": "lfZxxxxxxxxxxxx.p",
  "domainId": "151fxxxxxxxxxxxxxxxxxxxxxxde602a",
  "domainName": "773073",
  "role": "member"
  • OS_USER_ID  -> VCAP[‘userId’]
  • OS_PASSWORD -> VCAP[‘password’]
  • OS_PROJECT_ID -> VCAP[‘projectId’]
  • OS_REGION_NAME -> VCAP[‘region’]
  • OS_BUCKET_NAME -> Up to you however you called it
  • OS_FILE_NAME -> Up to you, however you called it
  • TF_MODEL -> ‘mobilenet_0.50_{imagesize}’ or ‘inception_v3’

Use Object Store to store your trained class for later use

We decided to use Object Store to store our training data and also the re-trained network. This can be any other place as well, for example S3 on AWS or your local HDD. Just change the Dockerfile and exec file to download and upload your data correspondingly. More details on how to use the Object Store can be found in my blog article.


Accessing IBM Object Store from Python

IBM Object Store

SWIFT Object StoreIBM offers a S3 compatible Object Store as a file storage. Beside S3 the storage can also be accessed via the SWIFT protocol by selecting a different deploy model. As the cost for this storage is extremely low compared to Database storage it is perfect for storing sensor data or other kind of data for machine learning.

I use the storage for example to host my training data or trained model for Tensorflow. Access and payment for the Object Store is managed via IBM Cloud aka Bluemix. And as this offering is included in the Lite offering the first 25GB are for free. 🙂

As there is a problem getting the S3 credentials right now I use the SWIFT access model. Please make sure when you request the Object Store service to access the SWIFT version to select the right access model.

Python libs

As the SWIFT protocol is part of openstack, the python access client can be found at Depending on the security access model you also need the openstack Identity API (Keystone). Both libs are on github (swiftclient and keystone) and also available via pip.

pip install python-swiftclient
pip install python-keystoneclient

Access storage

Inside the IBM Cloud web interface you can create or read existing credentials. If your program runs on IBM Cloud (Cloudfoundry or Kubernetes) the credentials are also available via the VCAP environment variable. In both cases they look like mine here:

  "auth_url": "",
  "project": "object_storage_xxxxxxxx_xxxx_xxxx_b35a_6d007e3f9118", 
  "projectId": "512xxxxxxxxxxxxxxxxxxxxxe00fe4e1", 
  "region": "dallas",
  "userId": "e8c19efxxxxxxxxxxxxxxxxxxx91d53e",
  "username": "admin_1xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxa66",
  "password": "fTxxxxxxxxxxw8}l",
  "domainId": "15xxxxxxxxxxxxxxxxxxxxxxxxxxxx2a", 
  "domainName": "77xxx3",
  "role": "admin"

Important informations are the projectId, region, userId and password. The access with keystone the swift python client looks like this:

conn = Connection(key=VCAP['password'],
                              "region_name": VCAP['region']}

Important is the version information, also as part of the authurl.

Accessing data

Objects can be read and written, containers (aka buckets) can we read and modified as described in the documentation. For example:

resp_headers, containers = conn.get_account() # Get container
conn.put_container('containerName')  # Create new container
conn.put_object(container,  # Write files
resp_headers, obj_contents = conn.get_object('container', 'file.txt') '
conn.delete_object('container', 'file.txt')