Run your own docker registry with token-based identification behind nginx


How to build an controlled environment to distribute docker images based on user accounts

Docker itself, AWS (just to name the biggest docker hosts right now) and many more public / private repository servers are on the marked. But sometime there is need to host an own registry for docker images. One reason can be because we can, the other is for example to give individual pull / push rights to different images to different users and control the access also based on expiration dates.

Components and the big picture

For this setup we need several software components to work orchestrated together. Starting with the firewall to block all ports except the 443 for HTTPS, the nginx reverse proxy to terminate the SSL connection and protect the underlying services against direct access and also possible load balancing, the docker registry to host the images and at last but not least the docker token authenticator to identify users and give access to images (push and/or pull) based on their rights.

Docker introduced in the second version for the registry protocol the “Docker registry authentication scheme“. This basically transfers the access control to images to an outside system and uses the bearer token mechanism to communicate. The flow is to access an docker image is:

  1. Docker daemon accesses the docker registry server as usual and gets a 401 Unauthorized in return with a “WWW-Authenticate” header pointing to the authentication server the registry server trusts.
  2. Docker daemon contacts the authentication server with the given URL and the user identifies against the server.
  3. The authentication server checks the access rights based on username, password, image name and access type (pull/push) and returns a bearer token signed with the private key.
  4. Docker daemon accesses the docker registry again with the bearer token and the docker image request.
  5. Docker registry server checks the bearer token based on the authentication server public key and grants access or doesn’t.

Firewall

Ubuntu ships with a very simple firewall control script called “Uncomplicated Firewall“. The script manages the iptable configuration and lets the user configure ports with a single line. If you access the server via SSH make sure you allow ssh access before you activate the firewall. I also recommend installing fail2ban to ban script hacking.

sudo apt update
sudo apt install -y ufw fail2ban 
ufw allow ssh #only necessary when you need remote access
ufw allow https
ufw allow http
ufw enable 
ufw status

Nginx reverse proxy

We install Nginx also as a docker service because the update cycle is way faster compared to the software repository. The basic Nginx docker container is ready to be used and only needs the settings for http and https. Everything is handled via the https port but we also have http (port 80) open to have a redirect to https for everything with a 301 (moved permanently) return code.

FROM docker.io/nginx:latest

COPY   default.conf /etc/nginx/conf.d/default.conf
COPY   ssl.conf     /etc/nginx/conf.d/ssl.conf
COPY   cert /cert 

EXPOSE 80
EXPOSE 443

This is a very simple Dockerfile to to add the ssl certificates and the http/https configuration. We could also mount the ssl and configuration in the docker-compose file and leave the images plain as it is. Both options are valid and just a flavour.

server {
    listen      80;
    listen [::]:80;
    server_name registry.23-5.eu auth.23-5.eu;
    return 301 https://$host$request_uri;
}

This is the http configuration for nginx. Accepting everything for http and returning a 301 (moved permanently) to the same server and path just with https.

SSL configuration

SSL configuration is a little bit more complicated as we also specify the ciphers and parameters for the encryption. As this topic is endless and very easy to screw up I personally relay on https://cipherli.st as a configuration source.

openssl dhparam -out dhparams.pem 4096

The recommendation is to generate own Diffie–Hellman pool bigger than 2048 bit. This process can take a very long time. We add the result file together with our keys to the cert folder.

ssl_protocols              TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers  on;
ssl_dhparam                /cert/dhparams.pem;
ssl_ciphers                "ECDHE-RSA-AES256-GCM-SHA512:DHE-RSA-AES256-GCM-SHA512:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-SHA384";
ssl_ecdh_curve             secp384r1; 
ssl_session_cache          shared:SSL:10m;
ssl_session_timeout        10m;
ssl_session_tickets        off; 
ssl_stapling               on; 
ssl_stapling_verify        on; 
resolver                   8.8.8.8 9.9.9.9 valid=300s;
resolver_timeout           5s;
add_header                 Strict-Transport-Security "max-age=63072000; includeSubDomains; preload";
add_header                 X-Frame-Options DENY;
add_header                 X-Content-Type-Options nosniff;
add_header                 X-XSS-Protection "1; mode=block";

This configuration is based on the recommendation from cipherlist. Be aware one part of this setup is the Strict-Transport-Security with can cause a lot of long-time trouble if you mess it up. This completes the basic SSL setup.

map $upstream_http_docker_distribution_api_version $docker_distribution_api_version {
  '' 'registry/2.0';
}

This mapping helps to set the right header even when Nginx removed it because of authentication. Docker registry needs this information in the http header.

server {
    listen      443 ssl http2;
    listen [::]:443 ssl http2;

    server_name auth.23-5.eu;

    ssl_certificate         /cert/auth/fullchain.pem;
    ssl_certificate_key     /cert/auth/privkey.pem;
    ssl_trusted_certificate /cert/auth/chain.pem;

    location /auth {

        proxy_read_timeout    90;
        proxy_connect_timeout 90;
        proxy_redirect        off;

        proxy_set_header X-Real-IP         $remote_addr;
        proxy_set_header X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Proto https;
        proxy_set_header X-Forwarded-Port  443;
        proxy_set_header Host              $http_host;

        proxy_pass http://dockerauth:5001/auth;
    }
}

In this case we are running the registry and the auth server on the same virtual machine. Therefore both configurations are in the SSL.conf file. This one is for the auth server.

server {
    listen      443 ssl http2;
    listen [::]:443 ssl http2;

    server_name  registry.23-5.eu;

    ssl_certificate         /cert/registry/fullchain.pem;
    ssl_certificate_key     /cert/registry/privkey.pem;
    ssl_trusted_certificate /cert/registry/chain.pem;

    client_max_body_size 0;
    chunked_transfer_encoding on;

    location /v2/ {

        if ($http_user_agent ~ "^(docker\/1\.(3|4|5(?!\.[0-9]-dev))|Go ).*$" ) {
          return 404;
        }

        add_header 'Docker-Distribution-Api-Version' $docker_distribution_api_version always;

        proxy_pass http://registry:5000;
        proxy_set_header  Host              $http_host;   # required for docker client's sake
        proxy_set_header  X-Real-IP         $remote_addr; # pass on real client's IP
        proxy_set_header  X-Forwarded-For   $proxy_add_x_forwarded_for;
        proxy_set_header  X-Forwarded-Proto $scheme;
        proxy_read_timeout                  900;
    } 
}

And this configuration part for the registry server itself. Important here is the client_max_body_size parameter to make sure even bigger docker images are getting through. Older docker client versions getting a 404 because they can not be handled by the docker registry.

Lets encrypt

The easiest way to get a certificate is by using let’s encrypt. There are different ways how to receive a certificate, we just use a very simple one here with the standalone call. The certbot opens a mini web server on port 80 to handle the authentication request on its own. Therefore make sure the Nginx docker is not running.

certbot certonly -d registry.23-5.eu --standalone
certbot certonly -d auth.23-5.eu     --standalone

for i in registry auth client
do
 cp /etc/letsencrypt/live/${i}.23-5.eu/chain.pem     /root/nginx/cert/${i}/
 cp /etc/letsencrypt/live/${i}.23-5.eu/fullchain.pem /root/nginx/cert/${i}/
 cp /etc/letsencrypt/live/${i}.23-5.eu/privkey.pem   /root/nginx/cert/${i}/
done

Do the certificate request call for the auth and the registry certificate and copy the certificate and private key to your cert folder for the docker build to pick it up. Don’t forget the dhaprams.pem file.

Docker registry

Now as the server is configured and more or less secured, let’s configure the docker registry server and auth server. Docker inc. offers a docker registry docker container which is relatively easy to hande and to configure.

      - REGISTRY_AUTH=token
      - REGISTRY_AUTH_TOKEN_REALM=https://auth.23-5.eu/auth
      - REGISTRY_AUTH_TOKEN_SERVICE="Docker registry"
      - REGISTRY_AUTH_TOKEN_ISSUER="Acme auth server"
      - REGISTRY_AUTH_TOKEN_ROOTCERTBUNDLE=/ssl/domain.crt

The configuration is done in the docker-compose file itself. The important information is the REALM, so the docker registry can redirect the client to the auth server with the issuer and the cert bundle from the referred auth server to check the bearer token later.

Docker Token Authenticator

Docker Inc. does not provide an auth server out of the box as done with the registry itself. This is basically left for the registry provider to build their own. Luckily Cesanta stepped up and build a nice configurable auth server to be used with the registry server. docker_auth has different ways of how to store information about the user.

  • Static list of users
  • Google Sign-In
  • Github Sign-In
  • LDAP bind
  • MongoDB user collection
  • External Program (gets login parameters and returns 0 or 1)

In our case the way to go is the MongoDB user collection as we can control for each user individually who has access to which image and easily change it on the fly by modifying the user data in the DB itself.

server:  # Server settings.
  # Address to listen on.
  addr: ":5001"

token:
  issuer: "Acme auth server" # Must match issuer in the Registry config.
  expiration: 900
  certificate: "/ssl/domain.crt"
  key: "/ssl/domain.key"

mongo_auth:
  dial_info:
    addrs: ["authdb"]
    timeout: "10s"
    database: "23-5"
    username: "ansi"
    password_file: "/config/mongopass.txt"
    enabled_tls: false
  collection: "users"

acl_mongo:
  dial_info:
    addrs: ["authdb"]
    timeout: "10s"
    database: "23-5"
    username: "ansi"
    password_file: "/config/mongopass.txt"
    enabled_tls: false
  collection: "acl"
  cache_ttl: "10s"

This is the configuration file for the auth server. Mainly 4 parts.

  • Server
    • Witch port to listen on
    • Nginx handles the TLS termination, therefore, this server has no TLS handling.
  • Token
    • Use the same issuer as configured in the registry server itself and provide the certificate files for signing the bearer token.
  • Mongo_auth
    • Where the user information is stored, the password is saved in a simple ASCII file and how to access the MongoDB. In our case, as we are behind a firewall in a docker network we don’t use TLS to access thMongoDBDB.
  • ACL_Mongo
    • Beside the user information, the AccessControlList (ACL) can also be stored in a MongoDB. Same configuration as the mongo_auth but there is a cache information as this information is stored in memory and refreshed every 10 seconds.

MongoDB

mongo --host localhost --username root --password example --authenticationDatabase admin

use 23-5

db.createUser({user: "ansi", pwd: "test", roles: ["readWrite"], mechanisms: ["SCRAM-SHA-1"]})

mongo --host localhost --username ansi --password test --authenticationDatabase 23-5

db.users.insert({
    "username" : "waldi",
    "password" : "$2y$05$hxH........Ii33Csix8hC",
    "labels" : {"full-access":["test/*"],
                "read-only-access":["prod/*"]
               }
})

db.acl.insert([
  { "seq": 10,
    "match": {"name": "${labels:full-access}"},
    "actions": ["*"],
    "comment": "full access"
  },
  { "seq": 20,
    "match": {"name": "${labels:read-only-access}"},
    "actions": ["pull"],
    "comment": "pull access"
  }
])

The mongoDB was initialized by the docker-compose file with an admin user “root” and passwd “example”. We use this account to create a new database called “23-5” and set a new user there with username “ansi” and passwd “test”. This database stores all user and acls. The docker registry users by themselves are stored with an bencrypted password. and some labels. Bencrypt a passwd with:

sudo apt install apache2-tools
htpasswd -nB USERNAME

Beside username and password, we can also store labels of all kind to a given user. This allows us to use these labels for the ACLs again. So in our case, the ACLs defines all docker images with a given name (the name is stored in the label with read-only or full access) to access images based on their label. In our case, the user “waldi” has full access to all docker images with “test/*” and only read access to everything in “prod/*” but nothing else. ACLs have a seq number in which they were processed. The first patching ACL will be used.

Labels can be combined so for example:

ACL:
{
  "match": { "name": "${labels:project}/${labels:group}-${labels:tier}" },
  "actions": [ "push", "pull" ],
  "comment": "Contrived multiple label match rule"
}
USER:
{
    "username" : "busy-guy",
    "password" : "$2y$05$B.x.......CbCGtjFl7S33aCUHNBxbq",
    "labels" : {
        "group" : [
            "web",
            "webdev"
        ],
        "project" : [
            "website",
            "api"
        ],
        "tier" : [
            "frontend",
            "backend"
        ]
    }
}

Would give push and pull access to the docker image

website/webdev-backend

These variables can be checked for the ACL:

  • ${account} the account name aka username
  • ${name} the repository name “*” can be used. So for example “prod/*” gives access to “prod/server”

Generating bearer SSL key

In order to sign a bearer token we need a key. This can be a self signed key done with openssl:

openssl req \
       -newkey rsa:4096 \
       -days 365 \ 
       -nodes -keyout domain.key \
       -out domain.csr \
       -subj "/C=EU/ST=Germany/L=Berlin/O=23-5/CN=auth.23-5.eu"

openssl x509 \
       -signkey domain.key \
       -in domain.csr \
       -req -days 365 -out domain.crt

openssl req \
        -x509 \
        -nodes \
        -days 365 \
        -newkey rsa:2048 \
        -keyout server.key \
        -out server.pem

Docker-compose

We can configure and start the auth and registry server and nginx with one docker-compose file:

version: '3'

services:

  nginx:
    restart: always
    build:
      context: nginx
    ports:
      - 80:80
      - 443:443

  mongoclient:
    image: docker.io/mongoclient/mongoclient:latest
    restart: always
    depends_on:
      - authdb
    ports:
      - 3000:3000
    environment:
      - TZ=Europe/Berlin
      - STARTUP_DELAY=1

  authdb:
    image: docker.io/mongo:4.1
    restart: always
    volumes:
      - /root/auth_db:/data/db
    environment:
      - TZ=Europe/Berlin
      - MONGO_INITDB_ROOT_USERNAME=root
      - MONGO_INITDB_ROOT_PASSWORD=example
    ports:
      - 27017:27017
    command: --bind_ip 0.0.0.0

  dockerauth:
    image: docker.io/cesanta/docker_auth:1
    volumes:
      - /root/auth_server/config:/config:ro
      - /root/auth_server/ssl:/ssl:ro
    command: --v=2 --alsologtostderr /config/auth_config.yml
    restart: always
    environment:
      - TZ=Europe/Berlin

  registry:
    image: docker.io/registry:2
    volumes:
      - /root/auth_server/ssl:/ssl:ro
      - /root/docker_registry/data:/var/lib/registry
    restart: always
    environment:
      - TZ=Europe/Berlin
      - REGISTRY_AUTH=token
      - REGISTRY_AUTH_TOKEN_REALM=https://auth.23-5.eu/auth
      - REGISTRY_AUTH_TOKEN_SERVICE="Docker registry"
      - REGISTRY_AUTH_TOKEN_ISSUER="Acme auth server"
      - REGISTRY_AUTH_TOKEN_ROOTCERTBUNDLE=/ssl/domain.crt

I also added a mongoclient docker container to have easy access to the mongodb server. Please be aware this one is not secured by the nginx reverse proxy and is only for testing. You can also access the mongodb with command line:

docker exec -it root_authdb_1 mongo --host localhost --username root --password example --authenticationDatabase admin

The MongoDB docker is also called with a different command to give access outside of localhost. (–bind_ip 0.0.0.0)

Testing

docker-compose build 
docker-compose up -d

Is starting the setup. We have a docker registry user “waldi” with this setup:

[{"username": "waldi",
  "password": "$2......dKOIrAn.KxCfeEn7HhePFIO",
  "labels": {"full-access": ["test", "socke*"]}
  }
]

[{"seq": 10,
  "match":{"name": "${labels:full-access}"},
  "actions":["*"],
  "comment": "full access"
 },{
  "seq": 20,
  "match":{"name": "${labels:read-only-access}"},
  "actions":["pull"],
  "comment": "pull access"
  }
]

So user “waldi can write and read all repositories with either “test” or anything starting with “socke“. Let’s try it.

$ docker login registry.23-5.eu
Authenticating with existing credentials...
Login Succeeded

$ docker pull nginx
Using default tag: latest
latest: Pulling from library/nginx
Status: Image is up to date for nginx:latest

$ docker tag nginx:latest registry.23-5.eu/test:latest

$ docker push registry.23-5.eu/test:latest
The push refers to repository [registry.23-5.eu/test]
fc4c9f8e7dac: Pushed 
912ed487215b: Pushed 
778790 size: 948

$ docker tag nginx:latest registry.23-5.eu/socken-test:latest

$ docker push registry.23-5.eu/socken-test:latest            
The push refers to repository [registry.23-5.eu/socken-test]
fc4c9f8e7dac: Mounted from test 
912ed487215b: Mounted from test 
5dacd731af1b: Mounted from test 
latest: digest: sha256:c10f4146f30fda9f40946bc114afeb1f4e867877c49283207a08ddbcf1778790 size: 948

It works. Now let’s test the negative part and try if the push gets refused:

$ docker tag nginx:latest registry.23-5.eu/test-socke:latest 

$ docker push registry.23-5.eu/test-socke:latest     
The push refers to repository [registry.23-5.eu/test-socke]
fc4c9f8e7dac: Preparing 
912ed487215b: Preparing 
5dacd731af1b: Preparing 
denied: requested access to the resource is denied

It works! The user can be modified on the fly in the MongoDB and granted or revoked rights. There is one final test to check if the Nginx is secured: https://www.ssllabs.com/ssltest/index.html.

Fixing the caching_sha2 problem with wordpress and mysql verion 8

The problem

I am using wordpress with mysql both in a docker installation. The procedure for my setup is described here. Since mysql updated to version 8 they introduced caching_sha2 as the default password algorithm. When you use the auto update mechanism in wordpress everything is fine and wordpress still works with the native password version configured for the wordpress user. But if you use wordpress in a docker container and pull wordpress:latest there is a problem since wordpress 4.9.7 to access the mysql database: (Never thought I can use the word wordpress so many times in a sentence!)

Warning: mysqli::__construct(): Unexpected server respose while doing caching_sha2 auth: 109 in Standard input code on line 22
Warning: mysqli::__construct(): MySQL server has gone away in Standard input code on line 22
Warning: mysqli::__construct(): (HY000/2006): MySQL server has gone away in Standard input code on line 22
MySQL Connection Error: (2006) MySQL server has gone away

The solution

The solution is relatively easy. You need to change the wordpress user manually from ​”mysql_native_password” to “caching_sha2_password“. This can be done with a simple SQL call. First stop your wordpress docker container and keep the mysql docker container running. Then execute these commands.

docker exec -it blog_wordpress_db_1 bash
mysql -u root -pREALLYEPICSECURE
ALTER USER wordpressuser IDENTIFIED WITH caching_sha2_password BY 'REALLYEPICSECURE';
exit
exit

Replace blog_wordpress_db_1 with your mysql docker instance name (“docker ps”), “REALLYEPICSECURE” with your root password and “wordpressuser” with your wordpress username.

That is basically all. Now you can start your wordpress:latest docker container again and it should work.

 

Serious weather condition in your calendar

The need

Calendar
Calendar

I really like to plan the day in my calendar. Therefore I added a lot of external ical feeds like meetup, open-air cinema and for sure lauchlibrary. In order to decide on transportation I always have the weather underground page in a separate browser tab. This is very inconvenient, therefore I wrote a small script to get weather predictions via API call from wunderground and export an ical feed and update my google calendar with weather conditions.

Wunderground

Weather Underground is (or at least was for many years) the coolest weather page in the internet. Really great UI and a wonderful API to get current weather conditions and weather predictions for the next 10 days. Further more (and that is why I really really like it) users could send their own weather sensor data to the side to enhance the sensor mash network and get a nice visualization. Unfortunately the service is loosing features on a monthly basis and also the page itself is down for several hours every now and then. Very sad, but I still love it.

As I said they have a nice API to get weather forecast for the next 10 days on an hourly base. OK, we can all discuss how  dependable a weather prediction for a certain hour in 8 days is, but at least for the next days it is really helpful.  I am using the forecast10day and the hourly10day API endpoints to get a nicely formatted JSON document from wunderground. If you want to run this script for your own area you need an account and an API key as the calls are restricted (but for free).

PWS

My favorite Maker-space (Motionlab.berlin) has an epic weather phalanx (as I love to call it) and sends in local weather conditions to wunderground. Therefore I can ask beside weather conditions in a city for weather conditions based a certain weather reporting station. In our case its the IBERLIN1705 station. Check out current conditions here.

Forecast10day

The API call to http://api.wunderground.com/api/YOUR-API-KEY-HERE/forecast10day/q/pws:IBERLIN1705.json returns for each day of the next 10 days information about humidity, temperature (min/max), snow, rain, wind and many more. I take these data and create one calendar entry each morning at 06:00-06:15 with summary information for the day. Specially for days beyond the 4 days boundry this condition is more accurate then an hourly information. Getting this information in python is very easy:

 try:
            data   = json.loads(requests.get("http://api.wunderground.com/api/YOUR-API-HERE/forecast10day/q/pws:IBERLIN1705.json").content)
        except:
            print("Error in Forecast")
            return False

        for e in data['forecast']['simpleforecast']['forecastday']:
            day        = e['date']['day']
            month      = e['date']['month']
            year       = e['date']['year'] 
            conditions = e['conditions']
            humidity   = e['avehumidity']
            high       = e['high']['celsius']
            low        = e['low']['celsius']
            snow       = e['snow_allday']['cm']
            rain       = e['qpf_allday']['mm']

I am using requests to make the REST call and parse the “content” value with json loads. Easy as it looks. The data var contains the dictionary with all weather information on a silver tablet (if the API is not down, happens way to often).

Hourly10day

http://api.wunderground.com/api/YOUR-API-KEY/hourly10day/q/pws:IBERLIN1705.json contains the weather information on an hourly basis for the next 10 days, So the parsing is very similar to the forcast API call. I am specially interested here in rain, snow, temperature, wind, dewpoint and UV-Index as these are values I want to monitor and add calendar entries when they are outside a certain range.

  • Wind > 23 km/h
  • Temperature > 30 or < -10 C
  • UV-Index > 4 (6 is max)
  • Rain and Snow in general
  • (Temperature – Dew point) < 3

Humidity in general are not so important and highly dependent on the current temperature. But dew point (“the atmospheric temperature (varying according to pressure and humidity) below which water droplets begin to condense and dew can form.”) is very interesting when you want to know if it is getting muggy. Even when it is 10 C a very low difference between temperature and dew point means you really feel the cold crawling into your bones. 🙂

Ical

To create an Ical feed I use the icalendar library in python. Very handy to create events and export them as an ical (XML) feed.

newcal = Calendar()

event = Event()    
event.add('summary', "%s-%sC %s%% Rain:%s Snow:%s %s" % (low, high, humidity, rain, snow, conditions))
event.add('dtstart', datetime(year,month,day,6, 0,0,0,timezone('Europe/Berlin')))
event.add('dtend',   datetime(year,month,day,6,15,0,0,timezone("Europe/Berlin")))
event.add('description', DESC)

newcal.add_component(event)
return newcal.to_ical()

Summary will be the text your calendar program displays when displaying the calendar itself, while description will be displayed then showing calendar entry details. “dtstart” and “dtend” mark the time range. For the timezone I use the pytz library. “to_ical()”. That’s basically all you need to create an ical feed.

Google

The google calendar can import and subscribe to calendars. While import adds the calendar entries to an existing calendar once (great for concerts, public transport booking), subscribe creates a new calendar and updates the feed every > 24 hours. This is great for long lasting events like meetup or rocket starts but weather predictions changes several times per hour. Therefore I added a small feature to the script to actively delete and create calendar entries. So I can do it every 3 hours and keep the calendar up to date.

As always google offers nice and very handy API endpoints to manipulate the data. Beside calling the API Rest endpoint by hand there are libraries for different languages. I use the “googleapiclient” and “oauth2client” to access my calendar. First step is to create a new calendar in google, then active the calendar API in the developer console and create an API key for your app. The googleapiclient takes care of the Oauth dance and stares credentials in a local file.

store = file.Storage('token.json')
creds = store.get()

if not creds or creds.invalid:
  flow = client.flow_from_clientsecrets('credentials.json', SCOPES)
  creds = tools.run_flow(flow, store)
        
return build('calendar', 'v3', http=creds.authorize(Http()))

If you call this function the very first time to requires the OAuth dance. Basically call a webpage and give access to your google calendar. The secreats are stored in the token.json file and reloaded every call.

Deleting old events

service       = getService()
events_result = service.events().list(calendarId=CALENDAR_ID, maxResults=100, singleEvents=True, orderBy='startTime').execute()
events        = events_result.get('items', [])
        
for e in events:
  service.events().delete(calendarId=CALENDAR_ID, eventId=e['id']).execute()

“getService” calls the upper function to get an access object. “events().list().execute() request a list of the first 100 calendar entries and “events_result.get() returns an array with all calendar entries and their details. “service.events().delete().execute() removes these entries.

Creating new events

ge = {
       'summary'    : '',
       'description': DESC,
       'start': {
                 'dateTime' : '',
                 'timeZone' : 'Europe/Berlin',
                },
       'end':   {
                 'dateTime' : '',
                 'timeZone' : 'Europe/Berlin',
                }
     }

ge['summary']           = "%s-%sC %s%% Rain:%s Snow:%s %s" % (low, high, humidity, rain, snow, conditions)
ge['start']['dateTime'] = '%s-%s-%sT06:00:00' % (year, month, day)
ge['end'  ]['dateTime'] = '%s-%s-%sT06:15:00' % (year, month, day)

service = getService()
service.events().insert(calendarId=CALENDAR_ID, body=ge).execute()

Very similar to the delete calls, the add calls gets the credentials, and calls “events().insert().execute()” with a dictionary containing the detailed information.

Docker container

The docker container is very simple.

FROM python:latest

RUN pip install icalendar requests Flask oauth2client google-api-python-client iso8601

ADD Exporter.py      Exporter.py
ADD credentials.json credentials.json
ADD token.json       token.json

EXPOSE 80

CMD python /Exporter.py

I am using the latest python docker container, installing some libraries with pip and copy the python file, the creadentials and token json files.

The repo

The complete source code can be found in my github repository.

The calendar for Berlin weather conditions can be found and added here.

 

Workload container for autoscaling test with kubernetes

Workload
Workload

The Idea

Every now and then you want to test your installation, your server or your setup. Specially when you want to test auto scaling functionalities. Kubernetes has an out of the box auto scaler and the official descriptions recommends a test docker container for testing with a apache and php installation. This is really great for testing a web application where you have some workload for a relatively short time frame. But I would also like to test a scenario where the workload runs for a longer time in the kubernetes setup and generates way more cpu workload then a web application. Therefore I hacked a nice docker container based on a c program load generator.

The docker container

The docker container is basically a very very simple Flask server with only one entry point “/”. The workload itself can be configured via two parameters:

  • percentage How much cpu load will be generated
  • seconds How long will the workload be active

The docker container itself uses nearly no CPU cycles as Flask is the only python process being active and waits for calls to start using CPU cycles.

lookbusy

I use a very nice open source tool called lookbusy from Devin Carraway which consumes memory and cpu cycles based on command line parameters. Unfortunately the program has no parameter to configure the time span it shout run. Therefore I call it the unix command timeout to terminate its execution after the given amount of seconds.

The Flask python wrapper

import subprocess
from   threading import Thread
from   flask     import Flask, request

app = Flask(__name__)

def worker(percentage, seconds):
    subprocess.run(['timeout', str(seconds), '/usr/local/bin/lookbusy', '-c', str(percentage)])

@app.route('/')
def load(): 
    percentage = request.args.get('percentage') if "percentage" in request.args else 50
    seconds    = request.args.get('seconds')    if "seconds"    in request.args else 10
    Thread(target=worker, args=(percentage, seconds)).start()
    return "started"

if __name__ == "__main__":
    app.run(host='0.0.0.0', port=80, processes=10)

The only program is a python Flask one, very short and only takes the get call to its root folder, checks for the two parameters and starts a thread with the subprocess. The get call immediately returns as it also supports long run workload simulations.

The Dockerfile

FROM   python:latest
RUN    curl http://www.devin.com/lookbusy/download/lookbusy-1.4.tar.gz | tar xvz && \
       cd lookbusy-1.4 && ./configure && \
       make && make install && cd .. && rm -rf lookbusy-1.4
RUN    pip install Flask
ADD    server.py server.py
EXPOSE 80
CMD    python -u server.py

The docker container is based on python latest (at this time 3.6.4). I put all the curl, make, install and rm calls into a single line in order to have a minimal footprint for the docker layer as we do not need the source code any more. As Flask is the only requirements I also call it directly without the requirements.txt file. The “-u” parameter for the python call is necessary to prevent python from buffering the output. Otherwise it can be quite disturbing when trying to read the debug log file.

Building and pushing the docker container

docker build -t ansi/lookbusy .
docker push     ansi/lookbusy

Building and pushing it to hub.docker.com is straightforward and nothing special.

Testing it on a kubernetes cluster

I have chosen the IBM cloud to test my docker container.

Requesting a kubernetes cluster

Requesting a kubernetes cluster can be done after login with

bx cs cluster-create --name ansi-blogtest --location dal10 --workers 3 --kube-version 1.8.6 --private-vlan 1788637 --public-vlan 1788635 --machine-type b2c.4x16

This command uses the bluemix CLI with the cluster plugin to control and configure kubernetes on the IBM infrastructure. The parameters are

  • –name to give your cluster a name (will be very important later on)
  • –location which datacenter to use (in this case dallas). Use “bx cs locations” to get your possible locations for the chosen region
  • –workers how many worker nodes are requested
  • –kube-version which kubernetes version should be used. Use “bx cs kube-versions” to get the available versions. “(default)” is not part of the parameter call.
  • –private-vlan which vlan for the private network should be used. Use “bx cs vlans <location>” to get the available public and private vlans
  • –public-vlan see private vlan
  • –machine-type which kind of underlying configuration you want to use for your worker node. Use “bx cs machine-types <location>” to get the available machine types. The first number after the “.” is the amount of cores and one after “x” the the amount of RAM in GB.

This command takes some time (~1h) to generate the kubernetes cluster. BTW my bluemix cli docker container has all necessary tools and also a nice script called “start_cluster.sh” to query all parameters and start a new cluster. After the cluster is up and running we can get the kubernetes configuration with

bx cs cluster-config ansi-blog
OK
The configuration for ansi-blogtest was downloaded successfully. Export environment variables to start using Kubernetes.

export KUBECONFIG=/root/.bluemix/plugins/container-service/clusters/ansi-blog/kube-config-dal10-ansi-blog.yml

Starting a pod and replica set

kubectl run loadtest --image=ansi/lookbusy --requests=cpu=200m

We start the pod and replica set without a yaml file because the request is very straight forward. Important here is the parameter “–requests“. Without it the autoscaler can not measure the cpu load and it never triggers.

Exposing the http port

kubectl expose deployment loadtest --type=LoadBalancer --name=loadtest --port=80

Again because the call is so simple we directly call kubectl without a yaml file to expose the Port 80. We can check for the public IP with

kubectl get svc
NAME     TYPE         CLUSTER-IP   EXTERNAL-IP PORT(S)      AGE
loadtest LoadBalancer 172.21.3.160 <pending>   80:31277/TCP 23m

In case the cloud runs out of public IP addresses and the “EXTERNAL_IP” is still pending after several minutes we can use one of the workers public ip addresses and the dynamic assigned port. The port is visible with “kubectl get svc” at the “PORTS” section. The syntax is as always in docker internalport:externalport. The workers public IP can be checked with

bx cs workers ansi-blog
ID                                               Public IP     Private IP     Machine Type       State  Status Version
kube-dal10-cr1dd768315d654d4bb4340ee8159faa17-w1 169.47.252.96 10.177.184.212 b2c.4x16.encrypted normal Ready  1.8.6_1506

So instead of calling our service with a official public ip address on port 80 we can use

http://169.47.252.96:31277

Autoscaler

Kubernetes has a build in horizontal autoscaler which can be started with

kubectl autoscale deployment loadtest --cpu-percent=50 --min=1 --max=10

In this case it measures the cpu load and starts new pods when the load is over 50%. The autoscaler in this configuration never starts more than 10 and never less than 2 pods. The current measurements and parameters can be checked with

kubectl get hpa
NAME      REFERENCE           TARGETS  MINPODS MAXPODS REPLICAS AGE
loadtest  Deployment/loadtest 0% / 50% 1       10      1        23m

So right now the cpu load is 0 and only one replica is running.

Loadtest

Time to get call our container and start the load test. Depending on the URL we an use curl to start the test with

curl "http://169.47.252.96:31277/?seconds=1000&percentage=80"

and check the result after some time with

kubectl get hpa
NAME      REFERENCE           TARGETS  MINPODS MAXPODS REPLICAS AGE
loadtest  Deployment/loadtest 60%/50%  1       10      6        23m

As we see the load increases and autoscaler kicks in. More details can obtained with the “kubectl proxy” command.

Deleting the kubernetes cluster

To clean up we could either delete all pods and replica sets and services but we could also delete the complete cluster with

bx cs cluster-rm ansi-blog

 

Image Recognition with Tensorflow classification on OpenWhisk

The big picture

Image classificationAs described in a previous article we (Niklas and I) are going to use Tensorflow to classify images into pre-trained categories. The previous artikel was about  on how to train a model with Tensorflow on Kubernetes. This article here now describes how to use the pre trained model which is stored on Object Storage. Similar to the training we will also use docker to host our program but this time we will use OpenWhisk as a platform.

Like the first part I also use the Google training Tensorflow for Poets. This time not the code itself but I copied the important classification parts from their script into my python file.

OpenWhisk with Docker

OpenWhisk is the open source implementation of an so called serverless computing platform. It is hosted by apache and maintained by many companies. IBM offers OpenWhisk on their IBM cloud and for testing and even playing around with it it the use is for free. Beside python and javascript OpenWhisk also offers the possibility to run docker containers. Internally all python and javascript code is executed anyhow on docker containers. So we will use the same official Tensorflow docker container we used to build our training docker container.

Internally OpenWhisk has three stages for docker containers. When we register a new method the execution instruction is only stored in a database and as soon as the first call approaches OpenWhisk the docker container is pulled from the repository, then initialised by an REST call to ‘\init‘ and then executed by calling the REST interface ‘\run‘. The docker container keeps active and each time the method is called only the ‘\run‘ part is executed. After some time of inactivity the container is destroyed and needs to be called with ‘\init‘ again. After even more time of inactivity even the image is removed and need to be pulled again.

The setup

The code itself is stored on github. Let’s have a look first on how we build the Docker container:

Dockerfile

FROM tensorflow/tensorflow:1.4.0-py3

WORKDIR /tensorflow
COPY requirements.txt requirements.txt
RUN  pip install -r   requirements.txt

COPY classifier.py classifier.py

CMD python -u classifier.py

As you can see this Docker is now really simple. It basically installs the python requirements to access the SWIFT Object Store and starts the python program. The python program keeps running until the OpenWhisk system decides the stop the container.

We make heavy use of the idea of having a init and a run part in the execute code. So the python program has two main parts. The first on is init and the second run. Let’ have a look the init part first which is basically setting up the stage for the classification itself.

\init

@app.route('/init', methods=['POST'])
def init():
    try:

        message = flask.request.get_json(force=True, silent=True)

        if message and not isinstance(message, dict):
            flask.abort(404)

        conn = Connection(key='xxxxx',
                          authurl='https://identity.open.softlayer.com/v3',
                          auth_version='3',
                          os_options={"project_id": 'xxxxxx',
                                      "user_id": 'xxxxxx',
                                      "region_name": 'dallas'}
                          )

        obj       = conn.get_object("tensorflow", "retrained_graph.pb")
        graph_def = tf.GraphDef()
        graph_def.ParseFromString(obj[1])
        with graph.as_default():
            tf.import_graph_def(graph_def)

        obj    = conn.get_object("tensorflow", "retrained_labels.txt")
        for i in obj[1].decode("utf-8").split():
            labels.append(i)

    except Exception as e:
        print("Error in downloading content")
        print(e)
        response = flask.jsonify({'error downloading models': e})
        response.status_code = 512

    return ('OK', 200)

Unfortunately it is not so easy to configure the init part in a dynamic way with parameters from outside. So for this demo we need to build the Object Store credentials in our source code. Doesn’t feel right but for a demo it is ok. In a later article I will describe how to change the flow and inject  the parameters in a dynamic way. So what are we doing here?

  1. 10-16 is setting up a connection to the Object Store as described here.
  2. 18-22 is reading the pre trained Tensorflow graph directly into memory. tf is a global variable
  3. 24-26 is reading the labels which are basically a string of names separated by line breaks. The labels are in the same order as the categories in the graph

By doing all this in the init part we only need to do it once and the run part can concentrate on classifying the images without doing any time consuming loading any more.

Tensorflow image manipulation and classification

def run():

    def error():
        response = flask.jsonify({'error': 'The action did not receive a dictionary as an argument.'})
        response.status_code = 404
        return response

    message = flask.request.get_json(force=True, silent=True)

    if message and not isinstance(message, dict):
        return error()
    else:
        args = message.get('value', {}) if message else {}

        if not isinstance(args, dict):
            return error()

        print(args)

        if "payload" not in args:
            return error()

        print("=====================================")
        with open("/test.jpg", "wb") as f:
            f.write(base64.b64decode(args['payload']))

        file_reader      = tf.read_file("/test.jpg", "file_reader")
        #file_reader      = tf.decode_base64(args['payload'])
        image_reader     = tf.image.decode_jpeg(file_reader, channels=3, name='jpeg_reader')
        float_caster     = tf.cast(image_reader, tf.float32)
        dims_expander    = tf.expand_dims(float_caster, 0)
        resized          = tf.image.resize_bilinear(dims_expander, [224, 224])
        normalized       = tf.divide(tf.subtract(resized, [128]), [128])
        input_operation  = graph.get_operation_by_name("import/input")
        output_operation = graph.get_operation_by_name("import/final_result")
        tf_picture       = tf.Session().run(normalized)

        with tf.Session(graph=graph) as sess:
            results = np.squeeze(sess.run(output_operation.outputs[0], {input_operation.outputs[0]: tf_picture}))
            index   = results.argsort()
            answer  = {}

            for i in index:
                answer[labels[i]] = float(results[i])

            response = flask.jsonify(answer)
            response.status_code = 200

    return response

How to get the image

The image is transferred base64 encoded as part of the Line 24-25 request. Part of the dictionary is the key payload. I choose this because Node-red is using the same name for some kind of most important key. Tensorflow has a function to consume base64 encoded data as well but I could not get it to run with the image encoding I use. So I took the little extra step here and write the image on file and read it back later. By directly consuming it I think we could same some milliseconds processing time.

Transfer the image

  • Line 27 reads the image back from file
  • Line 29 decode the jpeg into an internal representation format
  • Line 30 cast the values to an float32 array
  • Line 31 adds a new dimension on the beginning of the array
  • Line 32 resizes the image to 224, 244 to have a similar size with the training data
  • Line 33 normalize the image values

Classify the image

  • Line 34-35 gets the input and output layer and stores it in the variables
  • Line 36 loads the image into Tensorflow
  • Line 39 here is the magic happening. Tensorflow processes the CNN with the input and output layer connected and consumes the Tensorflow image. Furthermore numpy is squeezing out all array nesting to a single array.
  • Line 40 has an array with probabilities for each category.

Mapp the result to labels

The missing last step is now to map the label names to the results which is be done in line 43 and 44.

Build and deploy it in OpenWhisk

The docker container can be build with

docker build -t <namespace>/tensorflow-openwhisk-classify:latest .

and pushed with

docker push <namespace>/tensorflow-openwhisk-classify:latest

Run it in OpenWhisk

After configuring the command line tool wsk the action itself can be created with

wsk action create tensorflow-classify --docker <namespace>/tensorflow-openwhisk-classify:latest

For testing we need an image base64 encoded as file on our local hard disk. Then we can invoke the call with

wsk action invoke --result tensorflow-classify --param payload `cat test.base64`

The first execution will take up to 15 seconds because the docker container will be pulled from docker hub and the graph will be loaded from the Object Store. Calls later should be around 150 milliseconds processing time. The parameter –result will force OpenWhisk to wait for the function to end and also show you the result on your command line.

{
    "daisy": 0.9998985528945923,
    "dandelion": 0.00007187054143287241,
    "roses": 4.515387388437375E-7,
    "sunflowers": 0.000029122467822162434,
    "tulips": 4.63972159303605E-11
}

If you want to get the log file and also an exact execution time try this command:

wsk activation get `wsk activation list | grep tensorflow-classify | cut -f 1 -d " " |head -n 1`
  • First call results in  “duration”: 3805. Your call itself took way longer in the first call because 3805 is only the execution of the docker container (including init) not the time it tooks OpenWhisk to pull the docker container from docker hub.
  • Second call results in  “duration”: 156.

Build a web UI

Well UI is nothing I can talk about. But have a look at Niklas blog post on how to build a web UI. An test installation can be found here: https://visual-recognition-tensorflow.mybluemix.net

Image Recognition with Tensorflow training on Kubernetes

The big picture

Modern Visual Recognition is done with deep neural networks (DNN). One framework (and I would say the most famous one) to build this kind of network is Tensorflow from Google. Being open source and specially awesome it is perfect to play around and build your own Visual Recognition System. As the compute power and specially the RAM memory raises there is now a chance of having much more complicated networks compared to the 90th where there where only one or two hidden layer.

One architecture is the Convolutional Neural Network (CNN). The idea is very close to brain structure. The basic idea is to intensively train a network on gazillions of images and let it learn features inside the many hidden layers. Only the last layer connects features to real categories. Similar to our brain the networks learns concepts and patterns but not really the picture groups.

After spending a lot of compute power to train these networks they can be easily reused to train new images by replacing only the last layer with a new one representing the to be trained categories. Training this network is only training the last connection between the last layer and the rest of the network. This training is extremely fast (only minutes) compared to month for the complete network. The charming effect is to train only the “mapping” from features to categories. This is what we are going now.

Basically the development of such a system can be divided into two parts. The first part (training) is described there. For the “use” aka classification have a look into the second part on my blog. I developed this system together with a good friend and colleague of mine. Check out Niklas Heidloff, here is his blog and twitter account. The described system has mainly three parts. Two docker containers described in this blog and one epic frontend described in Niklas blog. The source code can be found on github.

ImageNet

If you want to train a neural network (supervised learning) you need a lot of images in categories. Not ten or hundred but better hundred thousands or even 15 million pictures. A wonderful source for this is Imagenet.  >14 million pictures organized in >20k categories. So a perfect source to train this kind of network. Google has done the same and participated in the Large Scale Visual Recognition Challenge (ILSVRC). Not only Google but many other research institutes build networks on top of Tensorflow in order have a better image recognition. The outcome are pre-trained models which can be used for system like we want to build.

Tensorflow for poets

Like always it is best to stand on shoulders of giants. So in our case use the python code developed by google at the codelabs. In this very fascinating and content full online training on Tensorflow Google developed python code to retrain the CNN and also to use the new trained model to classify images. Well, actually the training part is just using the original code and wraps it into a docker container and connects this container to an Object Store. So no much new work there but a nice and handy way to use this code for an own project. I highly recommend taking the 15 minutes and take the online training to learn how to use Tensorflow and Python.

MobileNet vs. Inception

As discussed there are many trained networks available the most famous ones are Inception and MobileNet. Inception has a much higher classification rate but also needs more compute power. Both on training and on classification. While we use kubernetes on “the cloud” the training is not a big problem. But we wanted to use the classifier later on on OpenWhisk we need to take care of the RAM memory usage. (512MB). The docker container can we configured to train each model but for OpenWhisk we are limited to the MobileNet.

Build your own classifier

Visual Recognition ArchitectureAs you can see in the picture we need to build two containers. The left one is loading the training images and the categories from an Object Store, trains the neural network and uploads the trained net back to the Object Store. This container can run on your laptop or somewhere in “the cloud”. As I developed a new passion for Kubernetes I added a small minimal yaml file to start the docker container on a Kubernetes Cluster. Well not really with multiple instances as the python code only uses one container but see it as some kind of “offloading” the workload.

The second container (will be described in the next article)  runs on OpenWhisk and uses the pre-trained network downloaded from the Object Store.

Use docker / kubernetes to train your model

We use the official Tensorflow docker container with python support as published from Google and the training script from Tensorflow for poets.

Dockerfile

FROM tensorflow/tensorflow:1.4.0-py3

# Update repository and install git and zip
RUN apt-get update && \
    apt-get install -y git zip

# Install python requirements for swift
COPY requirements.txt requirements.txt
RUN  pip install -r   requirements.txt

# Get the tensorflow tainingscripts
WORKDIR /
RUN     git clone https://github.com/googlecodelabs/tensorflow-for-poets-2
WORKDIR /tensorflow-for-poets-2

# Copy the runtime script
COPY execscript.sh execscript.sh
RUN  chmod 700 execscript.sh

CMD /tensorflow-for-poets-2/execscript.sh

The Dockerfile is straightforward. We use the Tensorflow docker image as base and install the git and zip (unpacking the training data) packages. Then we install all necessary python requirements. As all the Tensorflow related packages for Python are already installed these packages are only for accessing the Object Store (see my blog article). Then we clone the official github tensorflow-for-poets repository, add our execution shell script and finish with the CMD to call this script.

Execution Script

#!/usr/bin/env bash

echo ${TF_MODEL}

export OS_AUTH_URL=https://identity.open.softlayer.com/v3
export OS_IDENTITY_API_VERSION=3
export OS_AUTH_VERSION=3

swift auth
swift download ${OS_BUCKET_NAME} ${OS_FILE_NAME}

unzip ${OS_FILE_NAME} -d tf_files/photos

python -m scripts.retrain                            \
       --bottleneck_dir=tf_files/bottlenecks         \
       --how_many_training_steps=5000                \
       --model_dir=tf_files/models/                  \
       --summaries_dir=tf_files/training_summaries   \
       --output_graph=tf_files/retrained_graph.pb    \
       --output_labels=tf_files/retrained_labels.txt \
       --architecture=${TF_MODEL}                    \
       --image_dir=tf_files/photos

cd tf_files

swift upload tensorflow retrained_graph.pb
swift upload tensorflow retrained_labels.txt

All important and sensitive parameters are configured via environment variables introduced by the docker container call. The basic and always the same parameters are set here. Where to do the keystone authentication and which protocol version for the Object Store. The swift commands downloads a zip file containing all training images in subfolders for each category. So you need to build a folder structure like this one:

. |
  |-Category-A/
  |-Category-B/
  |-Category-C/
  |-NegativeExampels/

The execution script unpacks the training data and calls the retrain script from Tensorflow-for-poets. Important parameters are how_many_training_steps (can be reduced to speed up for testing) and the architecture. As the last parameter can be changed depending on how accurate the classifier has to be and also how much memory is available for the classifier this parameter is also transferred via a command line parameter.

The image can be build with:

docker build -t <namespace>/tensorflow-openwhisk-trainer:latest .

and pushed with:

docker push <namespace>/tensorflow-openwhisk-trainer:latest

train.yml

apiVersion: v1
kind: Pod
metadata:
  name: tensorflow-openwhisk-trainer
spec:
  restartPolicy: Never
  containers:
    - name: tensorflow-openwhisk-trainer
      image: ansi/tensorflow-openwhisk-trainer:latest
      imagePullPolicy: Always
      env:
      - name: OS_USER_ID
        value: xxxx 
      - name: OS_PASSWORD
        value: xxxx
      - name: OS_PROJECT_ID
        value: xxxxxxx 
      - name: OS_REGION_NAME
        value: dallas
      - name: OS_BUCKET_NAME
        value: tensorflow
      - name: OS_FILE_NAME
        value: flower_photos.zip
      - name: TF_MODEL
        value: mobilenet_0.50_224   # inception_v3, mobilenet_0.50_224, mobilenet_0.50_128, mobilenet_0.50_16

After building the docker container and pushing it to docker hub this yaml file triggers Kubernetes to run the container with the given parameters, many taken from your Object Store credential file:

VCAP = {
  "auth_url": "https://identity.open.softlayer.com",
  "project": "object_storage_07xxxxxx_xxxx_xxxx_xxxx_6d007e3f9118",
  "projectId": "512bfxxxxxxxxxxxxxxxxxxxxxxfe4e1",
  "region": "dallas",
  "userId": "4de3dxxxxxxxxxxxxxxxxxxxxxxx723b",
  "username": "member_caeae76axxxxxxxxxxxxxxxxxxxxxxxxxxxxxx7d",
  "password": "lfZxxxxxxxxxxxx.p",
  "domainId": "151fxxxxxxxxxxxxxxxxxxxxxxde602a",
  "domainName": "773073",
  "role": "member"
}
  • OS_USER_ID  -> VCAP[‘userId’]
  • OS_PASSWORD -> VCAP[‘password’]
  • OS_PROJECT_ID -> VCAP[‘projectId’]
  • OS_REGION_NAME -> VCAP[‘region’]
  • OS_BUCKET_NAME -> Up to you however you called it
  • OS_FILE_NAME -> Up to you, however you called it
  • TF_MODEL -> ‘mobilenet_0.50_{imagesize}’ or ‘inception_v3’

Use Object Store to store your trained class for later use

We decided to use Object Store to store our training data and also the re-trained network. This can be any other place as well, for example S3 on AWS or your local HDD. Just change the Dockerfile and exec file to download and upload your data correspondingly. More details on how to use the Object Store can be found in my blog article.

 

Bash script for automatic picture enhancement and upload to Watson Visual Recognition classifier

Visual Recognition
Visual Recognition Tool

pushVisualRecognition.sh

I hacked a nice script for the Watson Visual Recognition service. There is already a very helpful page available here but many people (including me) like command line tools or scripts to automate processes. The script does the following processes to each picture:

  1. Resize to max 500×500 pixel. Watson internally use only ±250 pixels, so this saves a lot of upload time.
  2. Enhance the image (normalisation) for better results.
  3. Autorotate the images based on the EXIF data from your camera. Watson ignores EXIF data.

The tool expects this directory structure and reads all necessary information from it:

  • Classifiername
    • Classname
      • <more then 10 files>.jpg

The Visual Recognition key is read from the “VISUAL_KEY” environment variable.

How to install it

The pushVisualRecognition.sh script is part of the bluemixcli docker container as described here. It basically only needs imagemagick and zip installed so you can also run it without the docker container and download the script directly from github with this link. If you want to run it with docker the command is

docker run --rm -it -v ${PWD}:/root/host -e VISUAL_KEY=<add your key here> ansi/bluemixcli /bin/bash

How to run it

Simply call pushVisualRecognition.sh in your directory, all necessary information will be retrieved from the directory structure and the environment variable.

Example

Create a directory structure like this one:

  myclassifier
   classa
     image01.jpg
     image02.jpg
     ...
     image23.jpg
   classb
     image01.jpg
     image02.jpg
     ...
     image23.jpg
   negative
     image01.jpg
     image02.jpg
     ...
     image23.jpg

Calling pushVisualRecognition.sh will result in:

root@9a874a1af6e6:~/host/Dropbox/Apps# pushVisualRecognition.sh
Work on classifier: myclassifier/
Work on class: classa/
Work on class: classb/
Work on class: negative/
{
 "classifier_id": "myclassifier_2110375920",
 "name": "myclassifier",
 "owner": "af63a091-ea7c-4d85-bcc6-1b62762f7dcb",
 "status": "training",
 "created": "2017-09-20T17:58:06.417Z",
 "classes": [
 {"class": "classa"},
 {"class": "classb"}
 ]
}root@9a874a1af6e6:~/host/Dropbox/Apps#

Docker container with Bluemix CLI tools

BluemixCLI on docker hub
BluemixCLI on docker hub

Being an developer advocate means to play always with the latest version of tools and being on the edge. But installed programs are getting out of date and so I always end up with having installed old versions of CLI tools. One reason why I love cloud (aka other people’s computers) computing so much is because I don’t need to update the software, it is done by professionals. In order to have always the latest version of my Bluemix CLI tools in hand and being authenticated I compiled a little docker container with my favourite command line tools. cf, bx, docker and wsk.

Getting the docker container

I published the docker container on the official docker hub. So getting it is very easy when the docker tools are installed. This command will download the latest version of the container and therefore the latest version of installed cli tools. We need to run this command from time to time to make sure the latest version is available on our computer.

docker pull ansi/bluemixcli

Get the necessary parameters

For all command line tools we need username, passwords and IDs. Obviously we can not hardcode them into the docker container therefore we need to pass them along as command line parameters when starting the docker container.

  • Username (the same as we use to login to Bluemix)
  • Password (the same as we use to login to Bluemix)
  • Org (The Organisation we want to work in, must already be existing)
  • Space (The Space we want to work in, must already be created)
  • AccountID (This can we catched from the URL when we open “Manage Organisation and click on the account)
  • OpenwhiskID (Individual for org and space, get be catched here: https://console.bluemix.net/openwhisk/learn/cli)

Run the container

The container can be started with docker run and passing all parameters with -e in:

docker run -it --rm                      \
-e BX_USERNAME=<Bluemix Username>        \
-e BX_PASSWORD=<Bluemix Password>        \
-e BX_ORG=<Bluemix Organisation>         \
-e BX_SPACE=<Bluemix Space>              \
-e BX_ACCOUNT_ID=<Bluemix Account ID>    \
-e WSK_AUTH=<Openwhisk Authentification> \
-v ${PWD}:/root/host                     \
ansi/bluemixcli /bin/bash

Line 8 mounts the local directory inside the docker container under /root/host. So we can fire up the container and have a bash with the latest tools and our source code available.

Use the tools

Before we can use the tools we need to configure them and authenticate against Bluemix. The script “init.sh” which is located in “/root/” (our working directory) takes care of all logins and authentications.

cf

The Cloudfoundry command line tool for starting, stopping apps and connecting services.

bx

The Bluemix version of the Cloudfoundry command line tool. Including the plugin for container maintenance. By initializing this plugin we also get the credentials and settings for the docker client to use Bluemix as a docker daemon.

docker

The normal docker client with Bluemix as daemon configured.

wsk

The OpenWhisk client already authenticated.

We can configure an alias in our .bashrc so by just typing “bxdev” we will have bash with the latest cli tools available.

Optimize pictures for visual recognition with openCV and gimp

Visual Recognition

Watson result

Computer Vision or Visual Recognition is part of cognitive computing (CC) aka Artificial Intelligence. One of the main concepts is to extract information out of unstructured data. For example you have a webcam pointing on a highway. As a human you see if there is a traffic jam or not. For a computer it’s only 640x480x3x8 (7.372.800) bit. Visual Recognition helps you to extract information out of this data. For example “This is a highway”. Out of the box systems like Watson are able to give you information what do you see on the picture. You can try it here https://visual-recognition-demo.mybluemix.net. The result can be seen on the left picture. So Watson knows it is a highway and even it’s a divided highway but it does not tell you there is a traffic jam or even a blocked road. Fortunately Watson is always eager to learn, let us see how we can teach him what is a traffic jam. This article only focuses on the picture preparation part not the train Watson part. See next postings for the Watson part.

Get pictures

There are many traffic cameras all around but I am not sure about the licence, so it is hard to use it here as a demo. But let us assume we can take pictures like this one from Wikimedia: Cars in I-70.If you live in south Germany there are nice traffic cameras from Strassenverkehrszentrale BaWue. Unfortunately they don’t offer the pictures with the right licence for my blog. If you know a great source for traffic cameras with the right licence please let me know.

Prepare pictures for training

Visual Recognition works a little bit like magic. You give watson 100 pictures of a traffic jam and 100 without traffic jam and he learns the difference. But how do we make sure he really learns traffic jam and not the weather or the light conditions. And furthermore only one lane in case the camera shows both lanes? So first we need to make sure we find enough different pictures of the road with traffic jam under different weather and light conditions. The second part can be done with OpenCV. OpenCV stands for open computer vision and helps you to manipulate images. The idea is to mask out parts we don’t want Watson to learn. In our case the second part of the lane and the sky. We can use GIMP to create a mask we can apply with openCV automatically to each picture.

Gimp

GIMP-Layers

First step is obvious to load the image in GIMP. Then open the layers dialog. It’s located under Windows/Dockable Dialogs/Layers or cmd-L. Here we add a new layer and select this one to paint on. Then we select in the tools menu the Paintbrush Tool and just paint the parts black we don’t want Watson to learn.

Blanked-image

Then we hide the original image by pressing the eye symbol in the layer dialog. This should leave us with only the black painting we did before. This will be our mask for openCV to be applied to all pictures. Under File/Export you can save it as mask.jpg. Make sure it is only the black mask and not the picture with the black painting.

Use openCV in docker

As openCV is quite a lot to install, we could easily use it within docker to work with our pictures. We can mount host directories inside a docker container, so in this case our directory with pictures:

docker run --name opencv --rm -it -v $(pwd):/host victorhcm/opencv /bin/bash

This brings up the openCV docker container from victorhcm and opens a shell with our current directory mounted under /host. As soon es you exit the container it will be removed because of the “–rm” parameter. Don’t worry only the docker container will be deleted, everything under /host is mounted from the host system and will remain. Everything you save in other directories will be deleted.

How to mask out part of the picture

The python program to use openCV to mask out all pictures in a directory is then really easy to use:

import cv2
import glob

mask = cv2.imread("mask.jpg", cv2.IMREAD_GRAYSCALE)

for fullname in glob.glob("pics/*.jpg"):
    filename  = fullname.split('/')[-1]
    image     = cv2.imread(fullname, cv2.IMREAD_COLOR)
    dst       = cv2.bitwise_and(image, image, mask=mask)
    cv2.imwrite("masked/" + filename, dst)
    print filename

Basically the program iterate through all “jpg” pictures in the subfolder “pics” and saves the masked pictures with the same name in the “masked” folder. Both directories have to exists before you start the script. In order to keep the script reduced to the important parts I left the create and check directory part out of this script.

Line 4

Loads the mask images as a grayscale image.

Line  8

Loads the image to work on as a colour image.

Line 9

Here is the real work done, this applies the mask with bitwise add of all pixels. Therefore the blank will win and the transparent will let the normal picture gets through.

Line 10

Saves the new masked picture in the “maksed” folder.

Preselect pictures

For the learning process we need to sort the pictures by hand. One bucked with traffic jam and the other with ok.

WordPress docker setup with nginx proxy

Setup a wordpress blog on docker with nginx as reverse proxy

docker setup with wordpress, nginx and mysql
Docker setup with wordpress, nginx and mysql containern

My friend und very experienced colleague Niklas Heidloff convinced me to start also a blog with all the geeky things I am doing all day long. In order to make it more interesting for me I decided to host wordpress on my own instead of using wordpress as a service (WaaS ?). Sylwester from Fablab.berlin pointed me to Vultr for hosting and so here we start. The idea is to have a simple docker setup with 3 containers. One for the Nginx webserver facing the evil internet and proxying the wordpress which is located in the local private docker network. Doing so I can also redirect to other web services (containers) later. Luckily there are already maintained docker container available for all 3 parts, so I only need to customize the nginx container with a dockerfile but can use the two others right as they are.

Install Docker on Ubuntu

Either you go with a docker provider like Bluemix or you get a virtual machine from softlayer or any other provider. In my case I have chosen a virtual server so I had to install docker on Ubuntu LTS. Which is really easy. Basically you add a new repository entry to your apt sources and install latest stable docker packages. There is also a script available on get.docker.com but I don’t feel comfortable to execute a shell script right from the net with root access. But it’s up to you.

wget -qO- https://get.docker.com/ | sh

Docker on linux does not contain docker-compose compared to the docker installation for example on mac. Installing docker compose is straightforward. The docker compose script can be downloaded from github here: https://github.com/docker/compose/releases.

curl -L https://github.com/docker/compose/releases/download/1.14.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

Docker-compose

Docker-compose takes care of a docker setup containing more than one docker container, including network and also basic monitoring. The following script starts and builds all docker container with nginx, mysql and wordpress. It also exports the volumes on the host file system for easy backup and persistence along docker container rebuilds and monitors if the docker containers are up and running.

version: '3'

services:
   db:
     image: mysql:latest
     volumes:
       - ./db:/var/lib/mysql
     restart: always
     environment:
       MYSQL_ROOT_PASSWORD: easytoguess
       MYSQL_DATABASE: wordpress
       MYSQL_USER: wordpress
       MYSQL_PASSWORD: eveneasier

   wordpress:
     depends_on:
       - db
     image: wordpress:latest
     restart: always
     volumes:
       - ./wordpress:/var/www/html/wp-content
     environment:
       WORDPRESS_DB_HOST: db:3306
       WORDPRESS_DB_USER: wordpress
       WORDPRESS_DB_PASSWORD: eveneasier
       WORDPRESS_DB_NAME: wordpress

   nginx:
     depends_on:
       - wordpress
     restart: always
     build:
       context: .
       dockerfile: Dockerfile-nginx
     ports:
       - "80:80"

Mysql is the first container we bring up with environment variables for the database like username, password and database name. Line 7 takes care to save the database file outside the docker container so you can delete the docker container, start a new one and still have the same database up and running. Point this where you want to have it. In this case in “db” under the same directory. Also make sure you come up with decent passwords.

The second container is wordpress. Same here with the host folder on line 21. Furthermore make sure you have the same user, password and db name configured as in the mysql container configuration.

Last one is nginx as internet facing container. You expose the port 80 here. While you just specify a container in the other two, in this one you configure a Dockerfile and a build context to customize your nginx regarding to the network setup. If you only want to host static files you can add this via volume mounts, but in our case we need to configure nginx itself so we need a customized Dockerfile as described below.

Dockerfile for nginx setup

FROM nginx:latest
COPY   default.conf /etc/nginx/conf.d/default.conf
VOLUME /var/log/nginx/log/
EXPOSE 80

This dockerfile inherits everything from the latest nginx and copies the default.conf file into it. See next chapter for how to setup the config file.

Nginx config file

server {
    listen            80;
    listen       [::]:80;
    server_name  www.23-5.eu ansi.23-5.eu;
    access_log  /var/log/nginx/log/unsecure.access.log  main;

    location / {
        proxy_read_timeout    90;
        proxy_connect_timeout 90;
        proxy_redirect        off;
        proxy_pass http://wordpress;

        proxy_set_header      X-Real-IP $remote_addr;
        proxy_set_header      X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header      Host $host;
    }
}

Line 2 and 3 configures the port we want to listen on. We need one for ip4 and one for ip6. Important is the proxy configuration in line 8 to 15. Line 11 redirect all calls to “/” (so without a path in the URL) to the server wordpress. As we used docker-compose for it docker takes care to make the address available via the internal DNS server. Line 13-15 rewrites the http header in order to map everything to the different URL, otherwise we would end up with auto generated links in docker pointing to http://wordpress

Start the System

If everything is configured and the docker-compose.yml, default.conf, Dockerfile-nginx and the folders db and wordpress are in the same folder, we can start everything being in this folder with:

docker-compose up --build -d

The parameter “-d” starts the setup in the background (daemon). For the very first run I would recommend using it without the “-d” parameter to see all debug messages.