This tutorial describes how to run benchmarking workloads for TensorFlow*, PyTorch*, and Kubeflow in Clear Linux* OS using the Deep Learning Reference Stack.

Overview

We created the Deep Learning Reference Stack to help AI developers deliver the best experience on Intel® Architecture. This stack reduces complexity common with deep learning software components, provides flexibility for customized solutions, and enables you to quickly prototype and deploy Deep Learning workloads. Use this tutorial to run benchmarking workloads on your solution.

The Deep Learning Reference Stack is available in the following versions:

  • Intel MKL-DNN-VNNI, which is optimized using Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) primitives and introduces support for Intel® AVX-512 Vector Neural Network Instructions (VNNI).
  • Intel MKL-DNN, which includes the TensorFlow framework optimized using Intel® Math Kernel Library for Deep Neural Networks (Intel® MKL-DNN) primitives.
  • Eigen, which includes TensorFlow optimized for Intel® architecture.
  • PyTorch with OpenBLAS, which includes PyTorch with OpenBlas.
  • PyTorch with Intel MKL-DNN, which includes PyTorch optimized using Intel® Math Kernel Library (Intel® MKL) and Intel MKL-DNN.

Note

To take advantage of the Intel® AVX-512 and VNNI functionality with the Deep Learning Reference Stack, you must use the following hardware:

  • Intel® AVX-512 images require an Intel® Xeon® Scalable Platform
  • VNNI requires a 2nd generation Intel® Xeon® Scalable Platform

Stack features

Note

Performance test results for the Deep Learning Reference Stack were obtained using runc as the runtime.

Prerequisites

  • Install Clear Linux OS on your host system.
  • containers-basic bundle
  • cloud-native-basic bundle

In Clear Linux OS, containers-basic includes Docker*, which is required for TensorFlow and PyTorch benchmarking. Use the swupd utility to check if containers-basic and cloud-native-basic are present:

sudo swupd bundle-list

To install the containers-basic or cloud-native-basic bundles, enter:

sudo swupd bundle-add containers-basic cloud-native-basic

Docker is not started upon installation of the containers-basic bundle. To start Docker, enter:

sudo systemctl start docker

To ensure that Kubernetes is correctly installed and configured, follow the instructions in Run Kubernetes*.

Version compatibility

We validated these steps against the following software package versions:

  • Clear Linux OS 26240 (Lower version not supported.)
  • Docker 18.06.1
  • Kubernetes 1.11.3
  • Go 1.11.12

TensorFlow single and multi-node benchmarks

This section describes running the TensorFlow benchmarks in single node. For multi-node testing, replicate these steps for each node. These steps provide a template to run other benchmarks, provided that they can invoke TensorFlow.

  1. Download either the Eigen or the Intel MKL-DNN Docker image from Docker Hub.

  2. Run the image with Docker:

    docker run --name <image name>  --rm -i -t <clearlinux/
    stacks-dlrs-TYPE> bash
    

    Note

    Launching the Docker image with the -i argument starts interactive mode within the container. Enter the following commands in the running container.

  3. Clone the benchmark repository in the container:

    git clone http://github.com/tensorflow/benchmarks -b cnn_tf_v1.12_compatible
    
  4. Execute the benchmark script:

    python benchmarks/scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --device=cpu --model=resnet50 --data_format=NHWC
    

Note

You can replace the model with one of your choice supported by the TensorFlow benchmarks.

If you are using an FP32 based model, it can be converted to an int8 model using Intel® quantization tools

PyTorch single and multi-node benchmarks

This section describes running the PyTorch benchmarks for Caffe2 in single node.

  1. Download either the PyTorch with OpenBLAS or the PyTorch with Intel MKL-DNN Docker image from Docker Hub.

  2. Run the image with Docker:

    docker run --name <image name>  --rm -i -t <clearlinux/stacks-dlrs-TYPE> bash
    

    Note

    Launching the Docker image with the -i argument starts interactive mode within the container. Enter the following commands in the running container.

  3. Clone the benchmark repository:

    git clone https://github.com/pytorch/pytorch.git
    
  4. Execute the benchmark script:

    cd pytorch/caffe2/python
    python convnet_benchmarks.py --batch_size 32 \
                          --cpu \
                          --model AlexNet
    

Kubeflow multi-node benchmarks

The benchmark workload runs in a Kubernetes cluster. The tutorial uses Kubeflow for the Machine Learning workload deployment on three nodes.

Kubernetes setup

Follow the instructions in the Run Kubernetes* tutorial to get set up on Clear Linux OS. The Kubernetes community also has instructions for creating a cluster.

Kubernetes networking

We used flannel as the network provider for these tests. If you prefer a different network layer, refer to the Kubernetes networking documentation for setup.

Images

You must add launcher.py to the Docker image to include the Deep Learning Reference Stack and put the benchmarks repo in the correct location. From the Docker image, run the following:

mkdir -p /opt
git clone https://github.com/tensorflow/benchmarks.git /opt/tf-benchmarks
cp launcher.py /opt
chmod u+x /opt/*

Your entry point becomes: /opt/launcher.py

This builds an image that can be consumed directly by TFJob from Kubeflow.

ksonnet*

Kubeflow uses ksonnet* to manage deployments, so you must install it before setting up Kubeflow.

ksonnet was added to the cloud-native-basic bundle in Clear Linux OS version 27550. If you are using an older Clear Linux OS version (not recommended), you must manually install ksonnet as described below.

On Clear Linux OS, follow these steps:

swupd bundle-add go-basic-dev
export GOPATH=$HOME/go
export PATH=$PATH:$GOPATH/bin
go get github.com/ksonnet/ksonnet
cd $GOPATH/src/github.com/ksonnet/ksonnet
make install

After the ksonnet installation is complete, ensure that binary ks is accessible across the environment.

Kubeflow

Once you have Kubernetes running on your nodes, set up Kubeflow by following these instructions from the quick start guide.

export KUBEFLOW_SRC=$HOME/kflow
export KUBEFLOW_TAG="v0.4.1"
export KFAPP="kflow_app"
export K8S_NAMESPACE="kubeflow"

mkdir ${KUBEFLOW_SRC}
cd ${KUBEFLOW_SRC}
ks init ${KFAPP}
cd ${KFAPP}
ks registry add kubeflow github.com/kubeflow/kubeflow/tree/${KUBEFLOW_TAG}/kubeflow
ks pkg install kubeflow/common
ks pkg install kubeflow/tf-training

Next, deploy the primary package for our purposes: tf-job-operator.

ks env rm default
kubectl create namespace ${K8S_NAMESPACE}
ks env add default --namespace "${K8S_NAMESPACE}"
ks generate tf-job-operator tf-job-operator
ks apply default -c tf-job-operator

This creates the CustomResourceDefinition (CRD) endpoint to launch a TFJob.

Run a TFJob

  1. Select this link for the ksonnet registries for deploying TFJobs.

  2. Install the TFJob components as follows:

    ks registry add dlrs-tfjob github.com/clearlinux/dockerfiles/tree/master/stacks/dlrs/kubeflow/dlrs-tfjob
    
    ks pkg install dlrs-tfjob/dlrs-bench
    
  3. Export the image name to use for the deployment:

    export DLRS_IMAGE=<docker_name>
    

    Note

    Replace <docker_name> with the image name you specified in previous steps.

  4. Generate Kubernetes manifests for the workloads and apply them using these commands:

    ks generate dlrs-resnet50 dlrsresnet50 --name=dlrsresnet50 --image=${DLRS_IMAGE}
    ks generate dlrs-alexnet dlrsalexnet --name=dlrsalexnet --image=${DLRS_IMAGE}
    ks apply default -c dlrsresnet50
    ks apply default -c dlrsalexnet
    

This replicates and deploys three test setups in your Kubernetes cluster.

Results of running this tutorial

You must parse the logs of the Kubernetes pod to retrieve performance data. The pods will still exist post-completion and will be in ‘Completed’ state. You can get the logs from any of the pods to inspect the benchmark results. More information about Kubernetes logging is available from the Kubernetes community.

Use Jupyter Notebook

This example uses the PyTorch with OpenBLAS container image. After it is downloaded, run the Docker image with -p to specify the shared port between the container and the host. This example uses port 8888.

docker run --name pytorchtest --rm -i -t -p 8888:8888 clearlinux/stacks-pytorch-oss bash

After you start the container, launch the Jupyter Notebook. This command is executed inside the container image.

jupyter notebook --ip 0.0.0.0 --no-browser --allow-root

After the notebook has loaded, you will see output similar to the following:

To access the notebook, open this file in a browser: file:///.local/share/jupyter/runtime/nbserver-16-open.html
Or copy and paste one of these URLs:
http://(846e526765e3 or 127.0.0.1):8888/?token=6357dbd072bea7287c5f0b85d31d70df344f5d8843fbfa09

From your host system, or any system that can access the host’s IP address, start a web browser with the following. If you are not running the browser on the host system, replace 127.0.0.1 with the IP address of the host.

http://127.0.0.1:8888/?token=6357dbd072bea7287c5f0b85d31d70df344f5d8843fbfa09

Your browser displays the following:

Jupyter Notebook

Figure 1: Jupyter Notebook

To create a new notebook, click New and select Python 3.

Create a new notebook

Figure 2: Create a new notebook

A new, blank notebook is displayed, with a cell ready for input.

New blank notebook

To verify that PyTorch is working, copy the following snippet into the blank cell, and run the cell.

from __future__ import print_function
import torch
x = torch.rand(5, 3)
print(x)
Sample code snippet

When you run the cell, your output will look something like this:

code output

You can continue working in this notebook, or you can download existing notebooks to take advantage of the Deep Learning Reference Stack’s optimized deep learning frameworks. Refer to Jupyter Notebook for details.