This tutorial shows you how to use the Data Analytics Reference Stack (DARS), and to optionally build your own images with the baseline Dockerfiles provided in the DARS repository. Our assumption is that Clear Linux* OS is the host. However, any system that supports Docker* containers can be used to follow these steps.

The Data Analytics Reference Stack release

The Data Analytics Reference Stack (DARS) provides developers and enterprises a straightforward, highly optimized software stack for storing and processing large amounts of data. More detail is available on the DARS architecture and performance benchmarks.

The Data Analytics Reference Stack provides two pre-built Docker images, available on Docker Hub:

We recommend you view the latest component versions for each image in the README found in the DARS repository. Because Clear Linux OS is a rolling distribution, the package version numbers in the Clear Linux OS-based containers may not be the latest released by Clear Linux OS.

Note

The Data Analytics Reference Stack is a collective work, and each piece of software within the work has its own license. Please see the terms of use for more details about licensing and usage of the Data Analytics Reference Stack.

Using the Docker Images

To immediately start using the latest stable DARS images, pull directly from Docker Hub. For this tutorial we’ll use the Dars with MKL version of the stack.

Once you have downloaded the image, you can run it with

docker run -it --ulimit nofile=1000000:1000000 --name mkl <name of image>

This will launch the image and drop you into a bash shell inside the container. You will see output similar to the following:

root@fd5155b89857 /root # spark-shell
spark-shell
Config directory: /usr/share/defaults/spark/
Welcome to
  ____              __
 / __/__  ___ _____/ /__
 _\ \/ _ \/ _ `/ __/  '_/
/___/ .__/\_,_/_/ /_/\_\   version 2.4.0
   /_/

Using Scala version 2.12.7 (OpenJDK 64-Bit Server VM, Java 1.8.0-internal)
Type in expressions to have them evaluated.
Type :help for more information.

scala>

The –ulimit nofile parameter is currently required in order to increase the number of open files opened at certain point by the spark engine.

Building DARS Images

If you choose to build your own DARS container images, you can customize them as needed. Use the provided Dockerfile as a baseline. To construct images with Clear Linux OS, start with a Clear Linux OS development platform that has the containers-basic-dev bundle installed. Learn more about bundles and installing them by using swupd.

First, clone the DARS repository from GitHub.

git clone https://github.com/clearlinux/dockerfiles/tree/master/stacks/dars -b master

Then, inside the DARS directory, run make to build OpenBLAS and MKL images, and run make baseline to build the baseline CentOS image. Depending on the system, it may take a while to finish building. Once completed, check the resulting images with Docker

docker images | grep dars

You can use any of the resulting images to launch fully functional containers. If you need to customize the containers, you can edit the provided Dockerfile.