This tutorial shows you how to use the Data Analytics Reference Stack (DARS), and to optionally build your own images with the baseline Dockerfiles provided in the DARS repository. Our assumption is that Clear Linux* OS is the host. However, any system that supports Docker* containers can be used to follow these steps.
The Data Analytics Reference Stack (DARS) provides developers and enterprises a straightforward, highly optimized software stack for storing and processing large amounts of data. More detail is available on the DARS architecture and performance benchmarks.
The Data Analytics Reference Stack provides two pre-built Docker images, available on Docker Hub:
- A Clear Linux OS-derived DARS with OpenBlas stack optimized for OpenBLAS
- A Clear Linux OS-derived DARS with MKL stack optimized for MKL
We recommend you view the latest component versions for each image in the README found in the DARS repository. Because Clear Linux OS is a rolling distribution, the package version numbers in the Clear Linux OS-based containers may not be the latest released by Clear Linux OS.
Once you have downloaded the image, you can run it with
docker run -it --ulimit nofile=1000000:1000000 --name mkl <name of image>
This will launch the image and drop you into a bash shell inside the container. You will see output similar to the following:
root@fd5155b89857 /root # spark-shell spark-shell Config directory: /usr/share/defaults/spark/ Welcome to ____ __ / __/__ ___ _____/ /__ _\ \/ _ \/ _ `/ __/ '_/ /___/ .__/\_,_/_/ /_/\_\ version 2.4.0 /_/ Using Scala version 2.12.7 (OpenJDK 64-Bit Server VM, Java 1.8.0-internal) Type in expressions to have them evaluated. Type :help for more information. scala>
The –ulimit nofile parameter is currently required in order to increase the number of open files opened at certain point by the spark engine.
If you choose to build your own DARS container images, you can customize them as needed. Use the provided Dockerfile as a baseline. To construct images with Clear Linux OS, start with a Clear Linux OS development platform that has the containers-basic-dev bundle installed. Learn more about bundles and installing them by using swupd.
First, clone the DARS repository from GitHub.
git clone https://github.com/clearlinux/dockerfiles/tree/master/stacks/dars -b master
Then, inside the DARS directory, run make to build OpenBLAS and MKL images, and run make baseline to build the baseline CentOS image. Depending on the system, it may take a while to finish building. Once completed, check the resulting images with Docker
docker images | grep dars
You can use any of the resulting images to launch fully functional containers. If you need to customize the containers, you can edit the provided Dockerfile.