Microsoft, responding to numerous user requests, introduced in May 2020, at the Build conference, a new feature of the Windows subsystem for Linux 2 ( Windows Subsystem for Linux 2 , WSL 2) - support for video accelerators. This allows you to run specialized computing applications in WSL 2. GPU support will open the way for professional tools, help solve WSL 2 tasks that currently can only be solved on Linux. Now, similar tasks can be solved in Windows, using the capabilities of the GPU.

It is extremely important here that WSL comes with support for the hardware-software architecture of parallel computing NVIDIA CUDA .

The material, the translation of which we are publishing, was prepared by NVIDIA specialists. Here we’ll talk about what you can expect from CUDA in the WSL 2 Public Preview.

ITKarma picture

Launching the AI ​​frameworks used in Linux in WSL 2 containers

What is WSL?


WSL is a feature of Windows 10 that allows you to use Linux command line tools directly on Windows without having to deal with the complexities of applying a dual boot configuration. WSL is a containerized environment that is tightly integrated with Microsoft Windows. This allows you to run Linux applications with traditional Windows applications and with modern applications distributed through the Microsoft Store.

WSL is primarily a tool for developers. If you are working on certain projects in Linux containers, this means that you can do the same things locally, on a Windows computer, using familiar Linux tools. Usually, to run such applications on Windows, you need to spend a lot of time setting up the system, you need some third-party frameworks, libraries. Now, with the release of WSL 2, everything has changed. WSL 2 brings full support for the Linux kernel to the Windows world.

WSL 2 and GPU Paravirtualization Technology (GPU-PV) enabled Microsoft to take Linux support for Windows to the next level, making it possible to launch GPU-based computing loads. Below we’ll talk more about how GPU usage in WSL 2 looks like.

If you are interested in the topic of support for video accelerators in WSL 2, take a look at this material and this repository.

CUDA in WSL


In order to take advantage of the GPU features in WSL 2, you need to have a video driver installed on your computer that supports Microsoft WDDM . Such drivers are created by manufacturers of video cards - such as NVIDIA.

CUDA technology allows you to develop programs for NVIDIA video accelerators. This technology is supported in WDDM, in Windows, for many years. Microsoft's new WSL 2 container provides GPU-accelerated computing capabilities that CUDA technology can take advantage of, which enables CUDA-based programs to run in the WSL environment. See the user guide for working with CUDA in WSL for more details.

WSL CUDA support is included in NVIDIA drivers for WDDM 2.9. These drivers are easy to install on Windows. WSL CUDA user-mode drivers (libcuda.so) automatically become available inside the container, they can be detected by the loader.

The NVIDIA driver development team has added WDDM and GPU-PV support to the CUDA driver.This is done so that these drivers could work in a Linux environment running on Windows. These drivers are still in Preview status, their release will take place only then, the official release of WSL with GPU support will take place. Driver release details can be found here .

The following figure shows the connection of the CUDA driver to WDDM inside the Linux guest system.

ITKarma picture

CUDA-enabled WDDM user mode driver running on Linux guest system

Suppose you are the developer who installed the WSL distribution on the latest Windows build from Fast Ring (build 20149 or later) Microsoft Windows Insider Program (WIP). If you switched to WSL 2 and you have an NVIDIA GPU, you can test the driver and run your GPU computing code in WSL 2. To do this, just install the driver in the Windows host system and open the WSL container. Here, without additional efforts, you will have the opportunity to work with applications that use CUDA. The following figure shows how a TensorFlow application that uses CUDA features runs in a WSL 2 container.

ITKarma picture

TensorFlow container running in WSL 2

The fact that CUDA technology is now available in WSL allows you to run applications in WSL that previously could only be run in a regular Linux environment.

NVIDIA is still actively working on this project and is making improvements to it. Among other things, we are working on adding WDDM APIs that were previously designed exclusively for Linux. This will lead to the fact that in WSL, without additional efforts on the part of the user, more and more applications will be able to work.

Another issue that interests us is performance. As already mentioned, GPU support in WSL 2 seriously uses GPU-PV technology. This can adversely affect the speed of performing small tasks on the GPU, in situations where pipelining will not be used. Right now we are working towards the greatest possible reduction in such effects.

NVML


NVML technology is not included in the source driver package, we are striving to fix this, planning to add support for NVML and support for other libraries in WSL.

We started with the main CUDA driver, which will allow users to run most of the existing CUDA applications, even at an early stage of the appearance of CUDA support in WSL. But as it turned out, some containers and applications use NVML to get GPU information even before loading CUDA. This is why adding NVML support to the WSL is one of our top priorities. It is entirely possible that soon we will be able to share the good news regarding the solution of this problem.

GPU containers in WSL


In addition to support for WSL 2 DirectX and CUDA, NVIDIA is working to add support for NVIDIA Container Toolkit to WSL 2 (formerly called nvidia-docker2). Containerized GPU applications that data scientists create to run on a local or cloud Linux environment can now run in WSL 2 on computers running Windows without making any changes to them.

Some special WSL packages are not required for this. The NVIDIA runtime library (libnvidia-container) can dynamically detect the libdxcore library and use it when the code runs in a WSL 2 environment that supports GPU acceleration. This happens automatically after installing the Docker and NVIDIA Container Toolkit packages, just like on Linux. This allows you to run containers using the GPU features in WSL 2 without any extra effort.

We strongly recommend those who want to use the CDMY0CDMY option to install the latest Docker tools (03/19 or later). To enable WSL 2 support, follow the instructions for your Linux distribution and install the latest version of CDMY1CDMY available.

How it works? All tasks specific to WSL 2 can be solved using the libnvidia-container library. Now this library can, at runtime, detect the presence of libdxcore.so and use this library to detect all the GPUs visible to this interface.

If you need to use these GPUs in the container, then using libdxcore.so, you will access the driver storage location, the folder that contains all the driver libraries for the Windows host system and WSL 2. The libnvidia-container.so library is responsible for configuring the container so that you can correctly access the driver repository. The same library is responsible for setting up the core libraries supported by WSL 2. A diagram of this is shown in the following figure.

ITKarma picture

The scheme for detecting and displaying the driver storage container used by libnvidia-container.so in WSL 2

In addition, this is different from the logic used outside of WSL. This process is completely abstracted using libnvidia-container.so and it should be as transparent as possible for the end user. One of the limitations of this earlier version is the impossibility of choosing a GPU in environments that have multiple GPUs. All GPUs are always visible in the container.

In the WSL container, you can run any NVIDIA Linux containers you are already familiar with. NVIDIA supports the most interesting Linux tools and workflows used by professionals. Download the container of interest to you from NVIDIA NGC and try it out.

Now we’ll talk about how to run TensorFlow and N-body containers in WSL 2 that are designed to use NVIDIA GPUs to speed up computing.

Launch N-body container


Install Docker using the installation script:

user@PCName:/mnt/c$ curl https://get.docker.com | sh 

Install the NVIDIA Container Toolkit. WSL 2 support is available starting with nvidia-docker2 v2.3 and the libnvidia-container 1.2.0-rc.1 runtime library.

Configure the CDMY2CDMY and CDMY3CDMY repositories and the GPG key. WSL 2 runtime code changes are available in the experimental repository.

user@PCName:/mnt/c$ distribution=$(./etc/os-release;echo $ID$VERSION_ID) user@PCName:/mnt/c$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - user@PCName:/mnt/c$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee/etc/apt/sources.list.d/nvidia-docker.list user@PCName:/mnt/c$ curl -s -L https://nvidia.github.io/libnvidia-container/experimental/$distribution/libnvidia-container-experimental.list | sudo tee/etc/apt/sources.list.d/libnvidia-container-experimental.list 

Install NVIDIA Runtime Packages and Their Dependencies:

user@PCName:/mnt/c$ sudo apt-get update user@PCName:/mnt/c$ sudo apt-get install -y nvidia-docker2 

Open the WSL container and run the Docker daemon in it. If everything is done correctly, after that you will see CDMY4CDMY service messages.

user@PCName:/mnt/c$ sudo dockerd 

ITKarma picture

Launching the Docker daemon

In another WSL window, load and run the N-body simulation container. It is necessary for the user performing this task to have sufficient authority to load the container. The following commands may need to be run using sudo. You can see GPU details in the output.

user@PCName:/mnt/c$ docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 

ITKarma picture

Launch N-body container

Launch TensorFlow container


We will test in Docker, in the WSL 2 environment, another popular container - TensorFlow.

Download the TensorFlow Docker Image.In order to avoid problems connecting to Docker, run the following command in sudo mode:

user@PCName:/mnt/c$ docker pull tensorflow/tensorflow:latest-gpu-py3 

Save the slightly modified version of the code from the lesson 15 of the TensorFlow GPU tutorial to the CDMY5CDMY drive of the host system. This drive, by default, is mounted in the WSL 2 container as CDMY6CDMY.

user@PCName:/mnt/c$ vi./matmul.py import sys import numpy as np import tensorflow as tf from datetime import datetime device_name=sys.argv[1]  # Choose device from cmd line. Options: gpu or cpu shape=(int(sys.argv[2]), int(sys.argv[2])) if device_name == "gpu":     device_name="/gpu:0" else:     device_name="/cpu:0" tf.compat.v1.disable_eager_execution() with tf.device(device_name):     random_matrix=tf.random.uniform(shape=shape, minval=0, maxval=1)     dot_operation=tf.matmul(random_matrix, tf.transpose(random_matrix))     sum_operation=tf.reduce_sum(dot_operation) startTime=datetime.now() with tf.compat.v1.Session(config=tf.compat.v1.ConfigProto(log_device_placement=True)) as session:         result=session.run(sum_operation)         print(result) # Вывод результатов print("Shape:", shape, "Device:", device_name) print("Time taken:", datetime.now() - startTime) 

The following shows the results of executing this script, launched from the CDMY7CDMY disk mounted in the container. The script was executed, first, using the GPU, and then using the CPU. For your convenience, the output presented here has been reduced.

user@PCName:/mnt/c$ docker run --runtime=nvidia --rm -ti -v "${PWD}:/mnt/c" tensorflow/tensorflow:latest-gpu-jupyter python/mnt/c/matmul.py gpu 20000 

ITKarma picture

Matmul.py script execution results

When using a GPU in a WSL 2 container, code execution is significantly faster than it is on a CPU.

Let's conduct another experiment designed to study the performance of GPU computing. It's about code from the Jupyter Notebook manual. After starting the container, you should see a link to the Jupyter Notebook server.

user@PCName:/mnt/c$ docker run -it --gpus all -p 8888:8888 tensorflow/tensorflow:latest-gpu-py3-jupyter 

ITKarma picture

Launch Jupyter Notebook

You should now be able to run demos in the Jupyter Notebook environment. Please note that to connect to the Jupyter Notebook using the Microsoft Edge browser, you must use CDMY8CDMY instead of 127.0.0.1.

Go to CDMY9CDMY and start the notebook CDMY10CDMY.

To see the results of GPU acceleration, go to the CDMY11CDMY menu, select CDMY12CDMY and look at the log in the WSL 2 container Jupyter Notebook.

ITKarma picture

Jupyter Notebook Journal

This demo, and some others in this container, allow you to see problems with the virtualization layer related to the unreasonably high additional load on the system when solving small problems. We have already talked about this above. Since we are launching very small training models here, their execution time on the GPU is less than the time required to solve synchronization problems. When solving such "toy" problems in WSL 2, the CPU can be more efficient than the GPU. We are working to solve this problem, striving to limit its manifestations to only very small workloads, to which pipelining is not applied.

WSL overview


In order to understand how GPU support was added to WSL 2, now we’ll talk about what Linux launch on Windows is and how containers see hardware.

Microsoft introduced WSL technology at Build in 2016. This technology quickly found wide application and became popular among Linux developers who needed to run Windows-based applications, such as Office, along with development tools for Linux and related programs.

WSL 1 allowed you to run unmodified Linux executables. However, the Linux kernel emulation layer was used here, which was implemented as an NT kernel subsystem. This subsystem handled calls from Linux applications, redirecting them to the appropriate Windows 10 mechanisms.

WSL 1 was a useful tool, but it was not compatible with all Linux applications, since it needed to emulate absolutely all Linux system calls. In addition, file system operations were slow, which led to unacceptably low performance of some applications.

Given this, Microsoft decided to take a different path and released WSL 2, the new version of WSL.WSL 2 containers run full Linux distributions in a virtualized environment, but take advantage of the new Windows 10 containerization system.

While WSL 2 uses Windows 10 Hyper-V services, it is not a traditional virtual machine, but rather a lightweight helper virtualization engine. This mechanism is responsible for managing virtual memory associated with physical memory, allowing WSL 2 containers to dynamically allocate memory by accessing the Windows host system.

Among the main goals of creating WSL 2, it is possible to note an increase in the performance of working with the file system and ensuring compatibility with all system calls. In addition, WSL 2 was created in an effort to improve the level of integration of WSL and Windows. This allows you to conveniently work with the Linux-system running in the container, using the Windows command line tools. This, in addition, increases the usability of the host file system, which is automatically mounted in the selected directories of the container file system.

WSL 2 was introduced in the Windows Insider Program as a Preview feature and was released in the latest update to Windows 10 in 2004.

In WSL 2 from the latest version of Windows, even more improvements have been made that affect a lot of things - from network stacks to the basic VHD mechanisms of the storage system. A description of all the new features of WSL 2 is beyond the scope of this article. For more information, see this page for a comparison of WSL 2 and WSL 1.

Linux kernel WSL 2


The Linux kernel used in WSL 2 was compiled by Microsoft based on the latest stable branch, using the source code available at kernel.org. This kernel has been specially tuned for WSL 2, optimized in terms of size and performance to ensure that Linux runs on Windows. The kernel is supported through the Windows Update mechanism. This means that the user does not have to worry about downloading the latest security updates and kernel improvements. All this is done automatically.

Microsoft supports several Linux distributions in WSL. The company, following the rules of the open source community, published in the GitHub repository WSL2-Linux-Kernel WSL 2 kernel source code with modifications needed to integrate with Windows 10.

GPL support in WSL


Microsoft developers added support for real GPUs using the GPU-PV technology in WSL 2-containers. Here, the operating system's graphics kernel (dxgkrnl) marshals the kernel-mode driver that is on the host with calls from user-mode components running in the guest virtual machine.

Microsoft developed this technology as a WDDM feature, and several Windows releases have been released since its inception. This work was carried out with the involvement of independent hardware vendors (Independent Hardware Vendor, IHV). NVIDIA graphics drivers have supported GPU-PV since the early days of this technology in the preview versions of products available in the Windows Insider Program. All currently supported NVIDIA GPUs can be accessed by guest OS in a virtual machine using Hyper-V.

In order for GPL-PV to be able to take advantage of WSL 2, Microsoft had to create the base of its graphic framework for the Linux guest system: WDDM with support for the GPU-PV protocol. The new Microsoft driver is behind dxgkrnl, the system responsible for supporting WDDM on Linux. The driver code can be found in the WSL2- repository Linux-Kernel.

Dxgkrnl is expected to provide support for GPU acceleration in WSL 2 containers in WDDM 2.9. Microsoft says dxgkrnl is a Linux GPU driver based on the GPU-PV protocol and that it has nothing to do with a Windows driver that has a similar name.

You can currently download the Preview version of the driver NVIDIA WDDM 2.9 .In the next few months, this driver will be distributed via Windows Update in the WIP version of Windows, which makes it unnecessary to manually download and install the driver.

GPU-PV basics


The dxgkrnl driver makes available, in user mode of the Linux guest system, the new device/dev/dxg. The D3DKMT kernel service layer, which was available on Windows, was also ported, as part of the dxcore library, to Linux. It interacts with dxgkrnl using a set of private IOCTL calls.

The Linux guest version of dxgkrnl connects to the dxg kernel on a Windows host using multiple VM bus channels. The dxg kernel on the host processes what comes from the Linux process, just like what comes from regular Windows applications using WDDM. Namely, the dxg kernel sends what it received to KMD (Kernel Mode Driver, a kernel mode driver unique to every HIV). The kernel-mode driver prepares what it receives for sending to a hardware graphics accelerator. The following figure shows a simplified diagram of the interaction between the Linux device/dev/dxg and KMD.

ITKarma picture

A simplified diagram illustrating how the components of a Windows host provide a dxg device in a Linux guest system

If we talk about providing a similar scheme of work in Windows guest systems, then we can say that NVIDIA drivers have supported GPU-PV in Windows 10 for quite some time. NVIDIA GPUs can be used to speed up computing and graphics output in all Windows 10 applications using the Microsoft virtualization layer. Using GPU-PV allows you to work with vGPU. Here are a few examples of similar applications:


Here's what it looks like to launch a DirectX application in a Windows Sandbox container using an NVIDIA GeForce GTX 1070 video accelerator.

ITKarma picture

In the Windows Sandbox container, graphics are accelerated using NVIDIA GeForce GTX 1070

User mode support


In order to add support for graphics output to WSL, the corresponding Microsoft development team also ported the dxcore user-mode component to Linux.

The dxcore library provides an API that allows you to retrieve information about WDDM-compatible graphics cards available on the system. This library was conceived as a cross-platform low-level replacement for the tool for working with DXGI adapters in Windows and Linux. The library also abstracts access to dxgkrnl services (IOCTL calls on Linux and GDI calls on Windows) using the D3DKMT API layer, which is used by CUDA and other user-mode components that rely on WDDM support in WSL.

According to Microsoft, the dxcore library (libdxcore.so) will be available on both Windows and Linux. NVIDIA plans to add DirectX 12 support and CUDA APIs to the driver. These add-ons target the new WSL features available with WDDM 2.9. Both API libraries will be connected to dxcore so that they can give dxg instructions about how to marshal their KMD requests on the host system.

Try the new WSL 2 features


Do you want to use your Windows computer to solve real problems from the fields of machine learning and artificial intelligence, and at the same time use all the amenities of a Linux environment? If so, then CUDA support in WSL gives you a great opportunity to do this.The WSL environment is where CUDA Docker containers have proven to be the most popular computing environment among data scientists.


Here you can find out more about using CUDA technology in WSL. here , on the forum dedicated to CUDA and WSL, you can share with us your impressions, observations and ideas about these technologies.

Have you tried CUDA in WSL 2 yet?

ITKarma picture.

Source