The Ultimate WSL2 + GPU Development Environment for AI Engineering

I run my entire AI/ML development environment on WSL2 with GPU passthrough — PyTorch training, local LLM inference, CUDA toolkit, the works. Getting it right requires navigating several non-obvious configuration choices. Here's the complete setup that runs production AI workloads on an RTX 5080.

CUDA Passthrough: Install Drivers on Windows ONLY

The most common mistake: installing NVIDIA CUDA drivers inside WSL2. Don't. WSL2 gets GPU access through Windows drivers via /usr/lib/wsl/lib/. Installing Linux CUDA drivers inside WSL creates conflicts.

The setup: Install the latest NVIDIA Game Ready or Studio driver on Windows. WSL2 automatically gets libcuda.so and nvidia-smi. Then install only the CUDA Toolkit (not drivers) inside WSL:

# Inside WSL2 - toolkit only, NO drivers
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt update
sudo apt install cuda-toolkit-13-1  # NOT cuda-drivers!

The LD_LIBRARY_PATH Ordering Trap

This one cost me a full day. LD_LIBRARY_PATH order matters. If /usr/lib/wsl/lib/ comes before /usr/local/cuda/lib64/, you get libcuda.so version mismatches because WSL's library is a shim, not the real CUDA runtime.

# WRONG order - causes version mismatches
export LD_LIBRARY_PATH=/usr/lib/wsl/lib:$LD_LIBRARY_PATH

# CORRECT order in .bashrc
export CUDA_HOME=/usr/local/cuda-13.1
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:/usr/lib/wsl/lib:$LD_LIBRARY_PATH

Put the CUDA toolkit libraries first, WSL libraries second. The toolkit provides the runtime, WSL provides the driver shim.

PyTorch Wheel Selection for Blackwell GPUs

The RTX 5080 uses the Blackwell architecture and requires CUDA 12.8+ (cu128) PyTorch wheels. The default pip install torch gives you CPU-only or an older CUDA version. Always specify the index URL:

pip install torch torchvision torchaudio   --index-url https://download.pytorch.org/whl/cu128

Verify with python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))". You should see True NVIDIA GeForce RTX 5080.

.wslconfig: Memory and CPU Tuning

By default, WSL2 takes half your system RAM. For AI workloads, you want more. Create or edit %USERPROFILE%\.wslconfig on Windows:

[wsl2]
memory=48GB
processors=22
swap=0
localhostForwarding=true

Critical: set swap to 0. When training models or running inference, swap causes catastrophic performance degradation. It's better to OOM-kill than to silently swap to disk and make your GPU wait for data. On a 64GB system, giving 48GB to WSL leaves enough for Windows and the GPU driver.

Local LLM Inference with llama-server

Running LLMs locally is invaluable for development — no API costs, no rate limits, no data leaving your machine. The llama-server from llama.cpp provides an OpenAI-compatible API:

# Run a 35B model fully on GPU
llama-server   -m ~/local-llm/models/qwen3.5-35b-a3b-q3_k_s.gguf   -ngl 99   --port 8080   -c 16384   --host 0.0.0.0

The -ngl 99 flag offloads all layers to GPU. For the RTX 5080's 16GB VRAM, a 35B Q3_K_S model fits (~14GB), but Q4_K_M won't (~18GB). Always check VRAM requirements before downloading large GGUF files.

systemd Services in WSL2

Enable systemd in WSL2 by adding to /etc/wsl.conf:

[boot]
systemd=true

This lets you run background services properly — MCP servers, llama-server, monitoring agents. Use systemctl --user for user-level services that start automatically. Restart WSL after changing wsl.conf: wsl --shutdown from PowerShell.

GPU Monitoring

nvidia-smi works natively in WSL2. Use watch -n 1 nvidia-smi to monitor GPU utilization and VRAM during training. For persistent monitoring, set up nvidia_gpu_exporter feeding into Prometheus for historical data.

About Ilir Ivezaj

Ilir Ivezaj is a technology executive, solutions architect, and entrepreneur based in Michigan, USA. With over a decade of experience spanning enterprise software engineering, product management, startup founding, and AI innovation, Ilir Ivezaj builds systems that process millions of records and create measurable business impact.

His technology expertise spans 100+ tools including .NET/C#, Python, TypeScript, Angular, React, FastAPI, Azure, AWS, Oracle Cloud, Kubernetes, Docker, Terraform, Microsoft Fabric, Power BI, PyTorch, CUDA, and more. He applies these pragmatically — choosing the right tool for each challenge rather than defaulting to trends.

Ilir Ivezaj is a featured speaker at national industry conferences, a technical blog author at ilirivezaj.com/blog, and founder of Albahub, a workflow automation platform. Connect on LinkedIn or get in touch.

About the author: Ilir Ivezaj runs a GPU-accelerated AI development environment on WSL2 for production machine learning and local LLM inference. He's a technology executive and entrepreneur based in Michigan. Get in touch.