GPU?
๐ The corresponding versions between TF and Cuda.
# check if GPU available?
import tensorflow as tf
tf.config.list_physical_devices('GPU')
# prevent tf uses gpu
# add below before any tf import
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"
Installation with docker
๐ Official guide.
๐ Note: Docker & GPU.
The advantage of this method is that you only have to install GPU driver on the host machine.
Check docker version: docker --version
:
<19.03
: requiresnvidia-docker2
(check bynvidia-docker version
) and--runtime=nvidia
.>=19.03
: requiresnvidia-container-toolkit
(check bywhich nvidia-container-toolkit
) and--gpus all
.
Without docker-compose
๐ Different types of images for tensorflow.
# pull the image
docker pull tensorflow/tensorflow:latest-gpu-jupyter
# run a container
mkdir ~/Downloads/test/notebooks
docker run --name docker_thi_test -it --rm -v $(realpath ~/Downloads/test/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter
# check if gpu available?
nvidia-smi
# check if tf2 working?
docker exec -it docker_thi_test bash
python
import tensorflow as tf
tf.config.list_physical_devices('GPU')
With docker-compose?
๐ Read this note instead.
On Windows WSL2
Install directly on Linux (without docker)
On my computer, Dell XPS 15 7590 - NVIDIAยฎ GeForceยฎ GTX 1650 Mobile.
This section is not complete, the guide is still not working!
Installation
This guide is specific for:
pip show tensorflow # 2.3.1
pip show tensorflow-gpu # 2.3.1
nvidia-smi # NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0
๐ Note: PyTorch.
๐ Note: Ubuntu.
๐ Note: Linux.
CUDA Toolkit:
- If you meet Existing package manager installation of the driver found, try this method to remove some already-installed packages before continuing.
- Or you can download cuda toolkit
.run
and then run
sudo sh cuda_11.1.0_*.run --toolkit --silent --override
Errors?
# Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
Need to install new cuda & CUDNN libraries and tensorflow. (This note is for tensorflow==2.3.1
and CUDA 11.1
). [ref]
# update path
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib\
${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# quickly test cuda version
nvcc --version
# WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 2000 batches). You may need to use the repeat() function when building your dataset.
Problem come from you don't have enough images!
train_generator = train_datagen.flow_from_directory(batch_size = 20)
validation_generator = test_datagen.flow_from_directory(batch_size = 20)
# Found 1027 images belonging to 2 classes.
# Found 256 images belonging to 2 classes.
model.fit(
validation_data = validation_generator,
steps_per_epoch = 100,
epochs = 20,
validation_steps = 50,
verbose = 2)
We must have steps_per_epoch * batch_size <= #of images
, in this case 100*20 = 2000 > 1027
. Check this answer for more information.
# correct
model.fit(
...
steps_per_epoch = 50, # batches in the generator are 20, so it takes 1027//20 batches to get to 1027 images
...
validation_steps = 12, # batches in the generator are 20, so it takes 256//20 batches to get to 256 images
...)
# Not found: No algorithm worked!
# OR
# This is probably because cuDNN failed to initialize
nvidia-smi
# check and kill the process that uses GPU much
# restart the task
# OR: add the following to your code
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession
config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)
๐ฌ Comments