Tensorflow discrete note

The last modifications of this post were around 1 year ago, some information may be outdated!

This is a draft, the content is not complete and of poor quality!

On this page

GPU?
Installation with docker
Install directly on Linux (without docker)
1. Installation
2. Errors?

GPU?

👉 The corresponding versions between TF and Cuda.

# check if GPU available?
import tensorflow as tf
tf.config.list_physical_devices('GPU')

# prevent tf uses gpu
# add below before any tf import
import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

Installation with docker

👉 Official guide.
👉 Note: Docker & GPU.

The advantage of this method is that you only have to install GPU driver on the host machine.

Note about docker version

Check docker version: docker --version:

<19.03: requires nvidia-docker2 (check by nvidia-docker version) and --runtime=nvidia.
>=19.03: requires nvidia-container-toolkit (check by which nvidia-container-toolkit) and --gpus all.

Without docker-compose

👉 Different types of images for tensorflow.

# pull the image
docker pull tensorflow/tensorflow:latest-gpu-jupyter

# run a container
mkdir ~/Downloads/test/notebooks
docker run --name docker_thi_test -it --rm -v $(realpath ~/Downloads/test/notebooks):/tf/notebooks -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter

# check if gpu available?
nvidia-smi

# check if tf2 working?
docker exec -it docker_thi_test bash
python

import tensorflow as tf
tf.config.list_physical_devices('GPU')

With docker-compose?

👉 Read this note instead.

On Windows WSL2

Install directly on Linux (without docker)

On my computer, Dell XPS 15 7590 - NVIDIA® GeForce® GTX 1650 Mobile.

This section is not complete, the guide is still not working!

Installation

👉 GPU support : TensorFlow

This guide is specific for:

pip show tensorflow # 2.3.1
pip show tensorflow-gpu # 2.3.1
nvidia-smi # NVIDIA-SMI 450.80.02 Driver Version: 450.80.02 CUDA Version: 11.0

👉 Note: PyTorch.
👉 Note: Ubuntu.
👉 Note: Linux.

CUDA Toolkit:

If you meet Existing package manager installation of the driver found, try this method to remove some already-installed packages before continuing.
Or you can download cuda toolkit .run and then run

sudo sh cuda_11.1.0_*.run --toolkit --silent --override

Errors?

# Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory

Need to install new cuda & CUDNN libraries and tensorflow. (This note is for tensorflow==2.3.1 and CUDA 11.1). ^[ref]

# update path
export PATH=/usr/local/cuda-11.1/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-11.1/lib\
                         ${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}

# quickly test cuda version
nvcc --version

# WARNING:tensorflow:Your input ran out of data; interrupting training. Make sure that your dataset or generator can generate at least `steps_per_epoch * epochs` batches (in this case, 2000 batches). You may need to use the repeat() function when building your dataset.

Problem come from you don't have enough images!

train_generator = train_datagen.flow_from_directory(batch_size = 20)
validation_generator =  test_datagen.flow_from_directory(batch_size  = 20)

# Found 1027 images belonging to 2 classes.
# Found 256 images belonging to 2 classes.

model.fit(
    validation_data = validation_generator,
    steps_per_epoch = 100,
    epochs = 20,
    validation_steps = 50,
    verbose = 2)

We must have steps_per_epoch * batch_size <= #of images, in this case 100*20 = 2000 > 1027. Check this answer for more information.

# correct
model.fit(
    ...
    steps_per_epoch = 50, # batches in the generator are 20, so it takes 1027//20 batches to get to 1027 images
    ...
    validation_steps = 12, # batches in the generator are 20, so it takes 256//20 batches to get to 256 images
    ...)

# Not found: No algorithm worked!

# OR
# This is probably because cuDNN failed to initialize

nvidia-smi
# check and kill the process that uses GPU much
# restart the task

# OR: add the following to your code
from tensorflow.compat.v1 import ConfigProto
from tensorflow.compat.v1 import InteractiveSession

config = ConfigProto()
config.gpu_options.allow_growth = True
session = InteractiveSession(config=config)

Tensorflow discrete note ^draft