Finally support for tensorflow on amd gpus

Here’s a quote from the developers:

ROCm Tensorflow 1.8 Release

 

We are excited to announce the release of ROCm enabled TensorFlow v1.8 for AMD GPUs. This post demonstrates the steps to install and use TensorFlow on AMD GPUs.

Installation

First, you’ll need to install the open-source ROCm stack. Details can be found here: https://rocm.github.io/ROCmInstall.html

Then, install these other relevant ROCm packages:

sudo apt update
sudo apt install rocm-libs miopen-hip cxlactivitylogger

And finally, install TensorFlow itself (via our pre-built whl package):

sudo apt install wget python3-pip 
wget http://repo.radeon.com/rocm/misc/tensorflow/tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl
pip3 install ./tensorflow-1.8.0-cp35-cp35m-manylinux1_x86_64.whl

Now that TensorFlow is installed, let’s run a few workloads.

Image recognition

We’ll use one of TensorFlow’s tutorials as a quick and easy Inception-v3 image recognition workload: https://www.tensorflow.org/tutorials/image_recognition

Here’s how to run it:

cd ~ && git clone https://github.com/tensorflow/models.git
cd ~/models/tutorials/image/imagenet
python3 classify_image.py

After, you should see a list of labels with associated scores. Since the above script is for classifying a supplied image of a panda, that’s what the result indicates:

giant panda, panda, panda bear, coon bear, Ailuropoda melanoleuca (score = 0.89103)
indri, indris, Indri indri, Indri brevicaudatus (score = 0.00810)
lesser panda, red panda, panda, bear cat, cat bear, Ailurus fulgens (score = 0.00258)
custard apple (score = 0.00149)
earthstar (score = 0.00141)

 

For more info have a look at the following links: 1, 2

Distributed computing for TensorFlow, Keras and PyTorch

Meet horovod the new distributed framework TensorFlow, Keras, and PyTorch. The goal of Horovod is to make distributed for Deep Learning fast and easy to use.

Example:

import tensorflow as tf
import horovod.tensorflow as hvd


# Initialize Horovod
hvd.init()

# Pin GPU to be used to process local rank (one GPU per process)
config = tf.ConfigProto()
config.gpu_options.visible_device_list = str(hvd.local_rank())

# Build model...
loss = ...
opt = tf.train.AdagradOptimizer(0.01 * hvd.size())

# Add Horovod Distributed Optimizer
opt = hvd.DistributedOptimizer(opt)

# Add hook to broadcast variables from rank 0 to all other processes during
# initialization.
hooks = [hvd.BroadcastGlobalVariablesHook(0)]

# Make training operation
train_op = opt.minimize(loss)

# Save checkpoints only on worker 0 to prevent other workers from corrupting them.
checkpoint_dir = '/tmp/train_logs' if hvd.rank() == 0 else None

# The MonitoredTrainingSession takes care of session initialization,
# restoring from a checkpoint, saving to a checkpoint, and closing when done
# or an error occurs.
with tf.train.MonitoredTrainingSession(checkpoint_dir=checkpoint_dir,
                                       config=config,
                                       hooks=hooks) as mon_sess:
  while not mon_sess.should_stop():
    # Perform synchronous training.
    mon_sess.run(train_op)