Caffe to TensorFlow

In this section, we will show you how to take advantage of many pre-trained models from Caffe Model Zoo (https://github.com/BVLC/caffe/wiki/Model-Zoo). There are lots of Caffe models for different tasks with all kinds of architectures. After converting these models to TensorFlow, you can use it as a part of your architectures or you can fine-tune our model for different tasks. Using these pre-trained models as initial weights is an effective approach for training instead of training from scratch. We will show you how to use a caffe-to-tensorflow approach from Saumitro Dasgupta at https://github.com/ethereon/caffe-tensorflow.

However, there are lots of differences between Caffe and TensorFlow. This technique only supports a subset of layer types from Caffe. Even though there are some Caffe architectures that are verified by the author of this project such as ResNet, VGG, and GoogLeNet.

First, we need to clone the caffe-tensorflow repository using the git clone command:

ubuntu@ubuntu-PC:~/github$ git clone https://github.com/ethereon/caffe-tensorflow
Cloning into 'caffe-tensorflow'...
remote: Counting objects: 479, done.
remote: Total 479 (delta 0), reused 0 (delta 0), pack-reused 479
Receiving objects: 100% (510/510), 1.71 MiB | 380.00 KiB/s, done.
Resolving deltas: 100% (275/275), done.
Checking connectivity... done.

Then, we need to change the directory to the caffe-to-tensorflow directory and run the convert python script to see some help messages:

cd caffe-tensorflow
python convert.py -h
The resulting console will look like this:
usage: convert.py [-h] [--caffemodel CAFFEMODEL]
                  [--data-output-path DATA_OUTPUT_PATH]
                  [--code-output-path CODE_OUTPUT_PATH] [-p PHASE]
                  def_path
    
positional arguments:
def_path              Model definition (.prototxt) path
    
optional arguments:
  -h, --help            show this help message and exit
  --caffemodel CAFFEMODEL
                        Model data (.caffemodel) path
  --data-output-path DATA_OUTPUT_PATH
                        Converted data output path
  --code-output-path CODE_OUTPUT_PATH
                        Save generated source to this path
  -p PHASE, --phase PHASE
                        The phase to convert: test (default) or train

According to this help message, we can know the parameters of the convert.py script. In summary, we will use this convert.py to create the network architecture in TensorFlow with the flag code-output-path and convert the pre-trained weights with the flag data-output-path.

Before we start converting the models, we need to get some pull requests from contributors of this project. There are some issues with the current master branch that we can't use the latest TensorFlow (version 1.3 at the time of writing) and python-protobuf (version 3.4.0 at the time of writing). Therefore, we will get the code using the following pull requests:

https://github.com/ethereon/caffe-tensorflow/pull/105

https://github.com/ethereon/caffe-tensorflow/pull/133

You need to open the preceding links to see if the pull requests are merged or not. If it is still in open status, you will need to follow the next part. Otherwise, you can skip the merged pull requests.

First, we will get the code from pull request 105:

ubuntu@ubuntu-PC:~/github$ git pull origin pull/105/head
remote: Counting objects: 33, done.
remote: Total 33 (delta 8), reused 8 (delta 8), pack-reused 25
Unpacking objects: 100% (33/33), done.
From https://github.com/ethereon/caffe-tensorflow
* branch            refs/pull/105/head -> FETCH_HEAD
Updating d870c51..ccd1a52
Fast-forward
.gitignore                               |  5 +++++
convert.py                               |  8 ++++++++
examples/save_model/.gitignore           | 11 ++++++++++
examples/save_model/READMD.md            | 17 ++++++++++++++++
examples/save_model/__init__.py          |  0
examples/save_model/save_model.py        | 51 ++++++++++++++++++++++++++++++++++++++++++++++
kaffe/caffe/{caffepb.py => caffe_pb2.py} |  0
kaffe/caffe/resolver.py                  |  4 ++--
kaffe/tensorflow/network.py              |  8 ++++----
9 files changed, 98 insertions(+), 6 deletions(-)
create mode 100644 examples/save_model/.gitignore
create mode 100644 examples/save_model/READMD.md
create mode 100644 examples/save_model/__init__.py
create mode 100755 examples/save_model/save_model.py
rename kaffe/caffe/{caffepb.py => caffe_pb2.py} (100%)

Then, from pull request 133:

- git pull origin pull/133/head
remote: Counting objects: 31, done.
remote: Total 31 (delta 20), reused 20 (delta 20), pack-reused 11
Unpacking objects: 100% (31/31), done.
From https://github.com/ethereon/caffe-tensorflow
* branch            refs/pull/133/head -> FETCH_HEAD
Auto-merging kaffe/tensorflow/network.py
CONFLICT (content): Merge conflict in kaffe/tensorflow/network.py
Auto-merging .gitignore
CONFLICT (content): Merge conflict in .gitignore
Automatic merge failed; fix conflicts and then commit the result.

As you can see, there are some conflicts in the kaffe/tensorflow/network.py file. We will show you how to resolve these conflicts, as follows.

First, we will solve the conflict at line 137:

We remove the HEAD part from line 137 to line 140. The final result will look like this:

Next, we will solve the conflict at line 185:

We also remove the HEAD part from line 185 to line 187. The final result will look like this:

In the caffe-to-tensorflow directory, there is a directory named examples that contains the code and data for the MNIST and ImageNet challenge. We will show you how to work with the MNIST model. The ImageNet challenge is not much different.

First, we will convert the MNIST architecture from Caffe to TensorFlow using the following command:

    ubuntu@ubuntu-PC:~/github$ python ./convert.py examples/mnist/lenet.prototxt --code-output-path=./mynet.py
    The result will look like this:
    
    ------------------------------------------------------------
        WARNING: PyCaffe not found!
        Falling back to a pure protocol buffer implementation.
        * Conversions will be drastically slower.
        * This backend is UNTESTED!
    ------------------------------------------------------------
    
    Type                 Name                                          Param               Output
    ----------------------------------------------------------------------------------------------
    Input                data                                             --      (64, 1, 28, 28)
    Convolution          conv1                                            --     (64, 20, 24, 24)
    Pooling              pool1                                            --     (64, 20, 12, 12)
    Convolution          conv2                                            --       (64, 50, 8, 8)
    Pooling              pool2                                            --       (64, 50, 4, 4)
    InnerProduct         ip1                                              --      (64, 500, 1, 1)
    InnerProduct         ip2                                              --       (64, 10, 1, 1)
    Softmax              prob                                             --       (64, 10, 1, 1)
    Converting data...
    Saving source...
    Done.

Then, we will convert the MNIST pre-trained Caffe model at examples/mnist/lenet_iter_10000.caffemodel using the following command:

 ubuntu@ubuntu-PC:~/github$ python ./convert.py  
 examples/mnist/lenet.prototxt --caffemodel  
 examples/mnist/lenet_iter_10000.caffemodel --data-output- 
 path=./mynet.npy

The result will look like this:

    ------------------------------------------------------------
        WARNING: PyCaffe not found!
        Falling back to a pure protocol buffer implementation.
        * Conversions will be drastically slower.
        * This backend is UNTESTED!
    ------------------------------------------------------------
    
    Type                 Name                                          Param               Output
    ----------------------------------------------------------------------------------------------
    Input                data                                             --      (64, 1, 28, 28)
    Convolution          conv1                                 
(20, 1, 5, 5)     (64, 20, 24, 24)
    Pooling              pool1                                            --     (64, 20, 12, 12)
    Convolution          conv2                               
 (50, 20, 5, 5)       (64, 50, 8, 8)
    Pooling              pool2                                            --       (64, 50, 4, 4)
    InnerProduct         ip1                                   
   (500, 800)      (64, 500, 1, 1)
    InnerProduct         ip2                                      
 (10, 500)       (64, 10, 1, 1)
    Softmax              prob                                             --       (64, 10, 1, 1)
    Converting data...
    Saving data...
    Done.

As you can see, these commands will create a python file named mynet.py and a numpy file named mynet.npy in the current directory. We also need to add the current directory to the PYTHONPATH to allow the further code to import mynet.py:

ubuntu@ubuntu-PC:~/github$ export PYTHONPATH=$PYTHONPATH:.
ubuntu@ubuntu-PC:~/github$ python examples/mnist/finetune_mnist.py
....
('Iteration: ', 900, 0.0087626642, 1.0)
('Iteration: ', 910, 0.018495116, 1.0)
('Iteration: ', 920, 0.0029206357, 1.0)
('Iteration: ', 930, 0.0010091728, 1.0)
('Iteration: ', 940, 0.071255416, 1.0)
('Iteration: ', 950, 0.045163739, 1.0)
('Iteration: ', 960, 0.005758767, 1.0)
('Iteration: ', 970, 0.012100354, 1.0)
('Iteration: ', 980, 0.12018739, 1.0)
('Iteration: ', 990, 0.079262167, 1.0)

The last two numbers in each line is the loss and accuracy of the fine-tune process. You can see that the fine-tune process can easily achieve 100% accuracy with the pre-trained weights from the Caffe model.

Now, we will take a look at the finetune_mnist.py file to see how the pre-trained weights are used.

First, they import the mynet python with the following code:

    from mynet import LeNet as MyNet

Then, they create some placeholders for images and labels and compute the loss using the layers ip2 as follows:

 images = tf.placeholder(tf.float32, [None, 28, 28, 1]) 
 labels = tf.placeholder(tf.float32, [None, 10]) 
 net = MyNet({'data': images}) 
 
 ip2 = net.layers['ip2'] 
 pred = net.layers['prob'] 
 
 loss =  
 tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=ip2,  
 labels=labels), 0) 
 Finally, they load the numpy file into the graph, using the load  
 method in the network class. 
 with tf.Session() as sess: 
    # Load the data 
    sess.run(tf.global_variables_initializer()) 
    net.load('mynet.npy', sess)

After that, the fine-tune process is independent from the Caffe framework.

Table of Contents for Caffe to TensorFlow

Create new playlist

Sign In

Sign Up

Table of Contents for
Caffe to TensorFlow