Skip to content

Commit 9478203

Browse files
Update README.md
1 parent 1f4eec5 commit 9478203

File tree

1 file changed

+16
-4
lines changed

1 file changed

+16
-4
lines changed

README.md

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,25 +7,37 @@ The task of image captoning is the task of generating a sequence of words to des
77
It is done through the well known sequence-to-sequence architecture.
88
An encoder (of a mostly pretrained CNN model) encodes the image, then an RNN decoder outputs a word in each of its steps.
99

10+
I found this google documentation quite useful when coding both variants:
11+
12+
[Google doc link](https://www.tensorflow.org/tutorials/text/image_captioning)
13+
14+
1015

1116
# Usage
1217
You can either run image_captioning_train.ipynb for training the whole thing from the start while having the option of changing the hyper parameters or image_captioing.ipynb for captioing any image by giving its path.
1318
In case you use the latter, make sure you have 'Encoder.hdf5','Decoder.hdf5' and 'tokenizer.pickle' downloaded.
1419
the encoder and decoder can be found in release.
20+
For the model with attention get the ones tagged with v2.0, otherwise get the ones tagged with v1.0.
1521

1622

1723
# Architecture
18-
This is a seq2seq neural network that uses teacher-forcing.
19-
This model doesn't use attention mechanism, which i might add later in another model
24+
There are two variants of the model that both use seq2seq architecture with teacher-forcing.
25+
Difference is that one uses attention mechanism while the other doesn't.
26+
You should go for the model variant with attention in almost all cases of usage as it is faster, smaller and more accurate.
27+
The other variant can be treated as a tutorial for those of you seeking basic understanding of the implementation of an image captioning model. This simpler archeticure is then built upon by the variant with attention.
2028
Encoding is done through using inception_v3.
2129
Feel free to change any of the parameters found in the configuration cell like LSTM_size, encoding_size or even the number of samples this network trains on.
2230

2331
# Google colab links
2432
You can get the notebooks uploaded here via google colab through these links:
2533

26-
[Training notebook](https://colab.research.google.com/drive/1VRQO7_r18a5rK68huvwKgNQOEGlzz3lm?usp=sharing)
34+
[Without attention training notebook](https://colab.research.google.com/drive/1VRQO7_r18a5rK68huvwKgNQOEGlzz3lm?usp=sharing)
35+
36+
[Without attention prediction notebook](https://colab.research.google.com/drive/1OQEMKfT5BrTJpOwu__2u_tmNs8ADeQ6v?usp=sharing)
37+
38+
[With attention prediction notebook](https://colab.research.google.com/drive/1Lz1vUFu2sx2AYWHJLSMwiG0V_ztdqEV2?usp=sharing)
2739

28-
[Prediction notebook](https://colab.research.google.com/drive/1OQEMKfT5BrTJpOwu__2u_tmNs8ADeQ6v?usp=sharing)
40+
[With attention prediction notebook](https://colab.research.google.com/drive/1KcpU8vEw7Sozu-SBGAgULaNgAFZYDzkh?usp=sharing)
2941

3042
# Example predictions
3143
![Prediction1](https://scontent.fcai1-2.fna.fbcdn.net/v/t1.15752-9/96286228_934101440372416_1710498101753544704_n.png?_nc_cat=100&_nc_sid=b96e70&_nc_ohc=v68QTdYdXxQAX_N0Mrt&_nc_ht=scontent.fcai1-2.fna&oh=d10c6903515d3e2539f903b2b1d4422d&oe=5EDD55AE)

0 commit comments

Comments
 (0)