OCR: clarification about input and output

I'm trying to solve OCR tasks based on this code.

So what shape input to LSTM should have, suppose we have images `[batch_size, height, width, channels]` how should they be reshaped to be used as input? Like  `[batch_size, width, height*channels]`, so `width` is like `time dimension`?

What if I want to have variable width? As I understand size of sequences in batch should be the same (common trick just to use padding by zeros at the end of sequence?) or `batch_size` should be 1)

What if I want to have variable width and height? As I understand I need to use convolutional + global average pooling / spartial pyramid pooling layers before input to LSTM, so output blob will be `[batch_size, feature_map_height, feature_map_width, feature_map_channels]`, how should blob be reshaped to be used as input to LSTM? Like `[batch_size, feature_map_width,  feature_map_height*feature_map_channels]` ? Can we reshape it just to single row like `[batch_size, feature_map_width*feature_map_height*feature_map_channels]` it will be like sequence of pixels and we loose some spartial information, will it work?

Here is definition of input, but I'm not sure what it's mean in your case `[batch_size, max_stepsize, num_features]`:
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L90

And how output of LSTM depends on input size and max sequence length?
https://github.com/igormq/ctc_tensorflow_example/blob/master/ctc_tensorflow_example.py#L110

BTW: Here is some examples using 'standard' approaches in Keras+Tensorflow which I want to complement with RNN examples.
https://github.com/mrgloom/Char-sequence-recognition

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OCR: clarification about input and output #20

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

OCR: clarification about input and output #20

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions