|
| 1 | +# Design Doc: Block and Scope |
| 2 | + |
| 3 | +## The Representation of Computation |
| 4 | + |
| 5 | +Both deep learning systems and programming languages help users describe computation procedures. These systems use various representations of computation: |
| 6 | + |
| 7 | +- Caffe, Torch, and Paddle: sequences of layers. |
| 8 | +- TensorFlow, Caffe2, Mxnet: graphs of operators. |
| 9 | +- PaddlePaddle: nested blocks, like C++ and Java programs. |
| 10 | + |
| 11 | +## Block in Programming Languages and Deep Learning |
| 12 | + |
| 13 | +In programming languages, a block is a pair of curly braces that includes local variables definitions and a sequence of instructions, or operators. |
| 14 | + |
| 15 | +Blocks work with control flow structures like `if`, `else`, and `for`, which have equivalents in deep learning: |
| 16 | + |
| 17 | +| programming languages | PaddlePaddle | |
| 18 | +|-----------------------|-----------------------| |
| 19 | +| for, while loop | RNN, WhileOp | |
| 20 | +| if, if-else, switch | IfElseOp, SwitchOp | |
| 21 | +| sequential execution | a sequence of layers | |
| 22 | + |
| 23 | +A key difference is that a C++ program describes a one pass computation, whereas a deep learning program describes both the forward and backward passes. |
| 24 | + |
| 25 | +## Stack Frames and the Scope Hierarchy |
| 26 | + |
| 27 | +The existence of the backward makes the execution of a block of traditional programs and PaddlePaddle different to each other: |
| 28 | + |
| 29 | +| programming languages | PaddlePaddle | |
| 30 | +|-----------------------|-------------------------------| |
| 31 | +| stack | scope hierarchy | |
| 32 | +| stack frame | scope | |
| 33 | +| push at entering block| push at entering block | |
| 34 | +| pop at leaving block | destroy at minibatch completes| |
| 35 | + |
| 36 | +1. In traditional programs: |
| 37 | + |
| 38 | + - When the execution enters the left curly brace of a block, the runtime pushes a frame into the stack, where it realizes local variables. |
| 39 | + - After the execution leaves the right curly brace, the runtime pops the frame. |
| 40 | + - The maximum number of frames in the stack is the maximum depth of nested blocks. |
| 41 | + |
| 42 | +1. In PaddlePaddle |
| 43 | + |
| 44 | + - When the execution enters a block, PaddlePaddle adds a new scope, where it realizes variables. |
| 45 | + - PaddlePaddle doesn't pop a scope after the execution of the block because variables therein are to be used by the backward pass. So it has a stack forest known as a *scope hierarchy*. |
| 46 | + - The height of the highest tree is the maximum depth of nested blocks. |
| 47 | + - After the process of a minibatch, PaddlePaddle destroys the scope hierarchy. |
| 48 | + |
| 49 | +## Use Blocks in C++ and PaddlePaddle Programs |
| 50 | + |
| 51 | +Let us consolidate the discussion by presenting some examples. |
| 52 | + |
| 53 | +### Blocks with `if-else` and `IfElseOp` |
| 54 | + |
| 55 | +The following C++ programs shows how blocks are used with the `if-else` structure: |
| 56 | + |
| 57 | +```c++ |
| 58 | +int x = 10; |
| 59 | +int y = 20; |
| 60 | +int out; |
| 61 | +bool cond = false; |
| 62 | +if (cond) { |
| 63 | + int z = x + y; |
| 64 | + out = softmax(z); |
| 65 | +} else { |
| 66 | + int z = fc(x); |
| 67 | + out = z; |
| 68 | +} |
| 69 | +``` |
| 70 | + |
| 71 | +An equivalent PaddlePaddle program from the design doc of the [IfElseOp operator](./if_else_op.md) is as follows: |
| 72 | + |
| 73 | +```python |
| 74 | +import paddle as pd |
| 75 | + |
| 76 | +x = var(10) |
| 77 | +y = var(20) |
| 78 | +cond = var(false) |
| 79 | +ie = pd.create_ifelseop(inputs=[x], output_num=1) |
| 80 | +with ie.true_block(): |
| 81 | + x = ie.inputs(true, 0) |
| 82 | + z = operator.add(x, y) |
| 83 | + ie.set_output(true, 0, operator.softmax(z)) |
| 84 | +with ie.false_block(): |
| 85 | + x = ie.inputs(false, 0) |
| 86 | + z = layer.fc(x) |
| 87 | + ie.set_output(true, 0, operator.softmax(z)) |
| 88 | +out = b(cond) |
| 89 | +``` |
| 90 | + |
| 91 | +In both examples, the left branch computes `softmax(x+y)` and the right branch computes `fc(x)`. |
| 92 | + |
| 93 | +A difference is that variables in the C++ program contain scalar values, whereas those in the PaddlePaddle programs are mini-batches of instances. The `ie.input(true, 0)` invocation returns instances in the 0-th input, `x`, that corresponds to true values in `cond` as the local variable `x`, where `ie.input(false, 0)` returns instances corresponding to false values. |
| 94 | + |
| 95 | +### Blocks with `for` and `RNNOp` |
| 96 | + |
| 97 | +The following RNN model from the [RNN design doc](./rnn.md) |
| 98 | + |
| 99 | +```python |
| 100 | +x = sequence([10, 20, 30]) |
| 101 | +m = var(0) |
| 102 | +W = tensor() |
| 103 | +U = tensor() |
| 104 | + |
| 105 | +rnn = create_rnn(inputs=[input]) |
| 106 | +with rnn.stepnet() as net: |
| 107 | + x = net.set_inputs(0) |
| 108 | + h = net.add_memory(init=m) |
| 109 | + fc_out = pd.matmul(W, x) |
| 110 | + hidden_out = pd.matmul(U, h.pre(n=1)) |
| 111 | + sum = pd.add_two(fc_out, hidden_out) |
| 112 | + act = pd.sigmoid(sum) |
| 113 | + h.update(act) # update memory with act |
| 114 | + net.set_outputs(0, act, hidden_out) # two outputs |
| 115 | + |
| 116 | +o1, o2 = rnn() |
| 117 | +print o1, o2 |
| 118 | +``` |
| 119 | + |
| 120 | +has its equivalent C++ program as follows |
| 121 | + |
| 122 | +```c++ |
| 123 | +int* x = {10, 20, 30}; |
| 124 | +int m = 0; |
| 125 | +int W = some_value(); |
| 126 | +int U = some_other_value(); |
| 127 | + |
| 128 | +int mem[sizeof(x) / sizeof(x[0]) + 1]; |
| 129 | +int o1[sizeof(x) / sizeof(x[0]) + 1]; |
| 130 | +int o2[sizeof(x) / sizeof(x[0]) + 1]; |
| 131 | +for (int i = 1; i <= sizeof(x)/sizeof(x[0]); ++i) { |
| 132 | + int x = x[i-1]; |
| 133 | + if (i == 1) mem[0] = m; |
| 134 | + int fc_out = W * x; |
| 135 | + int hidden_out = Y * mem[i-1]; |
| 136 | + int sum = fc_out + hidden_out; |
| 137 | + int act = sigmoid(sum); |
| 138 | + mem[i] = act; |
| 139 | + o1[i] = act; |
| 140 | + o2[i] = hidden_out; |
| 141 | +} |
| 142 | + |
| 143 | +print_array(o1); |
| 144 | +print_array(o2); |
| 145 | +``` |
| 146 | +
|
| 147 | +
|
| 148 | +## Compilation and Execution |
| 149 | +
|
| 150 | +Like TensorFlow programs, a PaddlePaddle program is written in Python. The first part describes a neural network as a protobuf message, and the rest part executes the message for training or inference. |
| 151 | +
|
| 152 | +The generation of this protobuf message is like what a compiler generates a binary executable file. The execution of the message that the OS executes the binary file. |
| 153 | +
|
| 154 | +## The "Binary Executable File Format" |
| 155 | +
|
| 156 | +The definition of the protobuf message is as follows: |
| 157 | +
|
| 158 | +```protobuf |
| 159 | +message BlockDesc { |
| 160 | + repeated VarDesc vars = 1; |
| 161 | + repeated OpDesc ops = 2; |
| 162 | +} |
| 163 | +``` |
| 164 | + |
| 165 | +The step net in above RNN example would look like |
| 166 | + |
| 167 | +``` |
| 168 | +BlockDesc { |
| 169 | + vars = { |
| 170 | + VarDesc {...} // x |
| 171 | + VarDesc {...} // h |
| 172 | + VarDesc {...} // fc_out |
| 173 | + VarDesc {...} // hidden_out |
| 174 | + VarDesc {...} // sum |
| 175 | + VarDesc {...} // act |
| 176 | + } |
| 177 | + ops = { |
| 178 | + OpDesc {...} // matmul |
| 179 | + OpDesc {...} // add_two |
| 180 | + OpDesc {...} // sigmoid |
| 181 | + } |
| 182 | +}; |
| 183 | +``` |
| 184 | + |
| 185 | +Also, the RNN operator in above example is serialized into a protobuf message of type `OpDesc` and would look like: |
| 186 | + |
| 187 | +``` |
| 188 | +OpDesc { |
| 189 | + inputs = {0} // the index of x |
| 190 | + outputs = {5, 3} // indices of act and hidden_out |
| 191 | + attrs { |
| 192 | + "memories" : {1} // the index of h |
| 193 | + "step_net" : <above step net> |
| 194 | + } |
| 195 | +}; |
| 196 | +``` |
| 197 | + |
| 198 | +This `OpDesc` value is in the `ops` field of the `BlockDesc` value representing the global block. |
| 199 | + |
| 200 | + |
| 201 | +## The Compilation of Blocks |
| 202 | + |
| 203 | +During the generation of the Protobuf message, the Block should store VarDesc (the Protobuf message which describes Variable) and OpDesc (the Protobuf message which describes Operator). |
| 204 | + |
| 205 | +VarDesc in a block should have its name scope to avoid local variables affect parent block's name scope. |
| 206 | +Child block's name scopes should inherit the parent's so that OpDesc in child block can reference a VarDesc that stored in parent block. For example |
| 207 | + |
| 208 | +```python |
| 209 | +a = pd.Varaible(shape=[20, 20]) |
| 210 | +b = pd.fc(a, params=["fc.w", "fc.b"]) |
| 211 | + |
| 212 | +rnn = pd.create_rnn() |
| 213 | +with rnn.stepnet() as net: |
| 214 | + x = net.set_inputs(a) |
| 215 | + # reuse fc's parameter |
| 216 | + fc_without_b = pd.get_variable("fc.w") |
| 217 | + net.set_outputs(fc_without_b) |
| 218 | + |
| 219 | +out = rnn() |
| 220 | +``` |
| 221 | +the method `pd.get_variable` can help retrieve a Variable by a name, a Variable may store in a parent block, but might be retrieved in a child block, so block should have a variable scope that supports inheritance. |
| 222 | + |
| 223 | +In compiler design, the symbol table is a data structure created and maintained by compilers to store information about the occurrence of various entities such as variable names, function names, classes, etc. |
| 224 | + |
| 225 | +To store the definition of variables and operators, we define a C++ class `SymbolTable`, like the one used in compilers. |
| 226 | + |
| 227 | +`SymbolTable` can do the following stuff: |
| 228 | + |
| 229 | +- store the definitions (some names and attributes) of variables and operators, |
| 230 | +- to verify if a variable was declared, |
| 231 | +- to make it possible to implement type checking (offer Protobuf message pointers to `InferShape` handlers). |
| 232 | + |
| 233 | + |
| 234 | +```c++ |
| 235 | +// Information in SymbolTable is enough to trace the dependency graph. So maybe |
| 236 | +// the Eval() interface takes a SymbolTable is enough. |
| 237 | +class SymbolTable { |
| 238 | + public: |
| 239 | + SymbolTable(SymbolTable* parent) : parent_(parent) {} |
| 240 | + |
| 241 | + OpDesc* NewOp(const string& name=""); |
| 242 | + |
| 243 | + // TODO determine whether name is generated by python or C++ |
| 244 | + // currently assume that a unique name will be generated by C++ if the |
| 245 | + // argument name left default. |
| 246 | + VarDesc* NewVar(const string& name=""); |
| 247 | + |
| 248 | + // find a VarDesc by name, if recursive true, find parent's SymbolTable |
| 249 | + // recursively. |
| 250 | + // this interface is introduced to support InferShape, find protobuf messages |
| 251 | + // of variables and operators, pass pointers into InferShape. |
| 252 | + // operator |
| 253 | + // |
| 254 | + // NOTE maybe some C++ classes such as VarDescBuilder and OpDescBuilder should |
| 255 | + // be proposed and embedded into pybind to enable python operate on C++ pointers. |
| 256 | + VarDesc* FindVar(const string& name, bool recursive=true); |
| 257 | + |
| 258 | + OpDesc* FindOp(const string& name); |
| 259 | + |
| 260 | + BlockDesc Compile() const; |
| 261 | + |
| 262 | + private: |
| 263 | + SymbolTable* parent_; |
| 264 | + |
| 265 | + map<string, OpDesc> ops_; |
| 266 | + map<string, VarDesc> vars_; |
| 267 | +}; |
| 268 | +``` |
| 269 | +
|
| 270 | +After all the description of variables and operators is added into SymbolTable, |
| 271 | +the block has enough information to run. |
| 272 | +
|
| 273 | +The `Block` class takes a `BlockDesc` as input, and provide `Run` and `InferShape` functions. |
| 274 | +
|
| 275 | +
|
| 276 | +```c++ |
| 277 | +namespace { |
| 278 | +
|
| 279 | +class Block : OperatorBase { |
| 280 | +public: |
| 281 | + Block(const BlockDesc& desc) desc_(desc) {} |
| 282 | +
|
| 283 | + void InferShape(const framework::Scope& scope) const override { |
| 284 | + if (!symbols_ready_) { |
| 285 | + CreateVariables(scope); |
| 286 | + CreateOperators(); |
| 287 | + } |
| 288 | + // should run InferShape first. |
| 289 | + for (auto& op : runtime_table_.ops()) { |
| 290 | + op->InferShape(scope); |
| 291 | + } |
| 292 | + } |
| 293 | +
|
| 294 | + void Run(const framework::Scope& scope, |
| 295 | + const platform::DeviceContext& dev_ctx) const override { |
| 296 | + PADDLE_ENFORCE(symbols_ready_, "operators and variables should be created first."); |
| 297 | + for (auto& op : runtime_table_.ops()) { |
| 298 | + op->Run(scope, dev_ctx); |
| 299 | + } |
| 300 | + } |
| 301 | +
|
| 302 | + void CreateVariables(const framework::Scope& scope); |
| 303 | + void CreateOperators(); |
| 304 | +
|
| 305 | + // some other necessary interfaces of NetOp are list below |
| 306 | + // ... |
| 307 | +
|
| 308 | +private: |
| 309 | + BlockDesc desc_; |
| 310 | + bool symbols_ready_{false}; |
| 311 | +}; |
| 312 | +``` |
| 313 | + |
| 314 | +## The Execution of Blocks |
| 315 | + |
| 316 | +Block inherits from OperatorBase, which has a Run method. |
| 317 | +Block's Run method will run its operators sequentially. |
| 318 | + |
| 319 | +There is another important interface called `Eval`, which take some arguments called targets, and generate a minimal graph which takes targets as the end points and creates a new Block, |
| 320 | +after `Run`, `Eval` will get the latest value and return the targets. |
| 321 | + |
| 322 | +The definition of Eval is as follows: |
| 323 | + |
| 324 | +```c++ |
| 325 | +// clean a block description by targets using the corresponding dependency graph. |
| 326 | +// return a new BlockDesc with minimal number of operators. |
| 327 | +// NOTE not return a Block but the block's description so that this can be distributed |
| 328 | +// to a cluster. |
| 329 | +BlockDesc Prune(const BlockDesc& desc, vector<string> targets); |
| 330 | + |
| 331 | +void Block::Eval(const vector<string>& targets, |
| 332 | + const framework::Scope& scope, |
| 333 | + const platform::DeviceContext& dev_ctx) { |
| 334 | + BlockDesc min_desc = Prune(desc_, targets); |
| 335 | + Block min_block(min_desc); |
| 336 | + min_block.Run(scope, dev_ctx); |
| 337 | +} |
| 338 | +``` |
0 commit comments