I don't understand why execution of a model with the same layers and weights wou...

joshvm · on June 28, 2022

In principle you can directly replace function calls with their equivalents between frameworks, this works fine for common layers. I've done this for models that were trained in PyTorch that we needed to run on an EdgeTPU. Re-writing in Keras and writing a weight loader was much easier than PyTorch > ONNX > TF > TFLite.

Arithmetic differences do happen equivalent ops, but I've not found that to be a significant issue. I was converting a UNet and the difference in outputs for a random input was at most O(1e-4) which was fine for what we were doing. It's more tedious than anything else. Occasionally you'll run into something that seems like it should be a find+replace, but it doesn't work because some operation doesn't exist, or some operation doesn't work quite the same way.

davidatbu · on June 28, 2022

It's just that expressing those "layers and weights" in code is different in tensorflow and Pytorch. I think a good parallel would be expressing some algorithm in two programming languages. The algorithim might be identical, but JS uses `list.map(...)` and python uses `map(list, ...)`, and JS doesn't priority queues in the "standard lib" while Python does, ...etc. Similarly, the low level ops and higher level abstractions are (slightly) different in Pytorch and Tensorflow.

I'm not too familiar with Tensorflow, so I can't give an example there, but a similar issue I recently faced when converting a model from Pytorch to ONNX is that Pytorch has a builtin discrete fourier transform (DFT) operation, while ONNX doesn't (yet. They're adding it). So I had to express a DFT in terms of other ONNX primitives, which took time.

qayxc · on June 28, 2022

In principle all operations can be translated between frameworks, even if some ops aren't implemented in one or the other. This, however, depends on whether the translation software supports graph rewriting for such nodes.

Lambdas and other custom code are also problematic, as their code isn't necessarily stored within the graph.