Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't understand why execution of a model with the same layers and weights would be different between PyTorch and Tensorflow.

Is it a problem of accumulation of floating-point errors in operations that are done in a different order and with different kinds of arithmetic optimisations (so that they would be identical if they used un-optimised symbolic operations), or is there something else in the implementation of a neural network that I'm missing?



In principle you can directly replace function calls with their equivalents between frameworks, this works fine for common layers. I've done this for models that were trained in PyTorch that we needed to run on an EdgeTPU. Re-writing in Keras and writing a weight loader was much easier than PyTorch > ONNX > TF > TFLite.

Arithmetic differences do happen equivalent ops, but I've not found that to be a significant issue. I was converting a UNet and the difference in outputs for a random input was at most O(1e-4) which was fine for what we were doing. It's more tedious than anything else. Occasionally you'll run into something that seems like it should be a find+replace, but it doesn't work because some operation doesn't exist, or some operation doesn't work quite the same way.


It's just that expressing those "layers and weights" in code is different in tensorflow and Pytorch. I think a good parallel would be expressing some algorithm in two programming languages. The algorithim might be identical, but JS uses `list.map(...)` and python uses `map(list, ...)`, and JS doesn't priority queues in the "standard lib" while Python does, ...etc. Similarly, the low level ops and higher level abstractions are (slightly) different in Pytorch and Tensorflow.

I'm not too familiar with Tensorflow, so I can't give an example there, but a similar issue I recently faced when converting a model from Pytorch to ONNX is that Pytorch has a builtin discrete fourier transform (DFT) operation, while ONNX doesn't (yet. They're adding it). So I had to express a DFT in terms of other ONNX primitives, which took time.


In principle all operations can be translated between frameworks, even if some ops aren't implemented in one or the other. This, however, depends on whether the translation software supports graph rewriting for such nodes.

Lambdas and other custom code are also problematic, as their code isn't necessarily stored within the graph.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: