Popular chat-bot LLMs weren't trained to do math, they were trained to predict the next token in a conversational setting. The fact that they can do any math at all is a miracle considering that they have absolutely no thought process; they're just predicting the next token, one after the other, until some threshold is met and they stop producing output.
Framing it another way, it's sort of like if you asked your calculator to have a conversation with you and it actually had a fairly decent go of it. Sure, it wasn't grammatically correct a lot of the time and it struggled quite a bit. But it wasn't at all designed to speak conversationally, so the fact that it could respond at all should be rather impressive.
They can write proofs, so they can do math. If you mean they're bad at arithmetic, a big part of that is improper tokenization. If you do digit level tokenization, even small transformers can learn arithmetic and generalize to longer inputs.
They can do math - they can write code to be executed. It's not much different than me or you doing any maths more complex than reciting the times table.