Zilog Z80 had very significant improvements over Intel 8080. From many points of view it can be considered midway between Intel 8080/8085 and Intel 8086/8088.
Besides increasing the number of registers, Z80 has added many features that were standard in any general-purpose computer (e.g. signed integer support and indexed addressing), but which were missing in Datapoint 2200/Intel 8008/Intel 8080, simply because Datapoint 2200 had not been designed to be used in general-purpose computers but only for implementing the logic required inside a serial terminal.
However many programs for Z80 did not make good use of its additional functions, because they were intended to remain compatible with the legacy Intel 8080 systems.
Intel 8086 did not have this problem, because it was only source-code compatible with 8080, not binary compatible, so any application had to be recompiled for it, and fully using the new ISA did not have any additional disadvantage.
Unlike 6502, Intel 8080 and Z80 had a few 16-bit operations, which were intended for address computations, while data operations were expected to be handled using the 8-bit accumulator.
Despite the intended use and the limited set of 16-bit operations, implementing complicated arithmetic operations, e.g. floating-point arithmetic, was still faster using the 16-bit address registers and operations for handling data. With properly optimized programs, Z80 and even 8080 could be much faster than 6502 for number crunching. (Though faster is only relative, because FP64 floating-point operations took many milliseconds per operation on any 8-bit microprocessor, many billions of times slower than on a modern laptop or desktop CPU.)
I think, this is much about another major difference in the history of both designs: while originally for a different architecture, the DP2200 processor / Intel 8008 was meant to be a CPU for a small computer system from the beginning. The 6800 and in consequence even more so the 6502 was more about a microcontroller for implementing in software what wasn't economically viable to be put in silicon. Notably, it was not meant to be a computer CPU. Thus, the 6502 falls short on many things like support for large stacks, as required for higher languages, or efficient 16-bit operations. (Its philosophy may be better described as, "if it runs, it's good enough, even better so, if it runs cost effectively.")
PS: regarding the DP2200 not being meant as a small system, I'm not so sure about this based on my own reading. But, certainly, it wasn't marketed as such.
And, regarding the educational merits of the 6502, it may be a good second language, as it requires you to think about your implementation. (Personally, I'm more for the PDP-1, which Ed Fredkin – "world's best programmer", no less – once claimed to have inspired IBM's RISC architecture. ;-) )
Besides increasing the number of registers, Z80 has added many features that were standard in any general-purpose computer (e.g. signed integer support and indexed addressing), but which were missing in Datapoint 2200/Intel 8008/Intel 8080, simply because Datapoint 2200 had not been designed to be used in general-purpose computers but only for implementing the logic required inside a serial terminal.
However many programs for Z80 did not make good use of its additional functions, because they were intended to remain compatible with the legacy Intel 8080 systems.
Intel 8086 did not have this problem, because it was only source-code compatible with 8080, not binary compatible, so any application had to be recompiled for it, and fully using the new ISA did not have any additional disadvantage.
Unlike 6502, Intel 8080 and Z80 had a few 16-bit operations, which were intended for address computations, while data operations were expected to be handled using the 8-bit accumulator.
Despite the intended use and the limited set of 16-bit operations, implementing complicated arithmetic operations, e.g. floating-point arithmetic, was still faster using the 16-bit address registers and operations for handling data. With properly optimized programs, Z80 and even 8080 could be much faster than 6502 for number crunching. (Though faster is only relative, because FP64 floating-point operations took many milliseconds per operation on any 8-bit microprocessor, many billions of times slower than on a modern laptop or desktop CPU.)