These results are promising and hopefully carry over to the upcoming Strix Halo which I’m eagerly awaiting. With a rumoured 40 compute cores and performance on par with a low power (<95W) mobile RTX4070, it would make an exciting small form gaming box.
I've been super excited for Strix Halo, but I'm also nervous. Strix Halo is a multi-chip design, and I'm pretty nervous about whether AMD can pull it off in a mobile form factor, while still being a good mobile chip.
Strix Point can be brought down to 15W and still do awesome. And go up to 55W+ and be fine. Nice idles. But it's monolithic, and I'm not sure if AMD & TSMC are really making that power penalty of multichip go down enough.
Very valid concerns! AMD's current die-to-die interconnects have some pretty abysmal energy/bit. Really hope they can pull off something similar to Intel's EMIB maybe?
The rumors saying Strix Halo will be a multi-chip product are saying it's re-using the existing Zen5 CPU chiplets from the desktop and server parts and just replacing the IO die with one that has a vastly larger iGPU and double the DRAM controllers. So they might be able to save a bit of power with more advanced packaging that puts the chiplets right next to each other, but it'll still be the same IF links that are power-hungry on the desktop parts.
Me too. There's at least one manufacturer who makes pretty sweet mini-ITX motherboard with R9 7945HX, I hope they will follow up with Strix Halo once it's released.
I considered Nextflow before begrudgingly settling on snakemake for my current project. Didn't record why... possibly because snakemake was already a known quantity and I was under time pressure or because I felt the task DAG would be difficult to specify in WDL. It's certainly the most mature of the bunch.
Nobody wants to write or debug groovy, especially scientists who are used to python. It also causes havoc on a busy SLURM scheduler with its lack of array jobs (heard this is being fixed soon).
If your project depends heavily on general purpose GPU programming, you might start one in C++.
This was the case for a project I am working on that was started in the last year. The interop features in rust (and other languages) are simply not as reliable as writing kernels directly in CUDA or HIP or even DPC++. You _can_ attempt to write the GPU code in C++ and call to this from $LANG via ffi, but if you want to preserve data structures and methods to work on both the host and device, its still easier to write it once in C++.
I concur re not being able to share data structures. I've been using `cudarc` for FFI (Shaders in C++; CPU code in rust), but that would be a nice perk.
I’m really unsure why this is front page. For a hacker news audience that has little knowledge of Aotearoa New Zealand history, this is an odd first introduction that has historically been used to vilify Maori and in turn justify colonisation. If this is your first exposure to the history of Maori, please know this emphasis carries its own agenda.
I think if someone had posted some link to an uncontroversial part of nz history it would languish on the new page with 1 or two points. Things that reinforce discourses of racial tension seem to constantly get upvoted .. somehow..
You should read a little more closely before such strong condemnations.
The Julia macros @btime and the more verbose @benchmark are specially designed to benchmark code. They perform warm up iterations, then run hundreds of samples (ensuring there is no inlining) and output mean, median and std deviation.
This is all in evidence if you scroll down a bit, though I’m not sure what has been used to benchmark the Mojo code.
Whilst the Julia version currently beats Mojo, I fully expect both to approach basically the same performance with enough tinkering, and for that performance to be on par with C or Fortran.
A more interesting question is which version is more elegant, ‘obvious’ and maintainable. (Deeply familiar with both, but money is on Julia).
> A more interesting question is which version is more elegant, ‘obvious’ and maintainable. (Deeply familiar with both, but money is on Julia).
Yes, more than raw speed, what impresses me is that the version of code in [1] is already a few times faster than the Mojo code - because that's pretty basic Julia code that anyone with a little Julia experience could write, and maintain easily.
The later versions with LoopVectorization require more specialized knowledge , and get into the "how can we tune this particular benchmark" territory for me (I don't know how to evaluate the Mojo code in this regard as yet, how 'obvious' it would be to an everyday Mojo developer). So [1] is a more impressive demonstration of how an average developer can write very performant code in Julia.
User lmiq articulates the sentiment well in a later comment in the OP thread:
> what I find more interesting here is that the reasoning and code type applies to any composite type of a similar structure, thus we can use that to optimize other code completely unrelated to calculations with complex numbers.
It matters less whether super-optimized code can be written in Julia for this particular case (though there's some value in that too), the more important and telling part to me is that the language has features and tools that can easily be adopted to a general class of problems like this.
An even more interesting question is: which version will actually entice millions of independed and variably motivated actors from all walks of life to commit and invest to a particular ecosystem. Technnical and usability aspects play only a minor role in technology adoption. In particular the best technology doesnt always win.
My humble two pennies is that Julia is missing the influencer factor: being endorsed by widely known entities that will attract the attention of both corporate eyes and the hordes of developers constantly looking for the next big thing.
Your money might be on Julia but $100mln was just placed on the Mojo/Modular bet...
I've tried julia a handful of times. IMO, the thing slowing adoption is that the usecases where julia feels like the most powerful, optimal choice are too limited. For example
- Slow startup times (e.g., time-to-first-plot) kill it's a appeal for scripting. For a long time, one got told that the "correct" way to use julia was in a notebook. Outside of that, nobody wanted to hear your complaints.
- Garbage collection kills it's appeal for realtime applications.
- The potential for new code paths to trigger JIT compilation presents similar issues for domains that care about latency. Yes, I know there is supposedly static compilation for julia, but as you can read in other comments here, that's still a half baked, brittle feature.
The second two points mean I still have the same two language problem I had with c++ and python. I'm still going to write my robotics algorithms in c++, so julia just becomes a glue language; but there's nothing that makes it more compelling that python for that use. This is especially true when you consider the sub-par tooling. For example, the lsp is written julia itself, so it suffers the same usability problems as TTFP : you won't start getting autocompletions for several minutes after opening a file. It is also insanely memory hungry to the extent that it's basically unusable on a laptop with 8gb of ram (on the other hand, I have no problem with clangd). Similarly, auto-formatting a 40 line file takes 5 seconds. The debugging and stacktrace story is similarly frustrating.
When you take all of this together, julia just doesn't seem worth it outside of very specific uses, e.g., long running large scale simulations where startup time is amortized away and aggregate throughput is more important than P99 latency.
Some of what you have written seems pre 1.0 release and some pre 1.9. I have never seen anybody in the community say the correct way to use Julia is in a notebook. As far as I have seen, some people use a simple editor and have the REPL open, and most just use it in vscode.
You can do real-time applications just fine in Julia, just preallocate anything you need and avoid allocations in the hot loop, I am doing real-time stuff in Julia. There are some annoyances with the GC but nothing to stop you from doing real-time. There are robotics packages in Julia and they are old, there is a talk about it and compares it with c++(spoiler, developing in julia was both faster and easier and the results were faster).
I have been using two Julia sessions on an 8gb laptop constantly while developing, no problem. LSP loads fine and fast in vscode no problem there either.
The debugger in vscode is slow and most don't use it. There is a package for that. The big binaries are a problem and the focus is shifting there to solve that. Stacktrace will become much better in 1.10 but still needs better hints(there are plans for 1.11). In general, we need better onboarding documentation for newcomers to make their experience as smooth as possible.
The recommended solution to slow startup times has always been keep a repl open. That's basically the same workflow as a notebook in my mind. Like I said this means there is a large class of tasks julia a doesn't make sense for because paying the startup cost is too expensive compared to python or go.
I just timed vscode with the lsp. From the point I open a 40 line file of the lorenz attractor example, it takes 45 seconds until navigation within that same file works, and the lsp hogs 1 GB of memory. That's 5x the memory of clangd and 20x worse performance; hardly what I would consider a snappy experience.
I have no doubt that julia can be shoe-horned into realtime applications. But when I read threads like this [1], it's pretty clear that doing so amounts to a hack (e.g., people recommending that you somehow call all your functions to get them jited before the main loop actually starts). Even the mitigations you propose, i.e., pre-allocating everything, don't exploit any guarantees made by the language, so you're basically in cross-your-fingers and pray territory. I would never feel comfortable advocating for this in a commercial setting.
I don't know man, I just tested vscode and it's almost instant, loads every function from multiple files in less than 5 seconds. I'm on a 13-inch intel Mac and Julia 1.11 master (1.9 and 1.10 should be the same).
Having a REPL open is not the same thing as a notebook, if you feel like that, cool I guess.
That thread is old and Julia can cache compiled code now from 1.9 and onward. However, it can not distribute the cached code(yet).
Writing the fastest possible real-time application in c/c++ has the same principles as in Julia. It's not as shoe-horned as you might believe.
When developing Julia, the developers chose some design decisions that affected the workflow of using the language. If it doesn't fit your needs that's cool, don't use it. If you are frustrated and like to try the language come to discourse, people are friendly.
>I don't know man, I just tested vscode and it's almost instant, loads every function from multiple files in less than 5 seconds. I'm on a 13-inch intel Mac and Julia 1.11 master (1.9 and 1.10 should be the same).
I know, I'm always "holding it wrong". And that's the problem with julia.
> Having a REPL open is not the same thing as a notebook, if you feel like that, cool I guess.
Both workflows amortize the JIT times away by keeping an in-memory cache compiled code. This makes a lot of smaller scripting tasks untenable in julia. So people chose python instead. That means julia needs a massive advantage elsewhere if they are going to incorporate both languages into their project.
> When developing Julia, the developers chose some design decisions that affected the workflow of using the language. If it doesn't fit your needs that's cool, don't use it. If you are frustrated and like to try the language come to discourse, people are friendly.
This thread was about why julia hasn't seen wider adoption. It's my contention that the original design decisions are a one of the root causes of that.
I just tried it from the Windows command line and this benchmark with the plots ran in what seemed like instant, and some simple timing showed it was under 2 seconds with a fresh Julia v1.10 beta installation. That seems to line up with what amj7e is saying, and I don't think anyone would call the Windows command line the pinnacle of performance? That's not to say Julia's startup is fast, but it has improved pretty significantly for workflows like due to the package caching. It needs to keep improving, and the work to pull OpenBLAS safely out of the default system image will be a major step in that direction, but it's already almost an order of magnitude better than last year in most of the benchmarks that I run.
I think the person you are replying to was using notebook as shorthand for interactively. You don't write scripts that you call, you have to have a repl open to interactively feed code to.
Autocompletion in Julia is also just terrible and the tooling really is lacking compared to better funded languages. No harm in admitting that (when Julia had no working debugger some people were seriously arguing that you don't need one: Just think harder about your code! Let's please bury that attitude...)
This certainly has not been my experience with Julia people. Sure there are opinionated people in every community, but most of pain points are acknowledged and known.
I can confirm that there were multiple (heated) arguments on Discourse, where some posters completely dismissed the need for debuggers in general. I remember it quite well.
It was very strange, but I don't think it says anything about the community, except that people have different opinions and preferences, like in any community.
>> For a long time, one got told that the "correct" way to use julia was in a notebook. Outside of that, nobody wanted to hear your complaints.
> I have never seen anybody in the community say the correct way to use Julia is in a notebook.
patrick's comment is fully in the past tense for this part, and that was indeed a pretty common thing for a long while in the past. Especially pre-1.0, before Revise became mature and popular, an oft-recommended workflow was to use a notebook instead of an editor. Or it would come in the form of a reply to criticism about the slow startup or compilation latency - "the scientists who this language is made for use notebooks anyway, so it doesn't matter" was a common response, indirectly implying that that was the intended way to use the language.
You must mean REPL, not notebook. I've been following the community since before the move to Discourse, and "use the REPL" surely outnumbers "use a notebook" by orders of magnitude.
Here on HN (in threads about Julia) the focus was generally on notebooks as I remember it. That's the context I assumed, but if it's about Julia fora in general I agree, the REPL had/has been talked about much more often than notebooks.
About the show startup times, they have been worked massively in the latest version of Julia. Mainly in version 1.9, which is the first version of Julia that saves native compiled code. You can RAAF more about that in the released blog [1].
On garbage collection and real-time applications, there is this [2] talk where ASML (the manufacturer of photolithography machines for TSMC) uses Julia for it. Basically it preallocates all memory needed before hand and turns off the garbage collector.
On the same more about real-time, if your call stack is all type stable [3], the you can be sure that after the first call (and subsequent compilation), the JAOT compiler won't be triggered.
About static compilation, there are two different approaches
* PackageCompiler.jl [4], which is rally stable and used in production today. It has he downside of generating huge executables, but you can do work to trim them. There is still work to do on the size of them.
* StaticCompiler.jl [5], which is still in the experimental phase. But it is far from being completely brittle. It does puts several restrictions on the chide you can write and compile with it, basically turning Julia in a static type language. But it had been successfully used to compile linkable libraries and executables.
Some of the concerns you have about usability in your third paragraph have been worked on with the 1.9 and 1.10 (coming) releases. The LSP usage is better thanks to native code caching, maybe you can try it again (of you have time). The debugging experience I honestly think is top notch if you're using Debugger.jl+Revise.jl [6] [7], still I know there are some caveats in it. About stack traces, there is also a more of work done to make them better and more readable, you can read the with done in these PR's [8] [9] [10] [11, for state of Julia talk].
Still, I can understand that Julia might not be able (yet) to cover all the usecases or workflows of different people.
IMO the reason Julia gets to be this fast is because of LLVM, and the guy who created LLVM is also the creator of Mojo so there is something to be said about that
My understanding is that Julia gets to be this fast because the language design was optimized for performance from the beginning, by clever use of its type hierarchy and multiple dispatch for compilation into very specific and efficient machine code (plus a thousand other little optimizations like constant propagation, auto-vectorization, etc.)
LLVM helps with its own optimizations, but more and more of those optimizations are being moved to the Julia side nowadays (since the language has more semantic understanding of the code and can do better optimizations). I believe the main thing LLVM helps with is portability across many platforms, without having to write individual backends for each one.
Not necessarily. Julia will always be 11 years older than Mojo, no matter how old both of them get, and that advantage won't shrink. Not to mention, Mojo is a superset of a 40-year old language with billions of dollars of development poured into it, plus an extra hundred million poured directly into Mojo itself. If we go by resources spent on each, Mojo has had gotten about 5x more investment than Julia.
yeah pretty much any strongly performance oriented modern language should be able to be massaged into emitting whatever instructions should give close to optimal performance here.
It's always fun though when one language does better naïvely in a benchmark to delve in and see how to match or surpass them, and see if it was worth the trouble.
Microbenchmark performance for languages in this class definitely shouldn't be seen as a strongly deciding factor though.
I disagree, actually. I have found microbenchmarks to be very informative to understand why a language is fast in some cases and slow in others.
It's not only the actual benchmark numbers though. Its understanding the code that reaches those numbers: Can Julia do explicit SIMD? How awkward is that in one language or the other? Are there idiosyncratic bottlenecks? Bad design decisions that needs to be worked around in one language but not the other? And so on.
“The Linux kernel's stateless video decoder interface is used for video decoding where no state needs to be kept between processed video frames and allows for independently decoding each video frame.”
That's confusing and if I understand correctly misleading. Here's an alternate summary that I think is more meaningful. tl;dr the state is in the userspace client, and passed to every operation:
> A stateless decoder is a decoder that works without retaining any kind of state between processed frames. This means that each frame is decoded independently of any previous and future frames, and that the client is responsible for maintaining the decoding state and providing it to the decoder with each decoding request.