I’ve written a library as well as a toy compiler in Rhombus. I occasionally work with Matthew Flatt. I’m happy to try to answer any questions! :)
One of the biggest things that make Rhombus an interesting language is its “shrubbery” notation. In languages with macros, you typically get faced with a trade-off: if you want to operate on an AST, then the syntax going into the macro needs to be parsable by the host language, thus limiting what kind of syntax you can have in a macro. On the other hand, you can pass a token stream, which gives you a lot more flexibility; the problem now is that you need some parser to recover binding syntax when you want it, which complicates things.
Rhombus does a little bit of parsing, but leaves subterms unparsed until later. This gives you flexibility in your macro syntax without the complications of using a raw token stream.
There’s a lot of other interesting stuff like its ability to bind macros in different spaces (a macro can have one expansion when used on the left of an assignment and a different one in expression position), the ability to attach static information to identifiers at compile time and have that propagate to use sites, etc.
fun
| is_sorted([] || [_]): #true
| is_sorted([head, next, tail, ...]):
head .<= next && is_sorted([next, tail, ...])
is_sorted([1, 2, 3, 4, 5])
is_sorted([1, 2, 30, 4, 5])
a bunch of ways to express blocks, pattern matching and macros. The class facilities also look very nice. other than some indentation it doesn't feel pythony at all (which for me is a good thing!) more like an Elixir feel (cute syntax for some great concepts behind) see https://github.com/racket/rhombus/blob/master/demo.rhm
The above is a super concise syntax example that showcases multiple things at the same time. It is function definition mixed with pattern matching, with some advanced pattern matching features, like the '...' and '||' symbols. You can rewrite the example into the code below, if you find it to be more readable.
fun is_sorted(list):
match list:
| []: #true
| [_]: #true
| [head, next, tail, ...]:
head .<= next && is_sorted([next, tail, ...])
There's also a dot in `head .<= next`, because the syntax is sugar for `head.<=(next)`. Not sure why this is needed, but a quick read on the docs suggests this is done to enable static dispatch for some calls.
Not sure that can be called sugar when it it has just as many characters, two spaces rather than two brackets, with remaining lexies in same order, and the cognitive load is at least as important as it exposes the dot-notation within an infix three characters operator.
The global project might be nice, but in this particular case it's kind of opt for worst of all pre-existing conventions while trying to please everyone. The challenge is tough, so that's no wonder it will have hard time matching the goal.
~ $ racket -I rhombus
Welcome to Racket v8.13 [bc].
> fun is_sorted(list):
match list:
| []: #true
| [_]: #true
| [head, next, ...]:
head .<= next && is_sorted([next, ...])
; readline-input:6:19: next: cannot use repetition binding as an expression
; in: next
; [,bt for context]
From what I can understand, the "..." isn't independent of the previous expression in the list (as I would have expected from e.g. Prolog or Haskell). Instead you are defining a kind of pattern to reuse later, so tail gives a name to the rest of the list (so that it doesn't get associated with next).
Ah, now that makes sense, the three dots need to be assigned to a variable. It's just how they do it is completely new to me. Thank you for finding out!
I'm not going to file a bug report, that's more time than I'm willing to spend on a project I'm not involved in. I will say I'm using Termux 0.118 and Racket 8.13 on kernel 5.10 and Android 14 and am willing to answer questions.
I just installed the package called `racket` from Termux's upstream, and it seems that they're using racket-minimal for that. Bit of a gotcha, but at least it doesn't seem like there's a bug. Thanks for the tip.
They are also using : ; , and a new line. Also ' and << >> that I have no idea how to type.
They are using both :~ and :: for type hints.
In none of the languages I've seen that was necessary or useful.
I think the problem they were solving was ... we really want to be able to cram everything into one line, so even if you split lines you might just as well still use the same characters despite them being unnecessary then.
The syntax `expr :~ annot` is used to annotate the expression with static information. This is different from type annotations.
It's a general mechanism to associate an expression with "extra information" that can be used elsewhere (at compile time).
One can for example using static information to implement dot notation for an object system. Using static information, one can push name resolution from runtime to compile time.
The important part is that users of Rhombus can use the mechanism for tracking static information specific for their programs.
It will be exciting to see the creative uses of this mechanism in the future.
> The syntax `expr :~ annot` is used to annotate the expression with static information.
Compiler can tell if a thing is static or dynamic and apply the correct behavior. Why would I ever want to check static thing only dynamically and why would I ever want to try statically check dynamic type if not by mistake?
If programmer doesn't really have a real choice why make him choose which buttons to press?
The idea behind macros is so to speak to allow the programmer to extend the compiler.
In a language like MetaPost I can use equations between variables:
x + y = 10
x - y = 6
A reference to a variable x will use the current set of equations and solve for x. The solution (it it exists) will become the value of the variable reference.
Let's say I want to extend Rhombus with this new kind of variable
(let's call them linear variables).
Each linear variable needs to have some static information attached.
Here the relevant information is the set of equations to consider.
A reference to a linear variable will then reduce (at compile time) the equations associated with the variable. The reduced set of equations is then used to generate runtime code that finds the value.
In a sense "static information attached to an expression" is just a convenient
way of working with compile time information (in the sense currently used by
macros in Racket).
I'm a bit confused: Can this static information system be used for run-of-the-mill static type checking as well or not? And if so, does a static type checker for this language exist or is one in the works?
Yes, I also got Python vibes. That behind said, I'm digging infix less and less over time, and I'm actually starting to crave forth syntax, just got the complete lack of punctuation...
but, I actually really like what they have going here. Seems nice and minimalist, while meeting consistent and clean.
for anyone interested, I recommend spending a few minutes with this to get a feel for the language (and if something is undecipherable then searching the docs for that thing, because the docs are super verbose and hard to navigate, and too dry, IMHO):
I stopped at 1.1 Notation. Full of arbitrary-looking decisions on which characters can be used for what.
It's 2024, and we still don't have a string notation that doesn't use the same character for opening and closing delimiter. If I start parsing from an arbitrary offset in the code, I can't say whether a double quote I read is the beginning of a string, or the end of one. I have to either resort to heuristics, or parse from the beginning of the file (at least once; and then cache offsets known to be outside a string). Something like "() would be nice. Still the familiar double quote, but the grouping is defined in a grammatically-superior way.
Also, still no identifiers that can start with a digit. Most of the mainstream languages have such complex grammars, probably requiring hand-coded parsers, but I can't have a "52cards" identifier. Is this really that hard compared to everything else?
Now, I'm self-taught and all. Maybe I'm missing something and the professors are right.
Using parens is still problematic because they have semantic meaning in most coding languages, meaning you'll still have to backtrack to the opening paren to decide whether the later paren is closing an expression or a string.
> Is this really that hard compared to everything else?
In the languages I've seen that don't allow numbers in identifiers, it's because doing so makes other expressions ambiguous.
E.g. in Python (et al), things that start with 0x are treated as a hex literal. 0x9 would be ambiguous because it could either be an identifier named 0x9 or a literal for 9 in hex.
It also makes integer literals ambiguous, because 54 would be both a valid identifier and a valid literal.
You could disambiguate that with more rules (identifiers can include numbers but can't start with 0x, identifiers must include at least one non-numeric character), but the gain for doing so is so low it feels a little Quixotic.
Forth almost has you covered on both points: `52cards`, `1+`, and `0` would all first be looked up in the dictionary, and only if they had not been defined would an integer conversion be attempted, and `s" ` is in principle distinct from the trailing `"`.
[unicode does have various quotation-mark pairs, but note which is the opener can be natural-language dependent: eg « french » vs »magazine german«]
There is a classic English-language solution in the so called ⁶⁶round quotes⁹⁹, like the accursed SmartQuotes MS Word feature inflicts. There is more than one pair, and they look visually indistinct in some situations so I used superscript numbers above, rather than “this”.
I like the «guillemot» (French) solution, and also the similar ‹chevrons›. Like all similar holy wars there are issues with familiarity and input methods etc. and I don't think it will be solved through mere elegance.
Honorable mention to the classic ``unix quotes'' that are still seen in typesetting software.
> If I start parsing from an arbitrary offset in the code, ...
Why would you ever do that? What's the point?
There are many other examples, but in C and C++, if you don't start parsing at the beginning, you're definitely going to get many things wrong. What if you start parsing in the middle of an identifier? How can you possibly expect to get something useful from that?
That... does not work in most programming languages. Especially so for showing indent levels correctly. Generally speaking, the language server (or whatever) is parsing the entire file.
Performance, mostly. If you have to re-parse part of the code many times a second, in a text editor. For [pseudo]structural editing or syntax highlighting, for example.
Optimizing for your edge case would require everyone writing and reading code to conform to this extra thing, which seems completely unnecessary. Machines are pretty fast.
I feel like a link to a 'pitch' or description of the Rhombus language would've been more insightful for most of us. Even being familiar with Racket, I don't know what Rhombus is.
Abstract:
Rhombus is a new language that is built on Racket. It offers the same kind of language extensibility as Racket
itself, but using conventional (infix) notation. Although Rhombus is far from the first language to support
Lisp-style macros without Lisp-style parentheses, Rhombus offers a novel synthesis of macro technology that is
practical and expressive. A key element is the use of multiple binding spaces for context-specific sublanguages.
For example, expressions and pattern-matching forms can use the same operators with different meanings and
without creating conflicts. Context-sensitive bindings, in turn, facilitate a language design that reduces the
notational distance between the core language and macro facilities. For example, repetitions can be defined and
used in binding and expression contexts generally, which enables a smoother transition from programming
to metaprogramming. Finally, since handling static information (such as types) is also a necessary part of
growing macros beyond Lisp, Rhombus includes support in its expansion protocol for communicating static
information among bindings and expressions. The Rhombus implementation demonstrates that all of these
pieces can work together in a coherent and user-friendly language.
You are being down-voted for the snark, but there is point there. This seems to be "people use Python because of the whitespace, so let's remove parenthesis." Which if true, is (1) vastly missing the point, and (2) this project isn't really exploring any fundamentally new areas of language design then.
I was kinda hoping to see this was the next generation of Lisp language design, but it appears to be just a different syntax.
I a not sure which Lisp you are thinking of, but compared to a "standard" Lisp you might be interested in how the module system interacts with the macro system.
In particular the concept of a tower of phases.
I'll have to read about why they made the choices they made, but it reminds me of going to a microbrewery: You tell them you like simple (American) macrobrews, and they say they have just the thing. What arrives is a light lager that tastes awful because they think macrobrews beers are awful. So they made an awful beer that doesn't taste like a macrobrew.
I don't think Rhombus is going to appeal to people that are comfortable with Python, C, Java, Go, JavaScript, etc... It's got infix, but it's cryptic. It seems like a step back from StandardML or Ocaml.
Maybe there's a justification in terms of how the macros work. Maybe they're poisoning the well to make Lisp syntax more appealing. I'll try to reserve judgment, but so far I'd rather program in Scheme/Racket than this.
I don't think the Racket/Rhombus developers are really trying for adoption. They're trying to push the field forward via their research. Creating these useful programming languages is how they validate their research, but the end goal is not to grab a large share of working developers but to grab mindshare of the few developers who create the future of programming. In this they have been quite successful.
I agree they probably aren't going for world domination, but here is a nice video from Mathew Flatt where he says, "The point of Rhombus is to make Racket macro technology more accessible" and "removing an obstacle for most people":
Having traditional infix operators, function calls, array subscript, and field access is a great start, but after that it doesn't look very traditional or familiar at all.
I hope they succeed, and I'll keep following their progress, but when I look at their if-statements, it's pretty non-traditional:
if is_rotten(apple)
| get_another()
| take_bite()
be_happy()
It looks like it has indentation-based grouping with vertical bars, white space, and colons implying different precedence, and that makes me have a lot more questions. I think my American macrobrew analogy above holds.
(I'm not very involved with the Rhombus project. I just post there a comment from time to time.)
I think that the macro system can handle the traditional "if X ... else ..." structure, but there is a design decision to avoid magic keywords like "else" as much as possible, like "in" in "fox X in Y: ..." Why should each construction have a different magic keyword? Can it be solved with only one thing that is shared in all construction like "|"? If you see the examples, all the other constructions with branches like "cond" that replaces "if" with "elseif" or "elif" ir whatever use "|". "match" also uses "|". This is better if you want to write macros that extend the language and have a similar structure.
(For what it's worth, I insisted in that "if" uses two "|" (one for each branch, instead of one only for the "else" part.)
Yeah, the if statements are definitely a little weird-looking to me. It seems like the goal is to allow for more traditional infix syntax while still defining most of the language using the macro system; it seems unfortunate if their macro system can't handle more traditional-looking "if (x} block; else block" conditional statements.
As someone who gets the shakes from looking at too many parenthesis I think it's a great step in the right direction.
A few weeks ago I went through the "GUI demo" source, and it's not bad. Of course just reading it doesn't tell much about what's the IDE support, how easy it is to figure out the arguments/types. (But it's rhombus/static, which is encouraging!)
A lot of folks (including me) find Python limiting for non-trivial use cases. The One Python Way was a great selling point in 2004, but the thing is, it's still basically the same Way in 2024. So, I hope you like nerfed lambdas and inheritance
I totally agree but always find myself in a very small minority when expressing this. People love Python and I don’t understand it.
For me the issue is that one cannot write in a “light functional programming” style in Python. The language lacks many simple things, but chief of them is multi-line lambdas! In 2024 I’m very surprised that people aren’t clamouring for this.
Ignoring the social benefits (which is easily the biggest draw of the language), Python feels really optimal for quick, simple tasks. The language lends itself to not overthinking it and building simple solutions.
A lot of folks (including me) find Python limiting for non-trivial use cases. The One Python Way was a great selling point in 2004, but the thing is, it's still basically the same Way in 2024, and the fact that the language has been designed so exhaustively to ensure that there is only one correct way to do stuff, there has never been room for evolution. (Insufficient entropy!)
I'm going to compare it to JavaScript/TypeScript, because it's what I know best in 2024, and because it's an engaging contrast; yet the take-home message is also applicable to other languages, such as Rhombus (which looks cool!)
Python feels timeless to me, like Roman majuscules. It was, in its day, brilliant: cleaner than Java, saner than Perl, and just so, so hackable. The strong Pythonic cultural rejection of Perl's 'more than one way to do it' dictum was powerfully clarificatory; we didn't have StackOverflow, and the _good_ technical resources were still all in physical books, so being able to learn one pretty-good way of expressing a concept or pattern was magical.
But, like roman majusucles, Python didn't evolve, because it didn't have to. The marginal cost of change threatened the original value proposition, so it just didn't really ever happen.
By contrast, while e.g. JavaScript had to evolve, because it was gobsmackingly bad, the necessity of that evolution has made made JavaScript (as a community and language) open to variation, change, competing paradigms, and imports from academe and research. Evolution loves a mess.
TypeScript, for example, happened nearly overnight, and as a result of it and other innovations, I can spend my day working blissfully in algebraic types, doing frigging set theory to do precise type hinting, and passing around pure functions and immutable structures. My code runs everywhere, and my coding style is always changing (improving!), and the only real price I've had to pay is learning some regrettable nonsense about an extra '=' in my comparison operators, and maybe the idiocy of having both `undefined` and `null` types.
Whereas, when I peep the pythonista channels at my work, I notice they are still having essentially the same conversation about eliminating the GIL that I remember them having in 2007 (yes I am old.)
Which is not to say that Python is _bad_, per se; there are obvious advantages to having an imperfect but opinionated lingua franca, and I'd sure rather be thrown into an unfamiliar Python codebase from 10 years ago than an unfamiliar JavaScript codebase of an equivalent age.
Yet I'll warrant that Python's long summer of success, combined with its one-way-to-do-it culture, close the mind and the imagination, and will eventually make it less fit-for-purpose than its competition. It will remain in use, and it will even find new contexts (machine learning, say) but 'the code part of the codebase' will be done in other languages.
I suspect Python will, thanks to its exceptional readability and regularity, become a configuration language --- a part of the UI, essentially, a sort of shell. It will also continue to be a language used to teach programming. Hanging on here and there, sort of like how Latin hangs around in biology and medicine. But legacy Python codebases, thanks to that very readability, will probably be rewritten sooner rather than later.
Standards (Latin, Python) are _useful_, and _timeless_ standards are some of the most valuable artifacts humans have ever produced.
Hm, I don't think the "one way of doing things" koan holds up in practice, other than being a nice narrative.
Nearly all other languages strive for doing things in one way, it is not something that makes Python unique. In fact, Python typically offers a complete mess of ways in which to solve something. Classes are sometimes good, sometimes they're not. Lists or Numpy arrays or Torch tensors, the choice depends mostly on performance, not on style.
And Python is evolving. There is optional type checking for instance.
Just so. Code written in Python has always had the virtue of being _incredibly boring_, which is a virtue that, at the time of its inception, was criminally undervalued; this was, after all, the heyday of C++, and if you weren't bringing operator overloading and multiple inheritance and generics to the table, the hipsters sniffed.
For example, no one complained that Python had multiple inheritance; instead, we thought this was _a point in its favour_, over and against Java. (I imagine Guido added it grudgingly as a vox-populi.)
Thus, the Pythonic mindset emerged as a sort of 'refusal of the call', sort of like Indiana Jones shooting the sword guy (https://www.youtube.com/watch?v=kQKrmDLvijo). You could be against the hermetic complexity of Perl, but do it better than Java! Neat!
These days, however, I suspect that Python, while still boring, is boring in the _wrong_ way, leaving opportunities for concision, clarity and performance on the table -- now-basic stuff like immutable datatypes, monads, tail recursion, concatenative programming, and so on.
As a longtime python programmer I disagree. Python is so stuck in its old ways that several useful and interesting PEPs just get rejected. In 2024, the thought of not having macros, a decent version manager, not being able to modify running code despite being an interpreted language, not having multi line lambdas, not having several of core language features in lambdas (e.g., no try/except), the pain of creating thunks, the overhead of closures, not being able to read module files easily (if they’re in another directory), etc. make Python one of the most frustrating languages.
Python arguably has better support for working with sum-types (algebraic types) than TypeScript does, because the language actually has a `match` statement (since 2021). Define a sum-type as a Union of dataclasses and the static type-checker (Pyright) can even tell you when your pattern-matching statements are non-exhaustive.
Do you track such developments, or spend the time dreaming up these elaborate theories? :)
Not particularly (point taken), but I _do_ note that describing a napkin-sketch as an 'elaborate theory' is perhaps more flattering than you mean it to be ;)
I can't tell what you mean concretely, because the two examples you give -- gradual typing and lack of concurrently executing threads -- are common to both languages. Python has support for gradual typing, and JavaScript is single-threaded (which is an even stronger property than having a GIL).
I don't think Rhombus is python style. And this won't convert anyone anyway; it's experimenting with new language constructs and such which is what racket is used for by most who use it. In many years from now, the lesson learned by these experiments might end up in a new language that might get converts and even rival python/js (I hope so, I find both terrible).
Best-in-class macros! Rhombus makes metaprogramming safe and fun.
Rhombus uses RRB trees for its native list structure: immutable data type with O(log32(n)) random access and functional update. Really amazing data structure.
Also, the Rhombus compiler isn’t the quickest right now (still in development) but once compiled, it’s generally much faster than Python. (Though that’s not hard.)
One of the biggest things that make Rhombus an interesting language is its “shrubbery” notation. In languages with macros, you typically get faced with a trade-off: if you want to operate on an AST, then the syntax going into the macro needs to be parsable by the host language, thus limiting what kind of syntax you can have in a macro. On the other hand, you can pass a token stream, which gives you a lot more flexibility; the problem now is that you need some parser to recover binding syntax when you want it, which complicates things.
Rhombus does a little bit of parsing, but leaves subterms unparsed until later. This gives you flexibility in your macro syntax without the complications of using a raw token stream.
There’s a lot of other interesting stuff like its ability to bind macros in different spaces (a macro can have one expansion when used on the left of an assignment and a different one in expression position), the ability to attach static information to identifiers at compile time and have that propagate to use sites, etc.
Here’s my compiler: https://codeberg.org/ashton314/rhombus-compiler
Here’s the library: https://github.com/ashton314/rhombus_dyn
Motivation for building the library: https://lambdaland.org/posts/2024-07-15_type_tailoring/