Hacker News new | past | comments | ask | show | jobs | submit login

I like the flow direction compared to standard SQL. SQL is supposed to read like a sentence I suppose but I have many times looked at it and really wanted things to be in a more logical order.

My main suggestion would be to be a bit less terse and introduce a bit more firm formatting. I'm not a huge fan of the term "split" and feel like jazzing that up to "split over" or even just reviving "group by" would improve readability. Additionally the aliasing could use work, I'd suggest reversing the assignment to be something closer to `use salary + payroll_tax as gross salary`. In terms of firm formatting, unless I'm missing something there isn't any reason to allow a filter statement before any aliases - so you can force two fixed positions for filter clauses which would make it always legal to reference aliases in filters.

On the brief topic of form vs. flexibility. SQL is a thing that, when complex, is written by many people over the course of its lifetime - removing the ability to make bad decisions is better than enabling the ability to write simple things even simpler - those silly do nothing queries like "SELECT * FROM customer WHERE deleted='f'` are written once[1] in a moments time and never inspected again. The complex queries are what you want to optimize for.

1. If they even are - with ORMs available a lot of those dead simple queries just end up being done through an ORM.




> On the brief topic of form vs. flexibility. SQL is a thing that, when complex, is written by many people over the course of its lifetime - removing the ability to make bad decisions is better than enabling the ability to write simple things even simpler

Hallelujah! But, to your footnote, this is a major reason why I despise ORMs. In my mind they make writing simple code slightly easier, but they make complicated SQL statements, especially when you get some weird issue under load and you're trying to debug why your DB is falling over, a ton more difficult and you spend so much time just battling your ORM.


On ORMs, the best use I see of them is for “transparent” queries that you don’t define.

Like fetching a record by id, or a single record and all of its related properties. Or a list of all the record in a table matching a simple filter.

That’s 98% of what we do against the DB, and I’m all for having it basically invisible.

Then let’s just bypass the ORM altogether the minute we think about joining or grouping things together. There are libs in most language that help just sanitize queries, so it’s no difficult really.


With a middle ground like a micro ORM those transparent queries are barely visible anyway, literally a line or two lines of embededed sql strings. Especially micro ORMs that can handle dynamic filters. They're generally write once and only get looked at again when modifications are necessary, so they're not worth "optimizing" by adding the complexity of an ORM.

A common pattern seems to be over engineering these simple scenarios though. Someone decides that embedded sql is evil and needs to be extracted out of normal code, often to stored procs. Then these simple queries have enough friction that an ORM starts to look good, then you end up with an ORM generating simple queries dynamically in the same place that used to have a simple embedded string.


The fundamental problem with an ORM is that you're using a lower level language to compile to a higher level language. This is completely backwards. It's like having a framework in your assembly to generate Java code for you, so you don't have to bother with all that "weird" Java, and can just stay in your comfort zone.


Isn’t it more important that the query you write with the ORM is readable than the underlying SQL it spits out? Using an ORM I can get reusable parts of a query, while writing complex joins, I’m not sure why skipping that part is good?


In my experience, a lot of very semantically reasonable and readable code end up with very penalizing SQL at the end, and it's a real challenge to then rewrite the whole into decent queries.

There can be part of an app where a very bad query here and there is not important, but more often than not it creeps up in key parts of the user experience, and it becomes very hard to untangle when it becomes something important enough to thoroughly optimize, but also complex enough that the existent tests only cover a tiny portion of the important use cases (if you're reusing a bunch of query bits, you're probably working with a wide combination of input/outputs). I've seen literally weeks spent on trying to optimize ORM chained subqueries.


My experience is that the ORMs I've used most (LINQ and Ruby's Sequel) can produce far more efficient SQL than a human can, and if not, you change the code, just as you would have to if you wrote a slow SQL query.

Of course, I've not seen every query in existence so it's more than possible you've seen bad SQL from an ORM, but the untangling part would again fall to those skilled in the language of the ORM - unless the ORM can't produce efficient SQL in a particular case. And just as it would if the query was originally written in SQL, you'd need someone skilled in SQL to untangle that.

What would that case (where an ORM cannot produce efficient SQL) look like?


I have some serious doubts that an ORM can tune queries as well as a human due to the fact that the ORM lacks one key piece of information that both me (and the database planning a query) can leverage - table statistics. An ORM can produce a query that will behave well in the best general circumstances, but as soon as you get into topics like subquery performance fencing ORMs simply have no ability to compute optimality on the fly. One specific example I've seen is where multiple paths exist through the database to transit from one fact to another with one path being more strictly optimal and the other path having more associations that may be needed - this can effect the join strategies you want to use so if your ORM is anything more than "I'll essentially tell you the SQL but in a weird syntax" then there's a good chance it'll chose the wrong path.


But you write the code via the ORM, so I don't see the difference between you writing the SQL with knowledge of the table statistics and you writing the code via the ORM with knowledge of the table statistics.


I haven't got the chance yo try Sequel, on the ruby side I played more with plain ActiveRecord and querying layers like ransack (my predecessor on the job loved abstraction layers)

In general ORM queries become ugly at three to four levels of joins and/or excluding under non trivial conditions (e.g. finding users that have not participated to a specific set of events). They will spit out something that works, but will take a few orders of magnitude more than an optimized query.

As you say, there is the option to play jenga with the ORM code to hit the right combination that produces a better output. But that feels like teaching a toddler to solve a puzzle that you already solved and are keeping the cheat sheet in your pocket. I personally don't see the beauty of it and would prefer to directly use the right SQL and call it a day.

On people skilled in SQL, you should have a few onboard anyway if you're doing more than basic CRUD on the DB, and it's easier to find than ORM gurus IMHO.


> I personally don't see the beauty of it and would prefer to directly use the right SQL and call it a day.

I think that's fair enough, there are enough ways to do things now that it should be possible to accommodate both.

> On people skilled in SQL, you should have a few onboard anyway if you're doing more than basic CRUD on the DB, and it's easier to find than ORM gurus IMHO.

I agree but I'm not sure there are more SQL gurus than those used to ORMs nowadays. Lately I've favoured using SQL but even 15 years ago most devs I knew couldn't use it well, I can't see devs used to Rails et al having the chops for it, sadly. What was once convenient easily becomes one's master.

> I haven't got the chance yo try Sequel

If you get the chance, I think it's worth it. It's easy to drop into plain SQL without dumping the ORM, and I've never had a problem with the stuff it generates. It's a pity ActiveRecord gets all the love instead.


Im not so sure it's always best to optimize for absolute performance, how you should code a solution to a specific problem is always dependant on it's context IMO.

I work on a lot of smaller IT projects for SME's, internal tools and platforms that are made on a small budget and, thus, end up having a tight deadline in order to not go over budget.

The vast majority of these projects are versions of CRUD apps for this and that, we tend to value readable code over performant hard-to-read code as it makes code review faster.


I think it’s all nice and fine if performance has no impact (which can be the case). For instance if your client can wait 30s for a report, no big deal.

Things go down when a query that used to take 4s now takes 30s as the product has taken off, handles 10x more data, and it’s not one user but a few hundreds having their request queued.

You wont have the luxury to not re-write that part in (probably) hard to read performant code. It can be a huge enough effort to blow away your deadlines and budget and sour pretty hard the relationship if your team struggles on something they aren’t used to do at all.


In my experience neither the output or the input is readable when using an ORM to generate complex queries. For simple queries they're great. Writing raw SQL also makes it much easier to jump out into SQL-specific tooling to debug a query, then copy it straight back into the code once you're done.


  > Isn’t it more important that the query you write with the ORM is readable than the underlying SQL it spits out?
I would look at the issue from a slightly different angle. Performance issues aside, I personally prefer either the ORM or the SQL depending on which is easier for the guy maintaining it to understand. Getting a row from the database and transforming it into an object? ORM. Generating a report on historical data across half a dozen tables? SQL.


I like the flow direction specifically for intellisense/autocomplete. I'm sure it would be easier to provide hints when the table name is known immediately.


This is exactly why LINQ uses a similar ordering.


I'd love for the next release of SQL to have optional alternative ordering of clauses


This is great feedback, and I agree with you re de-prioritizing terseness.

And I agree with you on both the assignments and `split` being a bit awkward. Kusto just uses `by`, WDYT?


Not the original commenter, but just using `by` makes total sense to me.



FWIW the separate `group_by()` is one of my greatest design regrets with dplyr — I wish I had made `by` a parameter of `summarise()`, `mutate()`, `filter()` etc.


Thanks for commenting! I hope you can see the influence of dplyr & tidyverse here. Please let me know if you have any other feedback!


Is it too deeply entrenched to change? The number of times I have had a data.frame grouped when it wasn't supposed to be, I can count on my fingers. But the hours that I spent trying to figure it out must amount to a paycheck or two.


There's a lot of dplyr code out there, and a lot of people who know most every part of the tidyverse by heart, making breaking changes like this so far into a frameworks life would cause a lot of unnecessary work in re-coding old code as well as requring people to re-learn syntax.

IMO for such a small adjustment the benefits don't outweight the costs.


Yeah, that's my current position. It's possible that we might be able to add it in optionally (by adding a new `.by` argument to summarise and friends), but just the analysis to determine how it would affect existing code is a lot of work.


<3


Group by seems good enough and not changing terms where there isn’t good reason seems like a good goal. Is PRQL’s split the same as SQL’s group by?


Yes Kusto's `by` is excellent!


By actually sounds great to me to, yea. In this case it's short but it's extremely communicative!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: