Contrary to many other "expose a RDBMS schema as an API" solutions, this one is interesting due to its very close tie-in with postgres. It even uses postgres users for authorization and it relies on the postgres stats collector for caching headers.
I also very much liked the idea of using `Range` headers for pagination (which should be out-of-band but rarely is).
I'm not convinced that this is the future of web development, but it's a nice refreshing view that contains a few very practical ideas.
Even if you don't care about this at all, spend the 12 minutes to watch the introductory presentation.
Coming from the old world of business IT where "integrate everything with the relational database" is standard procedure, it's certainly not unusual for the database to be a place where abstractions are defined(1), so what you expose via PostgREST may not in fact be a close match for the underlying data model.
(1 - In fact with me it's pretty much a matter of policy: any external system should access data via views named for the external system. Then, as inconvenient as it sometimes is, Postgres' dependency mechanism will keep me right about which fields in which tables are depended upon by which external systems.)
Agreed, even though I don't care for exposing the database quite this directly, it is a very interesting approach...
I'm also unsure of using the DB's authentication system. While approachable, I'm at a place where imho, for most systems there is a conflation of users, accounts, and logins. IMHO, a user may only have a single account, but a modern system should support multiple logins... Supporting social/jwt/oauth style logins in addition to local database backed logins is generally the best option for public facing sites/applications.. and even then jwt/oauth will allow for more seamless integration with internal SSO options.
Then again, this type of approach may work well if you want to be able to use it as an additional abstraction of your database access from a UI backing API.
> Agreed, even though I don't care for exposing the database quite this directly, it is a very interesting approach...
It is up to the developer to use this responsibly. I don't think the creator is suggesting to drop PostgreSQL instances directly onto the net. This would be ideal in a layered backend, and in fact, makes systems way more composable.
This is good work and if I ever did web development, it would be like this. Why people in the web world don't use stored procedures and constraints is a mystery to me. That this approach is seen as novel is in itself fascinating.
It's like all those web framework inventors didn't read past chapter 2 of their database manuals. So they wrote a whole pile of code that forces you to add semantics in another language elsewhere in your code in a language that makes impedance stark. PostgreSQL is advanced technology. Whatever you might consider doing in your CRUD software, PostgreSQL has a neat solution. You can extend SQL, add new types, use PL/SQL in a bunch of different languages, background workers, triggers, constraints, permissions. Obviously there are limits but you don't reinvent web servers because Apache doesn't transcode video on the fly. Well, you do if you're whoever makes Rubby on Rails.
The argument that you don't want to write any code that locks you to a database is some stunning lack of awareness, as you decide to lock yourself into the tsunami of unpredictability that is web frameworks to ward off the evil of being locked into a 20 year database product built on some pretty sound theoretical foundations.
Web developers really took the whole "let's make more work for ourselves" idea and ran with it all the way to the bank.
You'd have to pay me a million dollars a year to do web development.
You are speaking from ignorance with the voice of authority.
I worked on a rails app that handled a billion requests per day. The problem isn't performance of the web framework, those are easy to load balance and split into C or cache when you need it. The problem is scaling your database, keeping your data secure, and iterating to meet business goals with a growing codebase and infrastructure. A mess of stored procedures would restrain you from doing all three.
And I know, I worked on a codebase in 1999 that did this because of the "performance gains". It ended up bricking the project due to inability to iterate.
> The problem is scaling your database, keeping your data secure, and iterating to meet business goals with a growing codebase and infrastructure. A mess of stored procedures would restrain you from doing all three.
Your argument has a non sequitur right here. A mess of [foo] is a mess; the layer it is in does not matter; the language it is in does not matter. A mess of application layer code is equally effective in preventing scale, security and effectiveness.
The original post is right. Web developers treat their databases poorly[1]. A database is an interface to your data that maintains integrity. Maintaining integrity almost always means stored procedures, as some validation is not expressible as relational integrity and basic type validation.
Now, if you are at the point where your database fully guarantees integrity of data going in and coming out, a REST interface is a small step away. This project is very welcome.
[1] The typical web developer treats a database as a data store. It is also a data store, but a well designed database is much more than than.
A mess is a mess, true, but some are easier to clean up than others.
GP is correct. Methods for scaling/optimizing the application layer are clear and well-known. Scaling the data layer is a huge challenge. This is why the market is filled with snake oil databases promising linear scalability and perfect consistency/reliability, etc.
Scaling the data layer is a huge challenge. No doubt. But calling databases that are designed for solving these problems "snake oil" undermines the huge amount of work that serious engineers have invested in this. No one has ever promised linear scalability and perfect consistency/reliability. No one.
Cassandra, HBase, CouchDB etc even MongoDB have built in scalability as a first order priority from day one and have been largely successful at it e.g. iCloud, EA Online, PSN. Databases like this are a nightmare to work with for smaller datasets but work incredibly well with larger ones.
It's always a shame to see HN act like you scale vertically and magically every problem is solved.
> It's always a shame to see HN act like you scale vertically and magically every problem is solved.
When this is seen (and IME it's a pretty minority opinion) I think it's there as a reaction to the massive overuse and hype regarding a lot of newer-gen DBs. There's absolutely no doubt that there are good uses for them, but those cases are pretty niche compared to the level of their uptake.
MongoDB heavily implied the linear scalability, consistency, reliability bit in its early material [particularly in their marketing].
Its only in the past couple years they really started mentioning the fact it was "tunable consistency" blatantly rather rather than burying it in a couple places in the manual.
I purposely chose a non sequitur in the interests of speeding up the prose. It is an acceptable method frequently utilized in language. It roughly translates to:
"Than (what will certainly be given the expressiveness and level of abstraction they provide) a mess of stored procedures."
"The problem is scaling your database, keeping your data secure, and iterating to meet business goals with a growing codebase and infrastructure. A mess of stored procedures would restrain you from doing all three."
I'm always a little confused that people seem desperate to use the wrong tool, and then blame the tool. If you need to store normalized data and maintain integrety -- you'll end up with the equivalent of an SQL datastore (or, more likely a system that is faster, but subtly broken).
Sure, it's difficult to scale ACID. But if what you need is a way to serialize objects, you'll probably be better off with something like Gemstone/GLASS, a document store or some other kind of object database?
If your problem domain actually fits working with structured data, then using an SQL system makes a lot of sense. The obvious example for "web scale" here is Stackoverflow. Sure their architecture has grown a little since it was 2xIIS+2xSQL Server -- but they got pretty far on just that.
The bigger issue is this idea that everything needs to live in one place. For the bulk of an application handling a billion requests / day I'd wager that most of that traffic is isolated to certain types of data.
I'd wager that because in almost every case I've ever seen it's true. You just don't tend to see every table in a normalized dataset bearing the traffic load.
If that is the case, rolling that particular piece of data out to a more easily scalable store will largely fix the problem, if caching, async writes and buffered writes didn't already.
Everything else can very easily sit in PostgreSQL, avoid race conditions, maintain data integrity, have permissions controlled and be accessed from multiple languages directly without requiring an API layer. Then you can use a foreign data wrapper to let PG query that other data source (mongo, couchbase, redis, whatever) and join the results with the other data in the database just like it's all one bit happy dataset.
As another poster said, a mess is a mess and honestly I don't know why he takes a shot at Rails since Rails has some of the best first class support for leveraging PostgreSQL features these days.
There is only one database for everything in the business? Of course it doesn't scale. The problem you describe stems from solving every business request by adding yet another table to 'the' database.
It's a monolithic solution. It doesn't matter if you use database features or not. There is no difference between a mess of stored procedures and a mess of business logic classes. It's still a mess.
Web servers usually scale better than (traditional) databases, so it makes sense to not offload computation to the database, even if it means that there's an overhead.
That's very situational. Read scaling a database is easy. Write scaling a database is harder and doing computational logic while write scaling a database is harder still. Computational is still a very broad word though and the intensity of those computations is a huge defining factor.
The problem boils down to the "the database" idea described earlier. There are very, very few normalized datasets that I've ever seen that have write scaling concerns on more than 1 or two tables.
Move those to a separate datastore that is built for it and you've largely solved your problem. Postgres can even connect to outside datastores to run queries against them for sake of reporting.
Once you get rid of your N+1s the bottleneck in my experience (working with Rails now since v1.2) has always be Rails / Ruby itself. It is so incredibly slow, even using just Metal (even Sinatra for that matter). The slowdown at the view level is significant.
I always have a caching strategy (usually varnish in front of nginx) with Rails unless it's literally only supporting a handful of users, and anytime I need to support non-cacheable hits like writes to more than 50 or so concurrents I consider swapping in Node or Go or something reasonably performant to handle just those writes.
Lately I've been looking into Elixir as a Rails alternative for APIs for performance and scalability. I am very intrigued by a PostgreSQL based REST API.
The point is that when Rails gets too slow it is very easy to switch to something like cacheing or C (Or Go, or whatever). Even if you just split it off at nginx or use a worker pool in a faster language. Or if you need lots of concurrency use Go. Or even replace the Ruby code with one fairly nasty SQL statement or a single stored procedure.
The other 95% of your code can be slow Rails. You know those pages where a user adds another email address, or where they report a comment as being hateful, or where they select what language they want the app to be in, or where you have your teams page, or your business partners page, or your API docs and key registration / invalidation.
The database doesn't scale without pain though. You have joins, you're going to need to get rid of them. You have one table on a machine, you're going to need to split it. You have speedy reliable writes, you are going to have to either make due with inconsistency and possibly have a whole strategy to clean up the data after the fact or lose the speediness.
I'm intrigued about shuffling the serialization of JSON to Postgres, but that is different than what the OP was talking about.
By the same logic though, at the point that heavy write load becomes a reality it's just as feasible to move the heavy write table to an isolated datastore and leave 95% of your data (structurally) in the PG. Even use a PG foreign data wrapper to connect to that new datastore to allow PG to continue any necessary queries against it.
I'm not ever going to argue for heavy stored procedure usage but there are definitely times when it makes sense and more times still when using the features in your database instead of setting up multiple different standalone systems for pubsub, search, json data, etc when your database can do it all makes sense.
It's very similar to the "you can always switch the slow parts" point with Rails to move a part to Go. You can do it all in PostgreSQL and then when you actually reach a point where you've grown it into a bottleneck, move it out.
Postgres isn't SQL Server and it isn't Oracle and it isn't MySQL. It's Postgres. It's a tool that you choose because of it fits your needs, not because somebody told you it was a good database. You choose it as part of your stack. If you are using PostgreSQL because you wanted a dumb datastore then you chose the wrong database and should probably reavaluate your options. That's like getting a Lamborghini to make grocery runs.
I am a postgresql novice, but I've used the JSON serialization and it is indeed fast. But, here's my question:
When you do a 1-to-many join and return the same fields very many times, do the binary drivers optimize that or is it return many times? With JSON serialization (or serializing to arrays), you only get the one row.
To say that Facebook uses PHP and MySQL is to leave out the truth, honestly. They are a part of the stack, yes, but they aren't what makes the application scale to billions of requests. It would be like saying the local coffee shops website using Wordpress with a MySQL backend is using the same tech as Facebook. It's laughable.
To say that PHP/MySQL is just a "part of Facebook's stack" is laughable.
They are the core components of Facebook. Normal people understand that the characteristics of Facebook's architecture is unique to just Facebook. They can get away with sharding/colocating data that nobody else can. The rest of us have a tonne of integrated data that requires complex joins (whether at the application or database layer).
They are edge components of Facebook. Just from a brief interaction with FB recruiters, I learned they use a lot of Vertica in their back-office. Please don't propose that they are using MySQL for their main business when it's only powering app nodes which are just POPs fed by their real (internal) services. Approximately speaking.
I never mentioned performance or scaling as reasons for using a database's features- though they might be worth considering. The fact I never said those words and it fired you up says more about your experience than mine I imagine.
We use stored procedures. Not by choice; this is a legacy we're stuck with. It is nothing but an unadulterated disaster of a technology regardless of what you use it for. I'm talking 45,000 stored procedures here. 2000 tables. TiB of data. 50,000 requests/second across SOAP/web/desktop etc. It's hell.
Problems with stored procedures:
1) Performance. More code running in the hard to scale-out black box. You're just hanging yourself with hardware and/or license costs in the long run.
2) Maintenance. All database objects are stateful i.e. they have to be loaded to work. The sheer complexity of managing that on top of the table state results in massive cost. Add to that tooling, version control costs as well. Have you tried merging a stored procedure in a language which has no compiler and very loose verification?
3) Orthoganality. Nothing inside the relational model matches the logical concepts in your application. Think of transactions, domain modelling etc.
4) Duplication. You still have to write something in the application to talk to every single one of those stored procedures and map it to and from parameters and back to collections.
5) Transaction scoping. Do you know how expensive it is to introduce distributed transactions? Well it's a ton more cash when EVERYTHING is inside that black box.
6) Lock in. Your stored procedures aren't portable. Good luck trying to shift vendor in the future when you hit a brick wall.
Now I know it's popular to bash on Rails and I wouldn't use it personally but there are people using the same model on top of other platforms, like us.
Sorry but databases are just a hole to put your shit in when you want it out of memory. If you start investing too much in all of the specific functionality you're hanging yourself.
Sorry but databases are just a hole to put your shit in when you want it out of memory.
You've got to be joking, right?
Data is an enterprise's single biggest asset. A robust, consistent and performant store is vital. SPs can be written as garbage like any other logic, but in the right hands they are a perfectly valid tool for providing useful access to complex data.
Interestingly some very enterprise products don't even try and use database features like foreign key relationships - it can be a bit of a shock to open a database with many thousands of tables and realise that there is no obvious way to work out how they relate without looking at application level structures.
I'm sorry to hear your anecdote. I would have to come see your particular situation to see exactly what you mean by 1, 2, 5, because those don't seem unsolvable, but in general:
> 3) Orthoganality
You've introduced that by treating a relational store as a "hole to put your shit in". It's not fair to blame the database for that.
> 4) Duplication
Not with the project that is the topic of this thread you don't.
> 6) Lock in
As I mentioned in my original comment, you can be locked to your database or you can be locked to your ORM/DAO/ActiveRecord/DB client library or whatever it is you're using.
Not using database features isn't the key to heaven anymore than using them is the key to hell. I just meant to point out that in my experience they are underused massively.
Certainly not an anedote. I was an Oracle and SQL Server DBA for a number of years amongst other hats on stupid big datasets and loads. Add to that 25 years' experience getting companies out of deep trouble that everyone else has given up on. I know my shit.
Orthogonality: I haven't introduced anything here. Very rarely does any conceptual model of reality fit into the relational model. It's more imperative than that. Everything is usually crudely shoehorned into it because it's a compromise that people are barely willing to make or because they don't understand how to model a system properly.
Duplication: there is duplication in there. The versioning is very inadequate and API stability is the key to success on this. Plus also, this is a minor part of the application to consider. It's no different to issuing SQL. The protocol is different, that is all.
Lock in: There is no ORM lock-in past the platform. If you isolate everything properly i.e. use command-query-separation then this is a non issue. It's trivial to replace the ORM. You can even do it piecemeal. We've done it. I yanked out bastardisd ADO and EF out and stuck Nhibernate in. If you couple all your logic into the database, no banana. That luxury goes out of the window.
They may be underused, but when your vendor pulls a 26% price hike on half a million quid's worth of kit, can you afford to bend over and take it?
It's a tradeoff, but not one I'm willing to make on medium to large scale systems where there is a capital risk.
Interesting, though as this is a way to interface with Postgres I'm not sure your warnings about licensing costs are useful. Still, there are the other reasons you mentioned.
Some organisations won't allow use of a product unless there's a support option. I can understand this after a complete system failure a few years back on an open source unsupported product. I was hired to fix it! :) Mostly though in comes EnterpriseDB and support costs then. Problem is always staff availability here in the UK though so we always end up with SQL Server and Orscle bodies.
How can you compare being locked into a DB versus locked into an ORM ?
One of the main features of an ORM has been abstraction from the intrinsic properties of that database. ORM was a concept that was popularised by the original Obj-C/EOF/WebObjects back in the day which supported retrieving data from any database you pointed it at. And it fully supported you enhancing it's access layer with database specific features.
This is a much longer topic but it boils down to 2 things:
1. Switching your database is not easy with or without stored procedures because it will involve down time for the application while the data is migrated, then verifying that it works as expected in the new database with that ORM. You hope for the best, but it's always more complicated to switch a database.
2. The ORM tends to lock you into the application stack. Switching a part of your application from something like Rails to Go when you need to performance tune is significantly easier and more common than switching the entire database backing the whole system.
Beyond those two are the harsh realities of working with large datasets. As soon as a dataset it non-trivially small relying on the application to do core work on it becomes self destructive by adding network latency and in many cases object creation (check some Rails benchmarks on object creation costs). It becomes a big deal.
This is not to say that doing the bulk of work in the ORM is bad or that everything should be done in the database, it's a matter of balance. The only dangerous opinions on the matter are the "purist to the detriment of all else."
Verifying uniqueness, exclusion and maintaining data integrity should be the job of the database in most cases. That is what it's good at. Performing actual business logic on that data should not unless there is a significant performance based reason for it in most cases.
In Postgres the "stored procedure" thing is a little bit different because they're significantly more valuable thanks to the volume of functionality built into PG. Everything is basically a function in PG.
In PG, you can use functions to create indexes and when the function is used in a where clause that index will be used. You can use functions to create constraints, unique indexes and even notify outside process that are listening of changes in the database with pubsub.
PG is a heck of a lot more than just a "datastore" and that's why these discussions are important. If you want a generic dumb datastore...there are databases built for that. PG is built for a whole lot more than that.
1) Makes no sense whatsoever; stored procedures are typically going to be much faster than the equivalent mix of application-level code written in a scripting language that needs to communicate with a database and is likely vastly slower than PL/pgSQL or PL/SQL. The hardware licensing costs having nothing to do with stored procedures and apply just as much to anything else (there's free as an beer options and costly proprietary ones in either case).
2) This is not a reasonable objection, I could replace "loaded" with "compiled" and your non-argument would make just as much sense. The alternative does not make the "complexity" go away, it just distributes across multiple languages in your application and database.
3) No.
4) Another non-argument against stored procedures. For example, suppose I have a table "time_series(series_id INT, tstamp TIMESTAMP, val NUMERIC)". A common need would be to accumulate all points (tstamp, val) associated with a series_id. Following your logic, you either end up with tons of con the application side sending similar variations of a query that looks like "SELECT tstamp, val FROM time_series WHERE series_id = $x ORDER BY tstamp" or you create one application-level module that acts as an abstraction around a query like that. In the first case, you're doing massive duplication. In the second case, you've essentially made a stored procedure that is distributed across your database and application and all the issues you raised of having to write something to talk to it apply just the same.
5) Again, no. DBs are precisely to place to deal with issues like this as they have means for dealing with things like foreign tables. The application-level alternative just means re-inventing it all yourself and you're probably going to make a lot more mistakes and write a lot more code that way.
6) Non-argument (applies just as well to Rails, Python, Linux, etc.)
You had me until your last sentence. Database are very important to (and very good at) store your data. And data is important (duh). All the issues you described above are related to working with the data, which should not happen in the database.
That's a poor argument because we're not discussing the arbitraty boundary of storing versus working with data as the lines between those are very blurred. This is even more the case when you use stored procedures which work with data close to the storage.
While we're on this subject, RDBMS are no better at storing data than any other technology out there[1]. In fact when you start thinking abstractly like this, other tech such as Riak makes sense for a lot of workloads.
The only real benefits of RDBMS' are fixed schema, fungibility of staff, the ability to issue completely random queries and get a result in a reasonable amount of time and the proliferation of ORMs.
[1] Caveated on insane design decisions like MyISAM storage engine and MongoDB as a whole.
Riak makes sense for some workloads. For the vast majority of workloads out there, you will save time and money with an ACID system. Good ACID systems are mostly RDBMSs, so I would say, for now at least, that typical RDBMSs do have a leg up on other technologies out there.
It seems to me that least in a part of you argument, you seem to be confusing the deficiencies of particular implementations or server-run code with the idea itself. Stored procedures are bad because your RDBMS has a crappy compiler? Hmm...
(One might argue, of course, that SQL in itself turned out to be a lousy interface protocol for relational data processing, and that it caused a lot of pain to begin with. But that's a different topic.)
I'm actually arguing that my RDBMS is no place for a compiler. I want to compile everything externally, test it and then deploy it. Not deploy it, compile it, cross fingers.
Some counter-points I've heard made:
1) Performance. Stored procedures are fast, meaning it will be longer before you need to scale out.
2) Security. If you only use stored procs, you're a lot less exposed to SQL injections etc.
I don't really have a firm opinion either way, but it's not as clear cut as you are making out.
1) Stored procedures aren't that much faster than issuing plain SQL over the connection. The main performance bottlenecks in a RDBMS are cache and IOPS. Regardless of where you execute those, they are all inside that black box after the query is parsed. You also get the added pain of cached query plans which fuck up performance when the query optimiser makes assumptions about table statistics and stores them with a stored procedure. (SQL Server and Oracle at least are guilty of this).
2) The only place I've had SQL injection attacks in the last few years is where people have used dynamic SQL inside stored procedures (sp_executesql) and not escaped them properly. Outside of that, both ORM and proper parameter binding make this entirely irrelevant.
Stored procedures may be all of those things, but they don't have to - it's just that most of the time, developers don't really care, so they have fancy versioning, deployment and continuous integration for all their code, except for stored procedures.
Also, DB procedures are not easy to "debug" in the traditional way, but SQL client is basically the first REPL every programmer becomes familiar with. You can easily step stored procedure by running it's commands one by one, unless it's fancy Oracle forall loop with cursor or something (and the cursor select can still be selected as normal).
Also, databases tend to have more strong data types than programming languages in general so putting constraints in DB means, that bad data are not savable in the system.
> so they have fancy versioning, deployment and continuous integration for all their code, except for stored procedures.
Exactly, because those things are extremely important to how code gets shipped and delivers value to the business. Stored procedures become a huge risk to future development which ultimately means it's a risk to the businesses ability to deliver value. What happens when you need to change DB vendors because the business has been so successful that you've outgrown a relational database? You have to rewrite the ENTIRE MC portion of your MVC application. Why would someone ever do this?
Avoid state at all costs. Stored procedures are stateful. Schema and migrations is pain enough already.
Write me a check constraint that validates an email address being put in a varchar column and reports back a sensible message which can be bound to an entry field with metadata about the error.
Write me a constraint and key arrangement which is unique across two and three columns in separate groups.
> Avoid state at all costs. Stored procedures are stateful. Schema and migrations is pain enough already.
What do you mean by that? How is having a bunch of queries in a stored procedure more "stateful" than having the same queries in the application?
> Write me a check constraint that validates an email address being put in a varchar column and reports back a sensible message which can be bound to an entry field with metadata about the error.
Postgres gives you metadata about the error, though the error message will still be a generic "CHECK constraint violated" or some such.
> Write me a constraint and key arrangement which is unique across two and three columns in separate groups.
I'm not sure what you want to see based on that description, but surely you're not advocating enforcing unique constraints in the application?
If I have to load the stored procedure into the persistence engine then that step is required. This is no more stateful than queries in the application but it means that the relevant state in both the application and the database engine needs to be reloaded and constantly sychronised. Ergo, two times the work.
CHECK constraint violated is no good for humans. Prevention is better than cure here.
Why shouldn't I enforce unique constraints in the application?
1. Open a transaction
2. Get a user by name from the ORM.
3. Exists? Tell user that the username is already registered.
4. Doesn't exist? Save new User instance.
5. Commit transaction.
Steps 2 and 3 can be as arbitrarily complicated as you need them to be, are fully testable and cheap with anything that uses MVCC.
"Why shouldn't I enforce unique constraints in the application?"
You should to both. For all the reasons you mention, it's often cleaner to just do it in the application especially when you can use a framework with a simple "validate_uniqueness" flag.
But, what you're describing is also the very definition of a race condition. It's the same reason you don't increment counters by retrieving them, adding 1 to it and then saving the number back to the database and instead pass in an increment command.
Check it in the application but let the database make sure it doesn't get violated in a race condition. There's a significant amount of either/or in this entire conversation (not just you, the whole thread) when the database absolutely can and should be leveraged for certain things.
It's extremism and purism where the problems get introduced (in both directions).
I do most what you say and still think both of you are right.
I mean stored procedures have some use cases but i've seen people using it EVERYWHERE and I've seen people (including myself) NEVER use it.
I mean currently my dataset is so small I don't need stored procedures, I barely do anything more than CRUD. Okay I have a bigger GROUP BY query but that is all, and at one point I load a HUGE dataset into my application memory (1000 rows) but that works REALLY REALLY fast in scala and I tried to create a stored procedure around it, but I failed, and the application code uses the dataset to generate a big calculation. Currently I just have a Map<String, Map<String, List<Row>> which is easy accessible and usable for my calculation. I mean I could've done similar with stored procedures but the performance gains are really low.
For what you're describing it doesn't sound like stored procs are worth it. Avoid introducing them unless you find that they are necessary or beneficial, but don't avoid them entirely on principle.
Preserving data integrity tends to be a much more worthy use case for database logic than retrieval display.
There is a good chance that your proposed algorithm to enforce a uniqueness constraint in the application won't work. As in, you've left out enough details that would be critical for getting it right, and in my experience, a lot of programmers would only get this right by accident if they get it right at all.
First problem is that the SQL standard provides no way to make this work portably on any standards-compliant database. So right there you are going to have to code to the database to one degree or another.
So, let's say you want to make this work in Postgres. Now, you'll need to be using Postgres 9.0 at least; otherwise your uniqueness constraint won't be a uniqueness constraint.
Try this, in any version of Postgres. Open up two psql sessions. In one, run a `create table unique (x text);`. Then run `begin isolation level repeatable read; select * from unique where x = 'foo';` in one of the sessions. Repeat those two commands in the other sessions.
Neither session sees 'foo'. So now both can go ahead and run `insert into unique values ('foo'); commit;`. Both transactions will succeed, and you can confirm that there are now two instances of 'foo' in the table.
In fact, `begin isolation level serializable` in PostgreSQL 9.0 or later is the minimum isolation level to make this work. And, you will need retry logic around the transaction in case of a serialization failure. (Perhaps your DB access layer or language would hide this latter detail from you, or perhaps not.)
In PostgreSQL 8.4 and before, serializable and repeatable read were equivalent, and both were still SQL standards compliant. In PostgreSQL 9.0, the repeatable read isolation level stayed the same, while the serializable isolation level was strengthened.
Unless you can accept a certain level of degraded accuracy by using a probabilistic construct such as a Bloom filter, by far the biggest cost of maintaining uniqueness is the index. And you'll need that index whether you use the database or the application to enforce uniqueness.
And, judiciously pushing computation onto a database can actually be cheaper for the database as well as its clients. This scenario is likely to be one of those situations.
> CHECK constraint violated is no good for humans.
Well, sure, an application should respond to DB errors by presenting appropriate messages on the UI, just like any other errors it encounters. You should only see "CHECK constraint violated" if you are bypassing the app and using the DB. Otherwise, you should see something nice provided by the app.
> Why shouldn't I enforce unique constraints in the application?
Because you should do it in the database whether or not you do it in the application, and then once you have, well, DRY.
> Avoid state at all costs. Stored procedures are stateful.
Stored procedures are no more state than application code is.
> Write me a constraint and key arrangement which is unique across two and three columns in separate groups.
What does "unique across two and three columns in separate groups" mean? I get that its something more complex than a simple multicolumn uniqueness constraint, but not what it is supposed to do.
I suspect that whatever it is can be done with PostgreSQL -- possibly using the (relatively) new exclusion constraints -- but I can't quite be sure without more clarity on what you mean.
Well, you cannot isolate the code, you cannot unit test it, you cannot use a debugger, cannot set breakpoint, have stack traces etc. You are tied to the database at all times.
In my experience, every stored procedure that is larger than 2-3 lines is a headache.
Yeah. I suspect that lot of hate of logic in DB is because of bad Oracle setups many years ago, kind of like lot of people think SQL is useless because MySQL is.
I'm not a huge fan of stored procedures - but I'm pretty sure you can debug stored procedures in SQL Server pretty easily from Visual Studio - I think you can "step into" the call to SPs while debugging client code.
I find it strange when people adhere to these extremely dogmatic ideas about stored procedures. They appear to be either on one end of the scale or the other, ie. they either put all their logic in to Stored Procedures, or refuse to use them at all.
Of course, the reality is that people who use them reasonably do exist and are probably in the majority. You just rarely hear them talk about it because I suspect that they hold the same views about Stored Procedures, Constraints and any other DBMS feature as they do with any other software development tool ie. Use the right one for the job.
Yeah. Lot of the time, for a CRUD app, the who layer between rendering and data storage doesn't really do much besides validations, and those can be in database, so whole middle layer can be unnecessary.
Sometimes, the logic has to be in database, because it is single point of truth and because many application servers are hard to synchronize with regards to "you get max 3 attempts at login" or "you have to have enough balance to do bank transfer".
Sometimes, lot of your logic is in database, because database can do a lot of things in really fast and practical way, like aggregation and reporting and various data exploration tasks.
Sometimes, the database is just dumb store of object oriented data.
> Why people in the web world don't use stored procedures and constraints is a mystery to me.
You can blame MySQL 4.1 for that :(
Most people who call themselves "web developers" haven't even heard of PostgreSQL, or even if they've heard of it, have no use for it because their usual clients are stuck with MySQL-only web hosts who have only just managed to upgrade to PHP 5.3.
Yes. HN is a bubble. There are ~700 PHP questions on SO a day and ~150 node.js. This is just one pair of numbers, you can mine your own whatever you like but you'll realize there are massive amounts of "web developers" with a ... low amount knowledge.
The latest version of WordPress is still compatible with PHP 5.2.4 and above, so anyone who builds a WordPress site is effectively shipping a PHP 5.2 app.
No True Scotsman puts sugar on his porridge, and No True Developer just installs WordPress.
On the other hand, even Rails and Django encourage you to use the ORM whenever possible, so even a "developer" who builds apps on a modern framework is unlikely to be familiar with advanced SQL features.
For me, if you had to write even a single line of a Turing complete language to a file (so shell scripts yes, but one-time shell commands no) to install WP, that would count as development. Otherwise, it's just installation. Note: I have never installed WP.
Do people really consider ./configure && make && make install and its equivalents to be development now?
I'm just stating what I believe to be a fact: that the majority of web developers in this world never think of PostgreSQL as an option. I don't care whether that's a logical thing for them to think. It's just a fact, whether I like it or not.
If you think I'm wrong about the facts, please feel free to open a phone book in any part of the world other than the Bay Area, call up a decent sample of people who self-identify as web developers, and find out what percentage of them have ever heard of, let alone used, PostgreSQL.
Sorry for the loose use of language. Everything that follows the colon after "I believe to be a fact", until the end of that paragraph, is the content of what I believe, including the statememt "It's just a fact." I believe that it's a fact.
Anecdotal evidence: I've interacted with dozens of other people who call themselves web developers over the years, and most of them (outside of Silicon Valley) have never used PostgreSQL, nor any advanced features of SQL in any other RDBMS.
Objective evidence: the large market share of WordPress, Drupal, and other content management systems that don't use any advanced database features; as well as the large market share of frameworks such as Rails, Django, and Laravel that encourage developers to stick with the ORM and not care about advanced database features.
I don't think using stored procedures should be the focus of the project.
PostgREST is great because it lets you kickstart a CRUD application with ease.
I'm mainly a node.js developer nowadays and I'm using some frameworks to kickstart APIs for my clients - and then I jump in and add features.
What I really want is a solution to build a API server which deals with authentication, exposing my models through REST and other boring and repetitive stuff.
In this way I don't have to focus on everything, but just on the specific problem I'm solving.
I don't think there is a valid solution out there right now.
That's why I'm contributing to PostgREST and I hope to see even more features coming out of it (eg: better authentication, maybe with 3rd party logins).
Around 2005 I worked for a fairly large company that did exactly what you're suggesting with Postgresql and it was a complete disaster. Have you ever tried implementing sharding with all of your business logic in stored procedures? Have you ever tried hiring people who understand pl/sql and WANT to work with it? I have done both and it is a nightmare, once you get to the point of having to shard data you end up in one of two places:
1.) The sprocs become insanely complex because they have to be shard aware.
2.) You slowly start moving more of your code that was in sprocs to your application so now you've got two problems.
As for hiring, put out an ad for an engineer with pl/sql knowledge better yet put out an ad for someone who wants to learn and use pl/sql. Good luck finding enough of those people to get any significant work done.
> This is good work and if I ever did web development, it would be like this. Why people in the web world don't use stored procedures and constraints is a mystery to me. That this approach is seen as novel is in itself fascinating.
Regardless of the other points people brought up...
Sharding a database with stored procedures and constraints as you advise is a nightmare because you now have a completely separate deployment process [deploying stored procedures, if you think this doesn't require a deployment process across a sharded infrastructure...I have no words].
Using an internal web framework is much, much easier than maintaining two separate deployment processes. Especially when one of those processes has to take down nodes to avoid some shards having different stored procedures than other shards.
...and it's not just databases. OS capabilities (eg: vm tuning, bumping up the default sysctl limits, etc) are ignored and the problems arising of such disregard are then dealt with by adding layers to the application like distributed caches and other ^scaling^ solutions.
Because databases are a very poor fit for APIs? This is one of my biggest problems with high holy REST: it generally just means reimplementing your SQL API in HTTP semantics.
APIs should be about encapsulating business logic. Databases should be about storing data in a reliable, predictable way.
> Why people in the web world don't use stored procedures and constraints is a mystery to me.
We do. At least some of us, and honestly, it's not something I think about as being exceptional. I don't always use them, but I much prefer having a nice API of SPs to use rather than having to have custom SQL all over the place. DRY applies to writing queries just as much.
I'm sorry but why would I go through HTTP to query data? Why can't I just hit the database directly without the overhead of HTTP? Does a cleaner and being more standards-compliant worth the overhead of passing through HTTP?
And what happens when you start applying complex business rules that needs to scale? So many questions about this approach...
Just don't forget to secure it. Reminds me of when people thought it was a good idea to expose the REST API of ElasticSearch directly as their API and then got p0wned. Using "REST as an abstraction layer" isn't the same as taking someone else's OTC API, calling it your own, and exposing it to the public.
Yeah, this seems great if you want to completely eliminate the middle tier and have a client application talk directly w/ the database. I'm curious why there's an entire query API via query string parameters. Why not just expose a single POST /query endpoint where you can send some SQL?
Squeezing the middle tier as flat as possible is my actual strategy and goal in many distributed environments. That's totally feasible with modern tools.
But sending SQL from the client is dangerous, and in this forum you'll get some interesting looks suggesting that.
Amen. Whatever happened to the binary driver? Calling a Prepared Statement over a binary driver to a localhost or in rack DB and we're talking microsec level query times possible. Always annoyed when I spin up Hello World examples of all the new Rest Hotness db's to discovered that a PK query on a fresh install with only 10 rows still takes 10ms!
Quite unclear why you'd layer all the HTTP and marshalling overhead on backend service calls, other than to ensure you always have slow requests.
What is the use case of wrapping Postgres with REST? I can't think of many apps that don't require custom logic between receiving an API request and persisting something to the database. Is PostgREST trying to replace ORM by wrapping Postgres in REST? Or am I missing something. When would one use this tool. My naive perspective needs some enlightening.
> I can't think of many apps that don't require custom logic between receiving an API request and persisting something to the database.
I do. Most apps we write (government stuff) are CRUD apps that mostly handle data from the user, and then apply some logic to that data.
You can write custom logic in stored procs no problem. It's even a lot faster than doing it in java/php/ruby/.net or whatever because there are a lot less layers between you and the data, and the procedural language you use inside the the db is explicitly made and optimized for this purpose.
"The argument that you don't want to write any code that locks you to a database is some stunning lack of awareness, as you decide to lock yourself into the tsunami of unpredictability that is web frameworks to ward off the evil of being locked into a 20 year database product built on some pretty sound theoretical foundations."
Is it a common circumstance to completely switch your data layer without having a massive rewrite of your application? Do people frequently flip from pg to Oracle, or from SQL Server to MongoDB, and it was super-easy because that layer was abstracted?
Not frequently, but yes. Was forced to move from Oracle to Postgres, not that I mind Postgres. Very little software changes were required to get up and running. If everything was in views or stored procedures the change would have taken months instead of weeks.
On the other hand, I had to write for some software that supported both Oracle and Postgres (many deployments, new using Postgres and old migrating over time to Postgres) and it was a chore. All tests had to be run in both, SQL was 'same but different' with different types, and for perf reasons there was tonnes of hinting, which obviously pg ignores so overall performance was very different.
I have presided over a couple of to-PostgreSQL migrations but I've only heard of one away-from-PostgreSQL migration, and it was Skype when MS bought it.
One use case would be with Google App Engine. Other than it's Datastore [0] or CloudSQL [1], you don't have access to other databases. So this would be a great way to have Postgresql as a backend to your app.
I had this initial response, but you can have another service running tasks on and in to the database, or more complicated views for interacting with more complex models. PostgREST is just a service for interacting with your data, logic has to be done client side/in another service.
In the old days, DB vendor independence was considered important, because the Free databases were not considered good enough for Enterprise use. It had to be Oracle or IBM. Even if you were starting out on MySQL, you wanted to retain portability for when you moved to Oracle. Hence building ORMs as an abstraction layer allowing vendor independence and more Enterprise-y Java.
There's also the "demarcation" issue. Traditionally databases were administered by change-averse DBAs who didn't want you stealing their CPU cycles to run your stored procedures, and were generally extremely averse to change.
This is probably just as significant as vendor independence or technical issues. But I remember it a little differently.
In the old days you'd have programmers who did the code and the DBAs who were tasked with making sure the data wasn't broken. They would write SPs, use constraints and triggers etc. Depending on the team programmers might do these as well or instead, but it was the DBAs domain ultimately.
Then programmers wrote ORMs and started doing all that stuff in code and somehow managed to seize control of data validation and semantics from the DBAs.
In my experience (and I'm older these days...) databases are a much rarer migration than programming languages. I deal with stuff that's still in (heaven help us) VSAM files, accessed a mix of assembler, COBOL, C, Java, TCL, C#, C++, Pascal, and so on and so forth.
Integrating at the database is a powerful pattern. I see some value for larger organizations to go a step further and integrate at a higher service layer, because one database could never support the entire enterprise. But for orgs that fit in one database, it's great, IMO.
Agree, the received wisdom has always been database vendor choice as a ten-year commitment, as opposed to almost anything else. It's a reason to avoid databases :)
Strongly disagreed that it's a good reason to avoid databases. If you are going to collect data of any value over a long period of time, pick a horse and stay on it. Over time, technology and business needs might force a re-evaluation of that position, but you don't want to build around unmeasured plans to maybe switch databases at random some day.
You can do that using SQL views. That means you can provide consistent api of your choosing no matter how the original tables look like. Postgrest docs also describe how you should use this feature to version you API.
Implementing SQL views is not exactly trivial if you are operating under an existing Ruby on Rails app. I suspect most RoR developers don't use SQL views unless they absolutely need to for a certain query that's heavy on performance.
As a result, the coupling to the schema does seem to be concerning as moving everything to SQL views appears to be a high amount of overhead if you're already successfully relying on an ORM.
> Implementing SQL views is not exactly trivial if you are operating under an existing Ruby on Rails app.
From the applications side, it should be completely transparent to replace base tables with views.
From the database side, assuming you keep the views and the base tables in the same schema, for the initial transition, you just need to rename the base tables, and then create views with the names of the original tables referencing them; they'll be simple views, and so, in postgres, automatically updatable, so you won't even need to explicitly define update logic.
You're right. It kind of depends on whether you look on the DB as a datastore, that is the DB and the Rails app are the same "system" with regards to other parts of the infrastructure, or of the database is system by itself, that provides services to other systems.
It depends on what the DB does, really - eshop app + db might be the first one, while operational data warehouse would be the second. Usually you end up with something in the middle - like the eshop app, that provides views with summarized data to be ETLed to reporting.
You could always write a custom API using any one of the many libraries/frameworks fit to do so.
If you want to use Haskell (easy binary deploys, fast and strict type safety for cheaper maintenance, less bugs, easy refactoring) you could have a look at servant[1].
Looks really cool. I was first thinking it saves the JSON with the new Postgres JSON support, but saving it as relational data is even more impressive!
I'd say if the OPTIONS would return a JSON Schema (+ RAML/Swagger) instead of the json-fied DDL, it would be even more awesome. With a bit of code generation this would be super-quick to integrate in the frontend then.
"It provides a cleaner, more standards-compliant, faster API than you are likely to write from scratch."
If you are using this as a web server persistence backend, I would agree with the first, more or less accept the second and reject the third. HTTP + JSON serialisation are way slower for that kind of job.
If you are just exposing the database using only the Postgres, in that case is interesting, however, I have concerns about how more complex business logics would work with such a CRUD view.
APIs require more than database access, security, and nice routes. Those are all necessary but a good API also includes flows linking things together so you can progress through higher order processes and workflows. You need to make sure that you're actually providing user value.
CRUD over HTTP (or an "access API") should be a first step, not your end goal.
I would not put a direct DB to HTTP REST API front-facing to the public, but it has its use-cases, I can imagine using it server-to-server for instance.
I'd be interested to see a benchmark. PostgREST is fast! And it is also a piece of software that tries to "do one thing really well". It deploys as a binary, which is also a big plus compared to these "first install this list of dependencies at these ranges of versions before using our product".
Last thing: is Denodo open source? It is not listed at "why use Denodo", so I guess not...
Between this (yes, I know it's 3rd party) and the support for JSON, PostgreSQL seems to be eating into the market of the NoSQL databases every day. I like that. I like that because the fewer new things I must learn, the more time I can spend on the things I find interesting.
Wow, there is a lot of contention in this thread. So first off I want to say congratulations to the author of PostgREST. Getting 2k req/s out of a Heroku free tier is just awesome ontop of all the overhead convenience you provide. Great job, great documentation, all around looking fantastic. You deserve to be on HN homepage.
Second, I'm an author of a distributed database (VC backed, open-source), so I'd like to respond to some of opinions on databases voiced in this thread - particularly in the branched discussions. If you aren't interested in those responses, you can ignore the rest of my comment.
- "You'd have to pay me a million dollars a year to do web development." Don't worry, most webdev jobs are about a tenth of that. If inflation goes up even a little bit...
- "The problem is scaling your database", I can confirm that this is my experience as well. But there is a very specific reason for that. Most databases are designed to be Strongly Consistent (of the CAP Theorem) and thus use Master-Slave architecture. This ultimately requires having a centralized server to handle all your writes, and this becomes extraordinarily prone to failure. To solve this, I looked into Master-Master (or Peer-to-Peer / Decentralized) algorithms for my http://gunDB.io/ database. Point being, I'm siding with @3pt14159 in this thread.
- "Sorry but databases are just a hole to put your shit in when you want it out of memory", I write a database and... uh, I unfortunately kind of have to agree, probably at the cost of making fun of my own product. You see, the reason why is because most databases now a days are doing the same thing - they keep the active data set in memory and then have some fancy flush mechanism to a journal on disk and then do some cleanup/compression/reorganizing of the disk snapshot with some cool Fractal Tree or whatever. But it does not matter how well you optimize your Big O queries... if the data isn't in memory, it is going to be slow (to see why, zoom in on this photo http://i.imgur.com/X1Hi1.gif ). You just can't get the performance (or scale) without preloading things into RAM, so if your database doesn't do that... well what @batou said.
Overall, I urge you to listen to @3pt14159 and @batou. PostgreSQL is undeniably awesome, but please don't fanboy yourself into ignorance. Machines and systems have their limitations, and you can't get around them by throwing more black boxes at it - your app will still break and so will your fanboyness.
Our Restya stack (open source) is similar to this with tech agnostic approach. We used it to build Restyaboard http://restya.com/board/ (open source trello alternative/clone)
I see currently only "flat" urls are supported. are there any plans (and is it even possible in postgresql) to add dynamic views? so that `/users/1/projects` is a dynamic view, dependent on the $user_id ? . That'd be rad
The comments are unbelievably negative considering the quality and the range of features this offers. This is extremely useful because I won't have to spend time writing out REST api in order to expose the Postgre data. Often a client just wants to access the data with REST api and to write an entire stack just to serve a few doesn't make sense. There's no expectation that this is going to serve a gazillion requests per minute out of the box, and that's totally fine with me since you shouldn't rely on off the shelf solutions anyways if you were building an architecture of that size, but really question if you are going to have that many requests per second. It reminds me of the customer who claims 'I need this done in node.js to support 10,000 concurrent users' and when asked how many users he has now he replies 'none, but I hope I can reach the number', solving problems he doesn't have yet and complaining that 'php is too slow'.
Some of the best ideas and tools on HN are met with so much negativity it reminds me of Reddit, where the small percentage of people who get off on putting others down so they can feel good about themselves dominate the comments.
Good on you cdjk, this is exactly what I was looking for. Thank you!
It means do only what is needed. If you do anything extra, you could be wasting your effort and/or causing problems. Doing just what is needed is sometimes difficult for developers.
I have made the mistake of spending too much time slavishly implementing some pattern only to figure out later that it was just serving my need to implement the pattern, versus just getting the job done with a simple procedural script.
I imagine some of the negative comments are due to the passive aggressiveness of the README.
"It provides a cleaner, more standards-compliant, faster API than you are likely to write from scratch."
"If you're used to servers written in interpreted languages (or named after precious gems), prepare to be pleasantly surprised by PostgREST performance."
A lot of the optimizations are for responsiveness, not throughput. It's terrible to work with a site that has a one second response time. Cut that down to 15ms and it's an entirely different site, something that you might want to use. And, the side benefit is that it's more likely to support 10,000 users/second because you just optimized the system.
I think that would a good separation of concerns, I didn't know Kong, but it seems that is more specialised tool supporting Oauth2, Rate limit, Ip filtering ..etc via plugins.
I would like to see both tools running in Docker and working together.
I started this public gist to explore this solution
https://gist.github.com/alfonsodev/6a6c66b4074248ed9702
feel free to comment and collaborate there.
You should be aware that this is a _bad_ pattern for anything more serious than a university homework. Instead of exposing functionality that you can guarantee and that's required by the clients, you expose your database schema, essentially tightly coupling the DB with the clients.
I know it's tempting to do that, but spend some time thinking of your data and what do you want to expose.
Aren't a lot of endpoints essentially bound to the database anyway? If you were to do any sort of major schema change, chances are you would have to create a new endpoint (i.e. /api/v2/) to handle the new schema changes. Also this handles versioning.
I share this sentiment, but this is mostly a question of organization, and not so much about whether the code is inside or outside the DB.
I personally used some prefix for that, such as "service_" or "public_". All stored procedures (in PostgreSQL speak: "user-defined functions") that have this prefix are accessed from the client. Everything else is internal. Of course, it would be even nicer if the REST framework would enforce that convention.
This is especially nice with JSON aggregation and SUB SELECTs, where you can directly aggregate your objects and lists of sub objects within the DB query, and generate the whole JSON result directly in the DB.
>Should add some header to say that it's JSON, or add a .json file extension for the main page data.
The server sends `Content-Type: application/json` and provides no header related to caching. Browsers that do anything but fetching the resource again are not spec compliant.
Also, the only browser to ever look at the extension of a file in the URL was IE (https://msdn.microsoft.com/en-us/library/ms775147(v=vs.85).a...) and they have long since stopped doing that as all it was doing was cause security issues and screw with web developers.
Correct answer. The demo is doing everything right, parent seems confused. Browsers also don't have any special regard for '.json' in the path. The path can be anything; the path doesn't suggest anything about content-type or caching.
Yes, but if / returns a html file, and /.json returns the json, it's impossible for a browser to display the json by accident (no matter what the header is).
And yes, the header seems correct, Checking, that's the API demo, the GUI is separate. Thus the confusion.
Yeah, headers are the right way to do it, but a different path is also the right way to do it, and extensions like .json can make that simpler in some cases. Or /api/ prefixes.
and then it will send HTML ahead of xml, ahead of anything else, where it can.
If a RESTful client can only accept JSON, it should send
Accept: application/json
Headers are the right way to do it with a RESTful API. The reason why is that the path should indicate the resource you're trying to access and the resource is independent of the data format.
Adding '.json', path specifiers etc is a kludge; it requires web server support to actually work properly (it requires the webserver to send the right MIMEtype in the header). The browsers ignore it (I think).
For example, occasionally you'll click on an image (I see it about once a year) and you get back a stream of weird unicode. This is because the webserver is returning the .jpg as text/html or whatever, and the browser is rendering it as such.
I'd forgotten about the days when we used to add something like &ft=.pdf to the end of querystrings so IE would recognise the file as PDF. I can't remember the other tricks, but there was a whole raft of things you'd do to force downloading of content, or not.
Are you sure about that? Cache-Control, Expires? If you don't change the URL IE will cache the response whether you like it or not. 2 ways to handle this are to generate a random number to append as a parameter to change the URL. The other way is to have your web service to tell the browser not to cache with response headers. I have had IE do this to me and made all my web services send back "Cache-Control: no-cache" to prevent IE caching.
I also very much liked the idea of using `Range` headers for pagination (which should be out-of-band but rarely is).
I'm not convinced that this is the future of web development, but it's a nice refreshing view that contains a few very practical ideas.
Even if you don't care about this at all, spend the 12 minutes to watch the introductory presentation.