Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What I had in mind was something along these lines:

    allowedColumns: Map[UserInput, SQL] = {
      "name": "name",
      "age": "age",
      ...
    }

    strict: SQL = strictComparison? "=" : "LIKE"
    
    query: Maybe[SQL] = allowedColumns
      .get(userCol)
      .map(col => "SELECT " + col + " FROM tbl WHERE tbl.foo " + strict + addParameter(":?", userFoo))
In particular:

- We can write literals which look like strings but have type SQL

- We can append fragments of SQL together (potentially making it invalid; oh well)

- We can include user input via parameterised queries, in locations where arbitrary strings/ints/etc. are allowed

- Anything 'structural', like identifiers, choice of comparison operations, building up sub-expressions, etc. must be done programatically, using the above features. In this case we select the column name (written as a static, literal SQL value) by looking up the user's input in a map. We also allow choosing between the type of comparison to use (again, both are SQL literals).

It seems to me that requirements like 'user chooses which columns they want to see, which columns they want to filter by etc.' is fundamentally incompatible with a safe, string-like representation. Instead, the options are:

- Fully representing the structure of the language. This results in an AST approach, which allows safe dynamic queries. I think any alternative, like tracking the offset of each delimiter in a string, etc. will turn out to be equivalent to maintaining an AST.

- A "flat", string-like representation, whose dynamism is limited to choosing between some combination of pre-supplied fragments. This is what I've shown above. This is safe, but the 'dynamism' is inherently limited up-front (i.e. it's overly conservative).

- A "flat", string-like representation, which has unrestricted dynamism, but hence is also inherently unsafe (i.e. it's overly liberal).



> - We can write literals which look like strings but have type SQL

What I'm not clear is: what prevents me from accidentally/stupidly doing:

  filter : SQL = "WHERE " + userInputCol + " = ?"
Is this special string handling some compiler magic that distinguishes literal strings from string variables? If so then I think that in a language that supports something like this you can indeed make a safe library. The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code. But these may be acceptable trade-offs.

Do you know of any library that implements this?


> Is this special string handling some compiler magic that distinguishes literal strings from string variables?

Ah, maybe I should have made it clearer that I was overloading the double-quote syntax, so we can write:

    "foo": String
    "bar": SQL
    "baz": UserInput
    "quux": Shell
    etc.
I was also relying on type inference to figure out which is which, and on '+' returning the same type as both arguments, e.g.

    +: String  -> String  -> String
    +: SQL     -> SQL     -> SQL
    +: Int     -> Int     -> Int
    +: Float   -> Float   -> Float
    +: List[T] -> List[T] -> List[T]
    etc.
This way, we see how your example fails to typecheck:

    // Code as written
    filter : SQL = "WHERE " + userInputCol + " = ?"

    // Right-hand-side must have type SQL, to match left-hand-side
    "WHERE " + userInputCol + " = ?": SQL

    // Resolving order-of-operations of the two '+' operations
    ("WHERE " + userInputCol) + " = ?" : SQL
  
    // Arguments to outer '+' have same type as return value, which is SQL
    "WHERE " + userInputCol : SQL
    " = ?" : SQL

    // Arguments to inner '+' have same type as return value, which is SQL
    "WHERE " : SQL
    userInputCol : SQL
We've inferred that userInputCol must have type SQL, so it will fail for String/UserInput/whatever.

> The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code.

Yep, although macros could help with that sort of thing, e.g. Haskell's quasiquotation https://wiki.haskell.org/Quasiquotation

A couple of Google hits for 'haskell quasiquote sql':

https://hackage.haskell.org/package/postgresql-simple-0.6.2/...

https://hackage.haskell.org/package/postgresql-query

> Do you know of any library that implements this?

Not completely. De-coupling double-quoted literal syntax from a single String type can be done with Haskell's OverloadedStrings feature, but that relies on a function 'fromString : String -> t', which is what we're trying to avoid https://hackage.haskell.org/package/base-4.6.0.1/docs/Data-S...

Scala's custom interpolators are similar, but they rely on a function from 'StringContext -> t', and StringContext is easily created from a String ( https://www.scala-lang.org/api/current/scala/StringContext.h... )

To ensure safety, we would need some way to encapsulate the underlying fromString/StringContext implementation to prevent it being called by anything other than the literal-expansion at compile-time.

Of course, if we're willing to use macros then it's pretty easy, like those quasiquote examples above.

Haskell's module system is famously mediocre, so it might be possible to do this overloading + encapsulation with Idris https://idris2.readthedocs.io/en/latest/reference/overloaded...

(Of course, Idris also has an incredibly expressive type system, and a very powerful macro system, AKA "elaborator reflection", so it can definitely be done; but I haven't figured out the cleanest way)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: