- We can write literals which look like strings but have type SQL
- We can append fragments of SQL together (potentially making it invalid; oh well)
- We can include user input via parameterised queries, in locations where arbitrary strings/ints/etc. are allowed
- Anything 'structural', like identifiers, choice of comparison operations, building up sub-expressions, etc. must be done programatically, using the above features. In this case we select the column name (written as a static, literal SQL value) by looking up the user's input in a map. We also allow choosing between the type of comparison to use (again, both are SQL literals).
It seems to me that requirements like 'user chooses which columns they want to see, which columns they want to filter by etc.' is fundamentally incompatible with a safe, string-like representation. Instead, the options are:
- Fully representing the structure of the language. This results in an AST approach, which allows safe dynamic queries. I think any alternative, like tracking the offset of each delimiter in a string, etc. will turn out to be equivalent to maintaining an AST.
- A "flat", string-like representation, whose dynamism is limited to choosing between some combination of pre-supplied fragments. This is what I've shown above. This is safe, but the 'dynamism' is inherently limited up-front (i.e. it's overly conservative).
- A "flat", string-like representation, which has unrestricted dynamism, but hence is also inherently unsafe (i.e. it's overly liberal).
> - We can write literals which look like strings but have type SQL
What I'm not clear is: what prevents me from accidentally/stupidly doing:
filter : SQL = "WHERE " + userInputCol + " = ?"
Is this special string handling some compiler magic that distinguishes literal strings from string variables? If so then I think that in a language that supports something like this you can indeed make a safe library. The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code. But these may be acceptable trade-offs.
> Is this special string handling some compiler magic that distinguishes literal strings from string variables?
Ah, maybe I should have made it clearer that I was overloading the double-quote syntax, so we can write:
"foo": String
"bar": SQL
"baz": UserInput
"quux": Shell
etc.
I was also relying on type inference to figure out which is which, and on '+' returning the same type as both arguments, e.g.
+: String -> String -> String
+: SQL -> SQL -> SQL
+: Int -> Int -> Int
+: Float -> Float -> Float
+: List[T] -> List[T] -> List[T]
etc.
This way, we see how your example fails to typecheck:
// Code as written
filter : SQL = "WHERE " + userInputCol + " = ?"
// Right-hand-side must have type SQL, to match left-hand-side
"WHERE " + userInputCol + " = ?": SQL
// Resolving order-of-operations of the two '+' operations
("WHERE " + userInputCol) + " = ?" : SQL
// Arguments to outer '+' have same type as return value, which is SQL
"WHERE " + userInputCol : SQL
" = ?" : SQL
// Arguments to inner '+' have same type as return value, which is SQL
"WHERE " : SQL
userInputCol : SQL
We've inferred that userInputCol must have type SQL, so it will fail for String/UserInput/whatever.
> The main downside is that you need to work entirely with compile-time constructs - e.g. you can't use something like printf to take a compile-time format string and turn it runtime into a query; and you can't take queries from a separate file, they must be in source code.
> Do you know of any library that implements this?
Not completely. De-coupling double-quoted literal syntax from a single String type can be done with Haskell's OverloadedStrings feature, but that relies on a function 'fromString : String -> t', which is what we're trying to avoid https://hackage.haskell.org/package/base-4.6.0.1/docs/Data-S...
To ensure safety, we would need some way to encapsulate the underlying fromString/StringContext implementation to prevent it being called by anything other than the literal-expansion at compile-time.
Of course, if we're willing to use macros then it's pretty easy, like those quasiquote examples above.
(Of course, Idris also has an incredibly expressive type system, and a very powerful macro system, AKA "elaborator reflection", so it can definitely be done; but I haven't figured out the cleanest way)
- We can write literals which look like strings but have type SQL
- We can append fragments of SQL together (potentially making it invalid; oh well)
- We can include user input via parameterised queries, in locations where arbitrary strings/ints/etc. are allowed
- Anything 'structural', like identifiers, choice of comparison operations, building up sub-expressions, etc. must be done programatically, using the above features. In this case we select the column name (written as a static, literal SQL value) by looking up the user's input in a map. We also allow choosing between the type of comparison to use (again, both are SQL literals).
It seems to me that requirements like 'user chooses which columns they want to see, which columns they want to filter by etc.' is fundamentally incompatible with a safe, string-like representation. Instead, the options are:
- Fully representing the structure of the language. This results in an AST approach, which allows safe dynamic queries. I think any alternative, like tracking the offset of each delimiter in a string, etc. will turn out to be equivalent to maintaining an AST.
- A "flat", string-like representation, whose dynamism is limited to choosing between some combination of pre-supplied fragments. This is what I've shown above. This is safe, but the 'dynamism' is inherently limited up-front (i.e. it's overly conservative).
- A "flat", string-like representation, which has unrestricted dynamism, but hence is also inherently unsafe (i.e. it's overly liberal).