ClojureQL - Revolutions

2010-11-03 05:19:18

With almost 2.5 years in the making, ClojureQL has taken its time coming into maturity. Recently Ive taken some drastic measures and trimmed the scope in order to get to the elusive 1.0.

 

Preface

Two and half years ago I approached fellow Clojurian Meikel Brandmeyer to learn whether or not he would be interested in developing an advanced Clojure version of SchemeQL. We had 2 big dreams:

  1. Being able to construct SQL statements using only pure Lisp
  2. ...and having those statements be valid on any backend (MySQL, Postgres, Oracle, you name it)

Both of us were just learning Clojure so the natural first step was to hammer out the fatal word "defmacro". After a while of stepping on each others toes we decided to modularize the project, so that one of us could concentrate on the 'frontend' (user input) and the other on the backend (compiler output). Our first cut on the frontend received some insightful critique from Zef here, which we took to heart and started improving upon - The problem was of course, the macros. Meikel volunteered to re-write the frontend to a system which relies on a Monad type of queries, but he quickly realized how much work this entailed and simply gassed out, so ClojureQL was at a stand still.

The backend was quite advanced and even from its early days it could compile our own AST format into valid SQL for MySql, Sqlite, Derby and PostgresSQL. There were however two problems with this model.

  1. It takes more work than any of us have time for, to maintain a myriad of backends
  2. We started out too young.

Then what happend?

 

Christophe happend

Christophe Grand is somewhat of an expert when it comes to designing DSLs. He's designed Enlive and Moustache, trained professionals in DSL design across Europe and even educated the Americans on DSLs - So lets just say, he knows what he's talking about.

Besides being a borderline genius, Christophe is also my co-instructor at the Conj Labs events and recently we just wrapped up such an event in Frankfurt. After 4 fast paced days, covering oceans of material, speaking, eating and drinking Clojure non-stop we finally ended up at a small round table in Frankfurt International Airport. Here we had about 4 hours just to chit/chat and since its impossible to tire of chatting about Clojure I wanted Christophes take on the direction ClojureQL was going in.

Christophe looked down for a moment while he pondered ClojureQLs architecture and finally he looked up and said "I don't think I would have begun the way you did, I wouldnt go in that direction". In my mind that basically translated into "rm -rf clojureql/", so he went on using yet another of his 70s shows analogies

"Do you remember that 70s show about a Robot that acted like a Human? And how it was like 80% or 90% perfect human, but all anybody ever saw was the 10% that it was lacking? That's how it's going to be with ClojureQL. You might be able to make the backend stuff 90% transparent to the users, but they will see the 10% that's missing"

So to me that literally meant "rm -rf clojureql/", so thats what I did.

Then what happend?

 

A simpler design

Deleting ClojureQL and picking it right back up again has been pleasant. Thankfully I've learned a lot about Clojure and DSL design since we put in the first lines of code 2.5 years ago, so its been a very rapid process of prototyping ideas and it's ongoing! Ultimately here's what I want to end up with:

It's not done and here's what's driving the effort:

 

Current Status

So with the fancy backend compiler that targets multiple DBMs out of the way, I am free to concentrate about the interface. Here's what I got jotted down so far:

Tables

Tables are at the heart of any SQL abstraction layer, so in ClojureQL they're first class citizens. Once you have defined a link to a table, you don't need to worry about specifying colums for every query, database connections, etc. Simply grab the name and act on it:

user> (def users (table db-con :users [:id :name :title]))
#'user/users
A link to a table is like a reference type, it isn't its value but it just holds it. Dereferencing it at any point in time peeks at that value:
user> @users
({:id 1, :name "Lau Jensen", :title "Dev"} {:id 2, :name "Christophe", :title "Design Guru"} {:id 3, :name "sthuebner", :title "Mr. Macros"})

Typically in Clojure you can have two variations on a function. Take for example 'conj' which conjoins an item unto a copy of the original collection and compare it to 'conj!' which mutates a transient. In the same way ClojureQL provides conj! and disj! which both 'mutate' the table:

user> @(disj! users {:id 3})
({:id 1, :name "Lau Jensen", :title "Dev"} {:id 2, :name "Christophe", :title "Design Guru"})"Mr. Macros"})

user> @(conj! users {:name "Frank" :title "Dev"})
({:id 1, :name "Lau Jensen", :title "Dev"} {:id 2, :name "Christophe", :title "Design Guru"} {:id 4 :name "Frank" :title "Dev"})

Notice how the returns from both calls have to be dereferenced to be read. This makes it easier to chain calls with the -> macro. In addition there are 2 less interesting helpers which I'll just mention as well:

user> (take users 1)
({:id 1, :name "Lau Jensen", :title "Dev"})

user> (-> (disj! users (either (= {:id 3}) (= {:id 4})))
          (sort :desc))
({:id 2, :name "Christophe", :title "Design Guru"} {:id 1, :name "Lau Jensen", :title "Dev"})

The first example is a simple LIMIT statement, but the second shows off a chained call and also the special predicate compiler which returns a statement that matches either id=3 or id=4.

Finally, you have some freedom to do selects, here I'll demo both the escape and compiled version:

user> (select users (where "id=%1 OR id>=%2" 1 10))
({:id 1, :name "Lau Jensen", :title "Dev"})

user> (select users (where (either (= {:id 1}) (>= {:id 10}))))
({:id 1, :name "Lau Jensen", :title "Dev"})

Use version1 if you have some very special syntax which the compiler cant generate and use version2 if you prefer to have a cleaner expression of your query. Finally, I provide a way of expressing joins which turns out to be quite simple since the table objects know about all the colums:

user> (join users salary #{:users.id :salary.id})
({:wage 100, :title "Dev", :name "Lau Jensen", :id 1} {:wage 200, :title "Design Guru", :name "Christophe", :id 2} {:wage 300, :title "Mr. Macros", :name "sthuebner", :id 3})

Everything you see here and (especially) below is still in flux. It might change as this is a design phase, but I'm working quite quickly so I hope to cement a good design soon - But again, input is more than welcome!

 

Predicates

Building predicates is the tricky part of SQL abstraction and also where we got burned the most in the early versions by relying heavily on macros - This time around I dare you to find a defmacro in the abstraction code! :)

Basically what I have so far is an expression compiler and some helper functions to make that compiler more accessible. Here's how it works:

user> (compile-expr [:or [:eq {:a 5}] [:gt {:b 5}]])
"(a = 5 OR b > 5)"

(thanks to Chousuke from #clojure for inspiration on the data-design)

The compiler simply walks a vector of vectors looking for special keywords denoting actions and colums. Colums have a little extra syntax in that they can be aggregates denoted by a period, like so:

user> (compile-expr [:or [:eq {:a 5}] [:gt {:avg.b 5}]])
"(a = 5 OR avg(b) > 5)"

But of course, any sane person would rather write SQL directly than use such a syntax so here's the same code with the added layer of sugar:

user> (compile-expr (either (= {:a 5}) (> {:avg.b 5})))
"(a = 5 OR avg(b) > 5)"

As a user you should never call compile-expr yourself, I only show it here for demonstration purposes. Typically you'll use some function that requires matching and it will call compile-expr for you. Two obvious example are disj! and where.

user> (where (both (> {:a 5}) (<= {:b 10})))
"WHERE (a > 5 AND b<=10)"

user> (where (both (> {:a 5}) (either (<= {:b 10}) (> {:avg.y 100.0}))))
"WHERE (a > 5 AND (b<=10 OR avg(y) > 100.0))"

Both of these can be passed to a (select user ...) call, or the expression can be fed to disj! without the (where) call.

 

Onwards!

So, if you're very sad and disgruntled that I deleted ClojureQL 0.9.7 Frontend 2.0 and you really wanted to see that make it all the way to version 1.0 don't dispair! The repo is still live on Gitorious with both the master and frontend-2.0 branches so if you have mountain moving power, by all means, move it!

If you're psyched about this type of SQL abstraction and want to contribute there's a number of ways you can do it:

If you're content just experimenting I recommend cloning the repo (which is now back on Github), adjusting the 'db' var in core.clj to match your local MySQL installation and then load the file core.clj and run '(test-suite)'.

Hope you enjoyed the read.

 


About the author:

Lau Jensen is the founder and owner of Best In Class a danish consultancy company which specializes in Clojure development.

Lau is also one of the instructors driving the Conj Labs initiative. If you would like to be notified once new blogposts are published, you can follow Lau on Twitter.

Michael Ossareh
2010-11-16 01:46:02
I think this is awesome, at a high level the language looks very reasonable.

A few questions though:

a) does (def x (table db-con :x [:cola :colb])) create the table each time? if so with what syntax?

b) @x - does that cause a select to take place or are you really storing all of the rows in memory? What about multiple calls to @x ?
Lau
2010-11-16 02:55:53
Hi Michael,

The 1.0 release is due within the next few days and at that point I'll make a thorough introduction to ClojureQL in its final design.

To answer your questions

a) the (table ...) fn is a constructor for a relational object. It does not create a table, but references one.

b) @table causes a SELECT SQL statement to be executed every time. Nothing is stored in memory unless you bind it yourself, ie (let [data @table]...