Google AppEngine, BigTable and why RDBMS mentality is harmful
Tuesday, April 15th, 2008“or How do I write the correct GQL to join these tables together to give me the correct output”
Yet again people are asking on the AppEngine mailing list how to write the correct sorts of joins to join some tables together.
Looking at the design, it was clearly a great database design, a product table, a customer table, and a many to many join table with an attribute to qualify the join. Exactly the sort of tables that you were taught to write in your CompSci degree or courses, or learnt from experience over years of working with databases.
The problem is that BigTable is not a relational database. The Google team has tried as hard as possible to emphasise this, but they didn’t really provide enough examples that people really got it clear.
BigTable is not a relational database, now go write that out a hundred times, and when your done come back and finish reading.
When it comes to BigTable you want to design your database not around “What bits of data should be grouped together”, but around the principle, “What bits of data will I need to read concurrently”.
In this case, what you want to get out is
Purchase information (Order, Date), Customer information (Name, Country, …), Product information (Name, Code, …).
Therefore your table should be.
Purchase:
Customer Name
Customer Country
Product Code
Product Name
Purchase Order Number
Date Of Order
Now your query becomes
Purchase.all().filter("customer_name =",customer).filter("product_name =", product).fetch(100)
Easy huh? It’s all one big table.
Say you want to show a customer what items they ordered for a given purchase order…
Purchase.all().filter("purchase_order =",order_num).fetch(100)
Updating becomes a little harder, but is still pretty easy.
The question then comes, how do I update a product name, if I change my widgets from Acme Widget to “Acme Superwidget”?
Well firstly, really, how often do you do that? I bet it’s a number of orders of magnitude less often than you execute the queries above.
If you must, it is something like:
items = Purchase.all().filter("product_name =",product).fetch(1000)
for item in items:
item.product_name = new_name
item.put()
Yes that last part is a bit nasty, if GQL supported updates it would be nice, but writing the updates in python is hardly taxing.
I hope that helps someone