<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Google AppEngine, BigTable and why RDBMS mentality is harmful</title>
	<atom:link href="http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/</link>
	<description>Journalling the creation of a games development company</description>
	<pubDate>Mon, 06 Oct 2008 20:50:39 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6.2</generator>
		<item>
		<title>By: Ben Bangert</title>
		<link>http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/#comment-3697</link>
		<dc:creator>Ben Bangert</dc:creator>
		<pubDate>Thu, 17 Apr 2008 00:10:28 +0000</pubDate>
		<guid isPermaLink="false">http://www.mibgames.co.uk/?p=81#comment-3697</guid>
		<description>You're still missing a big part of the picture.

First, they need a way to do mass updates stil, since you can only fetch 1000 at a time, and doing so many updates in that manner is *really slow*. You will run out of time and Google will kill your request. In the meantime, the best you can do is some sort of AJAX page where it then triggers all the other hits that try to load subsets of products and update their names. Either way, its still really messy right now.

Second, with regards to storing a customers data. You're missing the important bit about Entity Groups and ancestor keys. If you have a batch of customer data that you need to work with at one time, you should ensure that the entities are keyed to the same ancestor. 

This allows you to:
a) ensure that all the data you want to get to for a specific customer is stored in the same portion of the distributed database.
b) do transactions that wrap the whole group of entities
c) update multiple entities in a single put() statement

There's definitely a lot to consider with BigTable.</description>
		<content:encoded><![CDATA[<p>You&#8217;re still missing a big part of the picture.</p>
<p>First, they need a way to do mass updates stil, since you can only fetch 1000 at a time, and doing so many updates in that manner is *really slow*. You will run out of time and Google will kill your request. In the meantime, the best you can do is some sort of AJAX page where it then triggers all the other hits that try to load subsets of products and update their names. Either way, its still really messy right now.</p>
<p>Second, with regards to storing a customers data. You&#8217;re missing the important bit about Entity Groups and ancestor keys. If you have a batch of customer data that you need to work with at one time, you should ensure that the entities are keyed to the same ancestor. </p>
<p>This allows you to:<br />
a) ensure that all the data you want to get to for a specific customer is stored in the same portion of the distributed database.<br />
b) do transactions that wrap the whole group of entities<br />
c) update multiple entities in a single put() statement</p>
<p>There&#8217;s definitely a lot to consider with BigTable.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: mib</title>
		<link>http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/#comment-3696</link>
		<dc:creator>mib</dc:creator>
		<pubDate>Wed, 16 Apr 2008 21:22:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.mibgames.co.uk/?p=81#comment-3696</guid>
		<description>Whoops,  I made the assumption that everyone knows what normalisation is, or what denormalisation is, and of course not everybody does.  I was going to write an overview of what normalisation is in a database, but thats actually quite complicated to do.

A simple explanation is that normalisation attempts to remove all duplication between data in tables.
Thats the thing that makes you want to put customer information in a customer table, order information in an order table and create relationships between them.

This is what we've done for nearly 30 years now, so it's become second nature to us.

However, sometimes we realise that it will cost more to do the join on the two tables when we want to query it than we save by seperating the data.  
We've recently had an example of that at my work, where we discovered that in order to select about 10 items from our database of nearly a million items, it had ot join all those million items into about 5 tables because the where clause relied on properties on multiple tables.

The solution then is to go the opposite way to normalisation, to denormalise.  This means that we copy data from our end tables, often into the join table, which means that the database can be more efficient, limiting the join table first, and then performing the join.

Finally, Google AppEngine does have foreign keys, at least in the django model anyway, and they're called ReferenceProperties.

Hope that helps</description>
		<content:encoded><![CDATA[<p>Whoops,  I made the assumption that everyone knows what normalisation is, or what denormalisation is, and of course not everybody does.  I was going to write an overview of what normalisation is in a database, but thats actually quite complicated to do.</p>
<p>A simple explanation is that normalisation attempts to remove all duplication between data in tables.<br />
Thats the thing that makes you want to put customer information in a customer table, order information in an order table and create relationships between them.</p>
<p>This is what we&#8217;ve done for nearly 30 years now, so it&#8217;s become second nature to us.</p>
<p>However, sometimes we realise that it will cost more to do the join on the two tables when we want to query it than we save by seperating the data.<br />
We&#8217;ve recently had an example of that at my work, where we discovered that in order to select about 10 items from our database of nearly a million items, it had ot join all those million items into about 5 tables because the where clause relied on properties on multiple tables.</p>
<p>The solution then is to go the opposite way to normalisation, to denormalise.  This means that we copy data from our end tables, often into the join table, which means that the database can be more efficient, limiting the join table first, and then performing the join.</p>
<p>Finally, Google AppEngine does have foreign keys, at least in the django model anyway, and they&#8217;re called ReferenceProperties.</p>
<p>Hope that helps</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: G-man</title>
		<link>http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/#comment-3695</link>
		<dc:creator>G-man</dc:creator>
		<pubDate>Wed, 16 Apr 2008 19:47:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.mibgames.co.uk/?p=81#comment-3695</guid>
		<description>As the Zen Master said, 'When you have forgotten all you know, then you will achieve Enlightenment...'

OK, I'm doing my best to 'empty my mind', and here are my realizations:

1. So, there are really no 'foreign keys', in the sense of Customer = Customer_id(5)?

2. Then, my customer_name will be written all over the place in different tables - that's denormalization?

3. If I want to track projects-persons, I have to make a list property with the projects ['house', 'office', 'street'], instead of a join table.

4. And I really have to rethink my database design with the business logic of queries governing all.

It's really a return to a much more naive state of how databases were done 'back in the day'.

Thanks!</description>
		<content:encoded><![CDATA[<p>As the Zen Master said, &#8216;When you have forgotten all you know, then you will achieve Enlightenment&#8230;&#8217;</p>
<p>OK, I&#8217;m doing my best to &#8216;empty my mind&#8217;, and here are my realizations:</p>
<p>1. So, there are really no &#8216;foreign keys&#8217;, in the sense of Customer = Customer_id(5)?</p>
<p>2. Then, my customer_name will be written all over the place in different tables - that&#8217;s denormalization?</p>
<p>3. If I want to track projects-persons, I have to make a list property with the projects ['house', 'office', 'street'], instead of a join table.</p>
<p>4. And I really have to rethink my database design with the business logic of queries governing all.</p>
<p>It&#8217;s really a return to a much more naive state of how databases were done &#8216;back in the day&#8217;.</p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brett Morgan</title>
		<link>http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/#comment-3684</link>
		<dc:creator>Brett Morgan</dc:creator>
		<pubDate>Tue, 15 Apr 2008 13:09:41 +0000</pubDate>
		<guid isPermaLink="false">http://www.mibgames.co.uk/?p=81#comment-3684</guid>
		<description>Best write up of the mindset I've seen so far. Danke. And sub'd. =)</description>
		<content:encoded><![CDATA[<p>Best write up of the mindset I&#8217;ve seen so far. Danke. And sub&#8217;d. =)</p>
]]></content:encoded>
	</item>
</channel>
</rss>
