Calendar

««Jun 2008»»
SMTWTFS
1234567
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22232425262728
2930

Alert Email

Get a short email alert whenever a new entry is published.

Confidential, secure it's piece of cake to keep uptodate.

Access Amazon SimpleDB from within CFQUERY

As you may already be aware, Amazon do a little more than just sell books. They have quietly and slowly changing the way we think of web services and cloud computing. With their ondemands servers (EC2), unlimited file storage (S3), messaging (SQS), databases (SimpleDB) they are truely making us all rethink how we architect tomorrows systems.

Amazon's SimpleDB service, is a system for storing and querying data without any consideration for scaling and storage. You just use it. Priced like all their other systems, the more you use the more you pay.

Like all of Amazon's web services this is another powerful tool to add to their arsenal, and now you as a CFML developer can easily get at this service through OpenBD's CFQUERY extension.

Amazon Simple DB Basics

Amazon SimpleDB is not a true relational database. Instead you can think of it as a series of Hashtables, stored in a single domain. A domain in the Amazon-speak is not dissimilar to a table in SQL, where their Item is close in thinking to a row. In a traditional database, you have a fixed number of columns per table, but in Amazon thats not the case, you can have up to 255 attributes (or columns) in a row and do not all have to be defined.

There is also no concept of a table/domain definition. You just add/delete data, with the assumption that you provide a unique identifier for each row or Item. Another small gotcha is that all your data is stored as literal strings. So 10 would be stored as "10". The only time you need to worry about that is when doing less-than greater-than queries, as they would be performed at a lexical level.

  • Amazon stores data in Domains then Items, with each item having any number of attributes
  • Maximum of 100 domains per user account
  • Maximum of 10GB per domain
  • All data is stored as strings; got to be careful when doing ItemA > ItemB as its lexical comparisons not numeric (ie pad out numbers)
  • No individual item can be over 1k in size
  • Maximum attributes per row/item is 255
  • Each row has a unique ID (ItemName)

Pricing for Amazon SimpleDB can be found here, but they charge per-GB on data going in and out, and the amount of CPU time your query takes.

CFML Integration

Getting access to this functionality is now very easy with OpenBD. We've added Simple DB functionality to our official Plugin that is available for use. When looking at providing access to this service, we debated whether it should be a set of functions, new set of tags or something else. The answer was staring us all in the face; CFQUERY.

CFQUERY is of course the CFML window into data storage, and the original creators of this tag already had built in future extensibility by the utilising the dbtype="" attribute. Historically only really used to differientiate between a SQL Query and a Query-of-Queries query. So we added a new dbtype; amazon.

This lets you build INSERT / DELETE / SELECT statements for accessing data sitting inside of Amazon SimpleDB.

If that wasn't cool and easy enough, the real side effect of using CFQUERY for your Amazon SimpleDB API is that it literally saves you money. For each request you make of Amazon SimpleDB, it costs money. But by utilising the inbuilt query caching of CFQUERY (including the OpenBD caching enhancements) you don't need to query Amazon half as much as you would normally would.

Lets look at some sample code, and how else OpenBD helps you interact with Amazon SimpleDB

Sample Code

First of all, we didn't get away with not implementing some functions. These were purely to assist in the creation of the Amazon datasource and the high level management of domains.

<cfset amazonDS = AmazonSimpleDB( "MyIdentifier", awsAccessId, awsSecretKey )>

<cfset CreateSBDomain( amazonDS, "mydomain" )>
<cfset DeleteSBDomain( amazonDS, "mydomain" )>
<cfset qry = ListSBDomains( amazonDS )>

The first function AmazonSimpleDB sets up the Amazon datasource, and once done you won't need to do it again. You don't even need to keep a reference to it, because all that is returned is a String object that will be your reference to it. This call takes in your two Amazon AWS access codes which opens up the world of Amazon to you.

To create a new domain you simple call CreateSBDomain passing in the Amazon datasource and the name you want your domain to be. Similiarly deleting the domain is performed using the DeleteSBDomain function. You can get a CFML query back of all your current domains by calling ListSBDomains. That is it for the functions. From here on in, its CFQUERY.

Inserting data

Let us start by inserting data into our domain. We are all familiar with the INSERT syntax for SQL, so you'll be able to dump data straight into your Amazon SimpleDB very quickly.

<cfquery dbtype="amazon" datasource="#amazonDS#">
  insert into mydomainname (ItemName, "name", "age") values (
  'MyUniqueName', 
  <cfqueryparam value="#session.name#">, 
  <cfqueryparam value="#session.age#">)
</cfquery>

As you can see, it is a standard INSERT statement, complete with CFQUERYPARAM tags to help you format your data. Please note though, when inserting you will need to provide in the column list, ItemName. This is the unique identifier, or index, for you row. If the row already exists, then the attritbute columns are overwritten.

Deleting data

Deleting data is equally as painless, except there are two types of deletes. You can delete an attribute from a given row, or you can delete the complete row. Remember, Amazon charges you for the data, whether you use it or not, so the ability to delete a given column in a given row is very powerful (and cost effective!).

<cfquery dbtype="amazon" datasource="#amazonDS#">
  delete from mydomainname 
  where ItemName='myrowid'
  [AND ItemAttribute='myattribute']
</cfquery>

So here you can see you either delete the whole row by specifying the unique id to the ItemName column, or you delete a specific attribute using the ItemAttribute keyword.

Note, Amazon does warn that due to the way their system operates and synchronizes, if you do an add or delete of data, then it may not be immediately available if you query for it straight after. In practice though, we haven't noticed this.

Selecting data

So now that you have data sitting within Amazon SimpleDB, you will no doubt want to pull it back out. This is done with the SELECT statement, within a CFQUERY tag. Remember, you can utilise the caching techniques of CFQUERY to increase performance.

<cfquery dbtype="amazon" datasource="#amazonDS#" name="qry">
  select * from ItemAttribute
  where domain='mydomain' and ItemName='myrowid'
</cfquery>

Here we are pulling back all the attributes for a given item or row. This will be a single row query. This may seem a little strange, but this maps onto how Amazon manage their data.

So the question becomes, what ItemName's do I need to pull back based on a given criteria or query. You can easily determine using the following SELECT statement.

<cfquery dbtype="amazon" datasource="#amazonDS#" name="qry">
  select ItemName,NextToken from mydomain
  where [Amazon Query]
  limit [nexttoken,],5
</cfquery>

This one may require a little explanation as its more of an Amazon issue. Amazon has no real notion of paging results. You can get a maximum of 250 items back in one go, and to get the next set you must pass back a special token that will allow Amazon to get you the next set. This token is taken from the previous query resultset.

Querying the data, you use Amazon's special query language, which isn't dissimilar to how some SQL databases format their commands.

For example, in our example, to query for all people that are in their 20's we would write the following.

<cfquery dbtype="amazon" datasource="#amazonDS#" name="qry">
  select ItemName,NextToken from mydomain
  where ['age'>'19' AND 'age'<'30']
  limit 100
</cfquery>

Recall we said that all data is stored as pure string's. This means your queries may look a little strange at first. But you soon get use to it.

However, you may be wondering how you can manage numbers of unequal length. We've added a new attribute to CFQUERYPARAM, called PADDING="" that lets you specify the number of leading zero's to a number if the value passed in is a number.

If you use the CFQUERYPARAM for inserting your data, OpenBD will figure out the best way to represent your data within Amazon SimpleDB so querying for it doesn't cause any bizarre side effects. For example, date objects can cause problems if not careful. Best to stick to using CFQUERYPARAM.

Summary

Amazon has done it again and delivered a truely scalable solution to which to build web applications without regard for managing load and logistics. OpenBD brings this power to the CFML developer through the use of the familiar CFQUERY tag. By utilising the power and knowledge of CFQUERY, your CFML applications can not only utilise Amazon SimpleDB but also save you money through its caching layer.

Get started with it today, but downloading OpenBD and then applying the Official plugin to it. Let us know what you think and how we can make it better.

Comments (0) . Friday, 20 June 2008

Open source in film; Python history

While other sectors of the software world are well versed in the world of open source, the CFML community is still coming to terms to it. Confusion over the different types licenses and misunderstandings aside, the speed of uptake of an open source project can vary greatly, but in large it is a very slow and long burn.

Michael Ogawa, a student at UC Davis, has conducted a rather wonderful research project into the evolution of some of the major software projects. He has taken the history of each of project, and mapped it visually over a time line to observe how people and the project evolve. The Python history is shown here in this video below:


code_swarm - Python from Michael Ogawa on Vimeo.

As you can see the initial success of python can be largely attributed to one man, Guido van Rossum. His dedication allowed the python project to continue until it reached that crucial or tipping stage where the project's success was beyond the input of a single person.

OpenBD is fortunate in that it gets a kick start by from an established known product that is nearly 10 years old. But we can still learn a lot from this project and many others. The key to the success to many of the larger projects is to make it more than a single persons dream. Committees and Councils, while all formal sounding, is merely a structure to ensure no single person dominates an established project. Apache, Redhat, MySQL, Eclipse, OpenJDK, JCP, or any other large successful open project all hold to this group thinking mentality.

You will start seeing very soon the result of this collective thinking as we start to roll out some of our innovations and features as we move CFML into the wider software pool. Be sure to check out the OpenBD sessions at CFUnited this week for more details.

Comments (0) . Wednesday, 18 June 2008