This tutorial will walk you through basic use of the ODB storage layer and
the higher level model module. It assumes a UNIXish system and basic
proficiency with the UNIX environment.
Basic Storage
Let's start by creating a simple program to manage our e-mail groups.
#!/usr/bin/python # store this as "maillist", and "chmod +x maillist" import sys import odb # get the database store = odb.Store('maildb') # get a "map" table groups = store.getMap('Groups') # get the command line arguments, decide what to do based on the first # command. args = sys.argv[1:] cmd = args.pop(0) if cmd == 'putgroup': # get the group and all of the members, store them in the Groups table group = args.pop(0) members = args groups.put(group, members) elif cmd == 'rmgroup': # delete the group group = args.pop(0) try: groups.delete(group) except KeyError: print 'Group %s does not exist' % group elif cmd == 'lsgroup': # list the members of the group group = args.pop(0) members = groups.get(group) if members is None: print 'Group %s does not exist' % group else: for member in members: print member
The first step is to identify the database:
# get the database store = odb.Store('maildb')
The "Store" class is an ODB database. The string passed into it is the filesystem path where the database stores its files. This directory will be created if it doesn't exist, so there is no separate construction process for it.
If you run:
$ maillist putgroup friends joey@yahoo.com frodo@middleearth.org
You should now have a "maildb" subdirectory in your current working directory.
It's probably a bad idea to store your databases on network filesystems: ODB only does fcntl style locking at this point, which isn't universally supported by network filesystems. So if multiple clients are using it simultaneously data corruption is possible.
We can list and delete elements in our database:
$ maillist lsgroup friends joey@yahoo.com frodo@middleearth.org $ mallist rmgroup friends
We often want to perform a set of database actions atomically: so that all of the actions either succeed or fail together.
As an example, let's say we wanted to keep a separate table of which groups every e-mail address was a member of. When adding a group, we could just do a series of puts:
# our "groups per member" table groupsPerMember = store.getMap('groupsPerMember') # ... portions omitted ... # new code to store a member groups.put(group, members) for member in members: # see if the member is already in a group memberGroups = groups.get(member, []) if memberGroups is None: memberGroups = [] # add the group to the list of groups for the member memberGroups.append(group) groupsPerMember.put(member, memberGroups)
But if we get an error after writing the group, but before writing all of the members, our database is in an inconsistent state: some of the members will have an incorrect list of groups. We just can't have that.
The way to avoid this is to enclose the entire update in a transaction:
txn = store.startTxn() try: # store the group groups.put(group, members) for member in members: # see if the member is already in a group memberGroups = groupsPerMember.get(member, []) if memberGroups is None: memberGroups = [] # add the group to the list of groups for the member memberGroups.append(group) groupsPerMember.put(member, memberGroups) # commit the transaction and clear it so that we don't abort it. txn.commit() txn = None finally: # if the transaction wasn't fully committed (and set to None) abort # it, rolling back all changes. if txn: txn.abort()
We can do something similar for rmgroups: this is left as an exercise to the reader.
We've seen that the transaction pattern looks like this:
txn = store.startTxn() try: ... do something ... txn.commit() txn = None finally: if txn: txn.abort()
This is a lot of syntax for something so common as defining a transaction. In order to make this a little less work, ODB provides a transaction function decorator that makes any function run in its own transaction:
from odb import txnFunc @txnFunc def storeInTwoTables(obj): byName.put(obj.name, obj) byId.put(obj.id, obj) return obj.name # store the object in both tables in a single transaction name = storeInTwoTables(obj)
The code above is equivalent to the whole "try ... finally" wrapper above, but is much less verbose. Note that arguments and return values are respected.
If you need to access the transaction from within the function, you can use the getTxn() method:
@txnFunc def doSomething(): # get the current transaction txn = store.getTxn() ...
You may want to store additional information in a transaction - like a timestamp, or the id of the user that committed the transaction. Transaction annotations allow you do to this:
txn = store.startTxn() try: groups.put('clowns', ['bozo@circus.com', 'guffaw@barnumbailey.com']) txn.annotations['user'] = 'mmuller' txn.annotations['comment'] = 'adding group "clowns"' txn.commit() txn = None finally: if txn: txn.abort()
We can do the same thing from within a decorated transaction function as follows:
@txnFunc def storeGroup(group, members): groups.put(group, members) txn = store.getTxn() txn.annotations['user'] = 'mmuller' txn.annotations['comment'] = 'adding group %s' % repr(group) storeGroup('clowns', ['bozo@circus.com', 'guffaw@barnumbailey.com'])
If we dump our transaction log using the dbDump utility, we can now see:
$ dbDump maildb/log.000000001 Txn { Annotations { 'comment': 'adding group "clowns"' 'user': 'mmuller' } _ReplaceAction { key = 'clowns' oldVal = None name = 'Groups' val = ['bozo@circus.com', 'guffaw@barnumbailey.com'] gotOldVal = False } }
You'll currently have to do some digging into ODB's internals if you want to
access the transaction logs programmatically. Hopefully, better support for
this sort of thing will some day be added to the API.
Inspecting our Database
ODB provides some tools to allow us to look into the database without writing python code. In particular, "odbq" lets us perform arbitrary queries on the database:
$ odbq -d maildb groupsPerMember/* frodo@middleearth.org ['friends'] joey@yahoo.com ['friends', 'comrades'] lester@nester.com ['comrades']
As you can see, while you were working on the rmgroup code, I added a "comrades" list along with my "friends" list :-). The query I used above was "groupsPerMember/*", which selects all keys in the "groupsPerMember" database.
We'll discuss odbq further later on when talking about the higher level "model" feaures.
Changes to the database are stored in transaction log files. "dbDump" can be used to view the contents of the transaction log:
$ dbDump maildb/log.000000001 [big transaction log dump omitted]
So far, we've only used "Map" tables - these are the most commonly used table types. But occasionally, you want to store data sequentially. For example, let's say we wanted to implement a persistent message queue:
class Queue: def __init__(self): self.q = store.getSequence('queue') def add(self, message): # add the message to the end of the table self.q.append(message) def get(self): # pop the first message off the queue. return self.q.pop(0)
If we want to make sure that we were able to successfully process the message before removing it from the queue, we could wrap the processing in a transaction:
txn = store.startTxn() try: msg = q.get() raise Exception('error processing the message!') txn.commit() txn = None finally: if txn: txn.abort()
Sequence tables are implemented using a special form of a btree which stores
child counts instead of keys. You can expect O(log n) insertions and lookups.
Cursors
ODB lets you iterate over ranges of values in both map and sequence tables using cursors.
To list all of the groups in our Groups table, we could do this:
for name, members in groups.cursor(): print '%s: %s' % (name, members)
the cursor() method returns an iterator over its table. For a Map table, the iterator yields key/values pairs. For a Sequence table, it merely yields elements:
queue = store.getSequence('queue') for elem in queue.cursor(): print elem
Cursors can be positioned using setToFirst(), setToLast() and setToKey(). To print out all groups whose names start with the letter "f":
cur = groups.cursor() cur.setToKey('f') for name, members in cur: if not name.startswith('f'): break print name, members
Note that map table keys are sorted lexically.
setToKey() defaults to a partial match - it finds the first key beginning with the specified substring. You can also find an exact match:
cur.setToKey('foo', exact = True)
The setToKey() method also works on sequence tables, in this case the key is the index.
# start iteration at position 10 cur.setToKey(10) for elem in cur: print elem
It is often useful to traverse a table in reverse, so the cursor supports a reverse() method which returns a reverse cursor at the same position:
# go backwards through our group list cur = groups.cursor() cur.setToLast() cur = cur.reverse() for group, members: print group, members
The semantics of cursors is conceptually the same as sequence indeces in Python: cursors can be conceived of as pointing to the space between two elements. So, for example, for any non-empty table:
cur = table.cursor() first = cur.next() print first == cur.reverse().next() # always prints "True" (assuming # comparison works as expected)
This could yield unexpected results when dealing with reverse iterators:
cur = groups.cursor().reverse() cur.setToKey('f') print cur.next() # prints the (group, members) _before_ the first group # starting with "f"
Everything up to this point has been focused on the storage API, which lets you store and retrieve objects from tables. This is all well and good, but in most applications there is a need for some higher level features. We typically want to be able to do things like compose keys from attribute values, or define indeces on tables. This is where the "model" module comes in.
Let's say we wanted to improve upon our mailing list example so that groups could contain hundreds of users. We might want to define a few objects like this:
class Group: "Groups have an id and a description." def __init__(self, id, desc): self.id = id self.desc = desc class Member: "Members have a group and an e-mail address." def __init__(self, group, email): self.group = group self.email = email
The kinds of things that we want to do are the same as in the previous example:
Look up all members of a group
Look up all groups that an address is a member of.
We can do this by making our objects Model objects and defining schemas for them. Schemas define mappings between objects and their tables and indeces. They allow you to specify a list of object attributes to be used to define keys for the tables and indeces.
To make use of these features, we would rewrite our classes as follows:
from odb.model import Model, Schema, WILD class Group(Model): "Groups have an id and a description." # groups are in table "Group", the key is the group id. _schema = Schema('Group', ('id',)) def __init__(self, id, desc): self.id = id self.desc = desc def iterAllMembers(self): # iterate over the list of Member objects whose key starts with the # group id for key, member in Members.select(self.id, WILD): yield member def addMember(self, email): Member(self.id, email).put() def removeMember(self, email): Member.get(self.id, email).delete() class Member(Model): "Members have a group and an e-mail address." # groups are in table "Group" (where the key is the group and e-mail) # and are also indexed by email and group. _schema = Schema('Member', ('group', 'email'), indeces = {'Email': ('email', 'group')} ) def __init__(self, group, email): self.group = group self.email = email
Index and table keys are defined by a tuple of attribute names that are used to compose the key values. These are sorted lexically in map tables, so if we want to be able to do something like iterating over all of the members of the group, we want the key to begin with the group id so that all of the members of the group can be found in a contiguous range within the table or index.
Keys must be unique. If you attempt to store an object with a key that is identical to that of another object, either in the main table or in any of the indeces, you will get a KeyCollisionError.
The Model.select() and Model.get() methods allow you to retrieve objects by key. Model.get() is used to retrieve a specific object given its key. Model.select() is a generator that allows you to iterate over a range of key/value pairs.
Both functions have similar keyword and function arguments: the sequence
arguments are values for each attribute in the key. So for example, the primary
key in our main table is ('group', 'email'), to look up an object using
get we say "Member.get(group, email)".
TODO
storage cursors
odbq queries on model tables
filers and backups