Mastodon

Dropbox Datastore, the iCloud Killer? (updated)

This week was DBX, Dropbox’s first ever developer conference. The big news as far as I’m concerned is their new Datastore API. In a break from their file-oriented past, Dropbox now has an API for syncing structured data between devices. I’ve long been a happy Dropbox user and I’ve lately been a frustrated iCloud developer. So the question is, should I care? Should you?

Some of the hype has suggested that the new API is an “iCloud killer”. As I’ve previously discussed, the term “iCloud” covers a lot of ground. Some of it, like file syncing, Dropbox already does. The new API is being compared to Core Data’s iCloud integration, hence my interest. Here I’m going to run through the Datastore API with an eye toward seeing how it compares to other options for structured data.

As an aside, the term “iCloud killer” doesn’t make a lot of sense. The only iCloud “killer” to date has been iCloud itself.

Update, 2013-07-12: I was contacted by Brian Smith, the iOS developer for Dropbox Datastore, with some clarifications and extra detail. I’ve added this new information inline below. Look for Update: for additions from Brian.

tl;dr

To summarize what follows: The Datastore API may be a good idea if your data model is not highly structured and if your users mostly already have Dropbox accounts. For anything more sophisticated or for non-Dropbox users, probably not. Or at least not without a lot of extra work. The API is by no means a replacement for SQLite or Core Data, but it’s probably enough for many apps.

Datastore API Overview

What Dropbox provides with the new API is fundamentally a fancy key-value store. It’s not a direct replacement for either Core Data or SQL, but it will be enough for many apps. Currently there are SDKs for iOS, Android, and JavaScript. The iOS SDK is not compatible with Mac apps since it uses UIKit. Data is structured using the following classes:

Update: Brian says that Mac OS X compatibility has been very highly requested and should appear, though not during the beta period.

DBDatastore

This is the core of your structured data. There’s one default data store plus whatever other stores you decide to create. Weirdly the API tutorial says that “each datastore is either cached locally or not”. It doesn’t get any clearer than that. There’s no API to check on or to control caching. I assume this isn’t nondeterministic but it’s not clear what it means. Maybe it’s just not cached until you first download it?

Update: Brian explains that data stores are downloaded and cached when you first open them. If your app has more than one data store in the user’s account, only stores that you’ve actually opened will be cached.

DBTable

As with SQL a table is a collection of records. But make no mistake this is no SQL table. There’s no schema, so you can save any record in any table without any kind of consistency checks. A DBTable is really more like a named collection of records, and you can’t infer anything about the records from the fact that they belong to a table. If your code does not suck then a table probably represents a group of records which follow the same model. You’ll need to enforce that yourself though.

One interesting feature on DBTable is that conflict resolution rules can be specified on a per-field basis. The rules include preferring the local value or a the remote value or choosing the min or max value from the conflict set. The docs say that min and max use “type-specific ordering” which probably means alphabetical for strings. There’s also a “sum” option that appears to track changes to numeric values rather than actual values to get a consistent result from multiple conflicting updates.

DBRecord

These are individual table records, though as noted above there’s no guarantee that one record in a table is anything like the others. A record is basically a dictionary and has whatever keys you decide to assign to it. The values can be one of the following types, with the corresponding Foundation types:

  • String (NSString)
  • Boolean (NSNumber)
  • Integer (NSNumber)
  • Floating point (NSNumber)
  • Date (NSDate)
  • Bytes (NSData, up to 100kB)
  • List (a DBList, not an NSArray, more on this below).

For comparison this is more or less the same as in Core Data. Core Data also includes a decimal numeric type, which helps deal with floating point accuracy issues. It also allows more flexibility in choosing the exact size of an integer value. And of course Core Data supports transformable attributes, which are great when you want to save a value that doesn’t fit the standard types.

Because there is no schema it’s on the developer to make sure that the values for a key are actually the expected type. Your table might use a dateCreated field that should store only dates. But the Datastore API doesn’t really care if you assign a string to that field, so there are no compile-time or run-time type checks. Again, if your code does not suck then you’ll probably get the right type. But non-lousy code will of course also check just to make sure, because nobody else is paying attention here.

Note that the list above does not include relationships. The Datastore API has no direct support for relationships. As the docs note, each record has a unique ID, and the ID is a string, so you could store one record’s ID as a property on another record and treat it like a SQL foreign key. However that doesn’t go very far because the API doesn’t support anything like a SQL table join. You can get a foreign key but you’ll have to write your own code to do anything with it.

The binary value type is limited to 100kB, and above that you need to use Dropbox’s file API for the data and store the filename in the record instead. That’s reasonable, and while in comparison SQLite doesn’t impose a hard size limit, it’s still good practice. In comparison though Core Data at least makes this automatic. Assign a binary blob of whatever size you like and Core Data can figure out whether to put it in an external file for you.

Finally, while I expect it’s possible to subclass DBRecord to add custom behaviors, there’s not a lot of point in doing so. You can’t tell the datastore to use your subclass, so you’ll be dealing in plain DBRecord instances anyway.

DBList

A DBList is just an ordered collection of values which can be of any type that’s valid on a DBRecord. It’s basically a simplified mutable array that can be used as a value in a DBRecord. It can also provide values as an NSArray if you need more power than its limited API. As with DBRecord values there’s no requirement that all of the values in a list are the same type.

DBRecord allows some flexibility when setting list values. If you pass an NSArray as a value, it is transparently saved as a DBList.

Update: Brian explained that DBList uses operational transforms to maintain list ordering in the face of conflicting changes, for example rearranging a list on one device and deleting some items on a different device. So while the class API seems pretty limited, there’s more to the class than it seems at first.

Using the Datastore API

Setup

The biggest hurdle when using the API the initial user setup. Though Dropbox is ubiquitous among the geeky cognoscenti, it’s far less common among the hoi polloi of the internet. If the user doesn’t already have a Dropbox account, there’s no way to ease the process in your app. They’ll have to go deal with Dropbox separately and then (if you’re lucky) return to your app. Assuming that your prospective user has a Dropbox account and wants to use it in your app is probably more limiting than the frequent, hated requirement of a Facebook account. It excludes the opposite half of the market, though.

Sharing

One of Dropbox’s major selling points when compared to iCloud is that files can easily be shared between different users. There is no sign that this kind of sharing exists for the Datastore API. Data stored this way does not show up like files stored in a Dropbox account. Neither the Dropbox web site nor the API has any sign of sharing options. The API provides a single-user data silo just like iCloud.

But there seems to be a weird kind of reverse sharing option. If you’re using the Datastore API you can connect to more than one Dropbox account. Unlike iCloud, you can connect multiple times to different accounts. Users wanting to share data could, it seems, create a special sharing-only Dropbox account that everyone could access. The security implications are nasty since it only works if everyone knows the password, but it looks like it would actually work.

However if you thought initial setup was a bad user experience, this is even worse. Set up two distinct accounts, share the password, sign in to both accounts in your app… Fun stuff, eh?

Threading

The API docs are completely silent on the matter of thread safety, so I assume that there is none. If this matters in your app you’ll need to handle it yourself. Choose your poison, GCD, NSLock, whatever, but keep your data store access synchronized. That’s potentially a real problem, for example with the common pattern of downloading data from a web service while keeping the UI responsive. Of course most of Core Data isn’t thread safe either. But there are well-established ways to work on more than one thread while keeping things in sync and avoiding corruption. With Dropbox Datastore this doesn’t seem to be the case, not yet at least.

Update: According to Brian, “Datastores are completely thread-safe, and every method call and property access is guaranteed to be atomic.” Documentation of this should be forthcoming. That’s really good to hear.

Looking up Data

The API for searching the data store is quite simple, maybe too simple. To find records, pass an NSDictionary to the DBTable and you’ll get any DBRecords with the same key/value pairs. Easy! Except… the results need to match exactly. No getting records where a numeric field is more or less than a reference value, no partial string matching, etc. Also, since there’s no relationship support, your query is limited to values on the target table. No looking up instances of one record type based on the values of a related record type. And there’s no way to specify a sort order. All of this could be handled by looking up records and post-processing the result. But you might end up needing to fetch all records from a table in some cases. And it’s a lot nicer to just tell your data store to sort on a particular key for you. Both Core Data and straight SQL pretty much kick the Datastore API’s ass here.

DBError

Most of the API seems pretty straightforward. However error status is handled via the DBError class, which subclasses NSError. Both the definition and the use of DBError in the API is a trainwreck of inconsistency and superfluous local variables. Rather than go into detail I’ll just refer you to Jon Wight’s excellent writeup. This won’t stop you using the API, but it may drive you up the wall when you do.

Update: My biggest concern with DBError was the use of DBError * method arguments where DBError ** made more sense. This was apparently a typo, and the latest beta SDK fixes it. Whether subclassing NSError is a good idea is still debatable, but with this change at least you won’t need to allocate objects you might never need. This fixes the inconsistency I noted.

Summary

See tl;dr above. It’s going to be interesting to see how this develops.