SQL or RDF? Thoughts on Tellico’s Next Backend

One of the main goals of Tellico‘s development has been to be a simple application. I wanted to be able to keep track of my books without having to configure an SQL database, or create a schema, or worry about system daemons. To that end, while I thought about SQLite at the time (several years ago), I ended up writing Tellico to just store all its data in memory. The images are stored on disk, but all the field values for each entry are maintained in simple object containers (vectors and hashes…). The XML format is used only for serializing the data to save and reload.

In practice, I believe that has worked rather well. While I have received emails from folks who try to store 10,000 books in their database and find the performance lacking, by and large, I’ve seen many reviews note favorably that Tellico is simple and flexible to use and can be useful for the majority of people.

I do want to expand Tellico’s capabilities, however. One large goal is to get away from treating each collection as a flat list of entries. I want to be able to have books and movies in the same database, for example, and I want to be able to track TV episodes and seasons equally well. I want to be able to add information about authors and actors.

To that end, I need to rewrite Tellico’s backend. And in considering how I want to do that, I’ve come to a decision point about SQL vs. RDF.

Many highly-visible KDE applications use SQL, such as Amarok, Digikam, and Akonadi. I just read a blog post about using the same MySQL instance for all three of those applications.

On the other hand, the Nepomuk framework in KDE provides an interface to an RDF database. Bangarang and KMail 2 are both heavily using Nepomuk.

So I’m trying to work up a pro/con list.

Portability

I want to say that SQL wins. Embedding or linking against SQLite means a typical user would never need to worry about database permissions, daemon persistence, or username and port settings. At the same time, for power users, the added work to make MySql or PostgreSQL an option, would be reasonable. Akonadi and Digikam have taken this approach, and up until recent versions, so had Amarok.

Using Nepomuk, on the other hand, requires the full Soprano and Virtuoso tool chain. Most KDE desktops are running Virtusoso at this point, I guess, but I don’t want to shut out the GNOME users out there. And on my underpowered development box with 1 GB of RAM, I can’t even use Strigi and Nepomuk.

Development Maturity

Here again, I think SQL wins. SQL (and to some extent, SQLite) is used in so many places, I know a significant amount of work has gone into optimizing and improving its efficiency. In other words, if the database access is slow, it’s very likely that the problem is due to my poor programming knowledge rather than a fundamental flaw. I don’t have that reassurance with RDF/SPARQL and Nepomuk. i know Nepomuk is improving, but looking at the bug reports and development fits and starts in the KDE code, it still seems a bit rocky.

SPARQL also has some weird semantics, such as blank nodes, a need for custom Insert/Replace behavior, and a lack of aggregate functions. SPARQL is still rather immature, in that sense.

Interoperability

I feel like I should include this factor. RDF seems to be a bit of a buzzword with the semantic database push lately.A SQL schema would largely be opaque, while the RDF store, assuming the use of common ontologies, would allow for future interoperability with other databases. This is all rather fuzzy, though, and there’s nothing that says I can’t have some sort of RDF export or translation from the SQL.

If I did use Nepomuk and RDF, I might even have to try to write some sort of abstraction layer to use Tracker on GNOME.

Developer Interest

I’d call this a tie! I’ve messed around with some limited SQL and RDF/SPARQL both, and I’m interested in learning more about both.

Conclusion

These are mostly just unordered thoughts bouncing around in my head. I’ll all but decide to take a shot at implementing a SQL backend, and then change my mind an hour later. Plus, who’s to say I can even figure out how to do any of this! I only impersonate a programmer on TV! 🙂

2 thoughts on “SQL or RDF? Thoughts on Tellico’s Next Backend”

Anonymous says:

September 18, 2011 at 1:49 am

I did not know about Tellico. I came across it in a tweet when I was looking for something else. Looks interesting.

https://twitter.com/#!/elseinstitute/status/114661110573039616
Jörg Ehrichs says:

September 28, 2011 at 5:36 am

Hi,

as I’ve just stumbled about Tellico again on kde-apps.org.

To give you a further decision hint for the SQL/RDF problem, you might have a look at the bibliographic ontology draft I try to create currently[1].

Together with the other ontologies available for Nepomuk they fulfill most of your needs to represents your collection data.

Nepomuk is getting better and better and hopefully integrating Nepomuk in Tellico will one day allow to use your collection data in other KDE programs too.

[1] https://sourceforge.net/apps/trac/oscaf/ticket/124

Comments are closed.

periapsis.org

The closest thing to a home page for Robby Stephenson