SQL or RDF? Thoughts on Tellico's Next Backend

One of the main goals of Tellico's development has been to be a simple application. I wanted to be able to keep track of my books without having to configure an SQL database, or create a schema, or worry about system daemons. To that end, while I thought about SQLite at the time (several years ago), I ended up writing Tellico to just store all its data in memory. The images are stored on disk, but all the field values for each entry are maintained in simple object containers (vectors and hashes...). The XML format is used only for serializing the data to save and reload.

In practice, I believe that has worked rather well. While I have received emails from folks who try to store 10,000 books in their database and find the performance lacking, by and large, I've seen many reviews note favorably that Tellico is simple and flexible to use and can be useful for the majority of people.

I do want to expand Tellico's capabilities, however. One large goal is to get away from treating each collection as a flat list of entries. I want to be able to have books and movies in the same database, for example, and I want to be able to track TV episodes and seasons equally well. I want to be able to add information about authors and actors.

To that end, I need to rewrite Tellico's backend. And in considering how I want to do that, I've come to a decision point about SQL vs. RDF.

Many highly-visible KDE applications use SQL, such as Amarok, Digikam, and Akonadi. I just read a blog post about using the same MySQL instance for all three of those applications.

On the other hand, the Nepomuk framework in KDE provides an interface to an RDF database. Bangarang and KMail 2 are both heavily using Nepomuk.

So I'm trying to work up a pro/con list.

Portability


I want to say that SQL wins. Embedding or linking against SQLite means a typical user would never need to worry about database permissions, daemon persistence, or username and port settings. At the same time, for power users, the added work to make MySql or PostgreSQL an option, would be reasonable. Akonadi and Digikam have taken this approach, and up until recent versions, so had Amarok.

Using Nepomuk, on the other hand, requires the full Soprano and Virtuoso tool chain. Most KDE desktops are running Virtusoso at this point, I guess, but I don't want to shut out the GNOME users out there. And on my underpowered development box with 1 GB of RAM, I can't even use Strigi and Nepomuk.

Development Maturity

Here again, I think SQL wins. SQL (and to some extent, SQLite) is used in so many places, I know a significant amount of work has gone into optimizing and improving its efficiency. In other words, if the database access is slow, it's very likely that the problem is due to my poor programming knowledge rather than a fundamental flaw. I don't have that reassurance with RDF/SPARQL and Nepomuk. i know Nepomuk is improving, but looking at the bug reports and development fits and starts in the KDE code, it still seems a bit rocky.

SPARQL also has some weird semantics, such as blank nodes, a need for custom Insert/Replace behavior, and a lack of aggregate functions. SPARQL is still rather immature, in that sense.

Interoperability

I feel like I should include this factor. RDF seems to be a bit of a buzzword with the semantic database push lately.A SQL schema would largely be opaque, while the RDF store, assuming the use of common ontologies, would allow for future interoperability with other databases. This is all rather fuzzy, though, and there's nothing that says I can't have some sort of RDF export or translation from the SQL.

If I did use Nepomuk and RDF, I might even have to try to write some sort of abstraction layer to use Tracker on GNOME.

Developer Interest

I'd call this a tie! I've messed around with some limited SQL and RDF/SPARQL both, and I'm interested in learning more about both.

Conclusion

These are mostly just unordered thoughts bouncing around in my head. I'll all but decide to take a shot at implementing a SQL backend, and then change my mind an hour later. Plus, who's to say I can even figure out how to do any of this! I only impersonate a programmer on TV! :)

4 Comments

I did not know about Tellico. I came across it in a tweet when I was looking for something else. Looks interesting.

https://twitter.com/#!/elseinstitute/status/114661110573039616

Hi,

as I've just stumbled about Tellico again on kde-apps.org.

To give you a further decision hint for the SQL/RDF problem, you might have a look at the bibliographic ontology draft I try to create currently[1].

Together with the other ontologies available for Nepomuk they fulfill most of your needs to represents your collection data.

Nepomuk is getting better and better and hopefully integrating Nepomuk in Tellico will one day allow to use your collection data in other KDE programs too.


[1] https://sourceforge.net/apps/trac/oscaf/ticket/124

Hey Robby,

Did you ever end up making your decision? I've been following Trueg's blog and check this out: http://trueg.wordpress.com/2012/02/06/more-fun-with-tv-shows/

How cool would it be if that worked with Tellico? We already have lots of this data in there for our comics, movies, books, etc. I sent him an email about working with you. A few months ago I was able to find your identi.ca name to suggest the goodreads plugin (thanks for that!) but today on the web browser at work that I was using, your tellico page is busted-looking so I couldn't find your email address to CC you on Trueg's email. Anyway, I think it'd be pretty awesome - even if you had SQL as the backend, to have it work with Nepomuk in that way.

If you have the time, please dent me (djotaku) and let me know if you'll consider this idea.

Thanks!

Yeah, I saw Trueg's post on that, looks pretty neat. I've worked a bit on having Tellico interact with Nepomuk a bit, but it's frustrating. I ranted on Google+ recently (just trying out G+): https://plus.google.com/105856599006558518087/posts/BLChBPTejDS I mean, I can't even run Virtuoso on my home computer with KDE 4.8, it's too slow. And that makes it hard to develop! :/

My plan right now is to try to write a generic backend, to get away from the current object hierarchy I have in Tellico, then write a SQL implementation of it along with perhaps a Soprano/Virtuoso one.

Leave a comment

Recent Entries

  • Testing out Conquirere 0.1

    From Planet KDE, I saw a new KDE application mentioned, called Conquiere. Conquirere allows you to add bibliographic data such as journals, books, proceedings papers,...

  • Credit Card Rewards for 2011

    Like I did last January, I just finished tallying all of our credit card rewards for the previous year. I do that mainly out of...

  • Thank You, American Express

    American Express just ran a promotion called The Gift Chain. I saw it referenced on several of the financial blogs that I read. Most of...

  • Credit Card Benefits

    For all the bad raps that credit card get, I think it's worth noting when they really provide extra benefits. I've had two such incidents...

  • SQL or RDF? Thoughts on Tellico's Next Backend

    One of the main goals of Tellico's development has been to be a simple application. I wanted to be able to keep track of my...

Close