Archive for the 'Freebase' Category

Putting Freebase in a Star Schema

What’s Freebase?

cyclopedia
Freebase is a open database of things that exist in the world:  things like people,  places,  songs and television shows.   As of the January 2009 dump,  Freebase contained about 241 million facts,  and it’s growing all the time.  You can browse it via the web and even edit it,  much like Wikipedia.  Freebase also has an API that lets programs add data and make queries using a language called MQL.  Freebase is complementary to DBpedia and other sources of information.  Although it takes a different approach to the semantic web than systems based on RDF standards,  it interoperates with them via  linked data.

The January 2009 Freebase dump is about 500 MB in size.  Inside a bzip-compressed files,  you’ll find something that’s similar in spirit to a Turtle RDF file,  but is in a simpler format and represents facts as a collection of four values rather than just three.

Your Own Personal Freebase

To start exploring and extracting from Freebase,  I wanted to load the database into a star schema in a mysql database — an architecture similar to some RDF stores,  such as ARC.  The project took about a week of time on a modern x86 server with 4 cores and 4 GB of RAM and resulted in a 18 GB collection of database files and indexes.

This is sufficient for my immediate purposes,  but future versions of Freebase promise to be much larger:  this article examines the means that could be used to improve performance and scalability using parallelism as well as improved data structures and algorithms. Continue Reading »