Archive for February, 2009

Putting Freebase in a Star Schema

What’s Freebase?

cyclopedia
Freebase is a open database of things that exist in the world:  things like people,  places,  songs and television shows.   As of the January 2009 dump,  Freebase contained about 241 million facts,  and it’s growing all the time.  You can browse it via the web and even edit it,  much like Wikipedia.  Freebase also has an API that lets programs add data and make queries using a language called MQL.  Freebase is complementary to DBpedia and other sources of information.  Although it takes a different approach to the semantic web than systems based on RDF standards,  it interoperates with them via  linked data.

The January 2009 Freebase dump is about 500 MB in size.  Inside a bzip-compressed files,  you’ll find something that’s similar in spirit to a Turtle RDF file,  but is in a simpler format and represents facts as a collection of four values rather than just three.

Your Own Personal Freebase

To start exploring and extracting from Freebase,  I wanted to load the database into a star schema in a mysql database — an architecture similar to some RDF stores,  such as ARC.  The project took about a week of time on a modern x86 server with 4 cores and 4 GB of RAM and resulted in a 18 GB collection of database files and indexes.

This is sufficient for my immediate purposes,  but future versions of Freebase promise to be much larger:  this article examines the means that could be used to improve performance and scalability using parallelism as well as improved data structures and algorithms. Continue Reading »

Using Linq To Tell if the Elements of an IEnumerable Are Distinct

The Problem

I’ve got an IEnumerable<T> that contains a list of values:  I want to know if all of the values in that field are distinct.  The function should be easy to use a LINQ extension method and,  for bonus points,  simply expressed in LINQ itself

One Solution

First,  define an extension method

01   public static class IEnumerableExtensions {
02        public static bool AllDistinct<T>(this IEnumerable<T> input) {
03            var count = input.Count();
04            return count == input.Distinct().Count();
05        }
06    }

When you want to test an IEnumerable<T>,  just write

07 var isAPotentialPrimaryKey=CandidateColumn.AllDistinct();

Continue Reading »

Subverting XAML: How To Inherit From Silverlight User Controls

The Problem

Many Silverlighters use XAML to design the visual appearance of their applications.  A UserControl defined with XAML is a DependencyObject that has a complex lifecycle:  there’s typically a .xaml file,  a .xaml.cs file,  and a .xaml.g.cs file that is generated by visual studio. The .xaml.g.cs file is generated by Visual Studio,  and ensures that objects defined in the XAML file correspond to fields in the object (so they are seen in intellisense and available to your c# code.)  The XAML file is re-read at runtime,  and drives a process that instantiates the actual objects defined in the XAML file — a program can compile just fine,  but fail during initialization if the XAML file is invalid or if you break any of the assumptions of the system.

XAML is a pretty neat system because it’s not tied to WPF or WPF/E.  It can be used to initialize any kind of object:  for instance,  it can be used to design workflows in asynchronous server applications based on Windows Workflow Foundation.

One problem with XAML,  however,  is that you cannot write controls that inherit from a UserControl that defined in XAML.  Visual Studio might compile the classes for you,  but they will fail to initialize at at runtime.  This is serious because it makes it impossible to create subclasses that let you make small changes to the appearance or behavior of a control.

Continue Reading »