Closures, Javascript And The Arrow Of Time

time-machine

Introduction

In most written media,  time progresses as you move down a page:  mainstream computing languages are no different.   Anonymous Closures are a language mechanism that,  effectively,  lets programmers create new control structures.  Although people associate this power with exotic dynamic languages such as FORTH,  Scheme and TCL,   closures are becoming a feature of  mainstream languages such as Javascript and PHP (and even static languages such as C#.)

Although this article talks about issues that you’ll encounter in languages such as C# and Scheme,  I’m going to focus on Javascript written on top of the popular JQuery library:  I do that because JQuery is a great toolkit that lets programmers and designers of all skill levels do a lot by writing very little code.  Because JQuery smooths away low-level details,  it lets us clearly illustrate little weirdnesses of its programming model. Although things often work “just right” on the small scale,  little strange things snowball in larger programs — a careful look atthe  little problems is a key step towards the avoidance and mitigation of problems in big RIA projects.

Continue Reading »

Getting a Good Track From a Garmin eTrex

Introduction

The Garmin eTrex series are popular handheld GPS units.  The eTrex H is a inexpensive unit with excellent sensitivity and accuracy,  and the eTrex Vista HCx is a favorite of OpenStreetMap contributors because it accepts a microSD card which can hold OSM maps.  I intended to use my Vista HCx to contribute to OSM and to georeference photographs,  but I was shocked to discover that eTrex units remove time information from saved tracks. This means that saved tracks aren’t useful if you want to georeference photographs with an application like GPicSync.  There’s a simple solution to this problem:  avoid using saved tracks,  and download the “Active Track” instead.

Saved Tracks

The eTrex units “dumb down” saved tracks by (i) reducing the number of points, (ii) removing time information,  and (iii) applying spatial filtering.  A saved track looks like this in Garmin MapSource:

badtrack

This track could be useful for mapping,  but lacking time information,  it can’t be used to reference events that occur at a particular moment in time.  Although the eTrex has a number of menu items for configuring the active track (to,  for instance,  increase the sample rate at which points are taken) there are no options that influence the information stored in saved tracks.

Continue Reading »

Carpictures.cc, My First Web 3.0 Site

I haven’t blogged much in the last month because I’ve been busy. I’ve got a day job, and I’m a parent, and I’ve also been working on a new site: Creative Commons Car Pictures.  What you see on the surface of “Car Pictures” might be familiar,  but what’s under the surface is something you’ve probably never seen before:

carpicturesfrontpage

A collection of car pictures under a Creative Commons license,  it was built from a taxonomy that was constructed from Dbpedia and Freebase.  My editor and I then used a scalable process to clean up the taxonomy,  find matching images and enrich image metadata.  As you can imagine,  this is a process that can be applied to other problem domains:  we’ve tested some of the methodology on our site Animal Photos! but this is the first site designed from start to finish based on the new technology.

Car Pictures was made possible by data sets available on the semantic web,  and it’s soon about to export data via the semantic web.  We’re currently talking with people about exporting our content in a machine-readable linked data format — in our future plans,  this will enable a form of semantic search that will revolutionize multimedia search:  rather than searching for inprecise keywords,  it will become possible to look up pictures,  video and documents about named entities found in systems such as Freebase,  Wikipedia,  WordNet,  Cyc and OpenCalais.

In the next few days we’re going to watch carefully how Car Pictures interacts with users and with web crawlers.  Then we’ll be adding content,  community features,  and a linked data interface.   In the meantime,  we’re planning to build something at least a hundred times bigger.

Quite literally, thousands of people have contributed to Car Pictures,  but I’d like to particularly thank Dan Brickley,  who’s been helpful in the process of interpreting Dbpedia with his ARC2 RDF tools,  and Kingsley Idehen,  who’s really helped me sharpen my vision of what the semantic web can be.

Anyway,  I’d love any feedback that you can give me about Car Pictures and any help I can get in spreading the word about it.  If you’re interested in making  similar site about some other subject,  please contact me:  it’s quite possible that we can help.

First-Class Functions and Logical Negation in C#

negation

Introduction

Languages such as LISP,  ML,  oCaml F# and Scala have supported first-class functions for a long time.  Functional programming features are gradually diffusing into mainstream languages such as C#,  Javascript and PHP.   In particular,  Lambda expressions,  implicit typing,  and delegate autoboxing have made  C# 3.0 an much more expressive language than it’s predecssors.

In this article,  I develop a simple function that acts on functions:  given a boolean function fF.Not(f) returns a new boolean function which is the logical negation of f.  (That is,  F.Not(f(x)) == !f(x)).   Although the end function is simple to use,  I had to learn a bit about the details of C# to ge tthe behavior  I wanted — this article records that experience.
Continue Reading »

Putting Freebase in a Star Schema

What’s Freebase?

cyclopedia
Freebase is a open database of things that exist in the world:  things like people,  places,  songs and television shows.   As of the January 2009 dump,  Freebase contained about 241 million facts,  and it’s growing all the time.  You can browse it via the web and even edit it,  much like Wikipedia.  Freebase also has an API that lets programs add data and make queries using a language called MQL.  Freebase is complementary to DBpedia and other sources of information.  Although it takes a different approach to the semantic web than systems based on RDF standards,  it interoperates with them via  linked data.

The January 2009 Freebase dump is about 500 MB in size.  Inside a bzip-compressed files,  you’ll find something that’s similar in spirit to a Turtle RDF file,  but is in a simpler format and represents facts as a collection of four values rather than just three.

Your Own Personal Freebase

To start exploring and extracting from Freebase,  I wanted to load the database into a star schema in a mysql database — an architecture similar to some RDF stores,  such as ARC.  The project took about a week of time on a modern x86 server with 4 cores and 4 GB of RAM and resulted in a 18 GB collection of database files and indexes.

This is sufficient for my immediate purposes,  but future versions of Freebase promise to be much larger:  this article examines the means that could be used to improve performance and scalability using parallelism as well as improved data structures and algorithms. Continue Reading »

Using Linq To Tell if the Elements of an IEnumerable Are Distinct

The Problem

I’ve got an IEnumerable<T> that contains a list of values:  I want to know if all of the values in that field are distinct.  The function should be easy to use a LINQ extension method and,  for bonus points,  simply expressed in LINQ itself

One Solution

First,  define an extension method

01   public static class IEnumerableExtensions {
02        public static bool AllDistinct<T>(this IEnumerable<T> input) {
03            var count = input.Count();
04            return count == input.Distinct().Count();
05        }
06    }

When you want to test an IEnumerable<T>,  just write

07 var isAPotentialPrimaryKey=CandidateColumn.AllDistinct();

Continue Reading »

Subverting XAML: How To Inherit From Silverlight User Controls

The Problem

Many Silverlighters use XAML to design the visual appearance of their applications.  A UserControl defined with XAML is a DependencyObject that has a complex lifecycle:  there’s typically a .xaml file,  a .xaml.cs file,  and a .xaml.g.cs file that is generated by visual studio. The .xaml.g.cs file is generated by Visual Studio,  and ensures that objects defined in the XAML file correspond to fields in the object (so they are seen in intellisense and available to your c# code.)  The XAML file is re-read at runtime,  and drives a process that instantiates the actual objects defined in the XAML file — a program can compile just fine,  but fail during initialization if the XAML file is invalid or if you break any of the assumptions of the system.

XAML is a pretty neat system because it’s not tied to WPF or WPF/E.  It can be used to initialize any kind of object:  for instance,  it can be used to design workflows in asynchronous server applications based on Windows Workflow Foundation.

One problem with XAML,  however,  is that you cannot write controls that inherit from a UserControl that defined in XAML.  Visual Studio might compile the classes for you,  but they will fail to initialize at at runtime.  This is serious because it makes it impossible to create subclasses that let you make small changes to the appearance or behavior of a control.

Continue Reading »

Twitter Joins Me

I’ve watched Twitter from a distance for the past year or so,  sometimes making fun of it in blog comments,  but I never actually joined.

Last week I was looking at my web server log with the good old tail -f,  and found that several other bloggers had hotlinked the copy of the twitter fail whale that was in my old “What do you do if you catch an exception?” post.  It turns out that my copy of the whale currently ranks #1 in Google Image Search.  It’s not bringing in a vast amount of traffic,  but it seems to be really engaging people,  because Google blog search is finding a new reference to the image just about every day.

I’ve been spending a lot of time developing sites where that’s the whole idea:  to build virtuous circles where people find images,  put them on their sites,  link back to my site,  which attracts more visitors.  Sometimes you can succeed at this without trying,  but making a business out of it is a matter of being lucky consistently.

After that I broke down and joined twitter.  My username on twitter is paul_houle;  Right now it seems like a strange and lonely place,  but I can see some discipline in expressing oneself in 140 characters.  I’ve noticed quite a few characters already:  everything from the very corporate people who tweet in calculated sound bites to people that tweet like /dev/random.  Perhaps I can’t do anything about the “strange” bit,  but perhaps I can about the “lonely” part.  If you like the things that I blog about,  you’re certainly invited to follow me,  and I’m interested in following like minded people.

Manipulate HTML Forms With Silverlight 2

Remote Control

Introduction

Lately I’ve been working on a web application based on Silverlight 2.  The application uses a traditional web login system based on a cryptographically signed cookie.  In early development,  users logged in on an HTML page,  which would load a Silverlight application on successful login.  Users who didn’t have Silverlight installed would be asked to install it after logging in,  rather than before.

Although it’s (sometimes) possible to determine what plug-ins a user has installed using Javascript,  the methods are dependent on the specific browser and the plug-ins.  We went for a simple and effective method:  make the login form a Silverlight application,  so that users would be prompted to install Silverlight before logging in.

Our solutionn  was to make the Silverlight application a drop-in replacement for the original HTML form.  The Silverlight application controls a hidden HTML form:  when a user hits the “Log In” buttonin the Silverlight application,  the application inserts the appropriate information into the HTML form and submits it.  This article describes the technique in detail. Continue Reading »

require(), require_once() and Dynamic Autoloading in PHP

Introduction

I program in PHP a lot,  but I’ve avoided using autoloaders,  except when I’ve been working in frameworks,  such as symfony,   that include an autoloader.  Last month I started working on a system that’s designed to be part of a software product line:  many scripts,  for instance,  are going to need to deserialize objects that didn’t exist when the script was written:  autoloading went from a convenience to a necessity.

The majority of autoloaders use a fixed mapping between class names and PHP file names.  Although that’s fine if you obey a strict “one class,  one file” policy,  that’s a policy that I don’t follow 100% of the time.  An additional problem is that today’s PHP applications often reuse code from multiple frameworks and libraries that use different naming conventions:  often applications end up registering multiple autoloaders.  I was looking for an autoloader that “just works” with a minimum of convention and configuration — and I found that in a recent autoloader developed by A.J. Brown. Continue Reading »