<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/rss2full.xsl" type="text/xsl" media="screen"?><?xml-stylesheet href="http://feeds.feedburner.com/~d/styles/itemcontent.css" type="text/css" media="screen"?><rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:wfw="http://wellformedweb.org/CommentAPI/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:feedburner="http://rssnamespace.org/feedburner/ext/1.0" version="2.0">

<channel>
	<title>Generation 5</title>
	
	<link>http://gen5.info/q</link>
	<description>Towards Intelligent Systems</description>
	<pubDate>Tue, 04 Nov 2008 17:06:29 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.6</generator>
	<language>en</language>
			<atom10:link xmlns:atom10="http://www.w3.org/2005/Atom" rel="self" href="http://feeds.feedburner.com/Generation5" type="application/rss+xml" /><feedburner:emailServiceId>1885699</feedburner:emailServiceId><feedburner:feedburnerHostname>http://www.feedburner.com</feedburner:feedburnerHostname><item>
		<title>Nested Sets, PHP, Verb Objects and Noun Objects</title>
		<link>http://feeds.feedburner.com/~r/Generation5/~3/442286550/</link>
		<comments>http://gen5.info/q/2008/11/04/nested-sets-php-verb-objects-and-noun-objects/#comments</comments>
		<pubDate>Tue, 04 Nov 2008 17:06:29 +0000</pubDate>
		<dc:creator>Paul Houle</dc:creator>
		
		<category><![CDATA[Asynchronous Communications]]></category>

		<category><![CDATA[Th]]></category>

		<guid isPermaLink="false">http://gen5.info/q/?p=101</guid>
		<description><![CDATA[Introduction
Controversy persists to this day about the relative merits of dynamic languages such as PHP and Python versus static languages such as C# and Java.  We&#8217;re finding more and more that the difference isn&#8217;t so much about static or dynamic typing,  but more about the cultures of different languages.   In this article,  I discuss an [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p><a href="http://blog.dhananjaynene.com/2008/09/commentary-on-python-from-a-java-programming-perspective/">Controversy persists to this day</a> about the relative merits of dynamic languages such as PHP and Python versus static languages such as C# and Java.  We&#8217;re finding more and more that the difference isn&#8217;t so much about static or dynamic typing,  but more about the cultures of different languages.   In this article,  I discuss an efficient representations of SQL trees in a database,  an algorithm for creating that representation,  and a PHP implementation.  The PHP implementations uses objects in a way foreign to many developers:  rather than using objects to represent nouns (data),  it uses a class to represent a verb (an algorithm.)  I make the case that programmers shouldn&#8217;t feel compelled to create new classes to represent every data item:  that verb objects often provide the right level of abstraction for many tasks.</p>
<h2>The Presenting Problem</h2>
<p>Lately I&#8217;ve been <a href="http://animalphotos.info/a/">collecting pictures of animals</a>,  and decided that incorporating the taxonomic database from <a href="http://www.itis.gov/">ITIS</a> would be a big help.  I&#8217;m interested in asking questions like &#8220;What are all the species underneath the <a href="http://en.wikipedia.org/wiki/Tapir">Tapiridae</a> family?&#8221;  The ITIS database uses the <em>adjacency list</em> representation,  where each row contains a column that references the primary key of a parent row.  Algorithms for the adjacency list are well known,  but are awkward to implement in SQL since it takes multiple SQL statements to traverse a tree.</p>
<p><em>Nested sets</em> are an alternative representation that makes it simple to write fast queries on trees.  Like the parts explosion diagram below,  components of the hierarchy are represented with contiguous numbers (parts 1-3 form one end piece of the steering knuckle.)</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/09/steering-knuckle.png"><img class="alignnone size-full wp-image-103" title="steering-knuckle" src="http://gen5.info/q/wp-content/uploads/2008/09/steering-knuckle.png" alt="" width="439" height="416" /></a></p>
<p>This article discusses the adjacency list and nested set models and presents a simple algorithm for converting an adjacency list into nested sets.</p>
<h2>Adjacency lists</h2>
<p>A common representation of a tree in a relational database is like this:</p>
<pre>[01] create table obviousTree (
[02]   id varchar(255),
[03]   parent varchar(255),
[04]   primary key(id),
[05]   foreign key(parent) references tree(id)
[06] )</pre>
<p>The <em>parent</em> column is allowed to be <em>null</em>,  so the root of the tree has a <em>null</em> parent.  A typical tree in this representation might look like</p>
<pre>[07] sql&gt; SELECT * FROM obviousTree;
[08]    +--------------+
[09]    | id  | parent |
[10]    |-----|--------|
[11]    | 'a' |  null  |
[12]    | 'b' |  'a'   |
[13]    | 'c' |  'a'   |
[14]    | 'd' |  'b'   |
[15]    | 'e' |  'b'   |
[16]    | 'f' |  'b'   |
[17]    +--------------+</pre>
<p>It&#8217;s simple in this representatation to find out what the parent of a node is,</p>
<pre>[18] SELECT parent FROM obviousTree WHERE id=@Child</pre>
<p>or to find the direct children of a node,</p>
<pre>[19] SELECT id FROM obviousTree WHERE parent=@Parent</pre>
<p>It&#8217;s not possible,  however,  to write a pure SQL statement that traverses all of the descendants of a node (at least not in standard SQL.)   You need to either write a stored procedure or write a program in a language like C# or PHP to implements a breadth-first or depth-first traversal of the tree.  This isn&#8217;t conceptually hard,  but it&#8217;s inefficient and doesn&#8217;t take full advantage of the querying power of SQL.</p>
<h2>Nested Sets</h2>
<p>Joe Celko has promoted the Nested Set representation of trees in several of his <a href="http://www.amazon.com/gp/product/1558609202?ie=UTF8&amp;tag=honeymediasys-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1558609202">books</a>.  In the nested set representation,  we represent the position of each node in the tree with two numbers:  the <em>left</em> value and the <em>right</em> value.  We&#8217;ll call them <em>lft</em> and <em>rgt</em> in our code,  since <em>LEFT</em> and <em>RIGHT</em> are reserved words in SQL.</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/09/nestedset.png"><img class="alignnone size-full wp-image-102" title="nestedset" src="http://gen5.info/q/wp-content/uploads/2008/09/nestedset.png" alt="" width="472" height="313" /></a></p>
<p>The left and right values of a parent node enclose the left and right values of the children,  so it&#8217;s easy to ask for the the descendants of a node</p>
<pre>[20] SELECT * FROM fastTree WHERE
[21]    lft&gt;@AncestorLft
[22]    AND lft&lt;@AncestorRgt</pre>
<p>to count the children</p>
<pre>[23] SELECT (rgt-lft-1)/2 FROM fastTree WHERE lft=@AncestorLeft</pre>
<p>or to find all the ancestors of a node</p>
<pre>[24] SELECT * FROM fastTree WHERE
[25]    lft&lt;@DescendentLft
[26]    AND rgt&gt;@DescendentRgt</pre>
<p>Granted,  update operations are slower and more complex in the nested set representation,  but for my application,  where I&#8217;m accessing a nearly 500,000-node tree that never changes,  nested sets are the clear choice.</p>
<h2>The Conversion Algorithm in PHP</h2>
<p>Looking at the tree above,  you can see a straightforward algorithm for creating the nested set representation.  We traverse the tree depth-first,  keeping a counter,  which we&#8217;ll call <em>cnt</em> as we go &#8212; it works a lot like a thumb operated tally counter:</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/09/200handtallycounterchrome.jpg"><img class="alignnone size-full wp-image-104" title="200handtallycounterchrome" src="http://gen5.info/q/wp-content/uploads/2008/09/200handtallycounterchrome.jpg" alt="" width="200" height="229" /></a></p>
<p>When we first encounter a node (going down),  we write <em>cnt</em> into the <em>lft</em> field of that node,  then we increment <em>cnt</em>.  When we encounter it again (going up),  we write cnt into the rgt field of the node an increment <em>cnt</em>.</p>
<p>Joe Celko makes a good case for implementing mutating operations on nested sets in stored procedures.  (see his book on <a href="http://www.amazon.com/gp/product/1558609202?ie=UTF8&amp;tag=honeymediasys-20&amp;linkCode=as2&amp;camp=1789&amp;creative=390957&amp;creativeASIN=1558609202">Trees and Hiearchies and SQL)</a> I think he&#8217;s particularly right when it comes to adding,  removing and moving nodes,  where the operation requires several statements that should be done in a transaction.  Transactional integrity is less important,  however,  in a one-time import procedure which is done in a batch;  although Mysql 5.0 (which I&#8217;m using) supports stored procedures,  it&#8217;s got less of a culture of stored procedure use than other databases,  so I felt comfortable writing the conversion script in PHP.   This script converted a tree with 477,797 nodes in less than a minute,  so performance was adequate.</p>
<p>I&#8217;m going to store the nested set in a table that looks like</p>
<pre>[27] create table fastTree (
[28]    lft integer not null,
[29]    rgt integer not null,
[30]    id varchar(255),
[31]    primary key(lft),
[32]    unique(rgt),
[33]    unique(id)
[34] )</pre>
<p>I include <em>&#8220;_Config.php&#8221;</em>,  which initializes the database connection,  $conn,  using the third-party ADODB library.  After that  I retrieve the list of parent-child relationships from the database:</p>
<pre>[35] require_once("./_Config.php");
[36] $rawLink=$conn-&gt;Execute("SELECT id,parent FROM obviousTree");
[37]
[38] $link=array();
[39] foreach($rawLink as $k=&gt;$row) {
[40]    $parent=$row["parent"];
[41]    $child=$row["id"];
[42]    if (!array_key_exists($parent,$link)) {
[43]       $link[$parent]=array();
[44]    }
[45]    $link[$parent][]=$child;
[46] }</pre>
<p>Note that we&#8217;re building a complete copy of the adjacency list in RAM:  the value of <em>$link[$parent]</em> is an array that contains the id&#8217;s of all the children of the <em>$parent</em>.  I&#8217;m doing this for two reasons:  (i) the adjacency list is small enough to fit in RAM,  and (ii) minimize the cost of I/O:  if you expect to access all rows in a table,  it&#8217;s a lot cheaper to do a full table scan than it is to do hundreds of thousands of index queries.</p>
<p>Next we define an object,  <em>TreeTransformer</em>,  that represents the algorithm.  In particular,  it provides a scope for the <em>$cnt</em> variable,  which has a lifespan apart from the recursive function that represents the tree:</p>
<pre>[47] class TreeTransformer {
[48]   function __construct($link) {
[49]     $this-&gt;count=1;
[50]     $this-&gt;link=$link;
[51]   }
[52]
[53]   function traverse($id) {
[54]     $lft=$this-&gt;count;
[55]     $this-&gt;count++;
[56]
[57]     $kid=$this-&gt;getChildren($id);
[58]     if ($kid) {
[59]       foreach($kid as $c) {
[60]         $this-&gt;traverse($c);
[61]      }
[62]     }
[63]     $rgt=$this-&gt;count;
[64]     $this-&gt;count++;
[65]     $this-&gt;write($lft,$rgt,$id);
[66]   }</pre>
<p><em>Traverse()</em> is the core of the algorithm:  it works in three phases:  (i) it assigns the <em>$lft</em> value,   (ii) it loops over all children,  calling itself recursively for each child,  and (iii) assigns the <em>$rgt</em> value and writes an output record.  The scope of <em>$this-&gt;count</em> is exactly the scope we want for the variable,  and saves the need of passing <em>$count</em> back and forth between different levels of <em>traverse()</em>.  Traverse calls two functions:</p>
<pre>[67]   function getChildren($id) {
[68]      return $this-&gt;link[$id];
[69]   }
[70]
[71]   function write($lft,$rgt,$id) {
[72]     global $conn;
[74]     $conn-&gt;Execute("
[75]        INSERT INTO fastTree
[76]           (lft,rgt,id)
[77]        VALUES
[78]           (?,?,?)
[79]     ",array($lft,$rgt,$id));
[80]   }
[81] }</pre>
<h2>Noun Objects or Verb Objects?</h2>
<p>This script uses a single class:  instead of using classes to represent data,  it uses a class to represent an algorithm,  a verb.  Rather than create objects to create data structures,  I <em>reuse</em> data structures that come with PHP.</p>
<p>This is an extensible design.</p>
<p><em>TreeTransformer</em> provides two extension points:  <em>getChildren()</em> and <em>write()</em>.  Most of the objections a person could have to this implementation could be addressed here:  for instance,  <em>getChildren()</em> could be modified to support a different data structure for the adjacency list,  or even to operate in a streaming mode that does an SQL query for each node.  <em>write()</em>,  on the other hand,  could be modified to avoid the limitations of the global <em>$conn</em>,  or to change the output format.  If <em>TreeTransformer</em> were to evolve in the future,  it would make sense to push <em>traverse()</em> up to an <em>AbstractTreeWriter </em>and define <em>getChildren()</em>,  <em>write()</em> and <em>$this-&gt;link</em> in a subclass.</p>
<p>Noun objects (that represent things) can be useful,  but a compulsion to create noun objects can lead to an entanglement between algorithms and data structures,   poor code reuse,  and a proliferation of artifacts that makes for a high defect count and expensive maintainance.  Even if were using noun objects in this program,  it still makes sense to implement this algorithm as a verb object:  by creating interfaces for the adjacancy list and for the output,  I could keep the algorithm reusable while keeping the noun objects simple.</p>
<h2>Conclusion: Write Less Code</h2>
<p>Many of the fashionable languages that people claim improve productivity,  such as Lisp,  Python and Ruby,  put powerful data structures at the programmer fingertips:  they encourage you to reuse data structures provided by the language rather than to create a new object in response to every situation.  Functional languages such as CAML and F# show the power of programming by composition,  enabled by the use of standard data structures (and interfaces.)</p>
<p>What&#8217;s really exciting is that these methods are becoming increasingly available and popular in mainstream languages,  such as C# and PHP.</p>
<p>Generic classes and methods,  as available in Java and C#,  let you have both reusable data structures <em>and</em> type safety.  Although there are trade offs,  you seriously consider using a <em>List&lt;Thing&gt;</em> rather than creating a <em>CollectionOfThings</em>.   Yes,  a collection of things that you create can present <strong>exactly</strong> the interface you need,  however,  without a lot of care,  you might find that another member of your team wrote an incompatible <em>JohnsCollectionOfThings</em>,  and you end up writing more glue code because you&#8217;re interfacing with a third party vendor that uses a <em>MagicBoxFullOfThings</em> instead.</p>
<p>Although there advantages to encapsulation,  and advantages to creating noun objects (sometimes they do provide the right places for extension points,)  programmers need to be careful when they create new classes.  Every class you write,  every method,  and every line of code is like a new puppy:  you not only need to write it,  but you&#8217;ll need to maintain it in the future.</p>
<p><small>Image sources: Parts Explosion Diagram of Hummer Steering Knuckle from <a href="http://www.chemaxx.com/hummmer_failure_analysis.htm">CHEMAXX</a>,  Handheld Tally Counter from <a href="http://www.difflearn.com/prodinfo.asp?number=DRT+188">Different Roads To Learning.</a></small></p>
<img src="http://feeds.feedburner.com/~r/Generation5/~4/442286550" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://gen5.info/q/2008/11/04/nested-sets-php-verb-objects-and-noun-objects/feed/</wfw:commentRss>
		<feedburner:origLink>http://gen5.info/q/2008/11/04/nested-sets-php-verb-objects-and-noun-objects/</feedburner:origLink></item>
		<item>
		<title>What do you do when you’ve caught an exception?</title>
		<link>http://feeds.feedburner.com/~r/Generation5/~3/376264865/</link>
		<comments>http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/#comments</comments>
		<pubDate>Wed, 27 Aug 2008 15:19:03 +0000</pubDate>
		<dc:creator>Paul Houle</dc:creator>
		
		<category><![CDATA[Dot Net]]></category>

		<category><![CDATA[Exceptions]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[PHP]]></category>

		<category><![CDATA[SQL]]></category>

		<guid isPermaLink="false">http://gen5.info/q/?p=80</guid>
		<description><![CDATA[Abort, Retry, Ignore
This article is a follow up to &#8220;Don&#8217;t Catch Exceptions&#8220;, which advocates that exceptions should (in general) be passed up to a &#8220;unit of work&#8221;, that is, a fairly coarse-grained activity which can reasonably be failed, retried or ignored. A unit of work could be:

an entire program,  for a command-line script,
a single [...]]]></description>
			<content:encoded><![CDATA[<h2>Abort, Retry, Ignore</h2>
<p>This article is a follow up to &#8220;<a href="http://gen5.info/q/2008/07/31/stop-catching-exceptions/">Don&#8217;t Catch Exceptions</a>&#8220;, which advocates that exceptions should (in general) be passed up to a &#8220;unit of work&#8221;, that is, a fairly coarse-grained activity which can reasonably be failed, retried or ignored. A unit of work could be:</p>
<ul>
<li>an entire program,  for a command-line script,</li>
<li>a single web request in a web application,</li>
<li>the delivery of an e-mail message</li>
<li>the handling of a single input record in a batch loading application,</li>
<li>rendering a single frame in a media player or a video game,  or</li>
<li>an event handler in a GUI program</li>
</ul>
<p>The code around the unit of work may look something like</p>
<pre>[01] try {
[02]   DoUnitOfWork()
[03] } catch(Exception e) {
[04]    ... examine exception and decide what to do ...
[05] }</pre>
<p>For the most part,  the code inside <em>DoUnitOfWork()</em> and the functions it calls tries to <a href="http://gen5.info/q/2008/07/31/stop-catching-exceptions/">throw exceptions upward rather than catch them.</a></p>
<p>To handle errors correctly,  you need to answer a few questions,  such as</p>
<ul>
<li>Was this error caused by a corrupted application state?</li>
<li>Did this error cause the application state to be corrupted?</li>
<li>Was this error caused by invalid input?</li>
<li>What do we tell the user,  the developers and the system administrator?</li>
<li>Could this operation succeed if it was retried?</li>
<li>Is there something else we could do?</li>
</ul>
<p>Although it&#8217;s good to depend on existing exception hierarchies (at least you won&#8217;t introduce new problems), the way that exceptions are defined and thrown inside the work unit should help the code on line [04] make a decision about what to do &#8212; such practices are the subject of a future article, which subscribers to our <a href="http://feeds.feedburner.com/Generation5">RSS feed</a> will be the first to read.</p>
<p><span id="more-80"></span></p>
<h2>The cause and effect of errors</h2>
<p>There are a certain range of error conditions that are predictable,  where it&#8217;s possible to detect the error and implement the correct response.  As an application becomes more complex,  the number of possible errors explodes,  and it becomes impossible or unacceptably expensive to implement explicit handling of every condition.</p>
<p>What do do about unanticipated errors is a controversial topic.  Two extreme positions are: (i) an unexpected error could be a sign that the application is corrupted, so that the <a href="http://blogs.msdn.com/larryosterman/archive/2008/05/01/resilience-is-not-necessarily-a-good-thing.aspx">application should be shut down</a>, and (ii) systems should <a href="http://blogs.msdn.com/eric_brechner/archive/2008/05/01/crash-dummies-resilience.aspx">bend but not break</a>: we should be optimistic and hope for the best.  Ultimately, there&#8217;s a contradiction between <em>integrity</em> and <em>availability</em>, and different systems make different choices.  The ecosystem around Microsoft Windows,  where people predominantly develop desktop applications,   is inclined to give up the ghost when things go wrong &#8212; better to show a &#8220;blue screen of death&#8221; than to let the unpredictable happen.  In the Unix ecosystem,  more centered around server applications and custom scripts,  the tendency is to soldier on in the face of adversity.</p>
<p>What&#8217;s at stake?</p>
<p>Desktop applications tend to fail when unexpected errors happen:  users learn to save frequently.  Some of the best applications,  such as GNU emacs and Microsoft Word,  keep a running log of changes to minimize work lost to application and system crashes.  Users accept the situation.</p>
<p>On the other hand,   it&#8217;s unreasonable for a server application that serves hundreds or millions of users to shut down on account of a cosmic ray.  Embedded systems,  in particular,  function in a world where failure is frequent and the effects must be minimized.   As we&#8217;ll see later,  it would be a real bummer if the Engine Control Unit in your car left you stranded home because your oxygen sensor quit working.</p>
<p>The following diagram illustrates the environment of a work unit in a typical application:  (although this application accesses network resources,  we&#8217;re not thinking of it as a distributed application.  We&#8217;re responsible for the correct behavior of the application running in a single address space,  not about the correct behavior of a process swarm.)</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/08/datadomains1.png"><img class="alignnone size-full wp-image-85" title="datadomains1" src="http://gen5.info/q/wp-content/uploads/2008/08/datadomains1.png" alt="" /></a></p>
<p>The Input to the work unit is a potential source of trouble.  The input could be invalid,  or it could trigger a bug in the work unit or elsewhere in the system (the &#8220;system&#8221; encompasses everything in the diagram)   Even if the input is valid,  it could contain a reference to a corrupted resource,  elsewhere in the system.  A corrupted resource could be a damaged data structure (such as a colored box in a database),  or an otherwise malfunctioning part of the system (a crashed server or router on the network.)</p>
<p>Data structures in the work unit itself are the least problematic,  for purposes of error handling,  because they don&#8217;t outlive the work unit and don&#8217;t have any impact on future work units.</p>
<p>Static application data,  on the other hand,  persists after the work unit ends,  and this has two possible consequences:</p>
<ol>
<li>The current work unit can fail because a previous work unit caused a resource to be corrupted, and</li>
<li>The current work unit can corrupt a resource,  causing a future work unit to fail</li>
</ol>
<p>Osterman&#8217;s argument that <a href="http://blogs.msdn.com/larryosterman/archive/2008/05/01/resilience-is-not-necessarily-a-good-thing.aspx">applications should crash on errors</a> is based on this reality:  an unanticipated failure is a sign that the application is in an unknown (and possibly bad) state,  and can&#8217;t be trusted to be reliable in the future.  Stopping the application and restarting it clears out the static state,  eliminating resource corruption.</p>
<p>Rebooting the application,  however,  might not free up corrupted resources inside the operating system.  Both desktop and server applications suffer from operating system errors from time to time,  and often can get immediate relief by rebooting the whole computer.</p>
<p>The &#8220;reboot&#8221; strategy runs out of steam when we cross the line from in-RAM state to persistent state,  state that&#8217;s stored on disks,  or stored elsewhere on the network.  Once resources in the persistent world are corrupted,  they need to be (i) lived with,  or repaired by (ii) manual or (iii) automatic action.</p>
<p>In either world,  a corrupted resource can have either a narrow (blue) or wide (orange) effect on the application.  For instance,  the user account record of an individual user could be damaged,  which prevents that user from logging in.  That&#8217;s bad,  but it would hardly be catastrophic for a system that has 100,000 users.   It&#8217;s best to &#8216;ignore&#8217; this error,  because a system-wide &#8216;abort&#8217; would deny service to 99,999 other users;  the problem can be corrected when the user complains,  or when the problem is otherwise detected by the system administrator.</p>
<p>If,  on the other hand,  the cryptographic signing key that controls the authentication process were lost,  <strong>nobody</strong> would be able to log in:  that&#8217;s quite a problem.  It&#8217;s kind of the problem that will be noticed,  however,  so aborting at the work unit level (authenticated request) is enough to protect the integrity of the system while the administrators repair the problem.</p>
<p>Problems can happen at an intermediate scope as well.  For instance,  if the system has damage to a message file for Italian users,  people who use the system in the Italian language could be locked out.  If Italian speakers are 10% of the users,  it&#8217;s best to keep the system running for others while you correct the problem.</p>
<h2>Repair</h2>
<p>There are several tools for dealing with corruption in persistent data stores. In a one-of-a-kind business system, a DBA may need to intervene occasionally to repair corruption. More common events can be handled by running scripts which detect and repair corruption, much like the <em>fsck</em> command in Unix or the <em>chkdsk</em> command in Windows. Corruption in the metadata of a filesystem can, potentially, cause a sequence of events which leads to massive data loss, so UNIX systems have historically run the <em>fsck</em> command on filesystems whenever the filesystem is in a questionable state (such as after a system crash or power failure.) The time do do an fsck has become an increasing burden as disks have gotten larger, so modern UNIX systems use journaling filesystems that protect  filesystem metadata with <em>transactional semantics</em>.</p>
<h2>Release and Rollback</h2>
<p>One role of an exception handler for a unit of work is to take steps to prevent corruption. This involves the release of resources, putting data in a safe state, and, when possible, the rollback of transactions.</p>
<p>Although many kinds of persistent store support transactions, and many in-memory data structures can support transactions, the most common transactional store that people use is the relational database. Although transactions don&#8217;t protect the database from all programming errors, they can ensure that neither expected or unexpected exceptions will cause partially-completed work to remain in the database.</p>
<p>A classic example in pseudo code is the following:</p>
<pre>[06] function TransferMoney(fromAccount,toAccount,amount) {
[07]   try {
[08]      BeginTransaction();
[09]      ChangeBalance(toAccount,amount);
[10]      ... something throws exception here ...
[11]      ChangeBalance(fromAccount,-amount);
[12]      CommitTransaction();
[13]   } catch(Exception e) {
[14]      RollbackTransaction();
[15]   }
[16] }</pre>
<p>In this (simplified) example, we&#8217;re transferring money from one bank account to another. Potentially an exception thrown at line [05] could be serious, since it would cause money to appear in <em>toAccount</em> without it being removed from <em>fromAccount</em>. It&#8217;s bad enough if this happens by accident, but a clever cracker who finds a way to cause an exception at line [05] has discovered a way to steal money from the bank.</p>
<p>Fortunately we&#8217;re doing this <span style="text-decoration: underline;">financial</span> transaction inside a <span style="text-decoration: underline;">database</span> transaction.  Everything done after <em>BeginTransaction()</em> is provisional:  it doesn&#8217;t actually appear in the database until <em>CommitTransaction()</em> is called.  When an exception happens,  we call <em>RollbackTransaction()</em>,  which makes it as if the first <em>ChangeBalance()</em> had never been called.</p>
<p>As mentioned in the &#8220;<a href="http://gen5.info/q/2008/07/31/stop-catching-exceptions/">Don&#8217;t Catch Exceptions</a>&#8221; article, it often makes sense to do release, rollback and repairing operations in a <em>finally</em> clause rather than the unit-of-work <em>catch</em> clause because it lets an individual subsystem take care of itself &#8212; this promotes encapsulation. However, in applications that use databases transactionally, it often makes sense to push transaction management out the the work unit.</p>
<p>Why? Complex database operations are often composed out of simpler database operations that, themselves, should be done transactionally. To take an example, imagine that somebody is opening a new account and funding it from an existing account:</p>
<pre>[17] function OpenAndFundNewAccount(accountInformation,oldAccount,amount) {
[18]    if (amount&lt;MinimumAmount) {
[19]       throw new InvalidInputException(
[20]          "Attempted To Create Account With Balance Below Minimum"
[21]       );
[22]    }
[23]    newAccount=CreateNewAccountRecords(accountInformation);
[24]    TransferMoney(oldAccount,newAccount,amount);|
[25] }</pre>
<p>It&#8217;s important that the <em>TransferMoney</em> operation be done transactionally,  but it&#8217;s also important that the whole <em>OpenAndFundNewAccount</em> operation be done transactionally too,  because we don&#8217;t want an account in the system to start with a zero balance.</p>
<p>A straightforward answer to this problem is to always do banking operations inside a unit of work, and to begin, commit and roll back transactions at the work unit level:</p>
<pre>[26] AtmOutput ProcessAtmRequest(AtmInput in) {
[27]    try {
[28]       BeginTransaction();
[29]       BankingOperation op=AtmInput.ParseOperation();
[30]       var out=op.Execute();
[31]       var atmOut=AtmOutput.Encode(out);
[32]       CommitTransaction();
[33]       return atmOut;
[34]    }
[35]    catch(Exception e) {
[36]       RollbackTransaction();
[37]       ... Complete Error Handling ...
[38]    }</pre>
<p>In this case, there might be a large number of functions that are used to manipulate the database internally, but these are only accessable to customers and bank tellers through a limited set of BankingOperations that are always executed in a transaction.</p>
<h2>Notification</h2>
<p>There are several parties that could be notified when something goes wrong with an application,  most commonly:</p>
<ol>
<li>the end user,</li>
<li>the system administrator,  and</li>
<li>the developers.</li>
</ol>
<p>Sometimes, as in the case of a public-facing web application, #2 and #3 may overlap. In desktop applications, #2 might not exist.</p>
<p>Let&#8217;s consider the end user first. The end user really needs to know (i) that something went wrong, and (ii) what they can do about it. Often errors are caused by user input: hopefully these errors are expected, so the system can tell the user specifically what went wrong: for instance,</p>
<pre>[39] try {
[40]   ... process form information ...
[41]
[42]    if (!IsWellFormedSSN(ssn))
[43]       throw new InvalidInputException("You must supply a valid social security number");
[44]
[45]    ... process form some more ...
[46] } catch(InvalidInputException e) {
[47]    DisplayError(e.Message);
[48] }</pre>
<p>other times, errors happen that are unexpected. Consider a common (and bad) practice that we see in database applications: programs that write queries without correctly escaping strings:</p>
<pre>[49] dbConn.Execute("
[50]   INSERT INTO people (first_name,last_name)
[51]      VALUES ('"+firstName+"','+lastName+"');
[52] ");</pre>
<p>this code is straightforward, but dangerous,  because a single quote in the <em>firstName</em> or <em>lastName</em> variable ends the string literal in the VALUES clause,  and enables an SQL injection attack.  (I&#8217;d hope that <em>you</em> know better than than to do this, but large projects worked on by large teams inevitably have problems of this order.) This code might even hold up well in testing, failing only in production when a person registers with</p>
<pre>[53] lastName="O'Reilly";</pre>
<p>Now,  the dbConn is going to throw something like a <em>SqlException</em> with the following message:</p>
<pre>[54] SqlException.Message="Invalid SQL Statement:
[55]   INSERT INTO people (first_name,last_name)
[56]      VALUES ('Baba','O'Reilly');"</pre>
<p>we could show that message to the end user, but that message is worthless to most people. Worse than that, it&#8217;s harmful if the end user is a cracker who could take advantage of the error &#8212; it tells them the name of the affected table, the names of the columns, and the exact SQL code that they can inject something into. You might be better off showing users something like:</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/08/twitter-whale.png"><img class="alignnone size-full wp-image-92" title="twitter-whale" src="http://gen5.info/q/wp-content/uploads/2008/08/twitter-whale.png" alt="" width="400" height="300" /></a></p>
<p>and telling them that they&#8217;ve experienced an &#8220;Internal Server Error.&#8221;  Even so,  the discovery that a single quote can cause an &#8220;Internal Server Error&#8221; can be enough  for a good cracker to sniff out the fault and develop an attack in the blind.. What can we do? Warn the system administrators. The error handling system for a server application should log exceptions, stack trace and all. It doesn&#8217;t matter if you use the UNIX <em>syslog</em> mechanism,  the logging service in Windows NT,   or something that&#8217;s built into your server,  like Apache&#8217;s <em>error_log</em>.  Although logging systems are built into both Java and .Net,  many developers find that <a href="http://logging.apache.org/log4j/1.2/index.html">Log4J</a> and <a href="http://logging.apache.org/log4net/index.html">Log4N</a> are especially effective.</p>
<p>There really are two ways to use logs:</p>
<ol>
<li>Detailed logging information is useful for debugging problems after the fact. For instance, if a user reports a problem, you can look in the logs to understand the origin of the problem, making it easy to debug problems that occur rarely: this can save hours of time trying to understand the exact problem a user is experiencing.</li>
<li>A second approach to logs is proactive: to regularly look a logs to detect problems before they get reported. In the example above, the <em>SqlException</em> would probably first be thrown by an innocent person who has an apostrophe in his or her name &#8212; if the error was detected that day and quickly fixed, a potential security hole could be fixed long before it would be exploited.  Organizaitons that investigate all exceptions thrown by production web applications run the most secure and reliable applications.</li>
</ol>
<p>In the last decade it&#8217;s become quite common for desktop applications to send stack traces back to the developers after a crash: usually they pop up a dialog box that asks for permission first. Although developers of desktop applications can&#8217;t be as proactive as maintainers of server applications, this is a useful tool for discovering errors that escape testing, and to discover how commonly they occur in the field.</p>
<h2>Retry I: Do it again!</h2>
<p>Some errors are transient: that is, if you try to do the same operation later, the operation may succeed. Here are a few common cases:</p>
<ul>
<li>An attempt to write to a DVD-R could fail because the disk is missing from the drive</li>
<li>A database transaction could fail when you commit it because of a conflict with another transaction: an attempt to do the transaction again could succeed</li>
<li>An attempt to deliver a mail message could fail because of problems with the network or destination mail server</li>
<li>A web crawler that crawls thousands (or millions) of sites will find that many of them are down at any given time: it needs to deal with this reasonably, rather than drop your site from it&#8217;s index because it happened to be down for a few hours</li>
</ul>
<p>Transient errors are commonly associated with the internet and with remote servers; errors are frequent because of the complexity of the internet, but they&#8217;re transitory because problems are repaired by both automatic and human intervention. For instance, if a hardware failure causes a remote web or email server to go down, it&#8217;s likely that somebody is going to notice the problem and fix it in a few hours or days.</p>
<p>One strategy for dealing with transient errors is to punt it back to the user: in a case like this, we display an error message that tells the user that the problem might clear up if they retry the operation. This is implicit in how web browsers work: sometimes you try to visit a web page, you get an error message, then you hit reload and it&#8217;s all OK. This strategy is particularly effective when the user could be aware that there&#8217;s a problem with their internet connection and could do something about it: for instance, they might discover that they&#8217;ve moved their laptop out of Wi-Fi range, or that the DSL connection at their house has gone down for the weekend.</p>
<p>SMTP, the internet protocol for email, is one of the best examples of automated retry. Compliant e-mail servers store outgoing mail in a queue: if an attempt to send mail to a destination server fails, mail will stay in the queue for several days before reporting failure to the user. Section 4.5.4 of <a href="http://www.ietf.org/rfc/rfc2821.txt">RFC 2821</a> states:</p>
<pre>   The sender MUST delay retrying a particular destination after one
   attempt has failed.  In general, the retry interval SHOULD be at
   least 30 minutes; however, more sophisticated and variable strategies
   will be beneficial when the SMTP client can determine the reason for
   non-delivery.

   Retries continue until the message is transmitted or the sender gives
   up; the give-up time generally needs to be at least 4-5 days.  The
   parameters to the retry algorithm MUST be configurable.

   A client SHOULD keep a list of hosts it cannot reach and
   corresponding connection timeouts, rather than just retrying queued
   mail items.

   Experience suggests that failures are typically transient (the target
   system or its connection has crashed), favoring a policy of two
   connection attempts in the first hour the message is in the queue,
   and then backing off to one every two or three hours.</pre>
<p>Practical mail servers use <em>fsync()</em> and other mechanisms to implement transactional semantics on the queue: the needs of reliability make it expensive to run an SMTP-compliant server, so e-mail spammers often use non-compliant servers that don&#8217;t correctly retry (if they&#8217;re going to send you 20 copies of the message anyway, who cares if only 15 get through?) <a href="http://en.wikipedia.org/wiki/Greylisting">Greylisting</a> is a highly effective filtering strategy that tests the compliance of SMTP senders by forcing a retry.</p>
<h2>Retry II: If first you don&#8217;t succeed&#8230;</h2>
<p>An alternate form of retry is to try something different. For instance, many programs in the UNIX environment will look in many different places for a configuration file: if the file isn&#8217;t in the first place tried, it will try the second place and so forth.</p>
<p>The online e-print server at <a href="http://www.cs.cornell.edu/Courses/cs501/2005sp/concepts/arxiv.html">arXiv.org</a> has a system called <a href="http://www.cs.cornell.edu/Courses/cs501/2005sp/concepts/arxiv.html">AutoTex</a> which automatically converts documents written in several dialects of <a href="http://en.wikipedia.org/wiki/TeX">TeX</a> and <a href="http://en.wikipedia.org/wiki/LaTeX">LaTeX</a> into Postscript and PDF files.  AutoTex unpacks the files in a submission into a directory and uses <em>chroot</em> to run the document processing tools in a protected sandbox. It tries about of ten different configurations until it finds one that successfully compiles the document.</p>
<p>In embedded applications,  where availability is important,  it&#8217;s common to fall back to a &#8220;safe mode&#8221; when normal operation is impossible.  The Engine Control Unit in a modern car is a good example:</p>
<p><a href="http://www.vehicle-lab.net/ecu.html"><img class="alignnone size-full wp-image-94" title="ecu" src="http://gen5.info/q/wp-content/uploads/2008/08/ecu.jpg" alt="" width="416" height="267" /></a></p>
<p>Since the 1970&#8217;s,   regulations in the United States have reduced emissions of hydrocarbons and nitrogen oxides from passenger automobiles by more than a hundred fold.  The technology has many aspects,  but the core of the system in an Engine Control Unit that uses a collection of sensors to monitor the state of the engine and uses this information to adjust engine parameters (such as the quantity of fuel injected) to balance performance and fuel economy with environmental compliance.</p>
<p>As the condition of the engine,  driving conditions and composition of fuel change over the time,  the ECU normally operates in a &#8220;closed-loop&#8221; mode that continually optimizes performance.   When part of the system fails (for instance,  the oxygen sensor) the ECU switches to an &#8220;open-loop&#8221; mode.  Rather than leaving you stranded,  it lights the &#8220;check engine&#8221; indicator and operates the engine with conservative assumptions that will get you home and to a repair shop.</p>
<h2>Ignore?</h2>
<p>One strength of exceptions,  compared to the older return-value method of error handling is that the default behavior of an exception is to abort,  not to ignore.  In general,  that&#8217;s good,  but there are a few cases where &#8220;ignore&#8221; is the best option.  Ignoring an error makes sense when:</p>
<ol>
<li>Security is not at stake,  and</li>
<li>there&#8217;s no alternative action available,  and</li>
<li>the consequences of an abort are worse than the consequences of avoiding an error</li>
</ol>
<p>The first rule is important,  because crackers will take advantage of system faults to attack a system.  Imagine,  for instance,  a &#8220;smart card&#8221; chip embedded in a payment card.  People have successfully extracted information from smart cards by fault injection:  this could be anything from a power dropout to a bright flash of light on an exposed silicon surface.  If you&#8217;re concerned that a system will be abused,  it&#8217;s probably best to shut down when abnormal conditions are detected.</p>
<p>On the other hand,  some operations are vestigial to an application.  Imagine,  for instance,  a dialog box that pops when an application crashes that offers the user the choice of sending a stack trace to the vendor.  If the attempt to send the stack trace fails,  it&#8217;s best to ignore the failure &#8212; there&#8217;s no point in subjecting the user to an endless series of dialog boxes.</p>
<p>&#8220;Ignoring&#8221; often makes sense in the applications that matter the most and those that matter the least.</p>
<p>For instance,  media player and video games operate in a hostile environment where disks,  the network, sound and controller hardware are uncooperative.  The &#8220;unit of work&#8221; could be the rendering of an individual frame:  it&#8217;s appropriate for entertainment devices to soldier on despite hardware defects,  unplugged game controller,  network dropouts and corrupted inputs,  since the consequences of failure are no worse than shutting the system down.</p>
<p>In the opposite case,  high-value systems and high-risk should continue functioning no matter what happen.  The software for a space probe,  for instance,  should never give up.  Much like an automotive ECU,  space probes default to a &#8220;safe mode&#8221; when contact with the earth is lost:  frequently this strategy involves one or more reboots,  but the goal is to always regain contact with controllers so that the mission has a chance at success.</p>
<h2>Conclusion</h2>
<p>It&#8217;s most practical to catch exceptions at the boundaries of relatively coarse &#8220;units of work.&#8221; Although the handling of errors usually involves some amount of rollback (restoring system state) and notification of affected people, the ultimate choices are still what they were in the days of DOS: abort, retry, or ignore.</p>
<p>Correct handling of an error requires some thought about the cause of an error: was it caused by bad input, corrupted application state, or a transient network failure? It&#8217;s also important to understand the impact the error has on the application state and to try to reduce it using mechanisms such as database transactions.</p>
<p>&#8220;Abort&#8221; is a logical choice when an error is likely to have caused corruption of the application state, or if an error was probably caused by a corrupted state. Applications that depend on network communications sometimes must &#8220;Retry&#8221; operations when they are interrupted by network failures. Another form of &#8220;Retry&#8221; is to try a different approach to an operation when the first approach fails. Finally, &#8220;Ignore&#8221; is appropriate when &#8220;Retry&#8221; isn&#8217;t available and the cost of &#8220;Abort&#8221; is worse than soldiering on.</p>
<p>This article is one of a <a href="http://gen5.info/q/category/exceptions/">series on error handling</a>.  The next article in this series will describe practices for defining and throwing exceptions that gives exception handlers good information for making decisions.  Subscribers to our <a href="http://feeds.feedburner.com/Generation5">RSS Feed</a> will be the first to read it.</p>
<img src="http://feeds.feedburner.com/~r/Generation5/~4/376264865" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/feed/</wfw:commentRss>
		<feedburner:origLink>http://gen5.info/q/2008/08/27/what-do-you-do-when-youve-caught-an-exception/</feedburner:origLink></item>
		<item>
		<title>Converting A Synchronous Program Into An Asynchronous Program</title>
		<link>http://feeds.feedburner.com/~r/Generation5/~3/364089717/</link>
		<comments>http://gen5.info/q/2008/08/13/converting-a-synchronous-program-into-an-asynchronous-program/#comments</comments>
		<pubDate>Wed, 13 Aug 2008 17:58:48 +0000</pubDate>
		<dc:creator>Paul Houle</dc:creator>
		
		<category><![CDATA[Asynchronous Communications]]></category>

		<category><![CDATA[GWT]]></category>

		<category><![CDATA[Silverlight]]></category>

		<guid isPermaLink="false">http://gen5.info/q/?p=49</guid>
		<description><![CDATA[Introduction
One of the challenges in writing programs in today&#8217;s RIA environments  (Javascript, Flex, Silverlight and GWT)  is expressing the flow of control between multiple asynchronous XHR calls.  A &#8220;one-click-one-XHR&#8221; policy is often best,  but you don&#8217;t always have control over your client-server protocols.  A program that&#8217;s simple to read as a synchronous program can become [...]]]></description>
			<content:encoded><![CDATA[<h2 style="text-align: left;">Introduction</h2>
<p style="text-align: left;">One of the challenges in writing programs in today&#8217;s RIA environments  (Javascript, Flex, Silverlight and GWT)  is expressing the flow of control between multiple asynchronous XHR calls.  A &#8220;<a href="http://gen5.info/q/2008/03/27/managing-concurrency-with-asynchronous-http-requests/">one-click-one-XHR</a>&#8221; policy is often best,  but you don&#8217;t always have control over your client-server protocols.  A program that&#8217;s simple to read as a synchronous program can become a tangle of subroutines when it&#8217;s broken up into a number of callback functions.  One answer is <em>program translation</em>:  to manually or automatically convert a synchronous program into an asynchronous program:  starting from the theoretical foundation,  this article talks about a few ways of doing that.</p>
<p style="text-align: left;"><a href="http://www.thibaudlopez.net/">Thibaud Lopez Schneider</a> sent me a link to an interesting paper he wrote,  titled &#8220;<a href="http://www.thibaudlopez.net/xhr/Writing%20effective%20asynchronous%20XmlHttpRequests.pdf">Writing Effective Asynchronous XmlHttpRequests</a>.&#8221;  He presents an informal proof that you can take a program that uses synchronous function calls and common control structures such as <em>if-else</em> and <em>do-while</em>,  and transform it a program that calls the functions asynchronously.  In simple language,  it gives a blueprint for implementing arbitrary control flow in code that uses asynchronous XmlHttpRequests.</p>
<p style="text-align: left;">In this article,  I work a simple example from Thibaud&#8217;s paper and talk about four software tools that automated the conversion of conventional control flows to asynchronous programming.  One tool,  the <a href="http://en.wikipedia.org/wiki/Windows_Workflow_Foundation">Windows Workflow Foundation,</a> lets us compose long-running applications out of a collection of asynchronous <em>Activity</em> objects.  Another two tools are<em> </em><a href="http://chumsley.org/jwacs/">jwacs</a> and <a href="http://www.neilmix.com/narrativejs/doc/">Narrative Javascript</a>,  open-source   translators that translated pseudo-blocking programs in a modified dialect of JavaScript into an asynchronous program in ordinary JavaScript that runs in your browser.</p>
<p style="text-align: left;"><span id="more-49"></span></p>
<p style="text-align: left;">
<h2 style="text-align: left;">A simple example: Sequential Execution</h2>
<p style="text-align: left;">I&#8217;m going to lift a simple example from <a href="http://www.thibaudlopez.net/xhr/Writing%20effective%20asynchronous%20XmlHttpRequests.pdf">Thibaud&#8217;s paper</a>,  the case of sequential execution.  Imagine that we want to write a function <em>f()</em>,  that follows the following logic</p>
<pre style="text-align: left;">[01] function f() {
[02]    ... pre-processing ...
[03]    result1=MakeRequest1(argument1);
[04]    ... think about result1 ...
[05]    result2=MakeRequest2(argument2);
[06]    ... think about result2 ...
[07]    result3=MakeRequest3(argument3);
[08]    ... think about result3 ...
[09]    return finalResult;
[10] }</pre>
<p style="text-align: left;">where functions of the form <em>MakeRequestN</em> are ordinary synchronous functions.  If,  however,  we were working in an environment like JavaScript,  GWT,  Flex,  or Silverlight,  server requests are asynchronous,  so we&#8217;ve only got functions like:</p>
<pre>[11] function BeginMakeRequestN(argument1, callbackFunction);</pre>
<p style="text-align: left;">It&#8217;s no longer possible to express a sequence of related requests as a single function,  instead we need to transform f() into a series of functions,  like so</p>
<pre style="text-align: left;">[12] function f(callbackFunction) {
[13]   ... pre-processing ...
[14]   BeginMakeRequest1(argument,f1);
[15] }
[16]
[17] function f1(result1) {
[18]    ... think about result1 ...
[19]    BeginMakeRequest2(argument2,f2);
[20] }
[21]
[22] function f2(result2) {
[23]    ... think about result2 ...
[24]    BeginMakeRequest3(argument3,f3);
[25] }
[26]
[27] function f3(result3) {
[28]   ... think about result 3 ...
[29]   callbackFunction(finalResult);
[30] }</pre>
<p style="text-align: left;">My example differs from the example of on page 19 of <a href="http://www.thibaudlopez.net/xhr/Writing%20effective%20asynchronous%20XmlHttpRequests.pdf">Thibaud&#8217;s paper</a> in a few ways&#8230;  In particular,  I&#8217;ve added the <em>callbackFunction</em> that <em>f()</em> uses to &#8220;return&#8221; a result to the program that calls it.  Here the <em>callbackFunction</em> lives in a scope that&#8217;s shared by all of the <em>fN</em> functions,  so it&#8217;s available in <em>f3</em>.  I&#8217;ve found that when you&#8217;re applying Thibuad&#8217;s kind of thinking,  it&#8217;s useful for <em>f()</em> to correspond to an object,   of which the <em>fN()</em> functions are methods. [<a href="http://gen5.info/q/2008/06/02/keeping-track-of-state-in-asynchronous-callbacks/">1</a>] [<a href="http://gen5.info/q/2008/04/18/asynchronous-functions/">2</a>] [<a href="http://gen5.info/q/2008/04/11/the-asynchronous-command-pattern/">3</a>]</p>
<p style="text-align: left;">Thibaud also works the implementation of <em>if-then-else</em>,  <em>switch</em>,  <em>for</em>,  <em>do-while</em>,  <em>parallel-do</em> and other common patterns &#8212; <a href="http://www.thibaudlopez.net/xhr/Writing%20effective%20asynchronous%20XmlHttpRequests.pdf">read his paper</a>!</p>
<h2 style="text-align: left;">What next?</h2>
<p>There are things missing from Thibaud&#8217;s current draft:  for instance,  he doesn&#8217;t consider how to implement exception handling in asynchronous applications,  although it&#8217;s <a href="http://gen5.info/q/2008/04/18/asynchronous-functions/">quite possible to do</a>.</p>
<p>Thinking about things systematically helps you do things by hand,  but it really comes into it&#8217;s own when we use systematic thinking to develop tools.  I can imagine two kinds of tools based on Thibaud&#8217;s ideas:</p>
<ol>
<li>Specialized languages for expressing asynchronous flows,  and</li>
<li>Compilers that transform synchronous programs to asynchronous programs</li>
</ol>
<h2>Windows Workflow Foundation</h2>
<p>Windows Workflow Foundation is an example of the first approach.</p>
<p>Although it&#8217;s not designed for use in asynchronous RIA&#8217;s,  Microsoft&#8217;s <a href="http://joeon.net/post/2008/02/Windows-Workflow-Foundation-Tutorial-Series.aspx">Windows Workflow Foundation </a>is an new approach to writing reactive programs.   Unfortunately,  like a lot of enterprise technologies,  WWF is surrounded by a lot of hype that obscures a number of worthwhile ideas:  the book <a href="http://www.amazon.com/gp/product/0321399838/103-7142849-0321425?ie=UTF8&amp;tag=honeymediasys-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=0321399838">Essential Windows Workflow Foundation</a> by Shukla and Schmidt is a lucid explanation of the principles behind it.  It&#8217;s good reading even if you hate Microsoft and would never use a Microsoft product,  because it could inspire you to implement something similar in your favorite environment.  (I know someone who&#8217;s writing a webcrawler in PHP based on a similar approach)</p>
<p>What does it do?</p>
<p>In WWF,  you create an asynchronous program by composing a set of asynchronous <em>Activities</em>.  Ultimately your program is a tree of <em>Activity</em> objects that you can assemble any way you like,  but typically you&#8217;d build them with a XAML (XML) file that might look like</p>
<pre>[31] &lt;Interleave x:Name="i1"&gt;
[32]    &lt;Sequence x:Name="s1"&gt;
[33]       &lt;ReadLine x:Name="r1" /&gt;
[34]       &lt;WriteLine x:Name="w1"
[35]          Text="{wf:ActivityBind r1,path=Text}" /&gt;
[36]       &lt;ReadLine x:Name="r2" /&gt;
[37]       &lt;WriteLine x:Name="w2"
[38]          Text="{wf:ActivityBind r2,path=Text}" /&gt;
[39]    &lt;/Sequence&gt;
[40]    &lt;Sequence x:Name="s2"&gt;
[41]       &lt;ReadLine x:Name="r3" /&gt;
[42]       &lt;WriteLine x:Name="w3"
[43]          Text="{wf:ActivityBind r3,path=Text}" /&gt;
[44]       &lt;ReadLine x:Name="r4" /&gt;
[45]       &lt;WriteLine x:Name="w4"
[46]          Text="{wf:ActivityBind r4,path=Text}" /&gt;
[47]    &lt;/Sequence&gt;
[48] &lt;/Interleave&gt;</pre>
<p>(The above example is based on Listing 3.18 on Page 98 of Shukla and Schmidt,  with some namespace declarations removed for clarity)</p>
<p>This defines a flow of execution that looks like:</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/08/wwfdiagram.png"><img class="alignnone size-full wp-image-60" title="wwfdiagram" src="http://gen5.info/q/wp-content/uploads/2008/08/wwfdiagram.png" alt="" width="485" height="244" /></a></p>
<p>The <em>&lt;Interleave&gt;</em> activity causes two <em>&lt;Sequence&gt;</em> activities to run simultaneously.  Each <em>&lt;Sequence&gt;</em>,  in turn,  sequentially executes two alternating pairs of <em>&lt;ReadLine&gt;</em> and <em>&lt;WriteLine&gt;</em> activities.  Note that the attribute values that look like <em>{wf: ActivityBind r3,path=Text}</em> wire out the output of a <em>&lt;ReadLine&gt;</em> activity to the input of a <em>&lt;WriteLine&gt;</em> activity.</p>
<p>Note that <em>&lt;Interleave&gt;</em>,  <em>&lt;Sequence&gt;</em>,  <em>&lt;ReadLine&gt;</em> and <em>&lt;WriteLine&gt;</em> are all asynchronous activities defined by classes <em>Interleave</em>, <em>Sequence</em>, <em>ReadLine</em> And <em>WriteLine</em> that all implement <em>Activity</em>.   An activity can invoke other activities,  so it&#8217;s possible to create new control structures.  Activities can wait for things to happen in the outside world (such as a web request or an email message) by listening to a queue.   WWF also defines an elaborate model for error handling.</p>
<p>Although other uses are possible,   WWF is intended for the implementation of server applications implementations that implement workflows.  Imagine,  for instance,  a college applications system,  which must wait for a number of forms from the outside,  such as</p>
<ul>
<li>an application,</li>
<li>standardized test scores, and</li>
<li>letters of reccomendation</li>
</ul>
<p>and that needs to solicit internal input from</p>
<ul>
<li>an initial screening committee,</li>
<li>the faculty of individual departments,  and</li>
<li>the development office.</li>
</ul>
<p>The state of a workflow can be serialized to a database,   so the workflow can be something that takes place over a long time,  such as months or weeks &#8212; multiple instances of the workflow can exist at the same time.</p>
<p>WWF looks like a fun environment to program for,  but I don&#8217;t know if I&#8217;d trust it for a real business application.  Why?  I&#8217;ve been building this sort of application for years using relational databases,  I know that it&#8217;s possible to handle the maintenance situations that occur in real life with a relational representation:  both the little tweaks you need to make to a production system from time to time,  and the more major changes required when your process changes.  Systems based on object serialization,  such as WWF,   tend to have trouble when you need to change the definition of objects over time.</p>
<p>I can say,  however,  that the <a href="http://www.amazon.com/gp/product/0321399838/103-7142849-0321425?ie=UTF8&amp;tag=honeymediasys-20&amp;linkCode=xm2&amp;camp=1789&amp;creativeASIN=0321399838">Shukla and Schmidt</a> book is so clear that an ambitious programmer could understand enough of the ideas behind WWF to develop a similar framework that&#8217;s specialized for developing asynchronous RIAs in Javascript,  Java,  or C# pretty quickly.   Read it!</p>
<h2>Transforming Javascript and Other Languages</h2>
<p>Another line of attack on asynchronous programming is the creation of compilers and translators that transform a synchronous program into a synchronous program.  This is particularly popular in Javascript,  where open-source tools such as <a href="http://chumsley.org/jwacs/demos.html">jwacs</a> (Javascript With Advanced Continuation Syntax) let you write code like this:</p>
<pre>[49] function main() {
[50]    document.getElementById('contentDiv').innerHTML =
[51]      '&lt;pre&gt;'
[52]      + JwacsLib.fetchData('GET', 'dataRows.txt')
[53]      + '&lt;/pre&gt;';
[54] }</pre>
<p>Jwacs adds four new keywords to the Javascript language:  internally,  it applies transformations like the ones in the Thibaud paper.  Although it looks as if the call to <em>JwacsLib.fetchData</em> blocks,  in reality,  it splits the <em>main()</em> function into two halves,  executing the function by a form of cooperative multitasking.</p>
<p><a href="http://neilmix.com/narrativejs/doc/index.html">Narrative Javascript</a> is a similar open-source translator that adds -&gt;,  a &#8220;yielding&#8221; operator to Javascript.  This signals the translator to split the enclosing function,  and works for timer and UI event callbacks as well as XHR.  Therefore,  it&#8217;s possible to write a pseudo-blocking sleep() function like:</p>
<pre>[55] function sleep(millis) {
[56]    var notifier = new EventNotifier();
[57]    setTimeout(notifier, millis);
[58]    notifier.wait-&gt;();
[59] }</pre>
<p>Narrative Javascript doesn&#8217;t remember the special nature of the the sleep() function,  so you need to call it with the yielding operator too.  With it,  you can animate an element like so:</p>
<pre>[60] for (var i = 0; i &lt; frameCount - 1; i++) {
[61]    var nextValue = startValue + (jumpSize * i);
[62]    element.style[property] = nextValue + "px";
[63]    sleep-&gt;(frequency);
[64] }</pre>
<p>You can use the yielding operator to wait on user interface events as well.  If you first define</p>
<pre>[65] function waitForClick(element) {
[66]    var notifier = new EventNotifier();
[67]    element.onclick = notifier;
[68]    notifier.wait-&gt;();
[69] }</pre>
<p>you can call it with the yielding operator to wait for a button press</p>
<pre>[70] theButton.innerHTML = "go right";
[71] waitForClick-&gt;(theButton);
[72] theButton.innerHTML = "--&gt;";
[73] ... continue animation ...</pre>
<p>The <a href="http://rifers.org/wiki/display/RIFE/Web+continuations">RIFE Continuation Engine</a> implements something quite similar in Java,  but it translates at the bytecode level instead of at the source code level:  it aims to transform the server-side of web applications,  rather than the client,  by allowing the execution of a function to span two separate http requests.</p>
<h2>Conclusion</h2>
<p>It&#8217;s possible to systematically transform a function that&#8217;s written in terms of conventional control structures and synchronous function calls into a collection of functions that performs the same logic using asynchronous function calls.   A paper by <a href="http://www.thibaudlopez.net/">Thibaud Lopez Schneider</a> points the way,  and is immediately useful for RIA programmers that need to convert conventional control structures in their head into asynchronous code.</p>
<p>A longer-term strategy is to develop frameworks and languages that make it easier to express desired control flows for asynchronous program.  The Windows Workflow Foundation from Microsoft is a fascinating attempt to create a specialized language for assembling asynchronous programs from a collection of <em>Activity</em> objects.  <a href="http://chumsley.org/jwacs/demos.html">jwacs</a> and <a href="http://neilmix.com/narrativejs/doc/index.html">Narrative Javascript</a> are bold attempts to extend the Javascript language so that people can express asynchronous logic as pseudo-threaded programs.  The <a href="http://rifers.org/wiki/display/RIFE/Web+continuations">RIFE Continuation Engine</a> demonstrates that this kind of behavior can be implemented in more static languages such as Java and C#.  Although none of these tools are ready for production-quality RIA work,  they may lead to something useful in the next few years.</p>
<img src="http://feeds.feedburner.com/~r/Generation5/~4/364089717" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://gen5.info/q/2008/08/13/converting-a-synchronous-program-into-an-asynchronous-program/feed/</wfw:commentRss>
		<feedburner:origLink>http://gen5.info/q/2008/08/13/converting-a-synchronous-program-into-an-asynchronous-program/</feedburner:origLink></item>
		<item>
		<title>Stop Catching Exceptions!</title>
		<link>http://feeds.feedburner.com/~r/Generation5/~3/352130370/</link>
		<comments>http://gen5.info/q/2008/07/31/stop-catching-exceptions/#comments</comments>
		<pubDate>Fri, 01 Aug 2008 01:45:24 +0000</pubDate>
		<dc:creator>Paul Houle</dc:creator>
		
		<category><![CDATA[Dot Net]]></category>

		<category><![CDATA[Exceptions]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://gen5.info/q/?p=43</guid>
		<description><![CDATA[Motivation
It&#8217;s clear that a lot of programmers are uncomfortable with exceptions [1] [2];  in the feedback of an article I wrote about casting,  it seemed that many programmers saw the throwing of a NullReferenceException at a cast to be an incredible catastrophe.
In this article,  I&#8217;ll share a philosophy that I hope will [...]]]></description>
			<content:encoded><![CDATA[<h2>Motivation</h2>
<p>It&#8217;s clear that a lot of programmers are uncomfortable with exceptions <a href="http://www.joelonsoftware.com/items/2003/10/13.html">[1]</a> <a href="http://www.ckwop.me.uk/Why-Exceptions-Suck.html">[2]</a>;  in the feedback of an article I wrote about casting,  it seemed that many programmers saw the throwing of a <em>NullReferenceException</em> at a cast to be an incredible catastrophe.</p>
<p>In this article,  I&#8217;ll share a philosophy that I hope will help programmers overcome the widespread fear of exceptions.  It&#8217;s motivated by five goals:</p>
<ol>
<li>Do no harm</li>
<li>To write as little error handling code as possible,</li>
<li>To think about error handling as little as possible</li>
<li>To handle errors should handled correctly when possible,</li>
<li>Otherwise errors should be handled sanely</li>
</ol>
<p>To do that, I</p>
<ol>
<li>Use <em>finally</em> to stabilize program state when exceptions are thrown</li>
<li><em>Catch</em> and handle exceptions locally when the effects of the error are local and completely understood</li>
<li>Wrap independent units of work in <em>try-catch</em> blocks to handle errors that have global impact</li>
</ol>
<p>This isn&#8217;t the last word on error handling,  but it avoids many of the pitfalls that people fall into with exceptions.  By building upon this strategy,  I believe it&#8217;s possible to develop an effective error handling strategy for most applications:  future articles will build on this topic,  so keep posted by subscribing to the <a href="http://feeds.feedburner.com/Generation5">Generation 5 RSS Feed</a>.</p>
<h2><span id="more-43"></span>The Tragedy of Checked Exceptions</h2>
<p>Java&#8217;s done a lot of good,  but checked exceptions are probably the worst legacy that Java has left us.  Java has influenced Python,  PHP,  Javascript,  C# and many of the popular languages that we use today.  Unfortunately,  checked exceptions taught Java programmers to catch exceptions prematurely,  a habit that Java programmers carried into other languages,  and has result in a code base that sets bad examples.</p>
<p>Most exceptions in Java are checked,  which means that the compiler will give you an error if you write</p>
<pre>[01] public void myMethod() {
[02]    throw new ItDidntWorkException()
[03] };</pre>
<p>unless you either catch the exception inside <em>myMethod</em> or you replace line [01] with</p>
<pre>[04] public void myMethod() throws ItDidntWorkException {</pre>
<p>The compiler is also aware of any checked exceptions that are thrown by methods underneath <em>myMethod</em>,  and forces you to either catch them inside <em>myMethod</em> or to declare them in the throws clause of <em>myMethod</em>.</p>
<p>I thought that this was a great idea when I started programming Java in 1995.  With the hindsight of a decade,  we can see that it&#8217;s a disaster.  The trouble is that every time you call a method that throws an exception,  you create an immediate crisis:  you break the build.   Rather than conciously planning an error handling strategy,  programmers do something,  anything,   to make the compiler shut up.  Very often you see people bypass exceptions entirely,  like this:</p>
<pre>[05] public void someMethod() {
[06]    try {
[07]       objectA.anotherMethod();
[08]    } catch(SubsystemAScrewedUpException ex) { };
[09] }</pre>
<p>Often you get this instead:</p>
<pre>[10]    try {
[11]       objectA.anotherMethod();
[12]    } catch(SubsystemAScrewedUp ex) {
[13]        // something ill-conceived to keep the compiler happy
[14]    }
[15]    // Meanwhile,  the programmer makes a mistake here because
[16]    // writing an exception handler broke his concentration</pre>
<p>This violates the first principle,  to &#8220;do no harm.&#8221;  It&#8217;s simple,  and often correct,  to pass the exception up to the calling function,</p>
<pre>[17] public int someMethod() throws SubsystemAScrewedUp {</pre>
<p>But,   this still breaks the build,  because now every function that calls <em>someMethod()</em> needs to do something about the exception.  Imagine a program of some complexity that&#8217;s maintained by a few programmers,  in which method A() calls method B() which calls method C() all the way to method F().</p>
<p><img src="http://gen5.info/q/wp-content/uploads/2008/07/dontthrowcalls.png" alt="" /></p>
<p>The programmer who works on method F() can change the signature of that method,  but he may not have the authority to change the signature of the methods above.  Depending on the culture of the shop,  he might be able to do it himself,  he might talk about it with the other programmers over IM,  he might need to get the signatures of three managers,  or he might need to wait until the next group meeting.  If they keep doing this,  however,  A() might end up with a signature like</p>
<pre>[18] public String A() throws SubsystemAScrewedUp, IOException, SubsystemBScrewedUp,
[19]   WhateverExceptionVendorZCreatedToWrapANullPointerException, ...</pre>
<p>This is getting out of hand,  and they realize they can save themselvesa lot of suffering by just writing</p>
<pre>[20] public int someMethod() throws Exception {</pre>
<p>all the time,  which is essentially disabling the checked exception mechanism.  Let&#8217;s not dismiss the behavior of the compiler out of hand,  however,  because it&#8217;s teaching us an important lesson:  error handling is a holistic property of a program:  an error handled in method F() can have implications for methods A()&#8230;E().  <em><strong>Errors often break the assumption of encapsulation</strong></em>,  and require a global strategy that&#8217;s applied consistently throughout the code.</p>
<p>PHP,  C# and other post-Java languages tend to not support checked exceptions.  Unfortunately,  checked exception thinking has warped other langages,  so you&#8217;ll find that catch statements are used neurotically everywhere.</p>
<h2>Exception To The Rule: Handling Exceptions Locally</h2>
<p>Sometimes you can anticipate an exception,  and know what exact action to take.  Consider the case of a compiler,  or a program that processes a web form,  which might find more than one error in user input.  Imagine something like (in C#):</p>
<pre>[21] List&lt;string&gt; errors=new List&lt;string&gt;();
[22] uint quantity=0;
[23] ... other data fields ...
[24]
[25] try {
[26]   quantity=UInt32.Parse(Params["Quantity"]);
[27] } catch(Exception ex) {
[28]   errors.Add("You must enter a valid quantity");
[29] }
[30]
[31] ... other parsers/validators ...
[32]
[33] if (errors.Empty()) {
[34]    ... update database,  display success page ...
[35] } else {
[36]    ... redraw form with error messages ...
[37] }</pre>
<p>Here it makes sense to catch the exception locally,  because the exceptions that can happen on line [22] are completely handled,  and don&#8217;t have an effect on other parts of the application.  The one complaint you might make is that I should be catching something more specific than <em>Exception</em>.  Well,  that would bulk the code up considerably and violate the DRY (don&#8217;t repeat yourself) principle: <em>UInt32.Parse</em> can throw three different exceptions:  <em>ArgumentNullException</em>,  <em>FormatException</em>,  and <em>OverflowException</em>.  On paper,  the process of looking up the &#8220;Quantity&#8221; key in <em>Params</em> could throw an <em>ArgumentNullException</em> or a <em>KeyNotFoundException</em>.</p>
<p>I don&#8217;t think either <em>ArgumentNullException</em> can really happen,  and I think the <em>KeyNotFoundException </em>would only occur in development,  or if somebody was trying to submit the HTML form with an unauthorized program.   Probably the best thing to do in either case would be to abort the script with a 500 error and log the details,  but the error handling on line [24] is <strong>sane</strong> in that it prevents corruption of the database.</p>
<p>The handling of <em>FormatException</em> and <em>OverflowException</em>,  in the other case,  is fully correct.  The user gets an error message that tells them what they need to do to fix the situation.</p>
<p>This example demonstrates a bit of why error handling is so difficult and why the perfect can be the enemy of the good:  the real cause of an<em> IOException</em> could be a microscopic scratch on the surface of a hard drive,  and operating system error, or the fact that somebody spilled a coke on a router in Detroit &#8212; diagnosing the problem and offering the right solution is an insoluble problem.</p>
<h2>Fixing it up with <em>finally</em></h2>
<p>The first exception handling construct that should be on your fingertips is <em>finally</em>,  not <em>catch</em>.  Unfortunately,  <em>finally</em> is a bit obscure:  the pattern in most languages is</p>
<pre>[38] try {
[39]    ... do something that might throw an exception ...
[40] } finally {
[41]    ... clean up ...
[42] }</pre>
<p>The code in the <em>finally</em> clause get runs whether or not an exception is thrown in the <em>try</em> block.  <em>Finally</em> is a great place to release resources,  roll back transactions,  and otherwise protect the state of the application by enforcing invariants.  Let&#8217;s think back to the chain of methods <em>A()</em> through <em>F()</em>:  with <em>finally</em>,  the maintainer of <em>B() </em>can implement a local solution to a global problem that starts in <em>F()</em>:  no matter what goes wrong downstream,  B() can repair invariants and repair the damage.  For instance,  if B()&#8217;s job is to write something into a transactional data store,  B() can do something like:</p>
<pre>[43] Transaction tx=new Transaction();
[44] try {
[45]    ...
[46]    C();
[47]    ...
[48]    tx.Commit();
[49] } finally {
[50]    if (tx.Open)
[51]        tx.Rollback();
[52] }</pre>
<p>This lets the maintainer of B() act defensively,  and offer the guarantee that the persistent data store won&#8217;t get corrupted because of an exception that was thrown in the <em>try</em> block.  Because B() isn&#8217;t catching the exception,  it can do this <strong>without depriving upstream methods</strong>,  <strong>such as A() from doing the same.</strong></p>
<p>C# gets extra points because it has syntactic sugar that makes a simple case simple:  The <em>using</em> directive accepts an <em>IDisposable </em>as an argument and wraps the block after it with a finally clause that calls the <em>Dispose()</em> method of the <em>IDisposable</em>.  ASP.NET applications can fail catastrophically if you don&#8217;t <em>Dispose() </em>database connections and result sets,  so</p>
<pre>[53] using (var reader=sqlCommand.ExecuteReader()) {
[54]   ... scroll through result set ...
[55] }</pre>
<p>is a widespread and effective pattern.</p>
<p>PHP loses points because it doesn&#8217;t support <em>finally</em>.  Granted,  <em>finally</em> isn&#8217;t as important in PHP,  because all resources are released when a PHP script ends.  The absense of <em>finally</em>,  however,   encourages PHP programmers to overuse <em>catch</em>,  which perpetuates exception phobia.   The PHP developers are adding great features to PHP 5.3,  such as late static binding,   so we can hope that they&#8217;ll change their mind and bring us a <em>finally</em> clause.</p>
<h2>Where should you catch exceptions?</h2>
<p>At high levels of your code,  you should wrap <strong>units of work</strong> in a try-catch block.  A unit of work is something that makes sense to either give up on or retry.  Let&#8217;s work out a few simple examples:</p>
<p><strong>Scripty Command line program:</strong> This program is going to be used predominantly by the person who wrote it and close associates,  so it&#8217;s acceptable for the program to print a stack trace if it fails.  The &#8220;unit of work&#8221; is the whole program.</p>
<p><strong>Command line script that processes a million records:</strong> It&#8217;s likely that some records are corrupted or may trigger bugs in the program.  Here it&#8217;s reasonable for the &#8220;unit of work&#8221; to be the processing of a single record.  Records that cause exceptions should be logged,  together with a stack trace of the exception.</p>
<p><strong>Web application: </strong>For a typical web application in PHP,  JSP or ASP.NET,  the &#8220;unit of work&#8221; is the web request.  Ideally the application returns a &#8220;500 Internal Error&#8221;,  displays a message to the user (that&#8217;s useful but not overly revealing) and logs the stack trace (and possibly other information) so the problem can be investigated.  If the application is in debugging mode,   it&#8217;s sane to display the stack trace to the web browser.</p>
<p><strong>GUI application: </strong>The &#8220;unit of work&#8221; is most often an event handler that&#8217;s called by the GUI framework.  You push a button,  something does wrong,  then what?  Unlike server-side web applications,  which tend to assume that exceptions don&#8217;t involve corruption of static memory or of a database,  GUI applications tend to shut down when they experience unexpected exceptions.  [<a href="http://blogs.msdn.com/larryosterman/archive/2008/05/01/resilience-is-not-necessarily-a-good-thing.aspx">3</a>]  As a result,  GUI applications tend to need infrastructure to convert common and predictable exceptions (such as network timeouts) into human readable error messages.</p>
<p><strong>Mail server: </strong>A mail server stores messages in a queue and delivers them over a unreliable network.  Exceptions occur because of full disks (locally or remote),  network failures,  DNS misconfigurations,  remote server falures,  and an occasionaly cosmic ray.  The &#8220;unit of work&#8221; is the delivery of a single message.  If an exception is thrown during delivery of the message,  it stays in the queue:  the mail server attempts to resend on a schedule,  discarding it if it is unable to deliver after seven days.</p>
<h2>What should you do when you&#8217;ve caught one?</h2>
<p>That&#8217;s the subject of another article.  <a href="http://feeds.feedburner.com/Generation5">Subscribe to my RSS</a> feed if you want to read it when it&#8217;s ready.  For now,  I&#8217;ll enumerate a few questions to think about:</p>
<ol>
<li>What do tell the end user?</li>
<li>What do you tell the developer?</li>
<li>What do you tell the sysadmin?</li>
<li>Will the error clear if up if we try to repeat this unit of work again?</li>
<li>How long would we need to wait?</li>
<li>Could we do something else instead?</li>
<li>Did the error happen because the state of the application is corrupted?</li>
<li>Did the error cause the state of the application to get corrupted?</li>
</ol>
<h2>Conclusion</h2>
<p>Error handling is tough.  Because errors come from many sources such as software defects,  bad user input,  configuration mistakes,  and both permanent and transient hardware failures,  it&#8217;s impossible for a developer to anticipate and perfectly handle everything that can go wrong.  Exceptions are an excellent method of separating error handling logic from the normal flow of programs,  but many programmers are too eager to catch exceptions:  this either causes errors to be ignores,  or entangles error handling with mainline logic,  complicating both.  The long term impact is that many programmers are afraid of exceptions and turn to return values as an error signals,  which is a step backwards.</p>
<p>A strategy that (i) uses <em>finally</em> as the first resort for containing corrupting and maintaining invariants,   (ii) uses <em>catch</em> locally when the exceptions thrown in an area are completely understood, and (iii) surrounds independent units of work with <em>try-catch</em> blocks is an effective basis for using exceptions that can be built upon to develop an exception handling policy for a particular application.</p>
<p>Error handling is a topic that I spend entirely too much time thinking about,  so I&#8217;ll be writing about it more.  Subscribe to my <a href="http://feeds.feedburner.com/Generation5">RSS Feed</a> if you think I&#8217;ve got something worthwhile to say.</p>
<p><a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2fgen5.info%2fq%2f2008%2f07%2f31%2fstop-catching-exceptions%2f"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2fgen5.info%2fq%2f2008%2f07%2f31%2fstop-catching-exceptions%2f" border="0" alt="kick it on DotNetKicks.com" /></a><br />
<script type="text/javascript"><!--
digg_url = 'http://digg.com/programming/Stop_Catching_Exceptions';
// --></script><br />
<script src="http://digg.com/tools/diggthis.js" type="text/javascript"></script></p>
<img src="http://feeds.feedburner.com/~r/Generation5/~4/352130370" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://gen5.info/q/2008/07/31/stop-catching-exceptions/feed/</wfw:commentRss>
		<feedburner:origLink>http://gen5.info/q/2008/07/31/stop-catching-exceptions/</feedburner:origLink></item>
		<item>
		<title>The Multiton Design Pattern</title>
		<link>http://feeds.feedburner.com/~r/Generation5/~3/345785835/</link>
		<comments>http://gen5.info/q/2008/07/25/the-multiton-design-pattern/#comments</comments>
		<pubDate>Fri, 25 Jul 2008 16:04:05 +0000</pubDate>
		<dc:creator>Paul Houle</dc:creator>
		
		<category><![CDATA[Asynchronous Communications]]></category>

		<category><![CDATA[GWT]]></category>

		<category><![CDATA[Silverlight]]></category>

		<guid isPermaLink="false">http://gen5.info/q/?p=41</guid>
		<description><![CDATA[Introduction
Many people have independely discovered a new design pattern,  the &#8220;Multiton&#8221;,  which,  like the &#8220;Singleton&#8221; is an initialization pattern in the style of the Design Patterns book.  Like the Singleton,  the Multiton provides a method that controls the construction of a class:  instead of maintaining a single copy of [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>Many people have independely discovered a new design pattern,  the &#8220;Multiton&#8221;,  which,  like the &#8220;Singleton&#8221; is an initialization pattern in the style of the Design Patterns book.  Like the Singleton,  the Multiton provides a method that controls the construction of a class:  instead of maintaining a single copy of an object in an address space,  the Multiton maintains a Dictionary that maps keys to unique objects.</p>
<p>The Multiton pattern can be used in systems that store persistent data in a back-end store,  such as a relational databases.  The Multiton pattern can be used to maintain a set of objects are mapped to objects (rows) in a persistent store:  it applies obviously to object-relational mapping systems,  and is also useful in asynchronous RIA&#8217;s,  which need to keep track of user interface elements that are interested in information from the server.</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/07/multitondiagram.png"><img title="multitondiagram" src="http://gen5.info/q/wp-content/uploads/2008/07/multitondiagram.png" alt="" /></a></p>
<p>An alternate use case of Mulitons,  seen in the &#8220;<a href="http://en.wikipedia.org/wiki/PureMVC">Multicore</a>&#8221; version of the PureMVC framework,  is the extension of the Singleton pattern to support multiple instances of a system in a single address space.</p>
<p>As useful as the Multiton pattern is,  this article explains how Multitons use references in a way that doesn&#8217;t work well with conventional garbage collection.  Multitons are a great choice when the number of Multitons is small,  but they may leak memory unacceptablely when more than a few thousand are created.  Future posts will describe patterns,  such as the Captive Multiton,  that provide the same capabilities with more scalable memory management &#8212; subscribe to our <a href="http://feeds.feedburner.com/Generation5">RSS feed</a> to keep informed.</p>
<h2><span id="more-41"></span>Use of a Multiton in An Asynchronous Application</h2>
<p>In our last article on <a href="http://gen5.info/q/2008/07/18/the-role-of-the-model-in-silverlight-gwt-and-javascript/">Model-View Separation in Asynchronous RIA&#8217;s</a>,   we used a Singleton object that represented an entire table in a relational database.  This object maintained a list of listerners that were interested in the contents of a table.  In this case,  the amount of information in the table was small,  and often used in the aggregate,   so retreiving a complete copy of the table was a reasonable level of granularity.  We could imagine a situation,  however,  where the number of records and size of the records is enough that we need to transfer records individually.  (This specific case is an outline of an implementation for Silverlight:  a GWT implementation would be similar &#8212; details specific to GWT are talked about in <a href="http://gen5.info/q/2008/07/18/the-role-of-the-model-in-silverlight-gwt-and-javascript/">a previous post</a>.)</p>
<p>Imagine,  for instance,  a <em>BlogPosting</em> object,  which represents a post in a blog,  which in turn has an integer primary key.  The BlogPosting object is a multiton,  so you&#8217;d write</p>
<pre>[01] var posting=BlogPosting.GetInstance(postId);</pre>
<p>to get the instance of <em>BlogPosting</em> that corresponds to <em>postId</em>.  Client objects can&#8217;t really write something like</p>
<pre>[02] TitleField.Text=posting.Title</pre>
<p>because the operation of retrieving text from an the server is asynchronous,  and won&#8217;t return in time to return a value,  either on line [01] or [02].  More reasonably,  a <em>BlogPostingViewer</em> can register itself against a <em>BlogPosting</em> instance so it will be notified when information is available about the blog posting.</p>
<pre>[03] public class BlogPostingViewer: UserControl,IBlogPostingListener {
[04]     protected int PostId;
[05]
[06]     public BlogPostViewer(int postId) {
[07]        PostId=postId;
[08]        BlogPosting.GetInstance(postId).AddListener(this);
[09]     }
[10]
[11]     public void Dispose() {
[12]        BlogPosting.GetInstance(postId).RemoveListener(this);
[13]        super.Dispose();
[14]     }</pre>
<p>This example shows a pattern usable in a Silverlight applicaton,  unlike the GWT style in the <a href="http://gen5.info/q/2008/07/18/the-role-of-the-model-in-silverlight-gwt-and-javascript/">model-view article</a>.  The <em>Dispose()</em> method will need to be called manually when the <em>BlogPostingViewer</em> is no longer needed,  since it will never be garbage collected so long as a reference to it inside the <em>BlogPosting</em> exists.  (This points to a general risk of memory leaks with Multitons that we&#8217;ll talk about later.)  This problem can be addre</p>
<p>The <em>BlogPostingViewer</em> goes on to implement the <em>IBlogPostingListener</em> interface,  updating the visual appearance of the user interface to reflect information from the UI:</p>
<pre>[15]     public void UpdatePosting(BlogPostingData d) {
[16]         if (d==null) {
[17]            ClearUserInterface();   // user-defined method blanks out UI
[18]            return
[19]         }
[20]         TitleField.Text=d.Title;
[21]         ...
[22]     }
}</pre>
<p>We assume that <em>BlogPostingData</em> represents the state of the <em>BlogPosting</em> at a moment in time,  distinct from the <em>BlogPosting</em>,  which represents the <em>BlogPosting</em> as a persistent object.  <em>BlogPostingData</em> might (roughly) correspond to the the columns of a relational table and look something like:</p>
<pre>[23] public class BlogPostingData {
[24]    public string Title { get; set;}
[25]    public Contributor Author { get; set; }
[26]    public string Body { get; set;}
[27]    public Category[] AssociatedCategories { get; set;}
[28]    ...
[29] }</pre>
<p>We could then add a <em>BlogPostingViewer</em> to the user interface and schedule it&#8217;s initialization by writing</p>
<pre>[30] var viewer=new BlogPostingViewer(PostId);
[31] OuterControl.Children.Add(viewer);
[32] BlogPosting.GetInstance(PostId).Fetch();</pre>
<p>Note that line [32] tells the <em>BlogPosting</em> instance to retreive a copy of the posting from the server (an instance of <em>BlogPostingData</em>) and call <em>UpdatePosting()</em> on all of the listeners.  Therefore,  there will be a time between line [30] and the time when the async call started on line [32] gets back when the <em>BlogPostingViewer</em> is empty (not initialized with <em>BlogPostingData</em>.)  Therfore,  the <em>BlogPostingViewer</em> must be designed so that nothing bad happens when it&#8217;s in that state:  it has to show something reasonable to user and not crash the app if the user clicks a button that isn&#8217;t ready yet.</p>
<p>(In a more developed application,  the <em>BlogPosting</em> could keep a cache of the latest <em>BlogPostingData</em>:  this could improve responsiveness by updating the <em>BlogPostingViewer</em> at the moment it registers,  or by doing a timestamp or checksum stamp against the server to reduce the bandwidth requirements of a <em>Fetch()</em>,  just watch out for the <a href="http://gen5.info/q/2008/04/21/once-asynchronous-always-asynchronous/">unintended consequences of multiple code paths</a>.)</p>
<h2>Implementing a Muliton</h2>
<p>Here&#8217;s an implementation of a Multiton in C# that&#8217;s not too different from the <a href="http://gen5.info/q/2008/04/21/once-asynchronous-always-asynchronous/">Java implementation from Wikipedia</a>.</p>
<pre>class BlogPosting {
    #region Initialization
    private static readonly Dictionary&lt;int,BlogPosting&gt; _Instances =
       new Dictionary&lt;int,BlogPosting&gt;();

    private BlogPosting(int key) {
        ... construct the object ...
    }

    public static BlogPosting GetInstance(int key) {
        lock(_Instances) {
            BlogPosting instance;
            if (_Instances.TryGetValue(key,out instance)) {
                return instance;
            }</pre>
<pre>            instance = new BlogPosting(key);
            _Instances.Add(key, instance);
            return instance;
        }
    }
    #endregion

    ... the rest of the class ...

}</pre>
<p>I&#8217;m pretty sure that a  version of this could be created in C# with slightly sweeter syntax that would look like</p>
<p>BlogPosting.Instance[postId]</p>
<p>but this doesn&#8217;t address the weak implementation of  static inheritence in many popular languages that requires us to cut-and-paste roughly 20 lines of code for each Multiton class,  rather than being able to reuse inheritence logic.  The Ruby Applications Library,  on the other hand,  contains a <a href="http://raa.ruby-lang.org/project/multiton/">Multiton</a> class that can be used to bolt Multiton behavior onto a class.  It would be interesting to see what could be accomplished with PHP 5.3&#8217;s <a href="http://www.colder.ch/news/08-24-2007/28/late-static-bindings-expl.html">late static binding</a>.</p>
<h2>Multitons And Memory Leaks</h2>
<p>Multitons,  unfortunately,  don&#8217;t interact well with garbage collectors.  Once a Multiton is created,  the static <em>_Instances</em> array will maintain a reference to every Multiton in the system,  so that Multitons won&#8217;t be collected,  even if  no active references exist.</p>
<p>You might think you could manually remove Multitons from the <em>_Instances</em> list,  but this won&#8217;t be entirely reliable.  In the case above,  each <em>BlogPosting </em>maintains a list of <em>IBlogPostingListeners</em>.  You could,  in principle,  scavenge <em>BlogPostings</em> with an empty set of listerners,  but that doesn&#8217;t stop a class from squirreling away a copy of a <em>BlogPosting</em> that will later conflict with a new BlogPosting that somebody creates by using <em>BlogPosting.GetInstance()</em>.</p>
<p><em>WeakReferences</em>,  as available in dot-Net and the full Java platform (as opposed to GWT),  are not an answer to this problem,  because references work backwards in this case:  a <em>BlogPosting</em> is collectable if (i) no references to the <em>BlogPosting</em> exist outside the <em>_Instances</em> array,  and (ii) a <em>BlogPosting </em>doesn&#8217;t hold references to other objects that may need to be updated in the future.</p>
<p>The severity of this issue depends on the number of Multitons created and the size of the Mulitons.  If the granularity of Multitons is coarse,  and you&#8217;ll only create five of them,  there&#8217;s no problem.  1000 Multitons that each consume 1 kilobyte will consume about a megabyte of RAM,  which is inconsequential for most applications these days.  However,  this amounts to a scaling issue:  an application that works fine when it creates 50 Mulitons could break down when it creates 50,000.</p>
<p>One answer to this problem is to restrict access to Muliton so that:  (i) references to Multitons can&#8217;t be saved by arbitrary objects and (ii) manages Multitons with a kind of reversed reference count,  so that Multitons are discared when they no longer hold useful informaton.  I call this a <em>Captive Multiton</em>,  and this will be the subject of our next exciting episode:  <a href="http://feeds.feedburner.com/Generation5">subscribe to our RSS feed </a>so you won&#8217;t miss it.</p>
<h2>More Information About Multitons</h2>
<p>So far as I can tell,  Multitons have been independently discovered by many developers in recent years.  I used Multitons (I called them &#8220;Parameterized Singleons&#8221;) in the manner above in a GWT application that I developed in summer 2007.  <a href="http://en.wikipedia.org/wiki/PureMVC">The PureMVC Framework uses Multitons</a> to allow multiple instances of the framework to exist in an address space.   A <a href="http://raa.ruby-lang.org/project/multiton/">reusable Multiton implementation</a> exists in Ruby.</p>
<h2>Conclusion</h2>
<p>The Muliton Pattern is an initialization pattern in the sense defined in the notorious &#8220;Design Patterns&#8221; Book.  Mulitons are like Singletons in that they use static methods to control access to a private constructor,  but instead of maintaining a single copy of an object in an address space,  a Multiton maintains a mapping from key values to objects.  A number of uses are emerging for mulitons:  (i) Multitons are useful when we want to use something like the Singleton pattern,  but support multiple named instances of a system in an an address space and (ii) Multitons can be a useful representation of an object in a persistent store,  such as a relational database.  Multitons,  however,  are not collected properly by conventional garbage collectors:  this is harmless for applications that create a small number of mulitons,  but poses a scaling problem when Multitons are used to represent a large number of objects of fine granularity &#8212; a future posting will introduce a Captive Multiton that solves this problem:  <a href="http://feeds.feedburner.com/Generation5">subscribe to our RSS feed</a> to follow this developing story.</p>
<p><a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2fgen5.info%2fq%2f2008%2f07%2f25%2fthe-multiton-design-pattern%2f"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2fgen5.info%2fq%2f2008%2f07%2f25%2fthe-multiton-design-pattern%2f" border="0" alt="kick it on DotNetKicks.com" /></a></p>
<img src="http://feeds.feedburner.com/~r/Generation5/~4/345785835" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://gen5.info/q/2008/07/25/the-multiton-design-pattern/feed/</wfw:commentRss>
		<feedburner:origLink>http://gen5.info/q/2008/07/25/the-multiton-design-pattern/</feedburner:origLink></item>
		<item>
		<title>The Role Of The Model in Silverlight, GWT and Javascript</title>
		<link>http://feeds.feedburner.com/~r/Generation5/~3/339248714/</link>
		<comments>http://gen5.info/q/2008/07/18/the-role-of-the-model-in-silverlight-gwt-and-javascript/#comments</comments>
		<pubDate>Fri, 18 Jul 2008 19:00:18 +0000</pubDate>
		<dc:creator>Paul Houle</dc:creator>
		
		<category><![CDATA[Asynchronous Communications]]></category>

		<category><![CDATA[GWT]]></category>

		<category><![CDATA[Silverlight]]></category>

		<guid isPermaLink="false">http://gen5.info/q/?p=35</guid>
		<description><![CDATA[Introduction
When people start developing RIA&#8217;s in environments such as Silverlight,  GWT,  Flex and plain JavaScript,  they often write asynchronous communication callbacks in an unstructured manner,  putting them wherever is convenient &#8212; often in an instance member of a user interface component (Silverlight and GWT) or in a closure or global function [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>When people start developing RIA&#8217;s in environments such as Silverlight,  GWT,  Flex and plain JavaScript,  they often write asynchronous communication callbacks in an unstructured manner,  putting them wherever is convenient &#8212; often in an instance member of a user interface component (Silverlight and GWT) or in a closure or global function (JavaScript.)</p>
<p>Several problems almost invariably occur as applications become more complex that force the development of an architecture that decouples communication event handlers from the user interface:  a straightforward solution is to create a model layer that&#8217;s responsible for notifying interested user interface components about data updates.</p>
<p>This article uses a simple example application to show how a first-generation approach to data updates breaks down and how introducing a model-view split makes for a reliable and maintainable application.</p>
<p>(This is one of a series of articles on RIA architecture:  subscribe to the <a href="http://feeds.feedburner.com/Generation5">Gen5 RSS feed</a> for future installments.)</p>
<h2>Example Application: Blogging And The Category Dropdown</h2>
<p>Imagine a blogging application that works like the Wordpress blog used on this site.  This application consists of a number of forms,  one of which is used to write a new post:</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/07/demoapplication2.png"><img class="alignnone size-full wp-image-40" title="demoapplication2" src="http://gen5.info/q/wp-content/uploads/2008/07/demoapplication2.png" alt="" width="500" height="202" /></a></p>
<p>This form lets you fill out two text fields:  a title and the body of the post.  It also contains a dropdown list of categories,  and gives you the option of adding a new category.  Categories are represented (server-side) in a table in a relational database that looks like:</p>
<pre>[01] CREATE TABLE categoryList (
[02]     id                 integer primary key auto_increment,
[03]     name               varchar(255)
[04] );</pre>
<p>Adding a category to the database requires a call to the server that adds a row to the database and returns the new category list,  which is then used to update the dropdown list.  I&#8217;ll show you samples of the app in a pseudocode in an imaginary environment which combines the best of Silverlight and GWT.  First we initialize the form and set an event handler that&#8217;s called when somebody clicks on the <em>AddCategoryButton</em>:</p>
<pre>[05] class CreatePostForm {
[06]    protected TextBox Title;
[07]    protected ListBox Category;
[08]    protected TextBox AddCategoryName;
[09]    protected Button  AddCategoryButton;
[10]    protected RichTextArea Body;
[11]    protected Button  Submit;
[12]
[13]    public CreatePostForm() {
[14]        ... initialize and lay out UI elements ...
[15]
[16]        AddCategoryButton.OnClick += AddCategoryButton_OnClick;
[17]
[18]        ... finish construction ...
[19]    }</pre>
<p>Leaving out error handling and other details,  the job of the event handler is to pass the name of the new category to the server.  The event handler is defined as an instance method of <em>CreatePostForm</em>:</p>
<pre>[20]    protected void AddCategoryButton_OnClick {
[21]        Server.Instance.AddCategory(AddCategoryName.Text,AddCategory_Completed)
[22]    }</pre>
<p>The <em>AddCategory</em> RPC call is defined on a Singleton called <em>Server</em>,  and takes two arguments:  (1) a string with the name of the new category,  and (2) a reference to to the callback function that gets called when the RPC call is complete.   The callback,  <em>AddCategory_Completed</em>,  is also an instance method:</p>
<pre>[23]     protected void AddCategory_Completed(List&lt;ListBoxItem&gt; items) {
[24]            Category.Items = items;
[25]     }</pre>
<p><em>ListBoxItem</em> is a class that represents a single row in a ListBox,  which has properties <em>ListBox.Id</em> and <em>ListBox.Name</em>.  This is simple and straightforward code,  and it ought to maintainable,  right?</p>
<p>Let&#8217;s see</p>
<h2>The Naive Implementation Adapts</h2>
<p>Well,  when we finish writing the class,  we notice the first problem - a minor problem.  There are two buttons on the form,  so we need two event handlers and two callback functions.  As a UI class gets complicated,  it can accumulate quite a few callback functions,  and it can get tricky keeping track of them all.  Careful naming,  code organization,  and the use of <em>#region </em>in C# can help organize the code,  but it&#8217;s easy to build UI controls that have tens of methods in which we can get lost.</p>
<p>Over time,  we&#8217;ll add more forms to the app,  and pretty soon we&#8217;ll add another form that has a category list:  perhaps this a form used by administrators to search for posts:  let&#8217;s call it <em>AdminSearchForm</em>.   <em>AdminSearchForm</em> also contains a <em>Listbox</em> called <em>Category</em>.  It&#8217;s a protected field of <em>AdminSearchForm</em>,  but we need to update it when the administrator adds a new category.   It seems reasonable to add a public method to AdminSearchForm</p>
<pre>[26] public class AdminSearchForm {
[27]     ...
[28]     protected ListBox Categories;
[29]     ...
[30]     public void UpdateCategoryList(List&lt;ListBoxItem&gt; items) {
[31]       Categories.Items=items;
[32]     }
[33] }</pre>
<p>Now we update the <em>AddCategory_Completed</em> function so it updates the <em>AdminSearchForm</em>:</p>
<pre>[34] public class CreatePostForm {
[35]    ...
[36]    protected void AddCategory_Completed(List&lt;ListBoxItems&gt; items) {
[37]       Categories.Items=items;
[38]       App.Instance.MainTabPanel.AdminSearchForm.UpdateCategoryList(items);
[39]    }
[40] }</pre>
<p>Not too bad,  eh?  just four more lines of transparent code to update <em>AdminSearchForm</em>,   even if line [38] has a rather ugly coupling to the detailed structure of the application.</p>
<h2>The Naive Implementation Breaks Down</h2>
<p>Over the next few weeks,  we add a few more dropdown lists to the application,  we keep doing the same thing,  and it&#8217;s fine for a while.  Then we start running into problems:</p>
<ol>
<li>We can&#8217;t reuse <em>CreatePostForm</em> to make different versions of the application,  because it contains a hard-coded list of all the dropdown lists in the application.</li>
<li>We can&#8217;t update the contents of a category list that it&#8217;s a dynamically generated UI element,  such as a dialog box,  a draggable representation of an item,  a search result listing,  or an application plug-in.</li>
<li>You need to consider how all of these dropdown lists get initialized when the application starts (something this code sample doesn&#8217;t show.)</li>
<li>At some point you need to add a second way that a user can add a category (for instance,  the &#8220;Manage Categories&#8221; screen in Wordpress) &#8212; at that point you can (a) duplicate the code in <em>AddCategory_Completed</em> (bad idea!),  (b) have the <em>ManageCategoriesForm</em> class call the  <em>AddCategory_Completed </em>method of <em>CreatePostForm</em> (better) or (c) move <em>CreatePostForm</em> someplace else. (best)</li>
<li>If UI components were responsible for communicating with the server to update themselves,  performance could be destroyed by unnecessary communications,  with no guarantee that UI components would be updated consistently.</li>
</ol>
<p>I&#8217;m sorry to admit that,  when I built my first GWT app,  I ran into all of the above problems,  plus a number of others.  I tried a number of ad hoc solutions until I was forced to sit down and develop an architecture (the one below) that doesn&#8217;t run out of steam.  Today,  you can do better.</p>
<h2>Separating the Model And The View</h2>
<p>Ok,  the plan is to create two classes:  <em>CategoryList</em> and <em>CategoryListBox</em> that work together to solve the problem of updating CategoryList boxes.  <em>CategoryList</em> is a singleton:  it keeps track of the current state of the category list and keeps a list of clients that need to know when the list is updated.</p>
<p><a href="http://gen5.info/q/wp-content/uploads/2008/07/objects.png"><img class="alignnone size-full wp-image-39" title="objects" src="http://gen5.info/q/wp-content/uploads/2008/07/objects.png" alt="" width="500" height="278" /></a></p>
<p>The code for <em>CategoryList</em> looks like:</p>
<pre>[41] public class CategoryList {
[42]    private static CategoryList _Instance;
[43]    public static CategoryList Instance {
[44]       get {
[45]           if (_Instance==null)
[46]              _Instance=new CategoryList();
[47]
[48]           return _Instance;
[49]       }
[50]    }
[51]
[52]    private List&lt;ListBoxItems&gt; Items {get; set;}
[53]    private List&lt;ICategoryListener&gt; Listeners;
[54]    private CategoryList() { ... construct ...};</pre>
<p>Java programmers might notice a few C#-isms here,  in particular the way the class defines a static property called <em>Instance</em> that other classes use.  We don&#8217;t,  however,  use the C# event mechanism,  because it doesn&#8217;t do exactly what we want to do.</p>
<p>We call <em>UpdateItems</em> when there&#8217;s a change in the category list,  or when we initialize the category list when the application starts.  <em>UpdateItems</em> as an ordinary method,  although a C# stylist might probably make the <em>Items</em> property public and put the following logic in the setter:</p>
<pre>[55]   public void UpdateItems(List&lt;ListBoxItems&gt; items) {
[56]      Items=items;
[57]      foreach(var l in Listeners) {
[58]         l.UpdateItems(Items);
[59]      }
[60]   }</pre>
<p><em>CategoryListBoxes</em> will register and unregister themselves with the <em>CategoryList</em> with the following methods:</p>
<pre>[61]    public AddListener(ICategoryListener l) {
[62]       Listeners.Add(l);
[63]       l.UpdateItems(Items);
[64]    }
[65]    public RemoveListener(ICategoryListener l) {
[66]       Listeners.Remove(l);
[67]    }
[68] }</pre>
<p>Note that we could have built all of this logic into the CategoryListBox,  but by introducing the <em>CategoryList</em> class and the <em>ICategoryListener</em> interface,  we&#8217;ve decoupled the model from the view,  and given ourselves the option to create new visual representations of the category list.  (Wordpress,  for instance has a distinct representation of the category list on the &#8220;Manage Category&#8221; screens and more than one way you can show a category list to your viewers.)</p>
<p>An interesting point is that <em>AddListener</em> immediately updates the listener when it registers itself.  This is a pattern that handles asynchronous initialization:  so long as the <em>Items</em> property starts out as something harmless,  <em>ICategoryListeners</em> formed before app initialization is completed will be initialized when the application initialization code calls <em>UpdateItems</em>.  If an <em>ICategoryListener</em> is created later,  it gets initialized upon registration  &#8212; either way you&#8217;re covered without having to think about it.</p>
<p>Let&#8217;s take a look at the CategoryListBox,  which extends ListBox and implements ICategoryListener.</p>
<pre>[70] public CategoryListBox: ListBox, ICategoryListener {</pre>
<p>It implements <em>ICategoryListener</em> by implementing the <em>UpdateItems</em> method:</p>
<pre>[71]   public UpdateItems(List&lt;ListBoxItems&gt; items) {
[72]      Items=items
[73]   }</pre>
<p>We&#8217;re going to implement registration and deregistration GWT style,  because GWT has particularly strict requirements for how we can access UI components.  We&#8217;re only allowed to manipulate UI components that are attached to the underyling HTML document tree &#8212; by registering and deregistering when the component is attached and detached,  components get updated at the proper times:</p>
<pre>[74]   public OnAttach() {
[75]      super.OnAttach();
[76]      CategoryList.Instance.AddListener(this);
[77]   }
[78]
[79]   public OnDetach() {
[80]      CategoryList.Instance.RemoveListner(this);
[81]      super.OnDetach();
[82]   }</pre>
<p>The GWT style is particularly nice in that it prevents long-lasting circular references between the view and the model:  once you remove the view from the visual,  the reference in the model goes away.  Silverlight is more forgiving in where you can register the control:  you can do it either the constructor or the <em>Loaded</em> event,  but I don&#8217;t see an equivalent <em>Unloaded</em> event which could be used for automatic deregistration &#8212; manual deregistration may be necessary to prevent memory leaks.</p>
<p>So what have we got?</p>
<p>We&#8217;ve got a <em>CategoryListBox </em>control that works together with the <em>CategoryList</em> singleton to keep itself updated.  So long as we call <em>CategoryList.UpdateItems()</em> during the initialization process,  we can just include a <em>CategoryListBox</em> where we want it and never worry about initialization or updating.  We can even create new <em>ICategoryListeners</em> if we want to make other visual controls that display the category list.  This is a path to simple and scalable development.</p>
<h2>What happened to the &#8220;Controller?&#8221;</h2>
<p>The Model-View-Controller paradigm is a perennially popular buzzword in computing.  The phrase was coined in the early 1980&#8217;s to describe a <a href="http://st-www.cs.uiuc.edu/users/smarch/st-docs/mvc.html">particular implementation in Smalltalk</a>,  which was one of the first implementations of a modern GUI.  The Controller is a third component that mediates between the View,  Controller and their environment.  Although Controllers are widspread in server-based web applications, the Controller often withers away in today&#8217;s GUI environments,  because it&#8217;s functions are often implemented by the event-handling mechanisms that come with the environment.  In this case,  &#8220;Controller&#8221;-like logic is embedded in certain methods of the <em>CategoryList</em>.</p>
<p>Note that there are two objects here that could be called a &#8220;Model&#8221;.  I&#8217;m calling the <em>CategoryList</em> a model because it has a 1-1 relationship with an object on the server:  the <em>categoryList</em> table.  <em>CategoryList</em> is a relatively persistent object that lasts for the lifetime of the RIA.  There&#8217;s another kind of &#8220;Model&#8221; object,  the <em>List&lt;ListBoxItems&gt;</em> that is stored in the <em>Items</em> property of <em>CategoryList</em> and is passed to a <em>ICategoryListener</em> during initialization or update &#8212; that object represents the state of the <em>categoryList</em> table at a particular instance time.  The generic <em>List&lt;&gt;</em> is an adequate representation of the state of <em>categoryList</em>,  although there are many cases where we might want to define a new class to represent the momentary state of a server object.</p>
<p>Something else funny about <em>CategoryList</em> is that it doesn&#8217;t export a public <em>Items</em> property.  It certainly could,  bu I chose not to because <strong><em>a getter for an asynchronous model object is making an empty promise</em></strong>.</p>
<p>A getter in a synchronous application can always initialize or update itself before returning:  a similar method in an asynchronous object must return to it&#8217;s caller before it can receive information from the server.   As asynchronous model can return a cached value of <em>Items</em> if available,  but it can make a much firmer promise to deliver correct updates of <em>Items</em> when they become available.  <em>CategoryList</em> does,  however,  deliver a cached copy of <em>Items </em>to  <em>CategoryListeners</em> after registration,  as this is an effective and efficient mechanism for initialization.</p>
<p>Would it be possible to define only a temporary &#8216;model&#8217; class and put a single <em>Controller</em> class in charge of updates?  Sure.  I think that would make more sense in a dynamically typed language like Javascript than it does in Java or C#,  since it would be hard for such a <em>Controller</em> to enforce type-safety.  Could we call <em>CategoryList</em> a <em>Controller</em>?  Perhaps,  but I think <em>CategoryList</em> is a logical place to locate methods that manipulate the <em>categoryList</em> &#8212; it really is a representation of a persistent object.</p>
<h2>What next?</h2>
<p>This is a good start,  but we haven&#8217;t entirely solved the RIA architecture problem.  Let&#8217;s talk about some of the issues we&#8217;d face if we generalized this approach:</p>
<ol>
<li>What if there was more than one type of dropdown list?  We ought to have an inheritance hierarchy from which we can derive multiple types of dropdown lists.  This could include mutable lists such as <em>ContributorTypeListBox</em> as well as immutable lists such as <em>USStateListBox</em>.</li>
<li>There is just one <em>CategoryList</em> in the application:  in some sense it&#8217;s globally scoped.  What if we want to represent a <em>BlogPost</em> or a <em>Contributor? </em>Simple,  use a <a href="http://en.wikipedia.org/wiki/Multiton">Multiton</a> instead of a Singleton.  Rather than writing <em>CategoryList.Instance</em>,  you might write <em>BlogPost.Instance[25]</em>,  where <em>25</em> is the primary key of the blog post.  The logic behind <em>Instance[] </em>is responsible for maintaining one and only one instance of <em>BlogPost</em> per actual blog post.</li>
<li>Isn&#8217;t the updating logic in the <em>CategoryList</em> and <em>CategoryListBox</em> repetitive?  It is.  A mature framework will either push this logic up into superclasses (kind of an embedded controller),  or push it out into a <em>Controller</em>.  The best approach will depend on the characteristics of the environment and the application.</li>
</ol>
<p>I&#8217;ll be elaborating on these issues in future postings:  subscribe to my <a href="http://feeds.feedburner.com/Generation5">RSS feed</a> to keep up to date!</p>
<h2>Conclusion</h2>
<p>It&#8217;s simple to initialize and update data in the simplest RIA&#8217;s,  but asynchronous communications makes it increasingly difficult as applications grow in complexity.  A simple approach to data updating that is reliable and maintainable is to create a set of persistent model classes that maintain:</p>
<ol>
<li>A cache of the latest data value,  and</li>
<li>A list of dependent view objects</li>
</ol>
<p>Model objects are responsible for updating View objects,  which in turn,  are responsible for registering themselves with the Model.  The result of this is that View objects can be used composably in the UI:  View objects can be added to the user interface without explicitly writing code to manage data updates.</p>
<p>Although this pattern can be applied immediately,  we&#8217;ll get the most of it when it (or a similar pattern) is incorporated in client-side RIA frameworks.  There are only a few client-side frameworks today (for instance, <a href="http://www.techper.net/2008/06/09/patterns-of-gui-architecture-in-cairngorm-and-puremvc/">Cairngorn and PureMVC</a>) but I think we&#8217;ll see exciting developments in the next year:  subscribe to the <a href="http://feeds.feedburner.com/Generation5">Gen5 RSS feed</a> to keep up with developments.<br />
</p>
<p><a href="http://www.dotnetkicks.com/kick/?url=http%3a%2f%2fgen5.info%2fq%2f2008%2f07%2f18%2fthe-role-of-the-model-in-silverlight-gwt-and-javascript%2f"><img src="http://www.dotnetkicks.com/Services/Images/KickItImageGenerator.ashx?url=http%3a%2f%2fgen5.info%2fq%2f2008%2f07%2f18%2fthe-role-of-the-model-in-silverlight-gwt-and-javascript%2f" border="0" alt="kick it on DotNetKicks.com" /></a></p>
<img src="http://feeds.feedburner.com/~r/Generation5/~4/339248714" height="1" width="1"/>]]></content:encoded>
			<wfw:commentRss>http://gen5.info/q/2008/07/18/the-role-of-the-model-in-silverlight-gwt-and-javascript/feed/</wfw:commentRss>
		<feedburner:origLink>http://gen5.info/q/2008/07/18/the-role-of-the-model-in-silverlight-gwt-and-javascript/</feedburner:origLink></item>
		<item>
		<title>The Semantics of Dictionaries,  Maps and Hashtables</title>
		<link>http://feeds.feedburner.com/~r/Generation5/~3/338195050/</link>
		<comments>http://gen5.info/q/2008/07/17/the-semantics-of-dictionaries-maps-and-hashtables/#comments</comments>
		<pubDate>Thu, 17 Jul 2008 16:41:05 +0000</pubDate>
		<dc:creator>Paul Houle</dc:creator>
		
		<category><![CDATA[Dot Net]]></category>

		<category><![CDATA[Java]]></category>

		<category><![CDATA[PHP]]></category>

		<guid isPermaLink="false">http://gen5.info/q/?p=37</guid>
		<description><![CDATA[Introduction
The first language I used that put dictionaries on my fingertips was Perl,  where the solution to just about any problem involved writing something like
$hashtable{$key}=$value;
Perl called a dictionary a &#8216;hash&#8217;,  a reference to the way Perl implemented dictionaries.  (Dictionaries are commonly implemented with hashtables and b-trees,  but can also be implemented with linked-list and [...]]]></description>
			<content:encoded><![CDATA[<h2>Introduction</h2>
<p>The first language I used that put dictionaries on my fingertips was Perl,  where the solution to just about any problem involved writing something like</p>
<pre>$hashtable{$key}=$value;</pre>
<p>Perl called a dictionary a &#8216;hash&#8217;,  a reference to the way Perl implemented dictionaries.  (Dictionaries are commonly implemented with hashtables and b-trees,  but can also be implemented with linked-list and other structures.)  The syntax of Perl is a bit odd,  as you&#8217;d need to use $, # or % to reference scalar,  array or hash variables in different contexts,   but dictionaries with similar semantics became widespread in dynamic languages of that and succeeding generations, such as Python, PHP and Ruby.  &#8216;Map&#8217; container classes were introduced in Java about a decade ago,   and programmers are using dictionaries increasingly in static languages such as Java and C#.</p>
<p>Dictionaries are a convenient and efficient data structure,  but there&#8217;s are areas in which different mplementations behave differently:  for instance,  in what happens if you try to access an undefined key.   I think that cross-training is good for developers,  so this article compares this aspect of the semantics of dictionaries in four popular languages:  PHP,  Python,  Java and C#.</p>
<h2>Use cases</h2>
<p>There are two use cases for dictionaries,  so far as error handling is concerned:</p>
<ol>
<li>When you expect to look up undefined values,  and</li>
<li>When you don&#8217;t</li>
</ol>
<p>Let&#8217;s look at three examples:</p>
<h3>Computing A Histogram</h3>
<p>One common use for a dictionary is for counting items,  or recording that items in a list or stream have been seen.  In C#,  this is typically written something like:</p>
<pre>[01] var count=Dictionary&lt;int,int&gt;();
[02] foreach(int i in inputList) {
[03]   if (!counts.Contains(i))
[04]       count[i]=0;
[05]
[06]   count[i]=count[i]+1
[07] }</pre>
<p>The Dictionary <em>count</em> now contains the frequency of items <em>inputList</em>,  which could be useful for plotting  a histogram.  A similar pattern can be used if we wish to make a list of unique items found in inputList.  In either case,  looking up values that aren&#8217;t already in the hash is a fundamental part of the algorithm.</p>
<h3>Processing Input</h3>
<p>Sometimes,  we&#8217;re getting input from another subsystem,  and expect that some values might not be defined.  For instance,  suppose a web site has a search feature with a number of optional features,  and that queries are made by GET requests like:</p>
<pre>[08] search.php?q=kestrel
[09] search.php?q=admiral&amp;page=5
[10] search.php?q=laurie+anderson&amp;page=3&amp;in_category=music&amp;after_date=1985-02-07</pre>
<p>In this case,  the only required search parameter is &#8220;q&#8221;,  the query string &#8212; the rest are optional.  In PHP (like many other environments),  you can get at GET variables via a hashtable,  specifically,  the $_GET superglobal,  so (depending on how strict the error handling settings in your runtime are) you might write something like</p>
<pre>[11] if ($_GET["q"])) {
[12]     throw new InvalidInputException("You must specify a query");
[13] }
[14]
[15] if($_GET["after_date"]) {
[16]  ... add another WHERE clause to a SQL query ...
[17] }</pre>
<p>This depends,  quite precisely,  on two bits of sloppiness in PHP and Perl:  (a) Dereferencing an undefined key on a hash returns an <em>undefined</em> value,  which is something like a <em>null</em>.  (b)  both languages have a liberal definition of <em>true</em> and <em>false</em> in an if() statement.  As a result,  the code above is a bit quirky.  The <em>if()</em> at line 11 evaluates <em>false</em> if <em>q</em> is undefined,  or if <em>q</em> is the empty string.  That&#8217;s good.  However,  both the numeric value <em>0</em> and the string <em>&#8220;0&#8243;</em> also evaluate false.  As a result,  this code won&#8217;t allow a user to search for <em>&#8220;0&#8243;</em>,  and will ignore an (invalid) after_date of <em>0</em>,  rather than entering the block at line [16],  which hopefully would validate the date.</p>
<p>Java and C# developers might enjoy a moment of schadenfreude at the above example,  but they&#8217;ve all seen,  written and debugged examples of input handling code that just as quirky as the above PHP code &#8212; with several times the line count.  To set the record straight,  PHP programmers can use the isset() function to precisely test for the existence of a hash key:</p>
<pre>[11] if (isset($_GET["q"]))) {
[12]     throw new InvalidInputException("You must specify a query");
[13] }</pre>
<p>The unusual handling of &#8220;0&#8243; is the kind of fault that can survive for years in production software:  so long as nobody searches for &#8220;0&#8243;,  it&#8217;s quite harmless.  (See what you get if you search for a negative integer on Google.)  The worst threat that this kind of permissive evaluation poses is when it opens the door to a security attack,  but we&#8217;ve also seen that highly complex logic that strives to be &#8220;correct&#8221; in every situation can hide vulnerabilities too.</p>
<h3>Relatively Rigid Usage</h3>
<p>Let&#8217;s consider a third case:  passing a bundle of context in an asynchronous communications call in a Silverlight application written in C#.  You can do a lot worse than to use the signatures:</p>
<pre>[14] void BeginAsyncCall(InputType input,Dictionary&lt;string, object&gt; context,CallbackDelegate callback);
[15] void CallbackDelegate(ReturnType returnValue,Dictionary&lt;string,object&gt; context);</pre>
<p>The point here is that the callback might need to know something about the context in which the asynchronous function was called to do it&#8217;s work.  However,  this information may be idiosyncratic to the particular context in which the async function is called,   and is certainly not the business of the asynchronous function.  You might write something like</p>
<pre>[16] void Initiator() {
[17]   InputType input=...;
[18]   var context=Dictionary&lt;string,object&gt;();
[19]   context["ContextItemOne"]= (TypeA) ...;
[20]   context["ContextItemTwo"]= (TypeB) ...;
[21]   context["ContextItemThre"] = (TypeC) ...;
[22]   BeginAsyncCall(input,context,TheCallback);
[23] }
[24]
[25] void TheCallback(ReturnType output,Dictionary&lt;string,object&gt; context) {
[26]   ContextItemOne = (TypeA) context["ContextItemOne"];
[27]   ContextItemTwo = (TypeB) context["ContextItemTwo"];
[28]   ContextItemThree = (TypeC) context["ContextItemThree"];
[29]   ...
[30] }</pre>
<p>This is nice,  isn&#8217;t it?   You can pass any data values you want between <em>Initiator</em> and <em>TheCallback. </em>Sure,  the compiler isn&#8217;t checking the types of your arguments,  but loose coupling is called for in some situations.  Unfortunately it&#8217;s a little too loose in this case,  because we spelled the name of a key incorrectly on line 21.</p>
<p>What happens?</p>
<p>The [] operator on a dot-net Dictionary throws a <em>KeyNotFoundException</em> when we try to look up a key that doesn&#8217;t exist.   I&#8217;ve set a global exception handler for my Silverlight application which,  in debugging mode,  displays the stack trace.  The error gets quickly diagnosed and fixed.</p>
<h2>Four ways to deal with a missing value</h2>
<p>There are four tools that hashtables give programmers to access values associated with keys and detect missing values:</p>
<ol>
<li>Test if key exists</li>
<li>Throw exception if key doesn&#8217;t exist</li>
<li>Return default value (or null) if key doesn&#8217;t exist</li>
<li>TryGetValue</li>
</ol>
<h3>#1: Test if key exists</h3>
<pre><strong>PHP:</strong>    isset($hashtable[$key])
<strong>Python:</strong> key in hashtable
<strong>C#:</strong>     hashtable.Contains(key)
<strong>Java:</strong>   hashtable.containsKey(key)</pre>
<p>This operator can be used together with the #2 or #3 operator to safely access a hashtable.  Line [03]-[04] illustrates a common usage pattern.</p>
<p>One strong advantage of the explicit test is that it&#8217;s more clear to developers who spend time working in different language environments &#8212; you don&#8217;t need to remember or look in the manual to know if the language you&#8217;re working in today uses the #2 operator or the #3 operator.</p>
<p>Code that depends on the existence test can be more verbose than alternatives,  and can  be <em>structurally unstable</em>:  future edits can accidentally change the error handling properties of the code.  In multithreaded environments,  there&#8217;s a potential risk that an item can be added or removed between the existance check and an access &#8212; however,  the default collections in most environment are not thread-safe,  so you&#8217;re likely to have worse problems if a collection is being accessed concurrently.</p>
<h3>#2 Throw exception if key doesn&#8217;t exist</h3>
<pre><strong>Python:</strong> hashtable[key]
<strong>C#:</strong>     hashtable[key]</pre>
<p>This is a good choice when the non-existence of a key is really an exceptional event.  In that case,  the error condition is immediately propagated via the exception handling mechanism of the language,  which,  if properly used,  is almost certainly better than anything you&#8217;ll develop.  It&#8217;s awkward,  and probably inefficient,  if you think that non-existent keys will happen frequently.  Consider the following rewrite of the code between [01]-[07]</p>
<pre>[31] var count=Dictionary&lt;int,int&gt;();
[32] foreach(int i in inputList) {
[33]   int oldCount;
[34]   try {
[35]       oldCount=count[i];
[36]   } catch (KeyNotFoundException ex) {
[37]       oldCount=0
[38]   }
[39]
[40]   count[i]=oldCount+1
[41] }</pre>
<p>It may be a matter of taste,  but I think that&#8217;s just awful.</p>
<h3>#3 Return a default (often null) value if key doesn&#8217;t exist</h3>
<pre><strong>PHP:</strong>    $hashtable[key] (well,  almost)
<strong>Python:</strong> hashtable.get(key, [default value])
<strong>Java:</strong>   hashtable.get(key)</pre>
<p>This can be a convenient and compact operation.  Python&#8217;s form is particularly attractive because it lets us pick a specific default value.  If we use an extension method to add a Python-style GetValue operation in C#,  the code from [01]-[07] is simplified to</p>
<pre>[42] var count=Dictionary&lt;int,int&gt;();
[43] foreach(int i in inputList)
[44]   count[i]=count.GetValue(i,0)+1;</pre>
<p>It&#8217;s reasonable for the default default value to be null (or rather,  the default value of the type),  as it is in Python,  in which case we could use the ??-operator to write</p>
<pre>[42] var count=Dictionary&lt;int