Ken Birman’s textbook Reliable Distributed Systems, is an excellent introduction to this brave new world, focused on the construction of systems that are reliable — that keep working when something goes wrong. This is critical for rich internet applications (that work over an unreliable public internet) and for applications that run on large clusters (where there’s a lot of hardware to fail.) If you find his text is pricey, you’ll appreciate the slides from his Cornell course available on his home page.
]]>