Jason Watkins

Storage and networking costs break the Internet

In Blog on December 5, 2008 at 12:17 am

If you fetch a piece of data more than once a month, you should store it locally instead.

Despite dramatic reductions in the cost of bandwidth, storage cost has been falling even faster. It’s now far cheaper to store something than to repeatedly transmit it. If you assume Amazon’s prices are a good proxy for the costs of large bandwidth and storage providers, then it’s clear: storing a byte for a month and moving a byte just once cost roughly the same.

The internet was architected with certain assumptions, one of which is that communication would be used to access expensive centralized resources. $10k workstations were connecting to million dollar mainframes. Now the cost of computation has fallen such that $1k servers can provide for hundreds if not thousands of <$1k desktops and laptops.

But yet the communication patterns of the internet remain largely unchanged. The amount of duplicate transmission on the internet is staggering. Because of the lack of any sort of widespread asynchronous multi-cast, content providers end up paying linear costs vs the number of users consuming their content. These economics skew the market toward large publishers or mediators even for user created content.

Content creators often hand over their copyrights because of these costs. It creates a market of vc funded rent seekers trading bandwidth for ad impressions.

Companies willing to mediate without consuming the copyright are rare (and worthy of our praise).

HTTP attempted to fill the gap by creating content based networking at layer 7. In practice it’s assumed that html representations are created uniquely for each request. Intermediate caching of other content is rare.

We as web developers share the blame, since by using http caching properly we could create a larger incentive for ISP’s to offer caching services.

Instead the publisher likely ends up paying a CDN instead to maintain storage at the edge of the network.

This is a problem technology can solve, but the current market provides little incentive for those solutions.

The architecture of the internet is out of sync with storage and networking costs.

PostgreSQL will finally come (replication) batteries included

In Uncategorized on June 3, 2008 at 11:41 pm

Personally I’d say this is about 4 years overdue. While there are replication options for PostgreSQL, no one seems to particularly care. The lack of simple built in replication is the second most common reason I see for MySQL adoption over PostgreSQL. The most common is speed, despite results that seem to indicate that the two are roughly at parity.

I think this is a great move that will really boost PostgreSQL adoption.

Way to go ChrisA: AppEngine ported to EC2

In Uncategorized on April 14, 2008 at 1:09 pm

So local Portland ruby hacker Chris Anderson ported the AppEngine SDK to run on EC2. It’s more of a proof of concept than production ready at the moment. But prove the concept it does. Not a bad hack for someone who’s never done Python before. Fork it yourself at GitHub.