If you fetch a piece of data more than once a month, you should store it locally instead.
Despite dramatic reductions in the cost of bandwidth, storage cost has been falling even faster. It’s now far cheaper to store something than to repeatedly transmit it. If you assume Amazon’s prices are a good proxy for the costs of large bandwidth and storage providers, then it’s clear: storing a byte for a month and moving a byte just once cost roughly the same.
The internet was architected with certain assumptions, one of which is that communication would be used to access expensive centralized resources. $10k workstations were connecting to million dollar mainframes. Now the cost of computation has fallen such that $1k servers can provide for hundreds if not thousands of <$1k desktops and laptops.
But yet the communication patterns of the internet remain largely unchanged. The amount of duplicate transmission on the internet is staggering. Because of the lack of any sort of widespread asynchronous multi-cast, content providers end up paying linear costs vs the number of users consuming their content. These economics skew the market toward large publishers or mediators even for user created content.
Content creators often hand over their copyrights because of these costs. It creates a market of vc funded rent seekers trading bandwidth for ad impressions.
Companies willing to mediate without consuming the copyright are rare (and worthy of our praise).
HTTP attempted to fill the gap by creating content based networking at layer 7. In practice it’s assumed that html representations are created uniquely for each request. Intermediate caching of other content is rare.
We as web developers share the blame, since by using http caching properly we could create a larger incentive for ISP’s to offer caching services.
Instead the publisher likely ends up paying a CDN instead to maintain storage at the edge of the network.
This is a problem technology can solve, but the current market provides little incentive for those solutions.
The architecture of the internet is out of sync with storage and networking costs.