[libtorrent] Disk cache

Discussion:

Arvid Norberg

2006-04-09 14:25:51 UTC

There has been some discussion on the IRC channel about benefits and
possible cache algorithms for a disk cache in libtorrent. I'm posting
this hoping that somebody could comment on some of the techniques or
assumptions.

The way I see it, there are two problems that a cache aims to solve:

1. seeding on a high speed connection will make the hard drive seek a
lot back and forth, just to do short reads.

2. downloading on a high speed connection will make the hard drive
seek a lot back and forth.

For 1, one assumption that can be made is that if block 0 is
requested from a piece, it's very likely that many more blocks from
that piece will be requested soon. So caching the entire piece at
once would probably be a good idea.

This is exactly what a normal OS' disk cache does. It reads ahead
into the cache. So I don't really see any benefit if caching this at
application level. One possible optimization would be if the disk
cache could be bypassed so that only one piece was cached (since the
disk cache in the OS may cache more than the piece we're reading
from). This would be hard to do, and especially hard to do in a
platform independent manner.

Is this a reasonable assumption?

I believe some clients also order the requested blocks in increasing
index order, and then reads them in that order. The idea is that it
will minimize the seek distance for the hard drive, since it's only
reading forward. Then of course it will jump back to the low index
again when it's time to read next batch of pieces.

Would this improve the disk performance?

Number 2 is a little bit more interesting though. Since every block
that is downloaded has to be written to disk, and since every block
is only downloaded once, you cannot really optimize away disk writes.
What you can do though, is to optimize the disk seeking.

uTorrent used to implement a write priority queue, where write
requests were put, and ordered in increasing index order. Once the
write cache was full, the pieces were written to disk in that order.
This was also done every 30 seconds, it the cache wasn't full yet.

from version 1.5 of uTorrent, this algorithm changed, into one that
caches blocks, and once all the blocks for one piece has been
downloaded, they are written to disk immediately. Ludvig (the author
of uTorrent) claims that the new algorithm worked better, and that
the hard drive didn't make as much noise anymore.

So, what do you think, should a write cache and read cache be
separated? Of course read operations would still look in the write
queue for up to date data, but a write operation wouldn't be able to
remove something from the read cache.

Should a cache be global or per torrent? I don't know how to minimize
the seeks between torrent. The assumption is that large parts of
files are unfragmented, so that most of the time, a seeking forward
is pretty cheap. In different torrents the storages are guaranteed to
be in different files, whose position on the media is unknown. It
would be desirable to have a global cache size though I guess.

Are there any other better caching algorithms?

Is there any points having a cache at all?

thanks.
--
Arvid Norberg

Matt Sicker

2006-04-09 14:43:33 UTC