Discussion:
[libtorrent] memory mapped I/O progress
Steven Siloti
2016-11-03 03:20:43 UTC
Permalink
I've been working on figuring out how to integrate mmapped file I/O with
libtorrent and before I go further I want to make sure this fits with
arvid's plans for this feature.

The idea is to wrap the file class with a pool_file class which handles the
details of managing the mappings and performing I/O using them. Instances
of pool_file are tied to an instance of file_pool because the two need to
interact to perform lru eviction of mappings on 32-bit systems. A wrapper
class is also needed because if the file class held strong references to
the mappings it would create a reference cycle.

On 64-bit systems things are relatively simple. We lazily map the entire
file and create a new mapping any time the size of the file is changed.
Even with 64-bit we need to reference count the mappings so that extending
the file size doesn't disturb concurrent I/O.

There is a potential performance gotcha in that munmap() is O(n) with
respect to the number of faulted in pages being unmapped. We therefor need
to try to avoid the case of a file being repeatedly extended by small
amounts. With full allocation this obviously isn't a problem.

On 32-bit systems mappings are created on-demand and cached up to some
maximum total mapped size. The size of each mapping is capped at some
arbitrary value, say 16MB. When a new mapping is requested the lru mappings
will be evicted until there is enough free space to create the new mapping.
There's a lot of complexity here which I haven't fully worked out yet.

Currently I only have 64-bit POSIX support codded up and passing the unit
tests. You can find it here:

https://github.com/ssiloti/libtorrent/commit/1dd5b130c2c4859dc1f895272cd3cc
e7e5939fd4

It still needs a lot of polish obviously, so excuse any oddities in the
details.
Arvid Norberg
2016-11-03 03:59:21 UTC
Permalink
Post by Steven Siloti
I've been working on figuring out how to integrate mmapped file I/O with
libtorrent and before I go further I want to make sure this fits with
arvid's plans for this feature.
The idea is to wrap the file class with a pool_file class which handles the
details of managing the mappings and performing I/O using them. Instances
of pool_file are tied to an instance of file_pool because the two need to
interact to perform lru eviction of mappings on 32-bit systems. A wrapper
class is also needed because if the file class held strong references to
the mappings it would create a reference cycle.
On 64-bit systems things are relatively simple. We lazily map the entire
file and create a new mapping any time the size of the file is changed.
Even with 64-bit we need to reference count the mappings so that extending
the file size doesn't disturb concurrent I/O.
It seems simpler to always truncate the files to their final size. Do you
see a problem with that?
I'm pretty sure the "hole" isn't committed or even allocated on disk
up-front, right?

(my understanding is that space on disk is allocated when a page is
committed to a virtual page, and that's when you get a signal if the disk
is full).
Post by Steven Siloti
There is a potential performance gotcha in that munmap() is O(n) with
respect to the number of faulted in pages being unmapped. We therefor need
to try to avoid the case of a file being repeatedly extended by small
amounts. With full allocation this obviously isn't a problem.
I would imagine, at least for the first pass, we could avoid this by always
truncating files to their final size. We still need to unmap when closing a
torrent, but that would just end up stalling one thread presumably.
Post by Steven Siloti
On 32-bit systems mappings are created on-demand and cached up to some
maximum total mapped size. The size of each mapping is capped at some
arbitrary value, say 16MB. When a new mapping is requested the lru mappings
will be evicted until there is enough free space to create the new mapping.
There's a lot of complexity here which I haven't fully worked out yet.
To keep things simple for a first pass, my plan was:

1. to ignore 32 bit system and always map everything
2. always truncate files to their full size on torrent startup (but not
necessarily allocate them)
3. Delete the block cache, file and file_pool
4. collapse the default_storage into disk_io_thread

The way I envision getting there is:

1. turn disk_io_thread (or more specifically, disk_interface) into the
customization point, instead of storage interface. This encompasses state
across torrents, the thread pool, job queue and block cache.
2. modify the disk_interface and disk_io_thread to be the one creating
storage_interface objects. But that can be an opaque type now, since using
it is internal to the disk_io_thread class.
3. implement a simple concrete class of disk_interface that has no cache,
no file pool, no threads (or maybe one thread) and just uses posix file
descriptor I/O. This would be the lowest common denominator that could be
used on any system that isn't 64 bits or doesn't support mmap

In the last few days I've been cleaning up and simplifying the
disk_interface to help with my experiment of making this change. The
deprecated functions have been removed, the disk job to just load a torrent
file was moved out into a separate on-demand worker thread.

I'm starting to think that perhaps it would make sense to cut a release
branch of master before actually starting to push patches towards this.

Anyway, it sounds like we've started at different ends and perhaps we could
meet in the middle. I will do some surgery to disk_io_thread and
default_storage, but I'm hoping to leave the bulk of the code intact.

Currently I only have 64-bit POSIX support codded up and passing the unit
Post by Steven Siloti
https://github.com/ssiloti/libtorrent/commit/
1dd5b130c2c4859dc1f895272cd3cc
e7e5939fd4
It still needs a lot of polish obviously, so excuse any oddities in the
details.
neat! hopefully I'll have some time to check this out over the weekend.
--
Arvid Norberg
Steven Siloti
2016-11-03 04:26:59 UTC
Permalink
Post by Arvid Norberg
1. to ignore 32 bit system and always map everything
2. always truncate files to their full size on torrent startup (but not
necessarily allocate them)
3. Delete the block cache, file and file_pool
4. collapse the default_storage into disk_io_thread
1. turn disk_io_thread (or more specifically, disk_interface) into the
customization point, instead of storage interface. This encompasses state
across torrents, the thread pool, job queue and block cache.
2. modify the disk_interface and disk_io_thread to be the one creating
storage_interface objects. But that can be an opaque type now, since using
it is internal to the disk_io_thread class.
3. implement a simple concrete class of disk_interface that has no cache,
no file pool, no threads (or maybe one thread) and just uses posix file
descriptor I/O. This would be the lowest common denominator that could be
used on any system that isn't 64 bits or doesn't support mmap
In the last few days I've been cleaning up and simplifying the
disk_interface to help with my experiment of making this change. The
deprecated functions have been removed, the disk job to just load a torrent
file was moved out into a separate on-demand worker thread.
I'm starting to think that perhaps it would make sense to cut a release
branch of master before actually starting to push patches towards this.
Anyway, it sounds like we've started at different ends and perhaps we could
meet in the middle. I will do some surgery to disk_io_thread and
default_storage, but I'm hoping to leave the bulk of the code intact.
Ok, if we're going to only support mmap on 64-bit systems and require
truncation on open() then managing the mappings becomes trivial. Consider
the code I posted a demonstration of how much nastier the alternative is :)
Arvid Norberg
2016-11-06 18:13:24 UTC
Permalink
Post by Steven Siloti
Post by Arvid Norberg
[...]
Anyway, it sounds like we've started at different ends and perhaps we
could
Post by Arvid Norberg
meet in the middle. I will do some surgery to disk_io_thread and
default_storage, but I'm hoping to leave the bulk of the code intact.
Post by Arvid Norberg
Ok, if we're going to only support mmap on 64-bit systems and require
truncation on open() then managing the mappings becomes trivial. Consider
the code I posted a demonstration of how much nastier the alternative is :)
I think it probably makes sense to properly support 32 bit mmaps at some
point, I just imagine that these days most 32 bit systems are embedded,
where multiple disk threads may not make sense, and the cost of system
calls for read() and write() may not significantly contribute to the
throughput bottleneck.

However, even on 64 bit systems there's still a challenge of error
handling, especially portably to systems other than windows and linux
(which I believe have sufficient support for managing signals).

I envision an RAII class that installs a signal handler and sets up a
sigsetjmp() into a longjmp context in a thread local variable (or is there
a better way to communicate this via the sigaction structure?).

Ideally the signal handle would only be installed for SIGBUS and SIGSEGV
(the signals used to report errors from mapped files) and only installed
for the current thread. I'm not sure that's possible in a portable way.

Perhaps it would make sense to install the signal handlers once, and in the
handlers check the address that caused the fault. If it happened inside one
of our mapped regions, then longjump out, otherwise invoke the default
signal handler. Perhaps a simpler way would be to use the existence of the
longjmp context as a way to communicate whether we're currently accessing a
memory map. The destructor of the RAII object could clear it so any
subsequent signals would be treated as actual bugs caught by the debugger.

I will do some research of how these things work, especially when
longjumping across initialization of stack variables, we would have to make
sure we don't do that probably.

Btw, it's not just ENOSPC that can cause signals. If the filesystem the
file is on is unmounted, or if the file is truncated to a smaller size from
an outside process, we'd get a SIGSEGV.
--
Arvid Norberg
Loading...