About this captureCOLLECTED BY Organization: Alexa Crawls Starting in 1996, Alexa Internet has been donating their crawl data to the Internet Archive. Flowing in every day, these data are added to the Wayback Machine after an embargo period. Collection: Alexa Crawls DE Crawl data donated by Alexa Internet. This data is currently not publicly accessible TIMESTAMPSGo to the first, previous, next, last section, table of contents.
- There are various `todo’ and `fixme’ comments in the code. This is anobvious place to start for improving kernel code quality.
- Allow two de/compression work areas: one for compression, the other fordecompression.
- Once the above works, make the compression work areaallocated/deallocated according to demand. One way of doing this isjust to deallocate the work area as soon as we’re finished (unless thereare other processes lining up to use it) and reallocate when we nextneed it (which could be almost immediately, unfortunately, e.g. aprocess creating a large file). There are other solutions….Perhaps the best (of the relatively easy) is as follows: have one workarea that is always present, just as at present (0.4.1). If we get arequest for a read while another process is using the main area forwriting (i.e. compressing), then allocate a new work area. In order toavoid allocating/deallocating in quick succession, the area isn’tdeallocated until the write is finished. (Alternatively, the new areacould become the primary one…. This would be more efficient insome ways (e.g. our contents cache is more useful), but we’d have to usethe one allocation method for both areas rather than vmalloc()for the stable one and kmalloc() for the transient one. I don’tknow what the advantages of vmalloc() are over kmalloc(),so I don’t know how costly this is.) Note that this `deallocate whencompression finishes’ scheme provides some help in the `manydecompressions in succession’ case, but not the `many compressions insuccession’ case (e.g. process creating large file). So maybe this ideaisn’t such a great improvement on that suggested in previous paragraphafter all. Its only advantage is that less memory is spent when thesystem is compressing but not decompressing.Of course, compression in user space would provide an excellentsolution.
- Free preallocated blocks when we fail to decompress a cluster.
- Support the SYNC flag. (SYNC is only partially supported instandard 2.0 kernels, so this isn’t a high immediate priority; onthe other hand, it is better-, or maybe even fully-, supported on 2.2.)
- Support compress-on-idle. Have the kernel maintain a queue ofinodes to compress. When a file’s data is accessed (read), moveit to the end of the queue if it is in the queue. When we raiseEXT2_DIRTY_FL (in ext2_write_file(), ext2_truncate(),ext2_ioctl()), insert inode into queue if it is not there already.Have ext2_cleanup_compressed_inode() remove the inode from thequeue (wenn appropriate).
- When I (Antoine) was thinking about the ideal compressed filesystem, I imagined we could wait a little more before we reallycompress file that have been accessed. Since access to compressedcluster is slower, we could uncompress them and mark the file dirty,but instead of compressing it again when the inode is put, just linkit into a special directory that would hold all dirty files. Filesin this directory could be compressed again after a certain amountof time, or when we start to lack free blocks. This is a feature Iliked in tcx. This is no more than a cache where the uncompressedblock would be stored on the disk, and that would persist even afterthe machine has been stopped.
- Get rid of EXT2_NOCOMPR_FL (i.e. `chattr +X’, theattribute flag that provides access to raw compressed data andprevents more than one process to have the file open) and replaceit with GNU’s O_NOTRANSfcntl flag. (The advantageof O_NOTRANS is that (i) it’s more standard (but only onHURD, not Linux) and (ii) other processes can still have the fileopen for normal access (though I think that this should beimplemented as a per-file-descriptor flag rather than aper-file-opening flag).Linux already has mandatory file locking support, so we can usethat instead where we need it.
- Better provision for logfiles, where we’d like to compress all butthe last (incomplete) cluster. (If the last cluster is compressedthen we have to uncompress and recompress on every write — andremember that logfiles are usually sync’ed after every line.) This has now been partially provided for with the `none’algorithm: set the algorithm to `none’, then every once in a whilechange it to `gzip’ and then immediately back to `none’. Changingthe algorithm from `none’ to any other algorithm is a specialcase: the kernel looks at all clusters that aren’t alreadycompressed (which, for log files, should mean only the lastcluster or two) and tries to compress them with the new algorithm.
- Support the EXT2_SECRM_FL (i.e. secure deletion) flag. (Thisisn’t supported even in the standard 2.2 kernel, so this isn’t a highpriority.)
- Add some mount options? Anything useful come to mind?
- Make an e2compr kernel module? The aim would be that people caninsmod e2compr into a kernel even if that kernel already has theext2 fs (without e2compr) built in. Useful if you don’t have achoice of the base kernel, as may be the case when upgrading someLinux distributions. However, I’m not sure how to implement this.The e2compr patch must modify some core ext2 routines, which, althoughI (pjm) don’t know much about modules, I think is impossible if theext2 filesystem is already compiled into the kernel not as a module.(Does anyone know?)
- Support bmap.bmap returns the block number on the filesystem where agiven <inode, block> pair is stored, information which is presumablygoing to be used to access the raw device at that point rather thango through the filesystem (and e2compr).I don’t think we can implement bmap directly: it would requiredecompressing all clusters that were requested through bmap calls,which is undesirable at best, particularly on a read-onlyfilesystem.More sensible might be to look at all the callers of bmap and seeif they can be coerced to go through e2compr for files withEXT2_COMPRBLK_FL set.One might alternatively think about having a `virtual device’ fore2compr clusters. A virtual device would also help with cachinguncompressed clusters, btw. Antoine’s objection to creating avirtual device was that there’s no trivial mapping between <inode*, blockno> pairs and a 32-bit block number on the virtual device.We couldprobably grab the allocation code from any of the filesystems,though I suggest we could benefit by optimising it for sparseoccupation of the 32-bit block address space. I’ve done somepreliminary work for this, but we won’t see any real work done onit until the current incarnation of e2compr is working reasonablywell.
- Until the bmap problem is addressed, try to make it impossible tocompress a file that’s being used as a swapfile.
- Allow modification of cluster size even for already compressedfiles. (Note that this would involve decompressing thenrecompressing the whole file.)
- Recompress the whole file when the algorithm is changed? Wecertainly wouldn’t want this to happen if we change the algorithm to`none’. This functionality is being added to e2compress, but Idon’t think that the usual kernel behaviour will change.
Go to the first, previous, next, last section, table of contents.