<$BlogRSDUrl$>

Saturday, January 31, 2004

OOFS


Filesystems Revisited



I still haven't reworked my disk based FS into an object oriented FS. Object Relational Location Filesystem (ORLFS) will work as follows though:

Header Area
|--Superblock
|----Address of Object 0
|------Block containing Object 0
|------Byte Offset in block
|----Object for Beginning of Inode List
|--Journal List
|----Objects that function as Concatonated Journals, in order
Data Area
|--Objects

The idea is to allow objects to freely move around. Objects are referenced by their object id, which is used to locate the block and byte offset in the block at which the object resides. The object itself has its object ID embedded in its header, as well as the object ID of the previous and next physical objects. The access methods required to use this costs disk reads and seeks, but it provides a large amount of flexibility. Filesystems can easily be fscked, resized, defragmented, cyphered, encrypted, and compressed.

The entire disk is used by objects. Freespace is free-marked objects. Directories are directory-marked objects. Files are file-marked objects. Objects are split and redone as necessary, usually this would be free space objects losing pieces and/or becoming multiple objects. Objects do not have to be in order on the disk.

Here is an example:

{0000: Object list}{001A: Free space.................................}{0CF4: / inode directory entry}{0001: /boot/bzImage-2.6.1..................................}{002A: journal chunk............................}{19F7: Free space.............................................................................}


Now we can add a small file:

{0000: Object list}{001A: Free space}{0024: /a.out}{0025: free space}{0CF4: / directory{0001: /boot/bzImage-2.6.1..................................}{002A: journal chunk............................}{19F7: Free space.............................................................................}

Next, we can defrag:

{0000: Object list}{0024: /a.out}{0CF4: / directory{0001: /boot/bzImage-2.6.1..................................}{002A: journal chunk............................}{19F7: Free space.......................................................................................................................................}

A few bytes in the object link change in each of these operations, as well as forward/backward pointers to the next/previous objects. Of course, to move a journal chunk, you need to start a second journal chunk, flush everything from the first, and then free the first.

This will make on-the-fly shrinkage/growth and defragmentation easy, and will leave objects fairly managable. Going to any object is a matter of jumping to the Object List Index of that object (1 seek/read for a good driver); and reading that object off the disk (1 more seek/read for a good driver). This means O(2) to read any object based on knowing its object ID; and O(2d) to read a directory entry, assuming none of the directroy objects along the way are fragmented. So, /usr/src/linux/Makefile (first fragment only) would be read off disk after 10 seek/read pairs (/, /usr, /usr/src, /usr/src/linux, /usr/src/linux/Makefile; 2 for each) 190-210 mS for a 5400 RPM drive (19-21 mS seek time).

Sound plausable?

This page is powered by Blogger. Isn't yours?