File-System: Delayed Allocation, fsync() solution

September 19, 2009 Matteo Bertozzi | Filed Under Unix C | No Comments

Last week on LWN Valerie Aurora as posted a great article (as always) POSIX v. reality: A position on O_PONIES. http://lwn.net/Articles/351422/.

fsync() is often more expensive than it absolutely needs to be. The easiest way to implement it is to force out every outstanding write to the file system, regardless of whether it is a journaling file system, a COW file system, or a file system with no crash recovery mechanism whatsoever. This is because it is very difficult to map backward from a given file to the dirty file system blocks needing to be written to disk in order to create a consistent file system containing those changes. For example, the block containing the bitmap for newly allocated file data blocks may also have been changed by a later allocation for a different file, which then requires that we also write out the indirect blocks pointing to the data for that second file, which changes another bitmap block… When you solve the problem of tracing specific dependencies of any particular write, you end up with the complexity of soft updates. No surprise then, that most file systems take the brute force approach, with the result that fsync() commonly takes time proportional to all outstanding writes to the file system.

Thinking for a while… Is not hard to implement fsync(), to flush just one file using Delayed Allocation (Allocate on Flush). We’ve all new data in memory, and old data stay on its block. So, modification in place means that you’ve just to flush blocks. Append means that you need to allocate something.
RaleighFS in Memory StructureThe image above, is a little bit old, but it’s the original idea of the RaleighFS in Memory Structure. There’re general information like super-block, bad blocks list and free blocks lis, current cache size and some other things. But today I’m focusing on Block Cache and Write Items Cache.

When you open a file, you load its metadata in memory then when you need file content you load it in the Block Cache. RaleighFS data block contains the File Key, so you can easily find blocks with specified key, but also you can easily find your file blocks using pointers.

So, why is easy to fsync() only the specified file with Delayed Allocation:

  • Modification in place, requires just a scan of the block cache to find what blocks are to flush (and obviously metadata)
  • Append to file, has all new data in memory and there’re no Modification to B*Tree(s) or Free blocks until you flush something.
  • Remove is just a command, and when fsync() is called all the delete operation on B*Tree(s) and list will take places.

But remember, syncing just one file is not a good idea, Trust your File-System’s flush policy!

No Comments yet »

RSS feed for comments on this post. TrackBack URI

Leave a comment

XHTML: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>