Go to Top

Unix & Linux: how the file system behind Android stores data

First of all, what exactly is UNIX?

UNIX is a multi-user, multi-tasking operating system initially developed in the1960s by AT&T Bell labs. The majority of UNIX is written in the C programming language and is therefore capable of running on a wide range of computer architectures. Hardware suppliers such as SCO, SGI, IBM, Hewlett Packard and Sun each provide their own versions of UNIX to run on their high-end servers.

Linux explained

Linux, sometimes called GNU/Linux, is a free open source Unix-like operating system. The GNU project began in 1984 with the aim of creating a free version of UNIX. The project, however, lacked a fully functioning kernel until 1991 when a 3rd party kernel called Linux was released by Linus Torvalds. The Linux kernel is normally released in combination with various packages from the GNU project and other sources.

Getting to know their file systems

Fast File System, EXT 2, 3 & 4

EXT 2 file system (sometimes known as the second extended file system) was originally designed for the Linux platform and was released in 1993. It has since been superseded by EXT3 which added a few new features the most notable being the journaling. EXT 4 is currently the default file system for most of the popular Linux distributions.

Like many other UNIX file systems, the main structure is very similar to that of the original UNIX Fast File System (FFS). The partition is split up into Cylinder Groups and originally each of these groups contains a Superblock, Group Table, Data Bitmap, Inode Bitmap, Inodes and finally Data. However, some versions of EXT 2 and EXT 3 have Sparse Cylinder Groups which contain only Inodes and Data.

Fast File System

The EXT 2, 3 & 4 file systems have a fixed number of inodes and these are mapped out on the partition by the Superblocks and group tables. These inodes are used to represent both files and directories and contains:

  • File type
  • Access rights
  • Owners
  • Timestamps
  • Size
  • Data block pointers

When a Cylinder Group is deleted, it can usually be replaced by another Cylinder Group, protecting the data.

EXT 2, 3 & 4 file systems

As you can see in the image above, in EXT 3 the data block pointers are the part of the inode that addresses the file data on the drive.

The first twelve point to the physical blocks containing the data, the last three however point to the data blocks indirectly (single, double and triple indirect pointers). The single indirect contains the address of a block containing direct pointers as shown in the diagram, the double indirect points to a block containing single indirect pointers and logically the triple indirect points to blocks containing double indirect pointers. This can be very difficult to visualise but basically each step of indirection allows the amount of data to be addressed to increase exponentially. It is worth noting that recovery from EXT 3 is very difficult and results tend not to be as good as that achieve in other file systems.

In EXT 4 the data structure remains very similar to EXT 3. The main difference is that Extents have replaced the direct and indirect data block pointers, significantly improving large file performance. When data is deleted in EXT 4, the recovery tends to be more successful as the Extents themselves aren’t deleted, making it possible to rebuild the lost file.

XFS

XFS was originally developed by SGI in 1993 to overcome some of the performance and scaling limitations of FFS. It was first released in 1994 with IRIX v5.3 and in 2000 SGI released the code as open source, it was then officially included in the Linux kernel from 2003. XFS structure is very similar to that of FFS at first glance. It keeps the cylinder group system of splitting the partition but names them allocation groups and it also has superblocks and uses Inodes to contain the file metadata however this is where the similarities end.

Unlike FFS the file system does not have a fixed number of inodes pre-allocated on the drive, instead it is the job of each allocation group to monitor free space and dynamically allocates inodes as required by the file system. These inodes are organised in a balanced B+ tree, which makes traversing the directory structure much quicker than the traditional list system implemented in FFS. However to maintain high performance the B+ tree must remain balanced as more inodes are allocated and this requires a relatively advanced algorithm. XFS inodes also use extents (run lists) to address data instead of addressing individual data blocks like FFS as this normally scales a better for large files.

XFS also includes journaling to offer file system recoverability in case of system crashes and power failures. However XFS only journals the file system metadata so while the volume can be repaired and mounted there can still be user data loss.

Another feature of XFS is delayed allocation, a method of allocating the blocks for file data while caching the data in memory. This data is then only actually written to the file system when the cache is flushed by the operating system. The main advantages of this approach are that it can often dramatically reduce fragmentation especially with files that expand slowly and it often reduces CPU load.

JFS (Journaled File System)

In 1990, IBM first released JFS with AIX version 3.1. Later in 1999, IBM ported it to OS/2 and released a version of JFS to open source community and by 2006 there was a stable version for Linux.

The design philosophy behind JFS is comparable to that of XFS and they overcome many of the performance limitations of FFS in similar ways even though the final implementations are different. They both use metadata journaling to offer file system recoverability, dynamically allocated inodes, extents to address the data area and also B+ trees to transverse directories.

ReiserFS and LVM (Logical Volume Management)

Logical Volume Management is a method of overcoming some of the limitations of using traditional partition methods to allocate storage space on media. Commonly included features are:

  • File system spanning and software RAIDs (Level 0, 1 & 5)
  • Resizing volume groups & logical volumes
  • Snapshots

Conventionally the space on hard drives is split up into partitions which file systems are written directly to. LVM however works a bit differently: the disks are still allocated using partitions but these are seen as physical volumes to the LVM. These physical volumes are then pooled together as either a RAID or through spanning to form a Volume Group. The Volume Group can then be allocated to form Logical Volumes on which the file systems actually reside. The image below shows a relatively simple example of how an LVM might be used.

Logical Volume Management

Modern versions of UNIX each have their own variations of LVM and depending on the vendor they have different names and feature sets.

Linux also has an LVM which was originally based on the Hewlett Packard UNIX version. One notable feature missing from both the HP and Linux LVM is that they have no implementation for parity fault tolerance, hence no software RAID 5 support.

Windows 2000, 2003, XP & Vista have an equivalent system called Logical Disk Manager, which provides a similar functionality.

The pros and cons

Though UNIX and Linux are not as popular as other operating systems, they do offer some great advantages for very specific needs.

Linux may not be as user friendly as other systems (we’re looking at you, Windows and MAC), but people who have a certain knowledge of programming can leverage this open source system to adapt it to very specific needs. The system runs very quickly and is most efficient for production based programs.

On the other hand, UNIX is more often seen when high-end processing is required. Some examples of this are nuclear machines, power plants, military systems, etc. It’s not the most user friendly system but is extremely efficient when the main concern is very high processing power.

Is recovery the same in these systems?

Whether you work with Linux or UNIX, it is important to remember that these systems present their own challenges should you lose data.

Fortunately, in the case of a physical failure, recovery success is just as high as with any other operating system. However once you experience logical failures, there are some differences worth being aware of.

Should you erroneously format your files, the recovery will be more challenging. The raw data may still be there but all the structures and inodes will have disappeared.

In the case of an accidental deletion, it is still possible to get the data back. In the case of EXT 4, for example, the Extents aren’t deleted so it is possible to rebuild the file.

So remember – be very careful before running a Remove command. Make sure you’re removing only what you want to and be extremely careful when choosing to format the data.

, , , , , , , , , , , ,

Leave a Reply