File systems are an integral part of any operating systems with the capacity for long term storage. There are two distinct parts of a file system, the mechanism for storing files and the directory structure into which they are organised. In modern operating systems where it is possible for several user to access the same files simultaneously it has also become necessary for such features as access control and different forms of file protection to be implemented.
A file is a collection of binary data. A file could represent a program, a document or in some cases part of the file system itself. In modern computing it is quite common for their to be several different storage devices attached to the same computer. A common data structure such as a file system allows the computer to access many different storage devices in the same way, for example, when you look at the contents of a hard drive or a cd you view it through the same interface even though they are completely different mediums with data mapped on them in completely different ways. Files can have very different data structures within them but can all be accessed by the same methods built into the file system. The arrangement of data within the file is then decided by the program creating it. The file systems also stores a number of attributes for the files within it.
All files have a name by which they can be accessed by the user. In most modern file systems the name consists of of three parts, its unique name, a period and an extension. For example the file ‘bob.jpg’ is uniquely identified by the first word ‘bob’, the extension jpg indicates that it is a jpeg image file. The file extension allows the operating system to decide what to do with the file if someone tries to open it. The operating system maintains a list of file extension associations. Should a user try to access ‘bob.jpg’ then it would most likely be opened in whatever the systems default image viewer is.
The system also stores the location of a file. In some file systems files can only be stored as one contiguous block. This has simplifies storage and access to the file as the system then only needs to know where the file begins on the disk and how large it is. It does however lead to complications if the file is to be extended or removed as there may not be enough space available to fit the larger version of the file. Most modern file systems overcome this problem by using linked file allocation. This allows the file to be stored in any number of segments. The file system then has to store where every block of the file is and how large they are. This greatly simplifies file space allocation but is slower than contiguous allocation as it is possible for the file to be spread out all over the disk. Modern operating systems overome this flaw by providing a disk defragmenter. This is a utility that rearranges all the files on the disk so that they are all in contiguous blocks.
Information about the files protection is also integrated into the file system. Protection can range from the simple systems implemented in the FAT system of early windows where files could be marked as read-only or hidden to the more secure systems implemented in NTFS where the file system administrator can set up separate read and write access rights for different users or user groups. Although file protection adds a great deal of complexity and potential difficulties it is essential in an environment where many different computers or user can have access to the same drives via a network or time shared system such as raptor.
Some file systems also store data about which user created a file and at what time they created it. Although this is not essential to the running of the file system it is useful to the users of the system.
In order for a file system to function properly they need a number of defined operations for creating, opening and editing a file. Almost all file systems provide the same basic set of methods for manipulating files.
A file system must be able to create a file. To do this there must be enough space left on the drive to fit the file. There must also be no other file in the directory it is to be placed with the same name. Once the file is created the system will make a record of all the attributes noted above.
Once a file has been created we may need to edit it. This may be simply appending some data to the end of it or removing or replacing data already stored within it. When doing this the system keeps a write pointer marking where the next write operation to the file should take place.
In order for a file to be useful it must of course be readable. To do this all you need to know the name and path of the file. From this the file system can ascertain where on the drive the file is stored. While reading a file the system keeps a read pointer. This stores which part of the drive is to be read next.
In some cases it is not possible to simply read all of the file into memory. File systems also allow you to reposition the read pointer within a file. To perform this operation the system needs to know how far into the file you want the read pointer to jump. An example of where this would be useful is a database system. When a query is made on the database it is obviously inefficient to read the whole file up to the point where the required data is, instead the application managing the database would determine where in the file the required bit of data is and jump to it. This operation is often known as a file seek.
File systems also allow you to delete files. To do this it needs to know the name and path of the file. To delete a file the systems simply removes its entry from the directory structure and adds all the space it previously occupied to the free space list (or whatever other free space management system it uses).
These are the most basic operations required by a file system to function properly. They are present in all modern computer file systems but the way they function may vary. For example, to perform the delete file operation in a modern file system like NTFS that has file protection built into it would be more complicated than the same operation in an older file system like FAT. Both systems would first check to see whether the file was in use before continuing, NTFS would then have to check whether the user currently deleting the file has permission to do so. Some file systems also allow multiple people to open the same file simultaneously and have to decide whether users have permission to write a file back to the disk if other users currently have it open. If two users have read and write permission to file should one be allowed to overwrite it while the other still has it open? Or if one user has read-write permission and another only has read permission on a file should the user with write permission be allowed to overwrite it if theres no chance of the other user also trying to do so?
Different file systems also support different access methods. The simplest method of accessing information in a file is sequential access. This is where the information in a file is accessed from the beginning one record at a time. To change the position in a file it can be rewound or forwarded a number of records or reset to the beginning of the file. This access method is based on file storage systems for tape drive but works as well on sequential access devices (like mordern DAT tape drives) as it does on random-access ones (like hard drives). Although this method is very simple in its operation and ideally suited for certain tasks such as playing media it is very inefficient for more complex tasks such as database management. A more modern approach that better facilitates reading tasks that aren’t likely to be sequential is direct access. direct access allows records to be read or written over in any order the application requires. This method of allowing any part of the file to be read in any order is better suited to modern hard drives as they too allow any part of the drive to be read in any order with little reduction in transfer rate. Direct access is better suited to to most applications than sequential access as it is designed around the most common storage medium in use today as opposed to one that isn’t used very much anymore except for large offline back-ups. Given the way direct access works it is also possible to build other access methods on top of direct access such as sequential access or creating an index of all the records of the file speeding to speed up finding data in a file.
On top of storing and managing files on a drive the file system also maintains a system of directories in which the files are referenced. Modern hard drives store hundreds of gigabytes. The file system helps organise this data by dividing it up into directories. A directory can contain files or more directories. Like files there are several basic operation that a file system needs to a be able to perform on its directory structure to function properly.
It needs to be able to create a file. This is also covered by the overview of peration on a file but as well as creating the file it needs to be added to the directory structure.
When a file is deleted the space taken up by the file needs to be marked as free space. The file itself also needs to be removed from the directory structure.
Files may need to be renamed. This requires an alteration to the directory structure but the file itself remains un-changed.
List a directory. In order to use the disk properly the user will require to know whats in all the directories stored on it. On top of this the user needs to be able to browse through the directories on the hard drive.
Since the first directory structures were designed they have gone through several large evolutions. Before directory structures were applied to file systems all files were stored on the same level. This is basically a system with one directory in which all the files are kept. The next advancement on this which would be considered the first directory structure is the two level directory. In this There is a singe list of directories which are all on the same level. The files are then stored in these directories. This allows different users and applications to store there files separately. After this came the first directory structures as we know them today, directory trees. Tree structure directories improves on two level directories by allowing directories as well as files to be stored in directories. All modern file systems use tree structure directories, but many have additional features such as security built on top of them.
Protection can be implemented in many ways. Some file systems allow you to have password protected directories. In this system. The file system wont allow you to access a directory before it is given a username and password for it. Others extend this system by given different users or groups access permissions. The operating system requires the user to log in before using the computer and then restrict their access to areas they dont have permission for. The system used by the computer science department for storage space and coursework submission on raptor is a good example of this. In a file system like NTFS all type of storage space, network access and use of device such as printers can be controlled in this way. Other types of access control can also be implemented outside of the file system. For example applications such as win zip allow you to password protect files.
There are many different file systems currently available to us on many different platforms and depending on the type of application and size of drive different situations suit different file system. If you were to design a file system for a tape backup system then a sequential access method would be better suited than a direct access method given the constraints of the hardware. Also if you had a small hard drive on a home computer then there would be no real advantage of using a more complex file system with features such as protection as it isn’t likely to be needed. If i were to design a file system for a 10 gigabyte drive i would use linked allocation over contiguous to make the most efficient use the drive space and limit the time needed to maintain the drive. I would also design a direct access method over a sequential access one to make the most use of the strengths of the hardware. The directory structure would be tree based to allow better organisation of information on the drive and would allow for acyclic directories to make it easier for several users to work on the same project. It would also have a file protection system that allowed for different access rights for different groups of users and password protection on directories and individual files.Several file systems that already implement the features I’ve described above as ideal for a 10gig hard drive are currently available, these include NTFS for the Windows NT and XP operating systems and ext2 which is used in linux.
Sam Harnett MSc mBCS