File Systems

For general-purpose OS, the file system is the most visible part, containing files (with metadata) and directories

We’ve talked about how they’re stored, but how are they organized?

To answer our previous question, we need to understand what we really mean by files and directories

Files

Files are collections of related information, either binary (zips, executables, etc.) or alphanumeric (code, text files, etc.)

These have attributes, which give some information

The basic operations are create, read, write, reposition, delete, truncate (erase contents but not metadata) and open/close (bring in/out of memory), with others including read (update file name in metadata) and append (write to end of file)

Any other operations are just a combo of the basic ones above (ex. copy a file really means opening for reading, creating a new one, opening that one for writing, reading the contents for that file, writing its contents into the new file and closing both files)

The OS keeps track of what files are open with a read/write table, including read/write info, position and the process that opened it

Some OSes also used file locking, either with a shared lock or an exclusive lock, so if there’s conflict we can either mandatory lock (prevent processes from sharing like in Windows) or advisory lock (leave it up to the processes, like in Linux)

Some applications identify files with extensions, with the operating system assisting with default applications

file type usual extension function
executable exe, com, bin or none ready-to-run machine-language program
object obj, o compiled, machine language, not linked
source code c, cc, java, pas, asm, a source code in various languages
batch bat, sh commands to the command interpreter
text txt, doc textual data, documents
word processor wp, tex, rtf, doc various word-processor formats
library lib, a, so, dll libraries of routines for programmers
print or view ps, pdf, jpg ASCII or binary file in a format for printing or viewing
archive arc, zip, tar related files grouped into one file, sometimes compressed, for archiving or storage
multimedia mpeg, mov, rm, mp3, avi binary file containing audio or A/V information

Some files are rigid in structure (simple record or complex record), while others are free-form

Access Methods

Files have to be loaded into memory to be used, either by sequential access, direct access or other methods

Sequential access uses read next, write next, reset and skip forward/backward and is the most common method

Direct access reads and writes record in any order with knowledge of block location, with read block N and write block N being the only methods

Other access methods include indexes for quick lookup (used in databases)

Directories

Directories group files together, containing all the filenames and their identifiers, which we then use to look up other file information

2023-04-18_04-59.png

The basic operations include searching for a file, creating a file, deleting a file, listing directories, renming a file and traversing the system

There are organized logically for efficiency (fast finding), naming (easy finding) and grouping (similar files in one directory)

These directories can either be atomic, two-level, a tree or a graph

With a single-level directory, each directory exists side by side, which is easiest but is the least useful since we can’t name efficiently and we can’t group similar files together

2023-04-18_05-01.png

Two level is slightly better, but we have the same problem with grouping

2023-04-18_05-02.png

Trees get to the meat of the problem, starting with a root dir and then stemming out from there, so we can use a flag 0 for a file and a flag 1 for a directory to differentiate the two

2023-04-18_05-03.png

With an acyclic graph, we relax the structure even more by allowing files to exist in multiple directories with pointers called links or shortcuts

2023-04-18_05-04.png

Directory traversal must account for possible cycles, so we must disallow links to directories and limit the number that can be search

Deletes and link adds also need to account for cycles by keeping a count of incoming references and using garbage collection to remove unreferenced files

Regardless of structure, if the directory isn’t specified, all operations occur in the pwd, so deleting the pwd can remove all files and directories in the subtree

Protection

Files need to be protected from not only physical damage but improper access as well, which we can do with authentication (valid user/pass like sudo), authorization (user perms) and encryption (even if the drive is copied, the files should be protected)

We can think about this in accordance to our basic mtehods

Most OSes employ identity-based protection with an access control list (ACL) mapping users to permissions

These can be quite large, so Linux uses classifications such as owner (the user who created the file), group (a set of users who share the file) and others

More complex permissions can be achieved with some tricks

Implementation

The file system handles file structure, allocating/recovering space, tracking locations and interfacing with the OS

Picking a file system depends on user needs, including performance, reliability and extra features, with new ones being created all the time

These systems are made of multiple layers, which reduces complexity while adding overhead

  1. Application programs
  2. Logical file system
  3. File-organization module
  4. Basic file system
  5. I/O control
  6. Devices

The I/O layer contains drivers and interrupt handlers, which controls I/O devices and are given commands like the following

read drive1, cylinder 72, track 2, sector 10, into memory location 1060

This gives low-level hardware specific commands to the hardware controller

The basic file system uses buffers and cache to store data as it’s retrieved or stored, being given more generic commands such as retrieve block 123

The file organization layer understand files, logical addresses and physical blocks, translating between logical and physical while also freeing and allocating space

The logical file system manages metadata, while translating file names into numbers, handles and locating by maintaining file control blocks (inodes in UNIX)

These together provide structures and operations to manage files and directories using several storage or memory structures as well as operations on these structures

Some common ones include the boot control block (contains boot information for an OS, 1 per block), volume control block (contains volume details such as total blocks, total free blocks, block size and free block pointers, 1 per block), directory structures (1 per file system) and FCBs (1 per file, containing metadata)

For in-memory, we have the mount table (file system mount, mount points and file system types), directory structure cache (recently used directories), system-wide open-file table (FCB of each file and other info) and per-process open-file table (pointers to the system-wide open-file table along with other info)

For operations, things get a bit complicated (and I don’t feel like writing)

E.g. To create a new file, a process calls the logical file system layer to create

The logical file system layer will allocate a new FCB, assign the FCB and the new file to a directory in the directory structure

The file organization module layer will find logical blocks for the file

The basic file system layer will map the logical blocks to physical blocks

The I/O control layer will write the data to disk

E.g. To read the new file, a process calls the logical file system layer to open and read

The logical file system layer will search the system-wide open-file table

Update the per-process open-file table to include read location in the file and access mode

E.g. To close the new file, a process calls the logical file system layer to close

The logical file system layer will remove the per-process open-file table

Decrement the system-wide open-file table entry

2023-04-18_05-26_3.png

Directories

The directory structure impacts efficiency, performance and reliability and depends on what we’re doing

We could do a linear list with pointers to data blocks, which is simple to program but is linear time

2023-04-18_05-28.png

Hash tables are another option, which decrease search time but come at the cost of hash collisions

2023-04-18_05-29.png

Allocations

The file system assigns contiguous sectors on the physical disk to a logical block, with files consisting of one or more blocks and free space being all the blocks that aren’t used

The file system needs to keep track of these blocks efficiently, with three main approaches

Contiguous allocation has each file occupying a contiguous set of blocks, with superior performance and simplicity, but makes finding space harder since we need to know the file size ahead of time

2023-04-18_05-33.png

Linked lists can eliminate external fragmentation and allow for dynamic file sizes, but random access is slow and a small portion of the block needs to be reserved for a pointer to the next block

2023-04-18_05-35.png

Indexed allocation gives each file an index block, with the index pointing to all other blocks

This also eliminates external fragmentation and makes random access fast, but if the number of indices exceeds the size of the index block, we will need multiple layers

2023-04-18_05-37.png

Some OSes will combine direct and indirect blocks, so small files will not use an index block, while large files will use multiple layers

Increasing the size of the pointers will raise the maximum file size, so 32-bit pointers can hold up to 4GB, while must UNIX file systems support 64-bit pointers

2023-04-18_05-39.png

2023-04-18_05-39_1.png

If the workload against a file is known, the allocation type can be chosen at file creation, but this usually isn’t the case

Reducing disk movement (discussed in lesson 8) can help, but for NVMs, we need different methods

Free-Space

We need to keep of list of free blocks, either as a bitmap or as a true list

With a bitmap, we can assign each block to a bit, using 0 for in-use and 1 for free

2023-04-18_05-42.png

This makes finding free blocks very simple and is easy to implement, as well as finding contiguous blocks, while also taking up a lot of space (1TB with 4KB blocks requires a 32MB bitmap in memory)

A linked-list makes it easy to find a free block (just look at the head), but traversing the list is costly (we rarely do a full traversal)

Freeing blocks doesn’t require erasing data, so file systems just mark the block as being free, saving time while hurting security (forensics experts ca manually read the disk)

Efficiency

The biggest bottleneck is the disk itself, so we must consider the following

Recovery

Crashes and bugs can create inconsistencies, so we must correct these issues in the file system itself

In the system, we can scan the metadata to check for inconsistencies

A status bit can be used to indicate whether or not the whole system is up to date, so if the system reboots suddenly, the file system is marked as “in flux”, triggering a full consistency check such as fsck in UNIX

We can alsu take a tip from DBs and use log-based transaction-oriented recovery (aka journaliing), which requires maintaining a running log of unconfirmed transactions that must be replayed when an interruption occurs

This log can be stored in a separate section of the file system or a different device, with the log also allowing the OS to write data asynchronously

Backups are also recommended, typically following a schedule like the following

To restore the file system, we can restore from either a full backup or an incremental backup

Internals

General-purpose computers can have multiple storage devices, including USBs, extra drives and so on

These storage devices can be further divided into partitions, assigned to one or more volumes and each of those being formatted into a file system

Some common ones include

Mounting

Systems need to be mounted before they can be used, with the OS being given a name and a mount point, a location in the file structure (usually empty)

If the system is already mounted, the OS needs to either allow or disallow multiple mounts

If the mount point is non-empty, the OS must also decide whether to prevent the action or obscure existing files and directories until the file system is unmounted

2023-04-18_06-01.png

Partitions

Partitions can either be raw (just blocks, like in databases or RAID) or cooked with a file system (bootable, root, etc.)

Perms

User-oriented OSes must accomodate shared files, with permissions being kept in the FCB

When we mount a file system to another OS, we must take care of perms, like in the case fo multiple user IDs across systems

To make this seemless, we can use virtual file systems, separating generic operations fro implementation details

2023-04-18_06-03.png

Remote Systems

To transfer files, we can use ftp like back in the day, or use a distributed file system (DFS) like we do nowadays

DFS uses a file-server protocol, with the client seeking the mount the remote and the sever sharing its local file system, while providing authentication and authorization

These must specify behaviour when there’s multiple users sharing files, which is discussed in more detail in lesson 6

We can either block requests until a file session (between open() and close()) is complete, or share read-only and just use blocking for writes

Network File System

The NFS is a popular remote system model using a similar client-server model and mounts to a remote NFS share, appearing as if it’s part of the local file system

This is designed to work in heterogeneous environments (different OSes, machines, protocols, etc.)

2023-04-18_06-09.png

With this, the client requests a remote directory and the server identifies client perms on the directory, including read-only and read-write

If permission is granted, the mount is recorded in an export table used to track all mounts, revoke access and notify of downtimes, among other things

We also use a set of remote procedure calls (RPCs) to handle searching, rerading a set of entries, changing links, accessing file attributes and reading/writing

Version 3 of this is stateless while version 4 is stateful, requiring additional overhead but being more secure since we don’t send vital info as parameters all the time

Despte the version, writing data to disk requires a lot of safety mechanisms, which can take minutes