For general-purpose OS, the file system is the most visible part, containing files (with metadata) and directories
We’ve talked about how they’re stored, but how are they organized?
To answer our previous question, we need to understand what we really mean by files and directories
Files are collections of related information, either binary (zips, executables, etc.) or alphanumeric (code, text files, etc.)
These have attributes, which give some information
The basic operations are create, read, write, reposition, delete, truncate (erase contents but not metadata) and open/close (bring in/out of memory), with others including read (update file name in metadata) and append (write to end of file)
Any other operations are just a combo of the basic ones above (ex. copy a file really means opening for reading, creating a new one, opening that one for writing, reading the contents for that file, writing its contents into the new file and closing both files)
The OS keeps track of what files are open with a read/write table, including read/write info, position and the process that opened it
Some OSes also used file locking, either with a shared lock or an exclusive lock, so if there’s conflict we can either mandatory lock (prevent processes from sharing like in Windows) or advisory lock (leave it up to the processes, like in Linux)
Some applications identify files with extensions, with the operating system assisting with default applications
file type | usual extension | function |
---|---|---|
executable | exe, com, bin or none | ready-to-run machine-language program |
object | obj, o | compiled, machine language, not linked |
source code | c, cc, java, pas, asm, a | source code in various languages |
batch | bat, sh | commands to the command interpreter |
text | txt, doc | textual data, documents |
word processor | wp, tex, rtf, doc | various word-processor formats |
library | lib, a, so, dll | libraries of routines for programmers |
print or view | ps, pdf, jpg | ASCII or binary file in a format for printing or viewing |
archive | arc, zip, tar | related files grouped into one file, sometimes compressed, for archiving or storage |
multimedia | mpeg, mov, rm, mp3, avi | binary file containing audio or A/V information |
Some files are rigid in structure (simple record or complex record), while others are free-form
Files have to be loaded into memory to be used, either by sequential access, direct access or other methods
Sequential access uses read next, write next, reset and skip forward/backward and is the most common method
Direct access reads and writes record in any order with knowledge of block location, with read block N and write block N being the only methods
Other access methods include indexes for quick lookup (used in databases)
Directories group files together, containing all the filenames and their identifiers, which we then use to look up other file information
The basic operations include searching for a file, creating a file, deleting a file, listing directories, renming a file and traversing the system
There are organized logically for efficiency (fast finding), naming (easy finding) and grouping (similar files in one directory)
These directories can either be atomic, two-level, a tree or a graph
With a single-level directory, each directory exists side by side, which is easiest but is the least useful since we can’t name efficiently and we can’t group similar files together
Two level is slightly better, but we have the same problem with grouping
Trees get to the meat of the problem, starting with a root dir and then stemming out from there, so we can use a flag 0 for a file and a flag 1 for a directory to differentiate the two
With an acyclic graph, we relax the structure even more by allowing files to exist in multiple directories with pointers called links or shortcuts
Directory traversal must account for possible cycles, so we must disallow links to directories and limit the number that can be search
Deletes and link adds also need to account for cycles by keeping a count of incoming references and using garbage collection to remove unreferenced files
Regardless of structure, if the directory isn’t specified, all operations occur in the pwd, so deleting the pwd can remove all files and directories in the subtree
Files need to be protected from not only physical damage but improper access as well, which we can do with authentication (valid user/pass like sudo), authorization (user perms) and encryption (even if the drive is copied, the files should be protected)
We can think about this in accordance to our basic mtehods
Read – Can the contents of the file be read?
Write – Can data be written or re-written to the file?
Execute – Can the contents of the file be loaded into memory and executed?
Append – Can data be appended to the file?
Delete – Can the file be deleted and space reclaimed?
List – Can the name and attributes be listed?
Attribute change – Can the attributes (metadata) be changed?
Most OSes employ identity-based protection with an access control list (ACL) mapping users to permissions
These can be quite large, so Linux uses classifications such as owner (the user who created the file), group (a set of users who share the file) and others
More complex permissions can be achieved with some tricks
The file system handles file structure, allocating/recovering space, tracking locations and interfacing with the OS
Picking a file system depends on user needs, including performance, reliability and extra features, with new ones being created all the time
These systems are made of multiple layers, which reduces complexity while adding overhead
The I/O layer contains drivers and interrupt handlers, which controls I/O devices and are given commands like the following
read drive1, cylinder 72, track 2, sector 10, into memory location 1060
This gives low-level hardware specific commands to the hardware controller
The basic file system uses buffers and cache to store data as it’s retrieved or stored, being given more generic commands such as retrieve block 123
The file organization layer understand files, logical addresses and physical blocks, translating between logical and physical while also freeing and allocating space
The logical file system manages metadata, while translating file names into numbers, handles and locating by maintaining file control blocks (inodes in UNIX)
These together provide structures and operations to manage files and directories using several storage or memory structures as well as operations on these structures
Some common ones include the boot control block (contains boot information for an OS, 1 per block), volume control block (contains volume details such as total blocks, total free blocks, block size and free block pointers, 1 per block), directory structures (1 per file system) and FCBs (1 per file, containing metadata)
For in-memory, we have the mount table (file system mount, mount points and file system types), directory structure cache (recently used directories), system-wide open-file table (FCB of each file and other info) and per-process open-file table (pointers to the system-wide open-file table along with other info)
For operations, things get a bit complicated (and I don’t feel like writing)
E.g. To create a new file, a process calls the logical file system layer to create
The logical file system layer will allocate a new FCB, assign the FCB and the new file to a directory in the directory structure
The file organization module layer will find logical blocks for the file
The basic file system layer will map the logical blocks to physical blocks
The I/O control layer will write the data to disk
E.g. To read the new file, a process calls the logical file system layer to open and read
The logical file system layer will search the system-wide open-file table
Update the per-process open-file table to include read location in the file and access mode
E.g. To close the new file, a process calls the logical file system layer to close
The logical file system layer will remove the per-process open-file table
Decrement the system-wide open-file table entry
The directory structure impacts efficiency, performance and reliability and depends on what we’re doing
We could do a linear list with pointers to data blocks, which is simple to program but is linear time
Hash tables are another option, which decrease search time but come at the cost of hash collisions
The file system assigns contiguous sectors on the physical disk to a logical block, with files consisting of one or more blocks and free space being all the blocks that aren’t used
The file system needs to keep track of these blocks efficiently, with three main approaches
Contiguous allocation has each file occupying a contiguous set of blocks, with superior performance and simplicity, but makes finding space harder since we need to know the file size ahead of time
Linked lists can eliminate external fragmentation and allow for dynamic file sizes, but random access is slow and a small portion of the block needs to be reserved for a pointer to the next block
Indexed allocation gives each file an index block, with the index pointing to all other blocks
This also eliminates external fragmentation and makes random access fast, but if the number of indices exceeds the size of the index block, we will need multiple layers
Some OSes will combine direct and indirect blocks, so small files will not use an index block, while large files will use multiple layers
Increasing the size of the pointers will raise the maximum file size, so 32-bit pointers can hold up to 4GB, while must UNIX file systems support 64-bit pointers
If the workload against a file is known, the allocation type can be chosen at file creation, but this usually isn’t the case
Reducing disk movement (discussed in lesson 8) can help, but for NVMs, we need different methods
We need to keep of list of free blocks, either as a bitmap or as a true list
With a bitmap, we can assign each block to a bit, using 0 for in-use and 1 for free
This makes finding free blocks very simple and is easy to implement, as well as finding contiguous blocks, while also taking up a lot of space (1TB with 4KB blocks requires a 32MB bitmap in memory)
A linked-list makes it easy to find a free block (just look at the head), but traversing the list is costly (we rarely do a full traversal)
Freeing blocks doesn’t require erasing data, so file systems just mark the block as being free, saving time while hurting security (forensics experts ca manually read the disk)
The biggest bottleneck is the disk itself, so we must consider the following
Unix spreads inodes across the volume. Files are stored close to their directory’s inode. This reduces seek time
Keeping “last modified” and “last accessed” dates can tell the OS if the file needs to be loaded into memory or to be backed up to tertiary storage
Crashes and bugs can create inconsistencies, so we must correct these issues in the file system itself
In the system, we can scan the metadata to check for inconsistencies
A status bit can be used to indicate whether or not the whole system is up to date, so if the system reboots suddenly, the file system is marked as “in flux”, triggering a full consistency check such as fsck in UNIX
We can alsu take a tip from DBs and use log-based transaction-oriented recovery (aka journaliing), which requires maintaining a running log of unconfirmed transactions that must be replayed when an interruption occurs
This log can be stored in a separate section of the file system or a different device, with the log also allowing the OS to write data asynchronously
Backups are also recommended, typically following a schedule like the following
A typical backup schedule (e.g. weekly or monthly)
Day 1 – Copy all files to a backup medium. This is a full backup
Day 2 – Compare a file’s last backup date with the last modified date. If the file has changed, back it up. This is an incremental backup
Continue this process up to day N. Then run a full backup again
Watch out for daylight savings time
To restore the file system, we can restore from either a full backup or an incremental backup
General-purpose computers can have multiple storage devices, including USBs, extra drives and so on
These storage devices can be further divided into partitions, assigned to one or more volumes and each of those being formatted into a file system
Some common ones include
Systems need to be mounted before they can be used, with the OS being given a name and a mount point, a location in the file structure (usually empty)
If the system is already mounted, the OS needs to either allow or disallow multiple mounts
If the mount point is non-empty, the OS must also decide whether to prevent the action or obscure existing files and directories until the file system is unmounted
Partitions can either be raw (just blocks, like in databases or RAID) or cooked with a file system (bootable, root, etc.)
User-oriented OSes must accomodate shared files, with permissions being kept in the FCB
When we mount a file system to another OS, we must take care of perms, like in the case fo multiple user IDs across systems
To make this seemless, we can use virtual file systems, separating generic operations fro implementation details
To transfer files, we can use ftp like back in the day, or use a distributed file system (DFS) like we do nowadays
DFS uses a file-server protocol, with the client seeking the mount the remote and the sever sharing its local file system, while providing authentication and authorization
These must specify behaviour when there’s multiple users sharing files, which is discussed in more detail in lesson 6
We can either block requests until a file session (between open() and close()) is complete, or share read-only and just use blocking for writes
The NFS is a popular remote system model using a similar client-server model and mounts to a remote NFS share, appearing as if it’s part of the local file system
This is designed to work in heterogeneous environments (different OSes, machines, protocols, etc.)
With this, the client requests a remote directory and the server identifies client perms on the directory, including read-only and read-write
If permission is granted, the mount is recorded in an export table used to track all mounts, revoke access and notify of downtimes, among other things
We also use a set of remote procedure calls (RPCs) to handle searching, rerading a set of entries, changing links, accessing file attributes and reading/writing
Version 3 of this is stateless while version 4 is stateful, requiring additional overhead but being more secure since we don’t send vital info as parameters all the time
Despte the version, writing data to disk requires a lot of safety mechanisms, which can take minutes