File Systems Unraveled 

Most of you have probably heard the term FAT, FAT32, FAT16, NTFS and 
partition table thrown around quite a bit. They seemed like such mystical 
entities. You never could figure out what exactly they all meant. This 
article will change that. By the time I am done, you will know precisely 
what each of these terms mean. 

What is a File System? 

First, let's understand what a file system is. A file system can be thought 
of as the way your computer goes about managing the files that gets stored 
on your hard drive. Your computer has thousands upon thousands of files. If 
there were no organized way of managing them, your system would be 
infinitely slow, provided that it works at all. This is understandable if 
you just consider how much stuff you have piled in your office, and how 
much time is wasted finding stuff that's buried under a ton of paper. Now 
take that mess, and multiply it by a thousand. That is what your computer 
would be going through if an efficient file system didn't exist. And just 
like there are all kinds of people in the world who organize things 
differently in the office, there are many file systems out there with 
varying features. However, there are several key functions that no file 
system should be without: 

- Efficiently use the space available on your hard drive to store the 
necessary data 
- Catalog all the files on your hard drive so that retrieval is fast and 
reliable. 
- Provide methods for performing basic file operations, such as delete, 
rename, copy, and move. 
- Provide some kind of data structure that allows a computer to boot off 
the file system. 

There are of course other file systems that go beyond meeting basic 
requirements by providing additional functionality, such as compression, 
encryption, password/permissions, filestreams, etc. Later on in this 
article, I will discuss some of the extra features in relation to Windows 
NT's NTFS. 

FAT In Detail 

Note: This section is more technical in nature than the rest of the 
article. Feel free to skip if you'd like. But be warned that you'll miss 
some interesting tidbits about the FAT you probably never knew. 

So what is FAT, and how do file systems work? The answer is quite simple in 
fact. The space on your hard drive, at its most basic level, is divided 
into units called sectors. Each sector is 512 bytes. So if your hard drive 
had 10 Kilobytes worth of total disk space, that would mean it is divided 
into 20 sectors. But the file system doesn't directly deal with the hard 
drive on a sector by sector basis. Instead, it groups a bunch of sectors 
together into a cluster, and it deals with the cluster. These clusters are 
also called allocation units by DOS. So another way of thinking about this 
is to suppose that each sector on your hard disk is a person carrying a 
bag, where you can store 512 bytes of information into each bag. Now 
instead of numbering each person as 1,2,3, etc ... The file system first 
takes several people and put them into a group, and call that group 1. So 
if you had 400 people, and the file system decided to put 4 people to a 
group, then you'd have 100 groups. In other words, on a drive with 400 
sectors (or roughly 200K of space), and with an allocation size of 4 
sectors (or 2K), there would be 100 clusters. So then when the file system 
needs to access a particular sector, it would first find the cluster number 
of the sector, and then within that cluster, it would access that 
particular sector by its sector index. This is akin to saying to find a 
person, say Jon, I would find Jon's group number first, and then go to his 
group and look for him. 

All three of the file systems (FAT16, FAT32 and NTFS) work like this. So 
what is the difference between FAT16 and 32? The major difference lies in 
how much space each file system can handle and how efficiently the file 
system does it. The problem with file efficiency arises because each 
cluster on a hard disk can only store one file! That means each group can 
only be made to handle one item. To illustrate my point, consider the 
following situation: 

The file system decides to divide all the people into groups of 8 (we'll 
get into how this number of chosen later). Each of these 8 people has a bag 
that can store stuff. 

Now the file system hands the first group a huge box of pencils and says 
"store this." The eight people start to put the pencils in their bags, and 
after one fills up, they move on to the next. The box of pencils fills 7 
bags. 

The file system tries to hand the group another small thing to put into the 
last 8th bag which is empty. But the group says "sorry, we can only handle 
one thing. You gave us one already." The file system says "fine, but you 
are wasting 12% of your space (1/8 = 0.125)" The group tells the file 
system "sorry, we can't help it." The file system moves on. 

Now the file system gives the next group of 8, only a single pencil to 
store. The group stores it and refuses to take anything else. The file 
system informs the group that they are wasting almost 100% of their storage 
space. But there is nothing they can do. 

These stories may seem silly, but they do get the point across, which is 
that as the size of the clusters increase, the amount of space you waste 
will increase. It is true that if you can make all your files precisely the 
same size as your cluster, then you'd have 0% waste. But that is not 
possible. Most typical files are not very big, and if the cluster size gets 
huge, then the waste can be quite alarming. 

So now the question becomes how does my computer figure out the size of 
each cluster? The answer is simple, take the size of your hard drive, and 
divide that by the number of clusters involved. So what I am saying is 
this: 

Cluster Size = Disk Space / Number of Clusters Possible

And since Cluster Size is directly proportional to wasted space (in other 
words, as the cluster size increases, the waste space also increases), we 
can see that what we want is a file system that can handle a large number 
of clusters. And this is where FAT16 and FAT32 differ. FAT32 can handle a 
lot more groups then FAT16 can. 

But why is that? The simple explanation is that FAT32 can count a lot 
higher than FAT16. As I said above, each cluster is numbered by the file 
system. FAT16 uses 16 bit numbers to count the clusters. That means FAT16 
uses binary numbers of 16 digits. The consequence is that the highest FAT16 
can count to is 2^16 - 1 (yes, it is in fact 2^16 - 1, because there are 2^
16 digits between 0 and 2^16 - 1. Zero also has to count), or 65535. So 
there can only be 65535 clusters to each FAT 16 disk. What that means for 
you, is that as your hard drive gets huge, your cluster count remains the 
same, so your cluster size increases. 

But don't think for a minute that you can just indefinitely increase the 
size of each cluster. That can't happen. The reason is that every sector 
inside a group also has to be numbered. Each sector has an index number 
that is written inside a byte. A byte is 8 bits. What that means is that 
the number has to be less than 2^8 (255 to be exact). And since the way you 
decrement in computers is to go by powers of 2, that means you can only 
store a number as big as 2^7, or 128 sectors. So now let's do a little bit 
of math: 

You have a max of 65535 clusters, 

You have a max of 128 sectors per cluster 

You have 512 bytes per sector. 

That means your max FAT16 size is = 65535 * 128 * 512 = 4 GB 

Wait a second? That's not right! I thought the limit was 2GB? And I thought 
each cluster in FAT16 can be only 32K, not 64K! And you would be right. The 
problem is that 128 sectors * 512 bytes per sector is 65536, which is one 
more than a 16 bit number can handle. So again, we decrement to 64 sectors 
per cluster, which yields us 32K per cluster. And 32K per cluster * 65535 
is roughly 2GB. 

Now FAT32 solves this problem by removing the 65535 clusters per disk 
limitation. FAT32 now uses 32bit number, which is a number with 32 digits. 
That allows it to count much higher. And since it can handle a bigger 
number of clusters, its cluster size is much smaller than that of FAT16 for 
bigger disks. In fact, FAT32's maximum disk size is 2 Terabytes. 

To get this number, you take the total number of sectors addressable (and I 
do mean sectors), which would be 2^32 - 1, and multiple that by 512 bytes 
per sector. That's a whopping 2048 Gigabytes, or 2 Terabytes. At this 
point, some of you may be scratching your heads trying to figure out the 
inconsistencies in my explanation. The first item to address is that even 
though the file system accesses the sectors by a cluster count first, that 
still doesn't alleviate the need to number the sectors individually. Even 
in FAT16, the sectors are numbered. And that leads to the second concern 
some of you may have. Since FAT16 uses 16 bit numbers, doesn't that mean 
that there can be only 2^16 - 1 sectors? Wouldn't that translate into 32 
megs? Yes. You are right. But unknown to most is the fact that since DOS 
4.0, the underlying sector numbering had already been changed to a 32bit 
value! The limit placed on the disk size was purely due to the 16bit 
numbering of the clusters, and the limit of the numbering system for the 
sectors in each cluster, as discussed above. 

Ok, so we know what sectors and clusters are. But how does that get 
translated into files? That is where the File Allocation Table comes in. 
The FAT is a huge database that contains records of where each file is on 
the disk. In fact, it would not be too much of a stretch to just think of 
the FAT as a table with several columns that each record something about 
the files on the drive. Each record inside the FAT will take up 32 bytes of 
space. In other words, if I had 100 files on the computer, it would take 
the system roughly 3200 bytes to record all of that information into the 
FAT. Just for fun, let's take a look at what is stored in these 32 bytes: 

Byte Range	Info Stored 
1 to 8 	Filename 
9 to 11 	Extension 
12 		Attributes (i.e. read-only, archive, hidden) 
13 to 22 	Reserved bits for latter features 
23 to 24 	Time Written 
25 to 26 	Starting cluster 
29 to 32 	File Size 

Interesting list isn't it? Some of the entries are self-explanatory. But 
there are two that are rather interesting. The first thing to look at is 
the Starting Cluster field. Some of you may have been wondering how the 
system translates cluster and sector indices into filenames and such. The 
answer is that for each file, there is a field in the FAT that indicates 
the first cluster of the file. The system would read that FAT entry and 
then find the starting cluster and read the file. Now the question is how 
does the system know when to stop reading? Furthermore, even before that, 
how does the system know where to read next after this cluster? The answer 
is that written within each cluster is the address of the next cluster that 
contains information from this file. So a computer reads the current 
cluster and checks to see if there are any other clusters after it. If 
there is, it skips to that cluster and reads it, and checks for the next 
one. This process repeats until it finds a cluster with no pointers. The CS 
majors reading this would recognize this as a Linked List implementation. 

The other interesting feature of this table is that each directory entry 
(record in the FAT) uses 4 bytes to store the size of the file. This may 
not seem like much at first. But what it actually tells you is the maximum 
size possible for any single file. The fact that we use 4 bytes to store a 
file size tells us that the largest number that can be represented is 
32bits (recall that there are 8 bits per byte). So what is the largest 
32bit number? That would be 2^32 - 1. So a file can have a maximum of 2^32 
-1 bytes, or 4 Gigabytes. This calculation is obviously done under the 
assumption that we are using FAT32. 

The last two fields I'd like to take a look at are the filename field and 
the reserved bytes field. The interesting thing about the filename field is 
that DOS uses that field to perform undelete. When you erase a file in DOS, 
you aren't actually erasing the file. All you are doing is changing the 
first letter of the filename field into a special character. And as far as 
the file system is concerned, the file isn't there, and the next time a 
file is written to this cluster, the current file is erased. The way DOS 
performs an undelete is to simply change that first letter back to 
something else. That is why when you used undelete in DOS, it always asked 
for the first letter of the filename before it could restore the file. 
Mystery solved. 

Now let me just make a quick mention of the reserved fields. The reserved 
fields didn't do much in FAT16, but it became rather useful in FAT32 and in 
NTFS. Since FAT32's cluster numbering used 32bit numbers instead of 16bit, 
as was the case in FAT16, the system needed two extra bytes to accommodate 
the added digits. Those two bytes were taken out of the reserved field. And 
in NTFS, compression attributes, some security information was also written 
into the reserved field of the FAT. 

Before I move on, I'd like to point out a few of the other differences 
between FAT16 and FAT32. In FAT32, the root directory is unlimited in size. 
What this means is that you can have as many files and directories in C:\> 
as you'd like. In the days of FAT16, you could have a maximum of 255 
directory entries. That means that if you had normal filenames of 8 letters 
+ 3 extensions, you have a maximum of 255 directories + files. That may 
seem like more than you'd need to put in the root directory. And it 
probably is , if you had 8.3 filenames. But in Win95, the system can 
support long filenames. The trick is that Win95 combines multiple directory 
entries to support long filenames. So consider a file that's named "My 
English Paper". That is 16 letters long. So it takes 2 directory entries, 
at least. Actually, it takes 3 directory entries. It takes 2 for the long 
filename, and another one for the short 8.3 filename to be compatible with 
DOS and Win3.1. As you can see, long filenames can quickly deplete 
directory entries. 

Another nice feature is that FAT32 has better FAT redundancy. Both FAT32 
and FAT16 store two copies of the file allocation table on disk. But 
traditionally, the file system only read from one of them. In FAT32, the 
system could choose to read from either one, which provides a better 
failsafe for freak accidents involving corrupt file tables. 

It is apparent that FAT32 is a superior file system then FAT16. 
Unfortunately, FAT32 is not supported by every operating system. The 
original version of Windows 95 couldn't read FAT 32. It wasn't until 
version B (OSR2) did Win95 gain that ability. And all versions of WinNT 
before 5.0 (named Windows 2000 or short Win2K) could not read FAT32 drives 
either. 

New Technology File System 

Now that I've covered FAT16 and FAT32 both in excruciating detail, let's 
turn our attention to NTFS, the proprietary file system for WinNT. While 
FAT32 was a decent system, it lacked some of the more advanced features 
that many businesses need to run a network. Chief among them are file level 
security, encryption, event logging, error recovery and compression. NTFS 
(5.0) provides all of these features in a nicely optimized package. 

Permissions: 

The feature that NT is probably best known for is its file level security. 
With NTFS permissions, you can control which users have what kind of access 
to which files. This is a stark contrast to the "security" in Windows 9x, 
where the system policy editor affords the only measure of protection. Once 
a knowledgeable user gets past the policy protections, which is only skin 
deep (or interface deep), every file on the system is his for the taking. 
In Windows NT, even if you get past the interface lockouts, you'll still 
have a hell of a time accessing other people's files, because they are 
locked at the file level. 

Before I discuss how to set file permissions, we need to take a step back 
and look quickly at how permissions in general work on Windows NT. Windows 
NT's security model is an entire topic onto itself. So I will not cover it 
in detail here. However, a general overview will prove beneficial. 

With Windows NT, you can assign security at two different levels - on a per 
user basis, and on a group basis. So, if there is a user called Jane who 
belongs to the Marketing User Group, you can affect Jane's access 
permissions by either assigning permissions to her account, or to her 
group. So what happens if Jane's group has Modify Access to a document, but 
Jane is only assigned Read Access? Surprising enough, the least restrictive 
of the two permission sets takes precedence, in this case, the Modify 
Access. The one glaring exception is the No Access permission. If a No 
Access permission is assigned at any level, the user has no access, 
regardless if any other permissions assigned. So if the Marketing Group is 
assigned No Access, Jane would have no access even if her account is 
assigned Full Control. 

So there you have it, NT file level security at a glance. There is so much 
more to it, but as this is an intro article, a more in-depth exploration of 
NT file security seems more appropriate in a separate article. With that 
said, let's take a look at some of the other features of NTFS. 

Compression: 

Another useful feature is compression. It works transparently (like 
DriveSpace), and can be assigned to individual files (unlike DriveSpace). 
To turn on compression for a file, right-click on it and choose Properties. 
From the Properties menu, you can check the Compressed attribute. The same 
can be done on a directory. 

Encryption: 

But what is even more useful is the encrypted file system (EFS) included in 
NTFS 5.0. With EFS, you can actually encrypt a file, rather than just 
protect it via permissions. This is a long overdue feature since there are 
other operating systems on the market which will read files on an NTFS 
volume while bypassing the NT security. BeOS is one example, one which I 
have used. Various flavors of Linux might also provide similar 
functionality, though I have yet to personally encounter one. However, if a 
file is encrypted, then such dangers are drastically mitigated. NT5's EFS 
is a system level service, which means it runs even when all users are 
logged off. This also prevents hackers from easily disabling the program, 
as is the case with user mode encryption programs. Moreover, the encryption 
system works transparently with respect to the user. What that means is 
that if a user has permissions to decrypt the file, then when the file is 
being accessed, it will be decrypted seamlessly, without any user 
intervention. On the other hand, if the user does not have appropriate 
access, then an "Access Denied" error will pop up. 

In principle, EFS works on a public/private key system (via CryptoAPI if 
you are interested). When a file is encrypted, a file encryption key (FEK) 
is automatically generated. That randomly generated FEK is used to encrypt 
the file(or folder). The FEK is then, itself, encrypted using the user's 
public key. A list of encrypted FEKs is stored as a part of the file 
content. When the user tries to access the file, the system will attempt to 
decrypt the FEK with the user's private key. If it succeeds, then the 
decrypted FEK is then used to decrypt the actual file. However, if a file 
is copied to a non-NTFS partition, then a plaintext version of the file is 
created. 

To activate encryption, simply right-click on a folder and choose 
Properties from the popup. Then simply check the Encrypt checkbox. By 
default, Windows Explorer will only allow folders to be encrypted (which is 
the recommended method). However, the command CIPHER can be used to encrypt 
on a per file basis. To encrypt: 

CIPHER /e myfile.txt 

To decrypt: 

CIPHER /d myfile.txt 

The other nice thing about EFS is that it offers a data recovery mechanism. 
A data recovery agent is automatically configured. In a Windows 2000 
domain, that defaults to the domain admin. The assigned security agent 
could then decrypt any file that is under his scope. It is important to 
note that when recovery occurs, only the file's FEK is revealed, and NOT 
the user's public key. This way, it prevents the security agent from 
accessing files that are not under his scope. As always, the domain admin 
will have the power to delegate security recovery rights to other user 
groups as to provide both flexibility and redundancy. 

File Auditing: 

However, just protecting the file against possible intruders is not enough. 
There must be a way for an admin to know if a file hack has been attempted. 
This is where File Auditing (event logging, if you will) comes in handy. 
With NTFS, you can keep track of who has tried to access what file, and if 
they succeeded. To enable file auditing, use the following steps: 

1 - First, make sure that File access auditing is turned on via User 
Manager. 
2 - Then simply go into the Security tab of any file you wish to audit and 
click on the Audit button. 
3 - Now simply add the users whom you wish to audit for the given file, and 
then click OK. 
4 - Now select the events you wish to audit. Click on OK. 
5 - To view the audited events, go through Event Viewer and look at the 
security logs. 

Data Recovery: 

But what good is protecting your data if it simply gets corrupted when the 
system crashes? Here too, NTFS has a solution. NTFS has superior data 
recovery capabilities (compared to FAT and FAT32). Each I/O operation that 
modifies a file on the NTFS volume is viewed by the file system as a 
transaction and can be managed as an atomic unit. When a user updates a 
file, the Log File Service tracks all redo and undo information for the 
transactions. If every step of the I/O process succeeds, then the changes 
are committed to disk. Otherwise, NT uses the Undo log to roll back the 
activity and restore the disk to a state before the changes were made. When 
Windows NT crashes, NTFS performs three passes upon reboot. First, it 
performs an analysis phase where it determines exactly which clusters must 
now be updated, per the information in the log file. Then it performs the 
redo phase where it performs all transaction steps logged from the last 
checkpoint. Lastly, it performs the undo phase where it backs out of all 
incomplete transactions. Together, these steps ensure that data corrupt is 
kept to a minimum. 

Yet Another Cool (But Scary) Feature: 

At this point, you probably think NTFS is pretty cool. But there is one 
other cool feature in NTFS that is documented, but not very well publicized 
(for obvious reasons as you will see). What I am referring to is 
filestreams (Unix users will be familiar with this feature). To illustrate 
the concept if filestreams, let's first picture any file (whether it be a 
document, a exe or a jpeg) as a garden hose. When you access the data in 
the file, that data flows through the file in a continuous stream, like 
water flows through a garden hose. In a typical file, there is only a 
single data stream, the default stream. All data written to and read from 
the file comes out of that stream. When Explorer displays (or the command 
interpreter) reads the size of the file, it is reading the data stored in 
that stream. In FAT and FAT32, this fact was of little concern since any 
file could only be given a single stream (the default). However, this all 
changes in NTFS, which allows any given file to have multiple data streams. 
This is akin to a garden hose that has within it multiple smaller hoses, 
each with its own stream of water flowing. In fact, each stream can contain 
different types of data. One data stream could be a text document, while 
another could contain WAV file data, another that contains executable code, 
and yet another that contains jpeg data. You can almost think of files with 
multiple data streams as a special kind of folder with multiple files 
stored within it. 

To illustrate my point, let's create a text file with multiple filestreams: 

1 - Go to Windows NT's Command Interpreter (type cmd at the Run prompt) 
2 - Switch to a partition that is NTFS. 
3 - Type the following: 
echo This is what you'll see >> stream.txt [Press Enter] 
echo This is what you won't see >> stream.txt:hiddenStream [Press Enter] 
4 - Now, open the file up in Notepad 

What you'll see is the text "This is what you'll see." The other string of 
text "This is what you won't see" is in the file, but it is stored in a 
separate file stream called hiddenStream. And since most programs do not 
read data from any stream other than the default stream, that data is 
hidden from the user. To view the contents of the hidden stream, do the 
following: 

1 - Go to the NT Command Interpreter. 
2 - Type the following: 
more < stream.txt:hiddenStream 
3 - And viola! There is your hidden stream 

At this point, you should be getting chills, because filestreams brings up 
some very disturbing possibilities for writing viruses and such. A virus 
writer could conceivably write the executable code for his virus into a 
hidden stream of a text file! This way, normal virus scanners would not 
find the harmful code. To activate the virus, the malicious programmer need 
only to write a catalyst program that performs a seemingly innocuous file 
read operation from a text file. The worst part of all of this is that 
hidden streams are difficult to detect because data written into the file 
stream is NOT calculated as a part of the file's size. So you could have a 
text file that contains 20 bytes of text and 2 megs of executable code and 
show up as 20 bytes. Even worse, any user could create files with hidden 
streams, even your guest account users (assuming they can write to a 
directory). 
Thankfully, the situation is not hopeless. For one, hidden file streams can 
be detected via the use of Windows APIs. Secondly, all hidden streams are 
lost when the file is copied to a non-NTFS partition. So conceivably, anti 
virus firms can write scanners that scan form hidden streams. To the best 
of my knowledge, there haven't been any serious viruses written to take 
advantage of this particular feature in NTFS. For now, you can rest easy 
knowing that the end isn't quite here yet. But definitely keep filestreams 
in mind, for if there is a security weakness, somebody will find it 
sometime. 

Conclusion: 

There you have it - the three most common file systems in a nutshell. I 
hope this article has been at least mildly entertaining for some of you. As 
always, any comments, suggestions or corrections can be sent to xinli1
@uiuc.edu. Until next time, happy computing folks. 


http://www.pcnineoneone.com