File Systems Unraveled
Most of you have probably heard the term FAT, FAT32, FAT16, NTFS and partition table thrown around quite a bit. They seemed like such mystical entities. You never could figure out what exactly they all meant. This article will change that. By the time I am done, you will know precisely what each of these terms mean.
What is a File System?
First, let’s understand what a file system is. A file system can be thought of as the way your computer goes about managing the files that gets stored on your hard drive. Your computer has thousands upon thousands of files. If there were no organized way of managing them, your system would be infinitely slow, provided that it works at all. This is understandable if you just consider how much stuff you have piled in your office, and how much time is wasted finding stuff that’s buried under a ton of paper. Now take that mess, and multiply it by a thousand. That is what your computer would be going through if an efficient file system didn’t exist. And just like there are all kinds of people in the world who organize things differently in the office, there are many file systems out there with varying features. However, there are several key functions that no file system should be without:
- Efficiently use the space available on your hard drive to store the necessary data
- Catalog all the files on your hard drive so that retrieval is fast and reliable.
- Provide methods for performing basic file operations, such as delete, rename, copy, and move.
- Provide some kind of data structure that allows a computer to boot off the file system.
There are of course other file systems that go beyond meeting basic requirements by providing additional functionality, such as compression, encryption, password/permissions, filestreams, etc. Later on in this article, I will discuss some of the extra features in relation to Windows NT’s NTFS.
FAT In Detail
Note: This section is more technical in nature than the rest of the article. Feel free to skip if you’d like. But be warned that you’ll miss some interesting tidbits about the FAT you probably never knew.
So what is FAT, and how do file systems work? The answer is quite simple in fact. The space on your hard drive, at its most basic level, is divided into units called sectors. Each sector is 512 bytes. So if your hard drive had 10 Kilobytes worth of total disk space, that would mean it is divided into 20 sectors. But the file system doesn’t directly deal with the hard drive on a sector by sector basis. Instead, it groups a bunch of sectors together into a cluster, and it deals with the cluster. These clusters are also called allocation units by DOS. So another way of thinking about this is to suppose that each sector on your hard disk is a person carrying a bag, where you can store 512 bytes of information into each bag. Now instead of numbering each person as 1,2,3, etc … The file system first takes several people and put them into a group, and call that group 1. So if you had 400 people, and the file system decided to put 4 people to a group, then you’d have 100 groups. In other words, on a drive with 400 sectors (or roughly 200K of space), and with an allocation size of 4 sectors (or 2K), there would be 100 clusters. So then when the file system needs to access a particular sector, it would first find the cluster number of the sector, and then within that cluster, it would access that particular sector by its sector index. This is akin to saying to find a person, say Jon, I would find Jon’s group number first, and then go to his group and look for him.
All three of the file systems (FAT16, FAT32 and NTFS) work like this. So what is the difference between FAT16 and 32? The major difference lies in how much space each file system can handle and how efficiently the file system does it. The problem with file efficiency arises because each cluster on a hard disk can only store one file! That means each group can only be made to handle one item. To illustrate my point, consider the following situation:
The file system decides to divide all the people into groups of 8 (we’ll get into how this number of chosen later). Each of these 8 people has a bag that can store stuff.
Now the file system hands the first group a huge box of pencils and says “store this.” The eight people start to put the pencils in their bags, and after one fills up, they move on to the next. The box of pencils fills 7 bags.
The file system tries to hand the group another small thing to put into the last 8th bag which is empty. But the group says “sorry, we can only handle one thing. You gave us one already.” The file system says “fine, but you are wasting 12% of your space (1/8 = 0.125)” The group tells the file system “sorry, we can’t help it.” The file system moves on.
Now the file system gives the next group of 8, only a single pencil to store. The group stores it and refuses to take anything else. The file system informs the group that they are wasting almost 100% of their storage space. But there is nothing they can do.
These stories may seem silly, but they do get the point across, which is that as the size of the clusters increase, the amount of space you waste will increase. It is true that if you can make all your files precisely the same size as your cluster, then you’d have 0% waste. But that is not possible. Most typical files are not very big, and if the cluster size gets huge, then the waste can be quite alarming.
So now the question becomes how does my computer figure out the size of each cluster? The answer is simple, take the size of your hard drive, and divide that by the number of clusters involved. So what I am saying is this:
And since Cluster Size is directly proportional to wasted space (in other words, as the cluster size increases, the waste space also increases), we can see that what we want is a file system that can handle a large number of clusters. And this is where FAT16 and FAT32 differ. FAT32 can handle a lot more groups then FAT16 can.
But why is that? The simple explanation is that FAT32 can count a lot higher than FAT16. As I said above, each cluster is numbered by the file system. FAT16 uses 16 bit numbers to count the clusters. That means FAT16 uses binary numbers of 16 digits. The consequence is that the highest FAT16 can count to is 2^16 – 1 (yes, it is in fact 2^16 – 1, because there are 2^16 digits between 0 and 2^16 – 1. Zero also has to count), or 65535. So there can only be 65535 clusters to each FAT 16 disk. What that means for you, is that as your hard drive gets huge, your cluster count remains the same, so your cluster size increases.
But don’t think for a minute that you can just indefinitely increase the size of each cluster. That can’t happen. The reason is that every sector inside a group also has to be numbered. Each sector has an index number that is written inside a byte. A byte is 8 bits. What that means is that the number has to be less than 2^8 (255 to be exact). And since the way you decrement in computers is to go by powers of 2, that means you can only store a number as big as 2^7, or 128 sectors. So now let’s do a little bit of math:
You have a max of 65535 clusters,
You have a max of 128 sectors per cluster
You have 512 bytes per sector.
That means your max FAT16 size is = 65535 * 128 * 512 = 4 GB
Wait a second? That’s not right! I thought the limit was 2GB? And I thought each cluster in FAT16 can be only 32K, not 64K! And you would be right. The problem is that 128 sectors * 512 bytes per sector is 65536, which is one more than a 16 bit number can handle. So again, we decrement to 64 sectors per cluster, which yields us 32K per cluster. And 32K per cluster * 65535 is roughly 2GB.
– Xin Li –