HOMEMySQL

Sparse files

What is a sparse file?

"A sparse file is a file where space has been allocated but not actually filled with data. These space is not written to the file system. Instead, brief information about these empty regions is stored, which takes up much less disk space. These regions are only written to disk at their actual size when data is written to them. The file system transparently converts reads from empty sections into blocks filled with zero bytes at runtime." [1]

In other words: Files are not as big as expected.

With databases this can be seen often: For example the MySQL Cluster REDO log files are created as sparse files or some ORACLE tablespace files.

But first let us create such a sparse file:

# dd if=/dev/zero of=sparsefile count=0 obs=1 seek=100G

# ls -lah sparsefile
-rw-r--r-- 1 oli users 100G 2007-10-24 11:18 sparsefile

# df -h .
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda9             5.0G  3.5G  1.2G  75% /home

Funny: How can I have a 100 Gbyte file on a 5 Gbyte device? And this also already shows the problem...

But first let us see how we can find the real size of the file. So we can see if a file will make trouble or not:

# du -ks sparsefile
0       sparsefile

In reality this file is only 0 kbyte in size.

Or an example from MySQL Cluster:

# ll -h D9/DBLQH/S?.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 18:02 D9/DBLQH/S0.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S1.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S2.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S3.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S4.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S5.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S6.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S7.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S8.FragLog
-rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S9.FragLog

# ll -hs D9/DBLQH/S?.FragLog
612K -rw-r--r-- 1 mysql dba 16M 2008-01-16 18:02 D9/DBLQH/S0.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S1.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S2.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S3.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:43 D9/DBLQH/S4.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S5.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S6.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S7.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S8.FragLog
548K -rw-r--r-- 1 mysql dba 16M 2008-01-16 13:44 D9/DBLQH/S9.FragLog

Why are sparse files dangerous?

In productive environments we want to have predictable behaviours of our systems. We therefore monitor these systems. With sparse files it becomes a little bit more tricky: We have free disk space, we have used disk space and we have possibly used disk space in the close or far future...

What we can do against?

Right now: Not much until the software vendor provides a possibility to avoid this.

Literature

[1] Sparse files on Wikipedia en
[2] Sparse files on Wikipedia de
HOMEMySQL