Monday, March 08, 2010

Null Characters

On a few occasions (normally when I am running out of disk space), I have seen ^@ symbols appear in my log files. These are "file holes" and contain null characters. The null character (or NUL char) has an ASCII code of 0 and appears as ^@ when viewed in 'vi' or 'less'.

Create a dummy file containing null characters:
In order to create a file with null characters, simply print \000. For example:

sharfah@starship:~> perl -e \
'print "hello \000world\000\nfoo bar\n";' > file-with-nulls
sharfah@starship:~> less file-with-nulls
hello ^@world^@
foo bar
Find lines containing null characters:
Use the following command in order print out lines containing null characters:
sharfah@starship:~> perl -ne '/\000/ and print;' file-with-nulls \
| less
hello ^@world^@
You can also perform an octal dump of the file to check if it has null characters:
sharfah@starship:~> od -b file-with-nulls | grep ' 000'
0000000 150 145 154 154 157 040 000 167 157 162 154 144 000 012 146 157
Delete null characters:
There are various ways in which this can be done. In the following examples, I have used tr and sed to remove the unwanted characters.
sharfah@starship:~> tr -d '\000' < file-with-nulls | less
hello world
foo bar
sharfah@starship:~> sed 's/\x0//g' < file-with-nulls | less
hello world
foo bar
Do not use the strings command because it will create a newline when it encounters a null characeter. "The strings utility looks for ASCII strings in a binary file. A string is any sequence of 4 or more printing characters ending with a newline or a null character."
sharfah@starship:~> strings file-with-nulls | less
hello
world
foo bar

Further reading:
ASCII Character Set
Identifying and removing null characters in UNIX [stackoverflow]
File holes