In the previous tutorial on datatypes, we learned that there is a difference between strings and numbers and dates, that "101" is not quite the same as
101 when it comes to SQL queries and functions.
In this brief tutorial, we learn that there's a difference between what we, in non-programmer-terms, think of as nothing – e.g.
0 and the empty string of
"" – and what SQL refers to as
"Yeah, well, sometimes nothin' can be a real cool hand" – Luke
To use a clumsy cinematic metaphor, a handful of nothing is what Paul Newman holds, colloquially, during a poker game in Cool Hand Luke. Whereas
NULL is what you hold when you are not in the game.
For us SQL programmers, there is one immediate syntactical takeaway: know the difference between:
SELECT * FROM fruits WHERE number_sold = 0 AND name = ''
SELECT * FROM fruits WHERE number_sold IS NULL AND name IS NULL
What is NULL
SQL null is a state (unknown) and not a value. – Wikipedia
A tricky concept for non-programmers is the concept of the
NULL state. In other programming languages, it's sometimes referred to as
nil; in Spanish, as nada:
Our nada who art in nada, nada be thy name thy kingdom nada thy will be nada in nada as it is in nada. Give us this nada our daily nada and nada us our nada as we nada our nadas and nada us not into nada but deliver us from nada; pues nada. Hail nothing full of nothing, nothing is with thee.”
NULL has connotations of nothingness, it is not the same as a blank value, i.e. an empty string of
"", and it is most definitely not the same as the number zero.
Think of a database of
people that contains names and yearly income:
The clearest illustration of the difference between
NULL and a value like
"" can be found in the
yearly_income. If you wanted to get the average income of the folks in the
people table, you might want to exclude people who aren't eligible to receive an income, e.g. a child. Technically, such a person "makes" an income equivalent to $0. But how would differentiate between such a person (in the example table above,
Zach Butler Jr.) and someone else who is expected to have an income but still earned $0? (e.g.
Charles R. Thompson III)?
yearly_income field is set to be a numeric-type of column, then we can't just add a
Butler Jr. However, we can set his
NULL, to differentiate his situation from someone who makes
So a query to find all
people who earned
0 income would look like this:
SELECT * FROM people WHERE yearly_income = 0
However, the syntax for finding
NULL states does not use the equals operator. To find all people with a
SELECT * FROM people WHERE yearly_income IS NULL
To find the inverse of that, i.e. people who are eligible to make an income:
SELECT * FROM people WHERE yearly_income IS NOT NULL
In the example
people table, look at the possible values for
To reiterate: just as
0 is not equivalent to
NULL – an empty string, i.e.
"" – is also not equivalent to
So here's where confusion often arises in data: what does
NULL and a blank value mean to the maintainer of the data? In the
yearly_income case, I posited an explanation: a
NULL state means that a person is ineligible to receive an income. However, that's just speculation: it could be that
NULL means, "The collector of this data never got around to finding this value".
NULL state could mean just that: "We haven't asked Nick R. Johnson if he is a Jr. or a Sr." And the blank value for Sara C. Smith could mean, "Sara C. Smith does not have a suffix". Or I could have the meanings switched up here. The point is, from a semantic standpoint, you might not know the difference between
0. And to find the difference, you may have to tab-out of your database GUI and make a phone call or visit to the data's originating office.
If you happen to be a creator of data, being aware of
NULL and blank values may be important, especially if you're starting data collection from a spreadsheet or text file, and then importing into a database. Sometimes the import process may interpret a blank value as
NULL; other times, just as an actual blank value or
0. And spreadsheets typically don't have a native way of designating a
The Wikipedia entry on
NULL in SQL has a nice summary of the controversy among programmers:
Null has been the focus of controversy and a source of debate because of its associated three-valued logic (3VL), special requirements for its use in SQL joins, and the special handling required by aggregate functions and SQL grouping operators…
For people new to the subject, a good way to remember what null means is to remember that in terms of information, "lack of a value" is not the same thing as "a value of zero"; similarly, "lack of an answer" is not the same thing as "an answer of no". For example, consider the question "How many books does Juan own?" The answer may be "zero" (we know that he owns none) or "null" (we do not know how many he owns, or doesn't own).
In a database table, the column reporting this answer would start out with a value of null, and it would not be updated with "zero" until we have ascertained that Juan owns no books.
The main point of this brief lesson was to make you aware of the existence of
NULL – some of the datasets I've imported into MySQL via Sequel Pro have treated blank values as blank. And in SQLite, the import process has treated the values as
At left is a table in MySQL/Sequel Pro that came in from a CSV file. Apparently, Sequel Pro considered blank values to be blank. At right, SQLite Manager treated the blank values as
NULL, which is signified as pink-colored cells:
As a result, the following query will not return the same results across the two datasets:
SELECT first_name, middle_name, last_name FROM members WHERE middle_name != ''
So, on an immediate, practical level, you will want to know the
NULL syntax in order to make valid queries, and you may have to check the table structure yourself to see what's going on. And if you are importing datasets on your own, now you have one more data-integrity issue to be mindful of.