This is the last tutorial on what I would consider "basic" SQL, that is, queries that operate on a single table. The
LIMIT functions aren't hard to figure out; they do what they say they do. Their functionality, for our purposes, is largely cosmetic. So this tutorial will mostly be a continuation of the "Let's see how complicated we can make our queries" game.
Note: For this SQL lesson, I will be using the Sequel Pro GUI for the MySQL database engine and will be querying the SFPD incident reports categorized as
If you want to see exactly the same results I do, you'll want to download and import the__ MySQL database of 2003 to 2013 assault reports from the SFPD__.
I've also created the SQLite version of that database, which should functionally be the same as the MySQL version.
For both MySQL and SQLite, I've also created a database of all the SFPD reports from 2003 to 2013. All the queries should work the same, except you can explore all the different categories of crime. The tradeoff is that the database is much bigger, and so will be slower to download, import, and query. If you're completely new to all this, I would just go with the assaults database, just so any errors you make don't take even longer to figure out.
LIMIT the number of results returned
LIMIT clause takes a number and then limits the number of results to that number. For example, to return just the first row in
SELECT * FROM sfpd_incidents LIMIT 1
LIMIT your results, but want to show, say, the 10th result instead of the first, you can use the
SELECT * FROM sfpd_incidents LIMIT 1 OFFSET 9
Note: I've almost never used
OFFSET outside of building a web application, in which I'm doing something like paginating results (i.e. showing the fifth page of 100 records, 10 records per page). Likely, you won't need
OFFSET for doing data queries.
LIMIT clause is frequently used with
ORDER, because we generally don't care about the very first record in a dataset, but the first record according to a specified order. This is not conceptually different than doing a column sort in a spreadsheet.
ORDER BY clause comes before
LIMIT; and both clauses will typically be the final pieces of a query.
For example, if we aren't sure that a dataset is chronologically ordered, we can specify that results be sorted by
SELECT IncidntNum, Date, Time FROM sfpd_incidents ORDER BY Date LIMIT 5
ORDER BY clause will take a comma-separated list of column names. In the above result set, to get a true chronological sorting, we need to do a secondary sort on the
SELECT IncidntNum, Date, Time FROM sfpd_incidents ORDER BY Date, Time LIMIT 5
+————+————+——-+ | IncidntNum | Date | Time | +————+————+——-+ | 040507759 | 2003-01-01 | 00:01 | | 021623770 | 2003-01-01 | 00:01 | | 030000600 | 2003-01-01 | 00:01 | | 030320997 | 2003-01-01 | 00:01 | | 030470586 | 2003-01-01 | 00:01 | +————+————+——-+
GROUP BY, the order of the columns given to
ORDER BY does matter, with priority given to the columns from left to right. For example, the following query would order results by
Time, and only order by
Date in the event of a tie of
SELECT IncidntNum, Date, Time FROM sfpd_incidents WHERE PdDistrict = 'Southern' ORDER BY Time, Date LIMIT 5
ORDER BY will return the results in ascending order, i.e from
99. If we want to reverse that sort, we provide the
DESC keyword (short for descending) after the column name.
The following query will return the latest (well, in the year 2013) results in our dataset:
SELECT IncidntNum, Date, Time FROM sfpd_incidents ORDER BY Date DESC, Time DESC LIMIT 5
All together now
LIMIT clauses cap off SQL queries as we now know them:
SELECTcolumns, functions, and aliased columns
WHEREsome conditional statements, including
GROUP BY(optional) columns to group by
HAVING(optional) some conditional statements
Here's an actual example:
SELECT PdDistrict, Descript, COUNT(*) AS report_count FROM sfpd_incidents WHERE ((Date BETWEEN '2003-01-01' AND '2006-12-31') OR Date LIKE '2011%') AND Descript LIKE 'AGGRAVATED%' AND PdDistrict IN('CENTRAL', 'TENDERLOIN', 'SOUTHERN', 'NORTHERN') GROUP BY PdDistrict, Descript HAVING report_count > 500 ORDER BY report_count DESC LIMIT 5
Being able to order and limit the result set is handy when we're trying to find a single answer, such as: "In the year 2008, what was the very first incident report for the Northern district?" or "Show the top 3 districts when it comes to making arresting suspected stalkers"
This concludes our coverage of basic SQL functionality. Make sure you have a good grasp of how to properly construct these queries, because while we won't be learning much more syntax, we will be constructing the SQL equivalent of Dickensian-length sentences.
- Find the final
'AGGRAVATED ASSAULT'to have been reported to the
'TENDERLOIN'district in the year 2010.
- Show the top 3 districts when it comes to making arresting suspected stalkers.
SELECT * FROM sfpd_incidents WHERE Descript LIKE 'AGGRAVATED ASSAULT%' AND PdDistrict = 'TENDERLOIN' AND Date LIKE '2010%' ORDER BY Date DESC, Time DESC LIMIT 1
We only have to group by the
PdDistrict field if we filter for
SELECT PdDistrict, COUNT(1) AS c FROM sfpd_incidents WHERE Descript = 'STALKING' AND Resolution LIKE 'ARREST%' GROUP BY PdDistrict ORDER BY c DESC LIMIT 3