Download BOOK ». Get BOOK. Electronic books. SQL is a standard interactive and programming language for querying and modifying data and managing databases. This task-based tutorial and reference guide takes the mystery out learning and applying SQL. His books have sold over , copies world wide in more than 20 languages. After 14 years of working for himself, Larry joined Stripe in He is currently a Technical Writer there.
About Larry Ullman Larry Ullman is a writer, developer, teacher, speaker, and consultant. Note Some of the links contained within this site have my referral id e. A condition, or predicate, is a logical expression that evaluates to true, false, or unknown. Rows for which the condition is true are included in the result; rows for which the condition is false or unknown are excluded. An unknown result, which arises from nulls, is described in the next section. SQL provides operators that express different types of conditions Table 4.
Operators are symbols or keywords that specify actions to perform on values or other elements. Datetimes must have the same fields year, month, day, hour, and so on to be compared meaningfully. Compare only identical or similar data types. An expression is any valid combination of column names, literals, functions, and operators that resolves to a single value per row. Chapter 5 covers expressions in more detail Listing 4.
For speed, fold your constants into a minimal number of expressions. Note that in the latter query, the subquery is aliased ta a table alias.
All DBMSs accept table aliases, but not all require them. Logical operators, or Boolean operators, are operators designed to work with truth values: true, false, and unknown. In two-value logic, the result of a logical expression is either true or false. In three-value logic, the result of a logical expression is true, false, or unknown. If the result of a compound condition is false or unknown, the row is excluded from the result. This type of table is called a truth table.
Any number of conditions can be connected with ANDs. All the conditions must be true for the row to be included in the result. Some compound conditions need parentheses to force the order in which conditions are evaluated. See Listings 4. Table 4. Listing You can combine the three logical operators in a compound condition. You can override this order with parentheses: Everything in parentheses is evaluated first. When parenthesized conditions are nested, the innermost condition is evaluated first.
AND is evaluated before OR, so the query is evaluated as follows: 1. Find all the history titles regardless of price. List both sets of titles in the result Figure 4.
Find all the biography and history titles. List the subset of titles in the result Figure 4. To see the result of each comparison in Listing 4. Table 5. Operators in the same row have equal precedence. Associativity determines the order of evaluation in an expression when adjacent operators have equal precedence. SQL uses left-to-right associativity. You can use parentheses to override precedence and associativity rules Listing 5.
To run Listing 5. The third and fourth columns show how to use parentheses to override associativity rules. See Figure 5. Determining the Order of Evaluation Chapter 5 Concatenating Strings with Concatenating Strings with Use the operator to combine, or concatenate, strings. Listing 5. Here, I need to convert sales from an integer to a string. Here, I need to convert pubdate from a datetime to a string.
Each operand is a string expression such as a column that contains character strings, a string literal, or the result of an operation or function that returns a string Listings 5.
The efficient way to express the clause is: Title T12 published on Title T06 published on Title T07 published on Figure 5. To run Listings 5. Search your DBMS documentation for concatenation or conversion.
Operators and Functions Listing 5. The alphabetic part of a publisher ID is the first character, and the remaining characters are the numeric part. Figure 5. To extract a substring: Listing 5. Heydemark CO C. Kells NY Figure 5. Your DBMS implicitly might constrain start and length arguments that are too small or too large to sensible values.
The substring function silently may replace a negative start with 1 or a too-long length with the length of string, for example. Search your DBMS documentation for substring or substr. Digits, punctuation, and whitespace are left unchanged. All the letters in the LIKE pattern must be uppercase for this query to work. Your DBMS might provide other stringcasing functions to, say, invert case or convert strings to sentence or title case.
Search your DBMS documentation for character functions or string functions. The characters show the extent of the trimmed strings. Both Figure 5. Your result will be either Figure 5. The CHAR 20 conversion shortens the title to make the result more readable.
Widening conversions always are allowed, but narrowing conversions can cause your DBMS to issue a warning or error. You can use Space number to add spaces to strings and Left string, length to truncate strings.
Search your DBMS documentation for conversion, cast, or formatting functions. CASE makes no changes to the underlying data. The simple CASE expression compares an expression to a set of simple expressions to determine the result. All expressions must be of the same type or must be implicitly convertible to the same type.
First, value1 is compared. To run Listing 7. Corker E03 Listing 7. See Figure 7. That way, you can compare a column in the first instance of the table to a column in the second instance. As with all joins, your DBMS combines and returns rows of the table that satisfy the join condition. Salter Lord Copper e1. Salter e2.
Corker Mr. Listing 7. A join condition takes this form: Creating a Self-Join alias1. A common type of self-join compares a column in the first instance of the table to the same column in the second instance.
This join condition lets you compare the values in a column to one another, as shown in the subsequent examples in this section. Oracle 9i and later support JOIN syntax. Using a subquery, Listing 7. Adding a join condition retains only those rows in which the two authors differ Listing 7. The first row states that Sarah Buchman lives in the same state as Christian Kells, and the second row gives the same information.
Listing Listing 7. Subsequent sections explain the types of subqueries and their syntax and semantics. Suppose that you want to list the names of the publishers of biographies. The naive approach is to write two queries: one query to retrieve the IDs of all the biography publishers Listing 8.
Understanding Subqueries A better way is to use an inner join Listing 8. Another alternative is to use a subquery Listing 8. The subquery in Listing 8. A subquery also is called an inner query, and the statement containing a subquery is called an outer query.
In other words, an enclosed subquery is an inner query of an outer query. Remember that a subquery can be nested in another subquery, so inner and outer are relative terms in statements with multiple nested subqueries. Listing 8. See Figure 8.
Note that the inner query in Listing 8. You still must terminate the statement that contains the subquery with a semicolon. A subquery returns an intermediate result that you never see, so sorting a subquery makes no sense. The SQL standard categorizes a subquery by the number of rows and columns it returns Table 8.
In all cases, the subquery also can return an empty table zero rows. This built-in limit typically exceeds the limit of human comprehension. Microsoft SQL Server, for example, allows 32 levels of nesting. Table 8. Many subqueries can be formulated alternatively as joins. In fact, a subquery is a way to relate one table to another without actually doing a join.
Because subqueries can be hard to use and debug, you might prefer to use joins, but you can pose some questions only as subqueries. In cases where you can use subqueries and joins interchangeably, you should test queries on your DBMS to see whether a performance difference exists between a statement that uses a subquery and a semantically equivalent version that uses a join.
Joins Listing 8. You always can express an inner join as a subquery, but not vice versa. This asymmetry occurs because inner joins are commutative; you can join tables A to B in either order and get the same answer. Subqueries lack this property. For information about aggregate functions, see Chapter 6. Figure 8. A correlated subquery is used if a statement needs to process a table in the inner query for each row in the outer query. This section gives an example of a simple subquery and a correlated subquery and then describes how a DBMS executes each one.
Subsequent sections in this chapter contain more examples of each type of subquery. Simple subqueries A DBMS evaluates a simple subquery by evaluating the inner query once and substituting its result into the outer query. A simple subquery executes prior to, and independent of, its outer query. Subqueries Listing 8. The inner query a simple subquery returns the cities of all the publishers Listing 8. Correlated subqueries offer a more powerful data-retrieval mechanism than simple subqueries do.
In the context of correlated subqueries, these qualified named are called correlation variables. The correlation variable candidate. This process continues until all the candidate rows have been processed. Subqueries In Listing 8.
It needs a value for candidate. The column average. The average sales for a book type are calculated in the subquery by using the type of each book from the table in the outer query candidate. The subquery computes the average sales for this type and then compares it with a row in the table candidate. If the sales in the table candidate are greater than or equal to average sales for the type, that book is displayed in the result.
The DBMS repeats this process until every row in the outer table candidate has been tested. The book type in the first row of candidate is used in the subquery to compute average sales. Take the row for book T01, whose type is history, so the value in the column type in the first row of the table candidate is history.
In effect, the subquery becomes: 2. Listings 8. Why do I say that a statement that uses a simple subquery probably will run faster than an equivalent statement that uses a correlated subquery when a correlated subquery clearly requires more work?
MySQL 4. To run Listings 8. This query probably will run slower than Listing 8. In statements that contain subqueries, column names are qualified implicitly by the table referenced in the FROM clause at the same nesting level.
In Listing 8. Abatis Publishers Schadenfreude Press Figure 8. A subquery can hide a comparison to a null. Consider the following two tables, each with one column. The first table is named table1: col Listing 8. This result is an empty table, which is correct logically but not what I expected. Why is the result empty this time? The solution requires some algebra. I can move the NOT outside the subquery condition without changing the meaning of Listing 8.
Refer to the AND truth table Table 4. To fix Listing 8. Recall from Table 8. The aggregate function AVG guarantees that each subquery returns a single value. For a more efficient way to implement this query, see the Tips in this section. See Listing You should qualify every column name explicitly in a subquery that contains a join to make it clear which table is referenced even when qualifiers are unnecessary.
Subqueries introduced with comparison operators often use aggregate functions to return a single value. No publisher named XXX exists, so the subquery returns an empty table zero rows. The comparison evaluates to null, so the final result is empty. Again, the subquery returns a single value the average of all sales. The subquery calculates the highest royalty share for each book being considered for selection in the outer query.
For each possible value of ta1, the DBMS evaluates the subquery and puts the row being considered in the result if the royalty share is less than the calculated maximum. Comparing a Subquery Value Listing 8. For each possible value of t1, the DBMS evaluates the subquery and includes the row in the result if the price value in that row exceeds the calculated average. You also can use a subquery to generate the list. A subquery that returns more than one column will cause an error.
The DBMS evaluates this statement in two steps. First, the inner query returns the IDs of the publishers that have published biographies P01 and P Second, the DBMS substitutes these values into the outer query, which finds the names that go with the IDs in the table publishers. Listing Listing 8. Finally, the outermost query uses the author IDs to find the names of the authors. To determine whether an author is a coauthor or the sole author of a book, examine his or her royalty share for the book.
If the royalty share is less than percent 1. The DBMS considers each row in the outer-query table authors to be a candidate for inclusion in the result.
When the DBMS examines the first candidate row in authors, it sets the correlation variable a. Chapter 8 Listing 8. The inner query returns the author IDs of sole authors, and the outer query compares these IDs with the IDs of the coauthors. You can rewrite Listing 8. To run Listing 8. The subquery returns the same number of columns as there are in the list. The DBMS compares the values in corresponding columns. ALL means greater than every value in the subquery result. ALL means greater than every subquery value—that is, greater than the maximum value.
You might find this result to be counterintuitive. For example, the query Subqueries Listing 8. The ALL condition evaluates to true if all values in subquery satisfy the ALL condition or if the subquery result is empty has zero rows. The inner query finds all the biography prices.
The outer query inspects the lowest price in the list and determines whether each nonbiography is cheaper. The inner query uses a join to find the sales of each book by author A The outer query inspects the highest sales figure in the list and determines whether each book sold more copies.
The inner query is evaluated once for each group defined in the outer query once for each type of book. ANY means greater than at least one value in the subquery result. ANY means greater than at least one subquery value— that is, greater than the minimum value.
If any at least one value in subquery satisfies the ANY condition, the condition evaluates to true. The ANY condition is false if no value in subquery satisfies the condition or if subquery is empty has zero rows or contains all nulls. You can use IN to replicate Listing 8. The outer query inspects the highest price in the list and determines whether each nonbiography is cheaper.
Unlike the ALL comparison in Listing 8. The outer query inspects the lowest sales figure in the list and determines whether each book sold more copies. Again, unlike the ALL comparison in Listing 8. I can replicate Listing 8.
Listing specific column names is unnecessary, because EXISTS simply tests for the existence of rows that satisfy the subquery conditions; the actual values in the rows are irrelevant. See Listing 8. Here, the first publisher is P01 Abatis Publishers. If so, Abatis Publishers is included in the final result. See Listing 9.
This query is equivalent to Listing 8. The existence test in Listing 8. I could argue that the result, Figure 8. Additionally, in Listing 8. Additionally, in Listings 8. For example, change Listing 8. Each of the statements in Listing 8. The first two queries inner joins will run at the same speed as one another.
Of the third through sixth queries which use subqueries , the last one probably is the worst performer. The DBMS will stop processing the other subqueries as soon as it encounters a single matching value.
But the subquery in the last statement has to count all the matching rows before it returns either true or false. Entire careers are devoted to solving these types of optimization problems.
A03 A04 A05 A06 Figure 8. DBMSs provide tools to let you measure the efficiency of queries. Tables 8. Performance tuning involves some platform-independent general principles, but the most effective tuning relies on the idiosyncrasies of the specific DBMS. Tuning is beyond the scope of this book, but the internet has plenty of discussion groups and articles—search for tuning or performance or optimization together with the name of your DBMS.
If you look up one of these books on Amazon. But whereas mathematical sets are unchanging, database sets are dynamic—they grow, shrink, and otherwise change over time. This operation differs from a join, which combines columns from two tables. See Figure 9. The sort is applied to the final, combined result. Listing 9. Set Operations Listing 9. The number and the order of the columns must be identical in both statements, and the data types of corresponding columns must be compatible.
Duplicate rows are eliminated from the result unless ALL is specified. The AS clause in the first query names the column in the result. Chapter 9 Listing 9. Figure 9. The results are DBMS dependent. The second statement includes duplicates in the union of table1 and table2 but eliminates duplicates in the subsequent union with table3, so ALL has no effect on the final result of this statement. To run Listings 9. To run Listing 9. Duplicate rows are eliminated from the result.
Each of the following statements is equivalent to Listing 9. Unlike SELECT, which only accesses data, these statements change data, so your database administrator might need to grant you permission to run them. This section explains how to use those tools to display table definitions for the current database. The osql and sqlcmd commands display a few pages that speed by. Figure To display table definitions in Oracle: Figure Type describe table; and then press Enter Figure Table To modify table definitions, see Chapter See Figure The number of values must equal the number of columns in table, and the values must be listed in the same sequence as the columns in table.
This statement adds one row to table Listing The number of values must equal the number of columns in the column list, and the values must be listed in the same sequence as the column names. The DBMS inserts each value into a column by using corresponding list positions. An omitted column is assigned its default value or null.
This statement adds one row to table. Chapter 10 You can omit column names if you want to provide values for only some columns explicitly Listing The DBMS inserts nulls into the omitted columns automatically. The number of columns in the subquery result must equal the number of columns in table or in the column list. The first column in the subquery result is used to populate the first column in table or column1, and so on. This statement adds zero or more rows to table.
This statement inserts one row into publishers; see Figure Listing This statement inserts two rows into publishers; see Figure This statement inserts no rows into publishers because no publisher is named XXX; see Figure This statement has no effect on the target table.
For information about transactions, see Chapter The value returned by expr replaces the existing value in column. This statement updates 13 rows; see Figure This statement updates three rows; see Figure This statement updates two rows; see Figure You can update values in a given table based on the values stored in another table.
Each listing updates values in a different column or columns from those in the other listings. The updated values in each column are shown in red. The updated values are shown in red. These operations informally are called upserts. To run Listing Then run the statements programmatically in a host language such as Visual Basic or C , using the result of the first statement as the input for the second statement.
To run Listings The changes are Listing Even if you remove all rows from a table, the table itself still exists. This statement deletes 13 rows; see Figure This statement deletes two rows; see Figure This statement deletes 12 rows; see Figure These statements modify database objects and data, so your database administrator might need to grant you permission to run them.
To a user or an SQL programmer, a database appears to be a collection of one or more tables and nothing but tables. Constraints define properties such as nullability, keys, and permissible values. An optional list of table constraints follows the final column definition. By convention, I start each column definition and table constraint on its own line.
Creating, Altering, and Dropping Tables Table Your DBMS uses these rules to enforce the integrity of information in the database automatically. Constraints come in two flavors: A column constraint is part of a column definition and imposes a condition on that column only. You must use a table constraint to include more than one column in a single constraint. If a primary key contains one column, for example, you can define it as either a column constraint or a table constraint.
If the primary key has two or more columns, you must use a table constraint. Constraint names are optional, but many SQL programmers and database designers name all constraints.
Constraint names also appear in warnings, error messages, and logs, which is another good reason to name constraints yourself. Constraints names must be unique within a table. You must create at least one column. The table name must be unique within the database, and each column name must be unique within the table. If the nullability constraint is omitted, the column accepts nulls. The database designer makes these types of decisions before creating a table.
Where omitted, the nullability constraint defaults to allow nulls. For the table authors created by Listing MySQL, for example, assigns a default value of zero to numeric, non-nullable columns without explicit defaults. The pubdate and contract defaults show that the defaults can be expressions more complex than plain literals. Where no default is specified, the DBMS inserts a null. If no default is specified, NULL is assumed. In practice, you always should define a primary key for every table.
The database designer picks one of the candidate keys to be the primary key. Listings This syntax shows the easiest way to create a simple primary key. This restriction is called referential integrity. In practice, foreign-key constraints almost always are named explicitly. Poor design can lead to time-consuming routine queries, circular rules, tricky backup-and-restore operations, and psychotically ambitious cascading deletes.
Updating a row in the foreign-key table. Deleting a row in the foreign-key table. A referential-integrity check is unnecessary. Inserting a row into the parent table.
Updating a row in the parent table. Deleting a row from the parent table. This syntax shows the easiest way to create a simple foreign key. This syntax shows the preferred way to add foreign keys; you can use the names if you decide to change or delete the keys later.
Each foreign-key column is an individual key and not part of a single composite key. This action is the default. An ISBN is a unique, standardized identification number that marks a book unmistakably. A simple unique constraint can be a column constraint or a table constraint; a composite unique constraint always is a table constraint. A unique constraint is similar to a primary-key constraint, except that a unique column can contain nulls and a table can have multiple unique columns.
This syntax shows the easiest way to create a simple unique constraint.
0コメント