MCTS 70 – 433 Applying additional query techniques

Applying additional query techniques

Implement subqueries

Subqueries can be correlated or noncorrelated; correlated subqueries depend on the outer query whereas noncorrelated subqueries are independent. The subqueries are defined within brackets.

An example correlated query would be:

SELECT
p.ProductId
p.Name
p.ListPrice
FROM
Production.Product p
WHERE EXISTS (SELECT 1 FROM Sales.SalesOrderDetail sod WHERE sod.ProductId = p.ProductId)

The above subquery within the brackets will be repeatedly executed based in input fron the outer query i.e. p.ProductId

Noncorrelated queries allow you to build dynamic queries which do not require the end user to know the intermediate data e.g.

SELECT
a.Name,
a.ListPrice
FROM 
Production.Product a
WHERE
a.ListPrice > (SELECT AVG(b.ListPrice) FROM Production.Product b)
ORDER BY
ListPrice

The above subquery within the brackets will be executed independently of the outer query.

Derived tables

SELECT
he.EmployeeID,
he.Title,
NoJobTitles.NumTitles
FROM (
SELECT he.Title, COUNT(*) NumTitles
FROM HumanResources.Employee he
GROUP BY Title) NoJobTitles
JOIN
HumanResources.Employee he
ON
NoJobTitles.Title = he.Title

Derived tables or virtual tables are tables created at runtime from a SELECT statement embedded in the FROM clause. SQL server will execute the embedded SELECT statement first to build the table structure and tag is ready for it to be referenced by the outer query.

Implement Common Table Expression (CTE) queries

CTEs are defined in two parts; a WITH clause containing a SELECT statement that generates a valid table and a outer SELECT statement that references the table expression.

A recursive CTE expands the definition of a table expression. Recursive CTEs contain a anchor query; this a SELECT statement that returns a starting point. e.g.

SELECT
*
FROM
table tbl1
WHERE
tbl1.id = [some value]

The anchor query is combined with a second query using the UNION ALL operator and executed recursively up the anchor query. The outer query returns the results of the recursive query (you can only do joins on the outer SELECT to return additional data).

Example here

Apply ranking functions

There are four ranking functions:

row_number, rank, dense_rank and ntile.

ROW_NUMBER assigns a number from 1 to n based on a user specified sorting order i.e. column. ROW_NUMBER also has the PARTITION BY option which will segment results depending on what is defined after the PARTITION BY clause.

The next functions RANK and DENSE_RANK are used to RANK a row on a particular column e.g. ranking components by the quantity is stock. If the RANK function doesn’t come across any rows which tie i.e. have the same data for that particular column then RANK, DENSE_RANK and ROW_NUMBER return the same result set.

If there are ties in the result set then RANK will display gaps in the ranking sequence where n number of rows are tied e.g. if the 1th ranking row has 3 equivalently ranked rows then the next ranking assigned will be 14th; DENSE_RANK doesn’t produce gaps in this scenario. RANK and DENSE_RANK also have the PARTITION BY option.

The NTILE(n) function divides the result set into n number of groups which are equal where applicable. NTILE also has the PARTITION BY option.

Examples here

Control execution plans

The execution of a query can be controlled with TABLE, QUERY and JOIN hints.

TABLE hints override the default behaviour of the query optimiser by specifying a locking mechanism, index or processing operation.

The TABLE hints available are:

FORCESEEK – forces the query optimser to only use an index seek operation to retrieve the data. Can be used on clustered and nonclustered indexes. If no index is found for that particular table then an error is returned.

KEEPIDENTITY – Used with BULK INSERTS statements. This allows for the identities contained in the import data to be retained; as long as they don’t violate constraints.

KEEPDEFAULTS – Used with BULK INSERT statements. This allows the column defaults to be used when no value is specified for that column in the BULK INSERT.

HOLDLOCK (SERIALIZABLE) – This option holds the shared lock until the transaction is complete; ordinarily the shared lock would be released when the statement completes.

NOEXPAND – used with indexed views; ensures the index isn’t expanded to access data from the underlying tables.

INDEX() – specifies an index to use to process the query statement.

IGNORE_CONSTRAINTS – Used with BULK INSERTS statements e.g. foreign key and check constraints can be ignored. NOTE: Primary key, Unique and NOT NULL constraints cannot be overridden.

IGNORE_TRIGGERS – Used with BULK INSERT statements; any triggers defined are ignore whilst the BULK INSERT executes.

NOLOCK (READUNCOMMITTED) – Allows dirty reads i.e. uncommitted transactions can be read. Any schema style tranactions in progress will block this transaction e.g.

BEGIN TRANSACTION
ALTER TABLE table_name ADD CONSTRAINT constraint_name CHECK (LEN(Column1) > 1);

In another query window run:

SELECT * FROM table_name WITH (READUNCOMITTED);

NOWAIT – returns an error message to the end user or application encounters a lock.

PAGLOCK – places a lock on the page rather than a row or table; this would be pointless in most scenarios if the transaction isolation level is SNAPSHOT. This is because the SNAPSHOT isolation level defines that transactions writing data do not block SNAPSHOT transactions.

READCOMMITTED (READCOMMITTEDLOCK) – follows the READ COMMITTED transaction isolation level. The isolation level is locking or row_versioning depending on whether READ_COMMITTED_SNAPSHOT is ON or OFF. READCOMMITTEDLOCK overrides READCOMMITTED by using locking.

READPAST – Skips row and page level locks and returns the unlocked rows. READPAST is only used in UPDATE and DELETE statement when trying to identity which records to modify.

REPEATABLEREAD – specifies that queries cannot READ UNCOMMITTED data but also that other transactions cannot modify data being read by the REPEATABLEREAD transaction e.g.

BEGIN TRANSACTION
SELECT * FROM table_name WITH (REPEATABLEREAD)

Attempting to execute an UPDATE, INSERT or DELETE statement whilst this transaction is in progress will be blocked.

TABLOCK (TABLOCKX) – places a shared lock on the table until the statement is complete; if holdlock is also specified then the lock will be in place until the end of the transaction. If tablock is specified for an INSERT statement then the recovery model must be simple or bulk logged. TABLOCKX places a exclusive lock on the table rather than a shared lock.

UPDLOCK – places update locks on the table until the transaction is complete.

XLOCK – places an exclusive lock on the table, row or page depending on whether ROWLOCK, PAGLOCK or TABLOCK is specified. The lock is held in place until the transaction completes.

JOIN hints override the default behaviour of the query optimiser by specifying JOIN strategy. 

The JOIN hints available are:

HASH – this type of JOIN is primarily used when a JOIN is used to JOIN tables with no indexes defined.

LOOP – this type of JOIN is used when one or more tables have an index be it clustered or nonclustered.

MERGE – this type of JOIN is used when both columns defined in the ON clause have clustered indexes; you may find that small tables i.e those that consume less than one data page would probably use a LOOP JOIN.

REMOTE – this JOIN type can only be used for INNER JOINs and the right hand table is REMOTE.

QUERY hints override the default behaviour of the query optimiser and are used throughout the query.

RECOMPILE – instructs the database engine to discard the query plan generated after the query has executed.

MAXRECUSRION – specifies the no. of recursions allowed. Allowable values are 0 to 32767; 100 is the default.

OPTIMIZE FOR – used to force the query optimiser to use a optimised plan for most queries; used when SQL server cannot do parameter sniffing due to those values not being available until runtime.  

Manage international considerations

The collation used to store, sort and compare char and varchar data types is defined at the server level; it can also be defined per column. The collation format is as follows: character_set(code page)_[CI|CS]_[AI|AS]

CI | CS – Case Insensitive or Case Sensitive

AI | AS – Accent Insensitive or Accent Sensitive 

nchar and nvarchar are different, they use a unicode universal code page to store characters; this code page can store any character regardless of collation because it uses two bytes to store each character whereas char and varchar code pages use one ot two bytes depending on the collation.

You can compare the data from a record against a different COLLATION e.g.

SELECT
pp.Firstname,
pp.LastName
FROM
Person.Person pp
WHERE
pp.FirstName = 'Francois' -- this query would not return two records
which contain the Persons François
Adding COLLATE Latin1_General_CI_AI would return those two records;
I believe this is because the default COLLATION is accent sensitive.

or sort order:

SELECT
pp.Name
FROM
Production.Product
ORDER BY
pp.Name COLLATE [Latin1_General_CS_AS|Traditional_Spanish_CI_AI]

The methods above negates any defined indexes.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.