Implementing tables and views
Create and alter tables
SQL server data types
char, nchar, varchar and nvarchar are supported character data types. Chars are fixed-length data types whereas varchars are variable length data types; the parameter defined for char or varchar ultimately defines how much storage is required for each column, etc.
Tinyint, smallint, int, bigint, decimal and numeric are supported numeric data types. The decimal and numeric data types take two parameters; precision and scale, precision is the number of digits to store and scale is the number of decimal places to store.
Integer data types don’t take any parameters but have their restrictions e.g. tinyint supports 0 – 255 (eight bit or one byte), smallint (sixteen bits or two bytes).
The real and float data types are approximate numeric data types; the parameter supplied to the float data type defines its mantissa (floating point precision). Any value less than or equal to 24 is assigned 24 bits of storage space; 25 and above upto 53 is assigned 53 bits of storage space.
Date and time
The new date and time data type datetime2 is the preferred data type to use in SQL server 2008. The benefits of this data type are: timezoneoffset, date only and time only data types. Previous datetime data types stored data and time together. This caused problems when performing comparisons; the other improvement is precison i.e. the datetime2 data type now stores time to the nearest 100 nanoseconds.
The new data types are: datetime2, datetimeoffset, date and time.
First of all; tables should be defined a schema at creation time; if a schema isn’t defined then the default is dbo. SQL server has a number of table types (permanent i.e. tables), temporary tables (local and global) defined by #table name and ##table name and table variables defined by @table name.
There are a number of rules to take into consideration when creating tables. The table name and column names cannot start with a digit; unless they’re surrounded by square brackets (delimited identifiers) or have spaces; unless they’re surrounded by square brackets; this is the ANSI SQL standard.
Delimited identifiers can use double quotes or square brackets; if QUOTED_IDENTIFIER is ON then this works just fine, if QUOTED_IDENTIFIER is off then double quotes are interpreted as strings.
When naming a table you should consider using PascalCasing and a descriptive name.
The data types you define for your columns should be appropiate i.e. varchar(max) should be the exception not the standard. It should be noted that varchar(max), nvarchar(max), varbinary(max) and XML prevent onilne index rebuilds.
When defining columns; the IDENTITY column should be a whole number i.e. INT or a zero scaled DECIMAL (bit confused here as I though uniqueidentifier could be an identify column), NULLs should be used sparingly i.e. only allow NULLs when absolutely necessary.
Compression is a new feature of SQL server 2008; this feature allow you to define PAGE or ROW level compression. If you have a number of fixed-length data types such as datetime2, int, decimal, nchar etc. then ROW level compression can be benefical. ROW level compression changes the fixed-length to variable-length.
PAGE level compression include ROW level but adds a PAGE DICTIONARY; page dictionary introduces pointers, these pointers remove redundant data at page level. COLUMN PREFIXING can reuse values on the page for other columns.
Create and alter views
A view is simply a SELECT statement that can be simply queried or used within another query as a table.
The syntax to create a view is: CREATE VIEW [schema].[name]
Views can reference tables, other views and functions but cannot contain COMPUTE BY, COMPUTE, create or use temporary tables, reference a table variable, ORDER BY (unless the TOP operator is specified).
Views can have multiple SELECT statement as long as they use the UNION or UNION ALL operator.
VIEWS have the following options:
ENCRYPTION – this obfuscate the code; it can be easily reverse engineered.
SCHEMABINDING – this protects underlying objects which are dependent on the view from being altered.
VIEW_METADATA – this returns metadata for the view rather than the underlying tables.
VIEWS can also be updatable as long as the VIEW only references one table, columns are not derived from an aggregate, columns are not computed from a resulting UNION/UNION ALL/CROSS JOIN/EXCEPT or INTERSECT. The following clauses: HAVING, DISTINCT and GROUP BY must not affect the columns being updated. The TOP operator cannot be used. If your VIEW doesn’t meet the above requirement then consider a DML trigger with the INSTEAD OF option within the VIEW.
VIEWs also have a WITH CHECK option; this ensures that only the data retrieived can be updated.
VIEWS can also be partitioned (across tables) or distributed (across instances). A partitioned VIEW can bring together multiple tables into one; this is manual partitioning of tables. Partitioned VIEWs require ANSI_PADDING is turned on; ANSI_PADDING is the way data is stored i.e. are trailing zeroes or blanks added. A distributed VIEW is very similar to partitioned VIEWs but they exist across different instances. To reference distributed VIEWs you need to define linked servers and use the four part name: linkedserver.database.schema.view.
Basic VIEWs can also be indexed; the index must be clustered and unique. An indexed VIEW is materialised i.e. the VIEW data is stored and maintained by SQL Server thus queries against a VIEW do not get submitted to the query processor.
Create and alter indexes
These are nonclustered indexed which select a well-defined sub-set of data using the WHERE clause e.g.
CREATE NONCLUSTERED INDEX [index name] ON [schema.table] WHERE [column equals some filter]
Clustered index; known as a sorted table; i.e. the table is stored in the index. The column or columns which are used to define the clustered index should use the correct datatype. Simply creating a primary key column creates a clustered index.
Nonclustered index; known as a unsorted table or heap; nonclustered columns are contained within the indexes key but are just pointers to the actual data. When building a nonclustered index which will cover queries consider including all columns which appear in the SELECT, JOIN, WHERE, GROUP BY, HAVING etc. NOTE: nonclustered indexes will have a bookmark; if the table is a heap the bookmark is the ROW ID (RID); [FILE]: [PAGE]:[ROW]. If the table is a clustered index the bookmark is the clustered index keys.
The fill factor determines how much space each index should consume on each data page; frequently updated tables should have a fill factor between 0 and 100 to minimise page splits. NOTE: the lower the ratio of free space to actual consumed space will result in a index consuming large amount of disk space.
Included columns builds on top of the covered nonclustered index by allowing you to specify more than sixteen columns; the caveat is the included columns cannot be used for sort and filter operations. Their primary goal is to reduce the number of page reads.
CREATE NONCLUSTERED INDEX [index name] ON [schema.table] (column1) INCLUDE (column2, column3, etc.)
If indexes become fragmented then you’ll need to rebuild your indexes.
ALTER INDEX [index name] ON [schema.table] REBUILD; or ALTER INDEX ALL ON [schema.table] REBUILD;
Index stats can be viewed by querying the dynamic system view: sys.dm_db_index_usage_stats.
Create and modify constraints
Data can be validated using declarative or procedural integrity; declarative integrity is defined by constraints; you can do the same with RULES (avoid using RULES though).
Procedural integrity uses either stored procedures or triggers to check data before or after a DML statement has been issued.
Declarative integrity is implemented using a PRIMARY KEY, UNIQUE CONSTRAINT, FOREIGN KEY, CHECK or DEFAULT.
Primary keys and unique constraints identity a column or combination of columns. Primary keys and unique constraints have the same limitation as a KEY of an INDEX i.e. they cannot contain more than sixteen columns or 900 bytes thus a column with a data type(max) cannot participate in a UNIQUE CONSTRAINT.
The default index created for a PRIMARY KEY is a clustered and the UNIQUE CONSTRAINT is non-clustered; this behaviour can be changed by specifying CLUSTERED or NONCLUSTERED when defining the constraint. A unique constraint would also be NONCLUSTERED as it is used for integrity not queries.
You can query for PRIMARY KEYS and UNIQUE CONSTRAINTS via sys.key_constraints, sys.indexes
SELECT * FROM sys.indexes WHERE Name LIKE ‘PK%’ or ‘UQ%’
SELECT * FROM sys.key_constraints WHERE Type = ‘PK’ or ‘UQ’
FOREIGN KEY constraints manage referential integrity between tables or within a table. The following rules must be met for a FOREIGN KEY; columns must be the same data type as their PRIMARY reference (when the FOREIGN KEY is a string the collation must also be the same), the size limitation of the PRIMARY KEY also affects the FOREIGN KEY.
FOREIGN KEYS benefit from indexes because SQL will query a FOREIGN KEY for referential integrity and when JOINS are defined.
FOREIGN KEYS can be queries for via sys.foreign_key and sys.foreign_key_constraints.
The default behaviour of a FOREIGN KEY constraint and referential integrity it to rollback the transaction should their be a violation. The options are: NO ACTION (default), SET NULL, SET DEFAULT and CASCADE; these options can be defined for UPDATE and DELETE statements for each FOREIGN KEY reference. Cascade can be dangerous.
CHECK constraints are simple to implement, checked automatically and can improve performance but the errors are not user friendly and the check doesn’t check the exisitng value. The CHECK constraint only rejects values that evaluates to false.
CHECK and FOREIGN KEY constraints can help query performance because when the query optimiser finds a trusted FOREIGN KEY it knows that it is unnecessary to execute that particular part of the plan.
Trusted FOREIGN KEYS can be queries via sys.foreign_keys.
SELECT name, is_not_trusted FROM sys.foreign_keys WHERE is_not_trusted = 1
Implement data types
Filestream – The filestream data type is used to store Binary Large OBjects (BLOB) i.e objects which are at least 1MB in size. Filesteam stores the objects on the filesystem rather than in the database; this can greatly enhance read and write performance.
To enable Filestream:
1. Enable filestream via SQL configuration manager
Right clilck the SQL server service > Properties > FILESRTEAM TAB > Enable FILESTREAM for transact-SQL access > Apply > Ok
2. Configure filestream via management studio
EXEC sp_configure filestream_level_access, [1,2 or 3]
1. T-SQL filestream access only.
2. File system access to filestream.
3. Network access to filestream via network share.
Filestream requires its own filegroup and data file; this is where the filestream directory is stored.
The disk array which will contain the filestream data should be formatted with a 64KB file allocation unit size.
NOTE: when you delete filestream data via an UPDATE or DELETE statement the actual underlying data is not removed until the database checkpoint process runs; this in turn runs the garbage collection process.
Spatial – the spatial data types are geography and geometry; geography takes longitude and latitude coordinates and geometry takes two and three dimension coordinates.
To use these data types create a column in a table which uses one of the types e.g.
CREATE TABLE Places ( PlaceId INT, PlaceName VARCHAR(100), PlaceLocation Geography )
Geography or geometry data is inserted into the table like so:
INSERT INTO Places VALUES ( 'Some place', geography::Parse('POINT(x y)') );
INSERT INTO Places VALUES ( 'Some place', geography::STGeom );
Implement partitioning solutions
Partitioning improve write performance when bulk importing data because the bulk insert is being inserted into a table with no indexes defined. After the inserts are complete the table partition can be added to the main table.
Partitioning requires two objects: a partition function and a partition scheme.
Partition function: defines the start and end points i.e. row 0 > 10 or 0 > 9 for LEFT and RIGHT ranges respectively.
CREATE PARTITION FUNCTION PF(INT) AS RANGE [LEFT|RIGHT] FOR VALUES (10, 20 , 30...);
Partition scheme: defines which filegroup each partition lives on e.g. partitions which contain rarely accessed data can be placed on cheaper storage arrays whereas frequently accessed data could be placed on expensive storage arrays.
CREATE PARTITION SCHEME PS AS PARTITION PF TO ([Name of filegroup],[...]);
See the attached sql script for an example of how to add a normal (staging) table to a partitioned table, paying attention to the check constraint on the identity column of the staging table; this ensures the staging table cannot contain more rows than a single partition can hold.
The SPLIT function allows you to split an existing range creating a new PARTITION. The row number is the number of rows currently in the partitioned table e.g.
SELECT MAX(Column) As ROWNUMBER FROM PartitionedTable; ALTER PARTITION FUNCTION pf() SPLIT RANGE (ROWNUMBER)
The MERGE function allows you to drop a partition effectively merging the data in the dropped partition into another partition; the example below would create a partition of 10 > 30 if the partition was created as the partition above.
ALTER PARTITION FUNCTION pf() MERGE RANGE (20);
Partitioned views are a manual way of bringing multiple tables together; these tables are usually found on different filegroups. The ANSI padding i.e. the way the data is stored must be the same across all member tables of the view.
Partitioned views cannot be indexed.