Getting Started with Business Intelligence: Data Warehouse (T-SQL, Indexes, Isolation Levels and Performance)

Hello.

When we’re talking about Business Intelligence (BI), we straightly imagine all Kimball concepts, star-schema, the famous slow changing dimensions among others, but we forget where it all begins. Today I will talk about some basic concepts (but quite important) in database concepts before things can start serious. The key points for this post are T-SQL, Indexes, Isolations Levels and Performance.

If you know how to answer the questions below, probably you should jump straight away to my next post I am going to post regarding this category.

Review questions:

What’s the difference between a Clustered and Non-Clustered index?
What’s a covered index?
What means NOLOCK in a query?
What’s the difference between Read Committed and Serialisable?
How do you measure the performance of a query?

Introduction to SQL

Transact SQL Statements

DELETE – Removes one or more rows from a table or view in SQL Server.DELETE FROM table_name WHERE condition;
TRUNCATE – Removes all rows (content) from a table or specified partitions of a table, without logging the individual row deletions. TRUNCATE TABLE is similar to the DELETE statement with no WHERE clause; however, TRUNCATE TABLE is faster and uses fewer system and transaction log resources.TRUNCATE table_name
DROP TABLE – Removes (for ever) one or more table definitions and all data, indexes, triggers, constraints, and permission specifications for those tables. A dropped database can be re-created only by restoring a backup. You can also be more specific in your removal and drop for example a specific column or index.
INSERT– Adds one or more rows to a table or a view in SQL Server.
MERGE – Performs insert, update, or delete operations on a target table based on the results of a join with a source table. For example, you can synchronize two tables by inserting, updating, or deleting rows in one table based on differences found in the other table.

Indexes

Indexes are special lookup tables that the database search engine can use to speed up data retrieval. Simply put, an index is a pointer to data in a table. An index in a database is very similar to an index in the back of a book.

For example, if you want to reference all pages in a book that discusses a certain topic, you first refer to the index, which lists all the topics alphabetically and are then referred to one or more specific page numbers.

An index helps to speed up SELECT queries and WHERE clauses, but it slows down data input, with the UPDATE and the INSERT statements. Indexes can be created or dropped with no effect on the data.

Creating an index involves the CREATE INDEX statement, which allows you to name the index, to specify the table and which column or columns to index, and to indicate whether the index is in an ascending or descending order.

Indexes can also be unique, like the UNIQUE constraint, in that the index prevents duplicate entries in the column or combination of columns on which there is an index.

Composite Indexes

A composite index is an index on two or more columns of a table. Its basic syntax is as follows.

CREATE INDEX index_name
on table_name (column1, column2);

Whether to create a single-column index or a composite index, take into consideration the column(s) that you may use very frequently in a query’s WHERE clause as filter conditions.

Should there be only one column used, a single-column index should be the choice. Should there be two or more columns that are frequently used in the WHERE clause as filters, the composite index would be the best choice.

Implicit Indexes

Implicit indexes are indexes that are automatically created by the database server when an object is created. Indexes are automatically created for primary key constraints and unique constraints.

When should indexes be avoided?

Although indexes are intended to enhance a database’s performance, there are times when they should be avoided.

The following guidelines indicate when the use of an index should be reconsidered.

Indexes should not be used on small tables.
Tables that have frequent, large batch updates or insert operations.
Indexes should not be used on columns that contain a high number of NULL values.
Columns that are frequently manipulated should not be indexed.

Types of indexes

A table or view can contain the following types of indexes:

Clustered
- Clustered indexes sort and store the data rows in the table or view based on their key values. These are the columns included in the index definition. There can be only one clustered index per table, because the data rows themselves can be stored in only one order.
- The only time the data rows in a table are stored in sorted order is when the table contains a clustered index. When a table has a clustered index, the table is called a clustered table. If a table has no clustered index, its data rows are stored in an unordered structure called a heap.
Nonclustered
- Nonclustered indexes have a structure separate from the data rows. A nonclustered index contains the nonclustered index key values and each key value entry has a pointer to the data row that contains the key value.
- The pointer from an index row in a nonclustered index to a data row is called a row locator. The structure of the row locator depends on whether the data pages are stored in a heap or a clustered table. For a heap, a row locator is a pointer to the row. For a clustered table, the row locator is the clustered index key.
- You can add nonkey columns to the leaf level of the nonclustered index to by-pass existing index key limits, and execute fully covered, indexed, queries. For more information, see Create Indexes with Included Columns. For details about index key limits see Maximum Capacity Specifications for SQL Server.

Both clustered and nonclustered indexes can be unique. This means no two rows can have the same value for the index key. Otherwise, the index is not unique and multiple rows can share the same key value. For more information, see Create Unique Indexes.

Indexes are automatically maintained for a table or view whenever the table data is modified.

There are special cases:

Covered queries and covering indexes are different, yet closely related. A query is covered if all the columns it uses come from one or more indexes. These columns include the columns you want the query to return as well as columns in any JOIN, WHERE, HAVING, and ORDER BY clause. A covered query typically is considered advantageous because data access through indexes can be more efficient. However, the high-speed access that this kind of query facilitates can become costly when you update the table because you must maintain the indexes.
A covering index—which is used in covered queries—can provide some or all of the indexed columns that a covered query uses. If a covering index is also a composite index (i.e., it indexes more than one column in its base table or view), it might contain columns that aren’t used in the covered query but are used instead in other queries that have overlapping columns.

Isolation Levels

Following are the different types of isolations available in SQL Server.

READ COMMITTED
In select query it will take only commited values of table. If any transaction is opened and incompleted on table in others sessions then select query will wait till no transactions are pending on same table.Read Committed is the default transaction isolation level.
READ UNCOMMITTED aka NOLOCK
If any table is updated(insert or update or delete) under a transaction and same transaction is not completed that is not committed or roll backed then uncommitted values will displaly(Dirty Read) in select query of “Read Uncommitted” isolation transaction sessions. There won’t be any delay in select query execution because this transaction level does not wait for committed values on table.The NOLOCK hint allows SQL to read data from tables by ignoring any locks and therefore not being blocked by other processes.
REPEATABLE READ
Select query data of table that is used under transaction of isolation level “Repeatable Read” can not be modified from any other sessions till transcation is completed.
SERIALIZABLE
Serializable Isolation is similar to Repeatable Read Isolation but the difference is it prevents Phantom Read. This works based on range lock. If table has index then it locks records based on index range used in WHERE clause(like where ID between 1 and 3). If table doesn’t have index then it locks complete table.
SNAPSHOT
Snapshot isolation is similar to Serializable isolation. The difference is Snapshot does not hold lock on table during the transaction so table can be modified in other sessions. Snapshot isolation maintains versioning in Tempdb for old data in case of any data modification occurs in other sessions then existing transaction displays the old data from Tempdb.

Measure a SQL query performance

When running workshops on the subject of SQL Query Performance the first question I always ask is how do we measure the performance of a query? Or to look at it another way – how do we measure the resources it consumes?

Attendees have come up with a range of answers from “With a stopwatch” (which I like a lot) to the slightly more technical “Using Profiler”. I like the first answer just because that’s where we all start. We run something from an application and we literally time how long it takes, or we run something in SSMS and we use the counter near the bottom right of our query window to tell us how long it took. We may even run something 100 times in a loop and capture the overall time so we can take an average.

SQL Server Profiler is also good when trying to capture what is actually happening in a given environment.
When we’re working on a query in SSMS there is an easier, more accurate, and more detailed approach than any of the others available, which is to use the following two commands:

SET STATISTICS TIME ON;
Your query here
SET STATISTICS TIME OFF;

This command tells SSMS to capture the consumed CPU time and elapsed time related to any queries you subsequently run in that query window. The output is sent to the Messages tab.

SET STATISTICS IO ON;
Your query here
SET STATISTICS IO OFF;

This command tell SSMS to capture the amount of data read from disk or memory by any subsequent queries in that window. Again the output is sent to the Messages tab.

I hope this helps you to understand and trigger your best BI side.

Thank you.

Kind regards,
Rafael

5/5 - (1 vote)

#RMC #walkthrough

Rafael Martins Cardoso

IT Service Management | SharePoint and Powershell | SQL Server and Oracle Database | Tableau, SAP Business Objects, Microsoft SQL Reporting Services and PowerBI | Business Intelligence | Web services