SQL Server Best Practices

Tips for writing high performance programming and scripting in SQL Server

1. Normalize your tables: There are two common excuses for not normalizing databases: performance and pure laziness. Do not optimize what's not slow for performance. DBMS’s were designed to be used with normalized databases and SQL Server is no exception, so design with normalization in mind.

2. Avoid using cursors: Use cursors wisely. Cursors are fundamentally evil. They force the database engine to repeatedly fetch rows, negotiate blocking, manage locks, and transmit results. They consume network bandwidth as the results are transmitted back to the client, where they consume RAM, disk space, and screen real estate. Consider the resources consumed by each cursor you build and multiply this demand by the number of simultaneous users.

a. Don't use scrollable cursors unless required
b. Use read-only cursors if you do not intend to update. This would be 90% of the situations.
c. Try to use Forward-Only cursor when using cursors
d. Don’t forget to close and de-allocate the cursors used.
e. Try to reduce the number of columns and records fetched in a cursor

3. Index Columns: Create Index on columns that are going to be highly selective. Indexes are vital to efficient data access; however, there is a cost associated with creating and maintaining an index structure. For every insert, update and delete, each index must be updated. In a data warehouse, this is acceptable, but in a transactional database, you should weigh the cost of maintaining an index on tables that incur heavy changes. The bottom line is to use effective indexes judiciously. On analytical databases, use as many indexes as necessary to read the data quickly and efficiently.
For example: DONOT index a column like "Gender". This would have a selectivity of 50% and if your table is having 10 Million records, you can be least assured that using this index you may have to travel half the number of rows ... Hence maintaining such indexes can slow your performance.

4. Use transactions: Use transaction judiciously. This will save you when things get wrong. Working with data for some time you'll soon discover some unexpected situation which will make your stored procured crash. See that the transaction starts as late as possible and ends as early as possible. This would reduce the requirement to lock down the resources while accessing.

5. Analyze deadlock: Access your tables on the same order always. When working with stored procedures and transactions, you may find this soon. Any SQL programmer / database analyst would have come across this problem. If the order changes then there would be a cyclic wait for resources to be released and the users would experience a permanent hang in the application. Deadlocks can be tricky to find if the lock sequence is not carefully designed. To summarize, Deadlock occurs when two users have locks on separate objects and each user is trying to lock the other user's objects. SQL Server automatically detects and breaks the deadlock. The terminated transaction will be automatically rolled back and an error code 1205 will be issued.

6. GOTO Usage: Avoid using the infamous GOTO. This is a time-proven means of adding disorder to program flow. There are some cases where intelligent use of GOTO is preferable to strictly refusing to use it. On the other hand, unintelligent use of GOTO is a quick ticket to unreadable code.

7. Increase timeout: When querying a database, the default timeout is often low, like 30 seconds. Remember that report queries may run longer than this, especially when your database grows. Hence increase this value to an acceptable value.

8. Avoid NULLable columns: When possible, normalize your table and separate your nullable columns. They consume an extra byte on each NULLable column in each row and have more overhead associated when querying data. It will be more flexible and faster, and will reduce the NULLable columns. I believe they can simplify coding when "missing data" is part of your business rules.

9. TEXT data type: Unless you are using it for really large data. The TEXT data type is not flexible to query, is slow and wastes a lot of space if used incorrectly. Sometimes a VARCHAR will handle your data better. You can also look at the "text in row" feature with the table options for SQL Server . But still I would stick to the first statement, Avoid using them on first place.

10. SELECT * Usage: It’s very difficult to get out of this habit, but believe me this is very essential. Please DONOT use this syntax. Always qualify the full list of columns. Using all columns increases network traffic, requires more buffers and processing, and could prove error prone if the table or view definition changes.

11. Temporary tables’ usage: Unless strictly necessary. More often than not a sub query can substitute a temporary table. In SQL Server , there are alternatives like the TABLE variable data type which can provide in-memory solutions for small tables inside stored procedures too. Some of the advantages of using the same:
a. A table variable behaves like a local variable. It has a well-defined scope, which is the function, stored procedure, or batch in which it is declared. Within its scope, a table variable may be used like a regular table.
b. However, table may not be used in the following statements: INSERT INTO table_variable EXEC stored_procedure SELECT select_list INTO table_variable statements.
c. Table variables are cleaned up automatically at the end of the function, stored procedure, or batch in which they are defined.
d. Table variables used in stored procedures result in fewer recompilations of the stored procedures than their counterparts temporary tables.
e. Transactions involving table variables last only for the duration of an update on the table variable. Thus, table variables require less locking and logging resources.

12. Using UDF: UDF can replace stored procedures. But be careful in their usage. Sometimes UDFs can take a toll on your applications performance. And UDFs have to prefix with the owners name. This is not a drawback but a requirement. I support usage of SPs more than UDFs.

13. Multiple User Scenario: Sometimes two users will edit the same record at the same time. While writing back, the last writer wins and some of the updates will be lost. It's easy to detect this situation: create a timestamp column and check it before you write. Code for these practical situations and test your application for these scenarios.

14. Use SCOPE_IDENTITY: Do not do SELECT max (ID) from MasterTable when inserting in a Details table. This is a common mistake, and will fail when concurrent users are inserting data at the same instance. Use one of SCOPE_IDENTITY or IDENT_CURRENT. The best choice would be SCOPE_IDENTITY as this would give you the identity value from the current context in perspective.

15. Analyze Query Plans: The SQL Server query analyzer is a powerful tool. And surely is your friend and you'll learn a lot of how it works and how the query and index design can affect performance through it. Understand the execution plan that the execution plan window shows for potential bottlenecks.

16. Parameterized queries: Parameterize all your queries using the sp_executesql. This would help the optimizer to cache the execution plans and use the same when requested the second time. You can cache-in the time required to parse, compile and place the execution plan. Avoid using of D-SQL as much as possible.

17. Keep Procedures Small: Keep SPs small in size and scope. Two users invoking the same stored procedure simultaneously will cause the procedure to create two query plans in cache. It is much more efficient to have a stored procedure call other ones then to have one large procedure.

18. Bulk INSERT: Use DTS or the BCP utility and you'll have both a flexible and fast solution. Try avoiding use of Insert statement for the Bulk loading feature, they are not efficient and are not designed for the same.

19. Using JOINS: Make sure that there are n-1 join criteria if there are n tables. Make sure that ALL tables included in the statement are joined. Make sure that only tables that
a. Have columns in the select clause
b. Have columns referenced in the where clause
c. Allow two unrelated tables to be joined together are included.

20. Trap Errors: Make sure that the @@ERROR global variable is checked after every statement which causes an update to the database (INSERT, UPDATE, and DELETE). Make sure that rollbacks (if appropriate) are performed prior to inserting rows into an exception table.

21. Small Result Set: Retrieving needlessly large result sets (for example, thousands of rows) for browsing on the client adds CPU and network I/O load, makes the application less capable of remote use, and limits multi-user scalability. It is better to design the application to prompt the user for sufficient input so queries are submitted that generates modest result sets.

22. Negative Arguments: Minimize the use of not equal operations, <> or !=. SQL Server has to scan a table or index to find all values to see if they are not equal to the value given in the expression. Try rephrasing the expression using ranges:
WHERE KeyColumn < 'TestValue' AND KeyColumn > 'TestValue'

23. Date Assumption: Prevent issues with the interpretation of centuries in dates, do not specify years using two digits. Assuming dates formats is the first place to break an application. Hence avoid making this assumption.

24. SP_ Name: DONOT start the name of a stored procedure with SP_. This is because all the system related stored procedures follow this convention. Hence a valid procedure today may clash with the naming convention of a system procedure that gets bundled with a Service pack / Security patch tomorrow. Hence do not follow this convention.

25. Apply the latest Security Packs / Service Packs: Even though this point applies to the network and the database administrators, it is always better to keep up-to date on the software’s. With the "slammer" virus and many more still outside, it is one of the best practices to be up-to date on the same. Consider this strongly.

26. Using Count(*): The only 100 percent accurate way to check the number of rows in a table is to use a COUNT(*) operation. The statement might consume significant resources if your tables are very big because scanning a large table or index can consume a lot of I/O. Avoid using count(*) to the maximum. Use short circuiting methods as EXISTS etc. Here is one other way you can find the total number of rows in a table. SQL Server Books Online (BOL) documents the structure of sysindexes; the value of sysindexes.indid will always be 0 for a table and 1 for a clustered index. If a table doesn't have a clustered index, its entry in sysindexes will always have an indid value of 0. If a table does have a clustered index, its entry in sysindexes will always have an indid value of 1.
SELECT object_name(id) ,rowcnt
FROM sysindexes
WHERE indid IN (1,0) AND OBJECTPROPERTY(id, 'IsUserTable') = 1

27. Ownership Chaining: Try using this feature (available from SQL Server  SP3), for permission management within a single database. Avoid using this feature to manage permissions across database.

28. SQL Injection: Security has been a prime concern for everyone. Hence validate all the incoming parameters at all levels of the application. Limit the scope of possible damage by permitting only minimally privileged accounts to send user input to the server.

29. Fill-factor: The 'fill factor' option specifies how full SQL Server will make each index page. When there is no free space to insert new row on the index page, SQL Server will create new index page and transfer some rows from the previous page to the new one. This operation is called page splits. You can reduce the number of page splits by setting the appropriate fill factor option to reserve free space on each index page. The fill factor is a value from 1 through 100 that specifies the percentage of the index page to be left empty. The default value for fill factor is 0. It is treated similarly to a fill factor value of 100, the difference in that SQL Server leaves some space within the upper level of the index tree for FILLFACTOR = 0. The fill factor percentage is used only at the time the index is created. If the table contains read-only data (or data that very rarely changed), you can set the 'fill factor' option to 100. When the table's data modified very often, you can decrease the 'fill factor' option to 70 percent, for example. Having explained page splits in detail I would warn you in over looking at this point because more free space means that SQL Server has to traverse through more pages to get the same amount of data. Hence try to strike a balance and arrive at an appropriate value.

30. Analyze Blocking: More often than not any implementer’s nightmare would be to see a blocking process. Blocking occurs when a process must wait for another process to complete. The process must wait because the resources it needs are exclusively used by another process. A blocked process will resume operation after the resources are released by the other process. Sometimes this can become cyclic and the system comes to a stand still. The only solution is to analyze your indexing strategy and table design. Consider these points strongly.

31. Avoid Un-necessary Indexes: Avoid creating un-necessary indexes on table thinking they would improve your performance. Understand that creating Indexes and maintaining them are overheads that you incur. And these surely do reduce the throughput for the whole application. You can create a simple test on a large table and find it for yourself how multiple indexes on the same column decrease performance.

32. Consider Indexed Views: Sometimes we would require an view to be indexed. This feature is bundled with SQL Server . The result set of the indexed view is persist in the database and indexed for fast access. Because indexed views depend on base tables, you should create indexed views with SCHEMABINDING option to prevent the table or column modification that would invalidate the view. Hence using them can reduce a lot of load on the base tables but increases the maintainability.

33. WITH SORT_IN_TEMPDB Option: Consider using this option when you create an index and when tempdb is on a different set of disks than the user database. This is more of a tuning recommendation. Using this option can reduce the time it takes to create an index, but increases the amount of disk space used to create an index. Time is precious, disk is cheaper.

34. Reduce Number of Columns: Try to reduce the number of columns in a table. The fewer the number of columns in a table, the less space the table will use, since more rows will fit on a single data page, and less I/O overhead will be required to access the table's data. This should be considered strongly by applications that talk across different machines. More the unwanted data passed more is the network latency observed.

35. Write SET NOCOUNT ON in top of the procedure.

36. Every SELECT statement should return with WITH NOLOCK key word.

Get latest updates on Microsoft Technology @ http://stsadm2010.blogspot.com

By Vivek Jagga   Popularity  (3993 Views)
Picture
Biography - Vivek Jagga
SharePoint Developer
For latest updates view My Blog