Tuesday, October 12, 2004

SQL Turbocharging

To really make your pages fly, depends on a lot of different things, such as indexing, table relationships, and good sql statements.

Normally we have two starting points, whether you are creating a table from scratch or whether you are working on existing code.

Let's say you are working with an existing database, and that you want the queries for this specific application to work fast.

First thing, you have to consider is the freshness of data needs. How fresh does the data have to be? This information, can help shape the coldfusion side of your queries, whether you use caching or not.

Second thing, is the control of the datetype enforcement of incoming variables to be applied to your query. For security it always pays to enforce the specific data type, either by using cfqueryparam, or other means.

Third thing, what is the logic of the data you are grabbing, are you looping from 1 query to another query? This is ALWAYS a bad practice for performance.

One of the hardest things I learned, is that you should combine the two queries, instead of doing the looping. But it will take patience and practice to formulate the query so as to return the same data.

Remember each ODBC/JDBC connection to the database adds a hit of time,threads and performance. Less queries is always better.

Now you have this query, now we want to make sure the tables are prepared to deliver the best results.

I always start by making sure tables have a primary key that is also clustered.

You can run the below script to get a list of tables, with primary keys, and whether or not they are clustered or not.

select i.table_name, cu.column_name, tc.constraint_name, case objectproperty(object_id(i.table_name), 'tablehasclustindex') when 0 then 'no' when 1 then 'yes' end as [has clustered index]
from information_schema.table_constraints tc
inner join information_schema.constraint_column_usage as cu on tc.constraint_name = cu.constraint_name
full outer join information_schema.tables i on tc.table_name = i.table_name
where objectproperty(object_id(i.table_name), 'isusertable') = 1
order by i.table_name asc, [has clustered index]

It's a great script for documenting the indexes of all tables in your database.

Now that the tables have primary keys that are clustered, then you look at additional indexes, to make sure the joins are indexed.

For example if you are always WHEREING against a username, but it's not the primary key, then you'd want to make sure it is indexed. Now only the primary key should be clustered, it also does pay to use a easy to understand name of each index.

Once all tables have indexes, that are using in your joining queries, then it is time to start creating relationships.

Creating Relationships in SQL Server can be a painful process, but also a great tool to help improve performance.

You start by specifying a relationship that has no referrential integrity. Referrential Integrity means whether it is strict in making sure that when table 1 is updated, table 2 has to be updated right after. You should consult with a dba, if you want to make sure you are making the right decision for the type of relationship.

But I always start out with a non-enforcing relationship, because that takes less effort in maintenance, and can cause less problems.

Now you don't want to create 5 billion relationships, but you want to look through all the main queries of your applications, for what joins that are commonly used.

So now your tables have primary keys, that are clustered, and have indexes for joined queries, and have relationships with joined tables.

The Database is now setup to deliver speedier data.

Now it is up to how you query that database.

SQL Server is different from Oracle, in that your from statement, the order of tables in it, can help/hinder performance.

In SQL Server 2000, the big thing is your where clauses and your joins.

This is where we take out the SQL Query Analyzer, turn on Show Explanation Plan, grab a sample query, and looking step by step.

An explanation plan is a powerful and graphical way of looking of how SQL Server plans to deliver the data you requested. I start out by looking for anything that takes more than 20% in resources, I look for table scans, or any sign, that a query isn't performing the way I want to.

Let's say you have two tables, Table A has 100,000 records, and Table B has 200 records. Now obviously if you have to do a join, you don't want to scan thru all 100,000 records from Table A, just to get 2 matching records from Table B?

You have to play with the joins, trying subqueries, or in statements, or exists, to create the query that delivers the right data, without unnecessary data scanning.

It pays to think about what kind of data and how much data you need first, then create the query to grab that data...

This is one thing that takes experience, and even I, still have a lot more to learn.

I hope this has helped by putting together all the main ingredients to improving performance.

After all, what's the point of creating a nice web application, if it can't load for the user, or it keeps crashing your site?

Later...

2 comments:

  1. Anonymous10:43 AM

    Please stop posting this rubbish.

    ReplyDelete
  2. This is not rubbish, this is practical information to help make your pages run faster.

    It takes detail work, on code, tables, databases to get everything to work together well...

    ReplyDelete