Friday, December 23, 2011

The Cowboy or rogue DBA/Developer

“I don’t always test my code, but then I do, I use Production” (parody of the Most Interesting Man in the World Dos Equis commercial).

A lot has been written about version control and the process of promoting code through environments. (See coding Horror and When Database Source Control Goes Bad). And, in many shops the DBA will find the development/test environment, then (hopefully) a QA and/or stage environment then finally, their production environment. Along with these different environments there is (in good IT shops) a process that defines promotion of code from one to the next. That is,

• make changes to source code in dev\test

• Compile code and unit test

• Promote code to QA

• QA group performs integration and regressions testing

• QA either returns code to dev or promotes to UA or a stage environment

• UA group validates functional and non-functional requirements and either returns to development or a Release Manager schedules a code release or go-live date to perform the final promotion

For non-scripted code (C++, C#, java, etc), where the source code must be compiled into object code or an executable, promotion is straight forward and limits the damage a developer can do from one environment to the next. That is, in their development or test area, their C++ code is compiled then unit tested. The SOURCE code is not promoted but the compiled bits are. Even if the source is moved to each environment, the ability to compile is limited so is the ability to affect a more permanent change. This prevents code from being modified outside of the release cycle and environments from getting out of synch.

For scripting environment (and I consider SQL Server such a case) it is much more difficult. Even with a process in place as listed above, due to the nature of T-SQL, it is easy for someone with sufficient privileges to make a change directly to the database. For example, consider a complex procedure that creates and loads temporary tables for some type of row-by-row processing (another article down the road) and one of the temp tables is missing an index. In the development environment it may have been missed and not a performance issue due to a relatively low number of records in the table. However, when promoted to a QA or UA environment, and yes, even production where datasets are often much larger, a performance issue is encountered. Someone recognizes that the orders tab le needs a non-clustered index on the OrderDate column because the query is performing a table scan and reading every row of the 500 million row table.

At this point a defect should be raised and code sent back for remediation. This is where the rogue or cowboy DBA\developer comes into play. Rather than follow the proscribed process, the index is added, the test is “passed” and code promoted.

Now there’s a problem. Even if the environments are source controlled, the underlying databases are now out of sync. The index is not in source control and it’s not in the development environment. And in the “worst case” scenario, the release manager gives the green light for a release date and code gets promoted to production (without) the necessary index. In the meantime, a patch or additional feature release gets promoted to QA. Remember, the change was made outside of development and source control so the required index is not included. Therefore the code promotion overwrites the “fix” that was put in place. And then guess what? That’s usually the time production, to put it mildly, “goes to shit”. The web cart is not responding, pages are timing out and some exec is screaming “Heads will roll! How did this mess get promoted? Who approved this?”

Since the each environment is now synched, it’s hard to find the breakdown. Even worse, there’s still the possibility of that cowboy DBA doing one of the following (which usually involves making direct changes to production):

• Remembering that they added the index to QA and quietly goes into production and changing it in order to cover their tracks (yes, I’ve seen it done)

• Some other cowboy DBA recognizes that there’s a missing an index and decides to perform the quick fix themselves

For these 2 cases, we wind up with unsynchronized environments again. The “fix” is in production but not in version control, or any lower system. Therefore when the next promotion occurs, it is likely that whatever change was made in production will be overwritten again.

How can you combat the cowboy DBA? The simplest solution involves locking down each environment. That is, only the QA DBAs can make changes in QA. Only production or operations DBAs can make\promotes changes in production. This has a few limitations. One, it requires additional resources. The reality is that usually one or two DBAs are responsible for all the environments from dev through production so the likelihood of the cowboy DBA is higher.

Since most shops are small and companies can’t (or won’t) shell out for additional resources it then requires other solutions. Database Auditing and Audit Specification s can be a big help. While they can’t prevent changes altogether (after all SOMEONE has to make changes) they do provide tools to see WHO is making changes. They can be used to log who is modifying live data, who performed a DDL operation (alter table, alter\create procedure, etc). When these event don’t correspond to release dates, then there is data to “gently reprimand” the offending cowboy for violating the “proscribed release methods:

Monday, April 18, 2011

SQL Saturday 74

Update: Too much travel. Don't know where I am or what weeke it is. SQL Sat 74 is not THIS coming weekend but the FOLLOWING. April 30th.

Had a blast at SQL Saturday 63 in Dallas on the 2nd. Thanks to everyone who had both positive and negetive feedback. Taking it to Jacksonville this weekend and looking to take it up a notch! See you there!

What is this cache and how did it get so big?

Friday, March 4, 2011

SQL Server Denali tidbits

So yesterday I was using some DMVs and checking space allocated in tempdb for user object, internal objects and version store, the sys.dm_db_file_space_usage view in particular. With it you can see unallocated page extents and how many pages are used by user and internal objects and the version store. On a whim I ran the same query query on my Denali instance and got two extra columns! These are Total Page Count and Allocated Extent Page Count. Ahhh, this is even more useful!Even more useful, the SQL Server 2008 version ONLY returns page usage for tempdb regardless of the database it is executed from. This is cool because I want to see what user objects (temp tables, etc), internal objects (spools, hash joins) are consuming tempdb.
However, with SQL Server Denali, the sys.dm_db_file_space_usage works on ANY database.
Use AdventureWorksDenali

select * from sys.dm_db_file_space_usage

Use tempdb

select * from sys.dm_db_file_space_usage

So make the most of this view and keep an eye on what's being allocated in your databases, particularly tempdb.

Friday, February 11, 2011

Procedure cache, what is it?

SQL Server uses pooled memory. Within this pool SQL Server stores both data and execution plans. You execute a query, SQL Server looks through the procedure cache to see if a plan exists. If not, that query plan is added to the procedure cache, and references to the data in the table are stored in the remainder of the pool. From this you can see that as both the procedure cache and buffer pool grow you could start to run out of memory, or encounter memory pressure. So how big is your procedure cache?
SQL Server provides the dm_exec_chached_plans DMV that can be used to get details on cached plans.

select SUM( size_in_bytes )/1024/1024. [Cache Size in MB]
from sys.dm_exec_cached_plans
select refcounts, usecounts, 
size_in_bytes/1000./1000. [Obj Plan Size in MB],
from sys.dm_exec_cached_plans
order by size_in_bytes desc
These queries will give you the total size of your procedure cache and which plan\query is using the most.
Next blog in the series will be how poor design can bloat the cache and how to avoid it.

Thursday, February 10, 2011


I've been a consultant at a respectable Microsoft partner for the last few years. We make our living off our professionalism and reputation. With that in mind I've become more active in the SQL Server blog world in order to "increase my brand". There is a lot of information regarding SQL Server starting with Books Online (BOL) and several publications from Microsoft Press to WROX and O'Reilly. That said, while doing tech review of a training class I came across some content that was copied directly from BOL without citation. Even more astounding, one paragraph was found to have referenced someone elses blog directly. However, that blog itself was taken from a Microsoft technet article. (Fortunately not a well known blogger in the SQL community).
In oder to make what could be a long blog entry short I just want to say, "Come on folks, if it's not your idea or thoughts, CITE IT!". A good example would be when recommending an index strategy. The first line is a best practice you could recommend, then follow up with a reference to SUPPORT your idea.
"Because a copy of the data is stored at the leaf level it is a best practice to avoid using wide columns as the key column of a non-clustered index and use them as part of an INCLUDE clause.
'Redesign nonclustered indexes with a large index key size so that only columns used for searching and lookups are key columns. Make all other columns that cover the query included nonkey columns. In this way, you will have all columns needed to cover the query, but the index key itself is small and efficient.' 1 MSDN index guidlines"
As opposed to
"Because a copy of the data is stored at the leaf level it is a best practice to avoid using wide columns as the key column of a non-clustered index and use them as part of an INCLUDE clause. Therefore, redesign nonclustered indexes with a large index key size so that only columns used for searching and lookups are key columns. Make all other columns that cover the query included nonkey columns. In this way, you will have all columns needed to cover the query, but the index key itself is small and efficient."

Keep blogging and stay professional!

Friday, January 28, 2011

A pet peeve

SQL Server has over 400 wait stats available via the sys.dm_os_wait_stats dmv. The most common are fairly documented. However, try to find some help if you see high counts of OS_PREEMPTIVE_ENCRYPT_MESSAGE. You'll find PLENTY of "help" sites that will tell you "SQL Server has all these new cool waits" then list each and everyone of them but no explanation.

In short, if you don't know what something is, don't post it in your blog. So to avoid being my OWN pet peeve, here's at least what I discovered regarding the wait type mentioned above. Client uses SSL connections with their JDBC drive but NOT with SQL Server. Therefore, when SQL returns a result to the application, it makes an OS call to encrypt the data using the server's certificate rather than SQL Server's certificate.

Thursday, January 13, 2011

SQL Server Denali: SEQUENCES

I'm primarily a SQL Server DBA although I've worked with Sybase, MySQL and Oracle throughout my career. As such, in my current consutling role, I'm often called upon to work with companies when migrating from one platform to SQL Server. When migrating Oracle to SQL Server some functionality does not map. For intsance, Oracle's SEQUENCE tables. While SQL Server does have IDENTITY columns the functionality is not the same. Therefore, in order to do a full migration, you must implement your own versions of Oracle's NEXTVAL and CURVAL functions.

One of the new features in SQL Server Denali is the SEQUENCE table.
The functionality is very simlar to Oracle's implementation. It has min and max values, a start with value and an increment and of course, a NEXT VALUE FOR. The SEQUENCE can also be cached. That is it will reserve the next N values for the SEQUENCE in memory. When the max value is reached, or the cache is exhausted, it generates the next N values.

I'll provide some sample code and use cases for this construct in my next post.

Wednesday, January 5, 2011

SQL Denali: New Feature

I was going through the CTP of Denali today. That's the upcoming release of SQL Server. One of the new T-SQL enhancements in the engine is PAGINATION. That is, scrolling or limiting results returned to the application. I first encountered issues with this in 2006 when migrating a PHP web app with a MySQL backend to SQL Server 2000. The developers had used the MySQL construct LIMIT to restrict the rows returned to the application.

SELECT Customer_LastName, Customer_FirstName, City, State
FROM Customers
WHERE Customer_LastName like 'bren%'

This would return only the first 5 rows. The developer asked how to do this with SQL Server 2000. It wasn't easy but using a combination of TOP and ORDER BY along with subqueries we go what we wanted. A year later, we upgraded to SQL Server 2005 and were able to take advantage of the ROW_NUMBER() OVER() function.

SELECT * from (
SELECT ROW_NUMBER() OVER( ORDER BY Customer_LastName) as ROWID, Customer_LastName, Customer_FirstName from Customers where Customer_LastName like '%bre%'

Now comes the OFFSET CLAUSE in ORDER BY.
This first example returns all rows AFTER the first 5.

SELECT DepartmentID, Name, GroupName
FROM HumanResources.Department

This example returns rows 1 through 5

SELECT DepartmentID, Name, GroupName
FROM HumanResources.Department
ORDER BY DepartmentID

Much simpler! Looking forward to adding new Denali features (HADRON and readable mirros\replicas, and SEQUENCES (sound familiar you Oracle peeps?).

Tuesday, January 4, 2011

Tuning and Algorithm Analysis Part II

In part one of this series I gave a quick, high level explanation of Algorithm Analysis and "Big O", O(n), notation. Today I will continue and show the impact iterative logic can have on a database, particularly within a lengthy, complex stored procedure.

I'll use a design pattern I see frequently when working with clients on slow queries. That is, the WHILE loop. I've come across many people who say "We know they're bad so we don't use CURSORS". Well, a WHILE loop is just as bad.

So let me set up the first example.
The task was to update the location for "units" under an organization. It was performed in 2 separate WHILE loops.

CREATE TABLE #hierarchy( rowid int identity(1,1) not null, UnitId, etc)

WHILE @hasdependent = 'TRUE'
INSERT INTO #hierarchy
SELECT Col1, Col2, etc
WHERE ParentID = ChildId
SELECT @NumRows = @NumRows + 1

WHILE @LoopCnt <= @NumRows BEGIN SELECT @UNIT_ID = UnitId from #hierarchy WHERE rowid = @LoopCnt UPDATE UnitTbl SET Location = @LOC WHERE UnitId = @UNIT_ID SELECT @LoopCnt = @LoopCnt + 1 END So a quick look at this algorithm shows that the first loop executes N times for the number of units under a given Org. The second loop will then execute N times resulting in a O(N+N) or O(2N). Unfortunately it did not stop there. The #hierarchy temp table was used in several other loops resulting in more like a O(M*N). So even if the SELECT that populates the temp table and the UPDATE use an index seek, the nature of the algorithm itself is inefficient and leads to excessive reads and writes. But to actually make this more efficient, the #hierarchy would typically have several thousand rows and it's not indexed. Now you add table scans degrading this further. Running their query with the SHOW CLIENT STATISTICS and the SET STATISTICS IO ON revealed the following:
Table '#hierarchy________________________________
Scan count 10, logical reads 10, physical reads 0
Table 'b_unit_equipment_item'.
Scan count 15, logical reads 23, physical reads 6
Table '#hierarchy________________________________
Scan count 10, logical reads 2, physical reads 0,
Table '#hierarchy________________________________
Scan count 10, logical reads 1, physical reads 0,
Table '#hierarchy________________________________
Scan count 10, logical reads 2, physical reads 0,
Table '#hierarchy________________________________
Scan count 10, logical reads 1, physical reads 0,
Table '#hierarchy________________________________
Scan count 1, logical reads 3, physical reads 0,

The end result of this is an inefficient algorithm combined with poor query design and for each iteration, the procedure is incurring scans and ultimately physical I\O. By replacing the first loop with a Common Table Expression (CTE) to build the hierarchy and changing the second loop to a single update that portion of the procedure the plan and I\O was reduced to this:
Table 'b_unit_equipment_item'.
Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

In conclusion, tuning a poor performing query can involve more than just looking for bad indexes. Look for inefficient algorithms and excessive I\O. These 3 items can really create a performance bottleneck.

Monday, January 3, 2011

Tuning and Algorithm Analysis Part I

As database professionals at one time or another we are faced with tuning a poorly performing query or procedure. The Database Tuning Advisor (DTA) and profiler are both useful tools in finding missing indexes, poor joins choices and scans. However, many times it's more the design or algorithm of a lenghty procedure that is affecting performance. The most common culprit is iterative logic vs. set based logic. And that's the focus of this series of posts: Algorithm Analysis. A poor algorithm is just as bad for a query (or any program) as a table scan.

So part 1 of this series is a brief explanation of algorithm analysis.

Algorithm Notation
The most common representation of algorithm efficiency is called big O or
O(N) notation. You might see an algorithm as O(N2) or O(NlogN), etc. You may also see what's called big Theta or Θ (N). Big O is typically an upper bound and big Theta is a lower bound.

Algorithm derivation
So what determines the efficienncy of an algorithm? I'll discuss a brute force method using the bubble sort algorithm. For simplicity, I'll use standard pseudo code

    procedure bubbleSort( A : list of sortable items )
  1. n = length(A) - 1

  2. for (i = 0; i <= n; i++)

  3.     for (j = i+1; j<=n; j++)

  4.        if A[j-1] > A[j] then

  5.           swap(A[j-1], A[j])

  6.       end if

  7.     end for

  8. end for

  9. end procedure

We'll consider each line an atomic action and count the number of time executed. We'll also assume for this analysis that n=10.
Line 1 is executed once
Lines 2 and 8 executed n, or 10, times.
For each iteration of n, lines 3 through 7 are executed n-1 times.

This gives a total execution of n * (n-1) times. As N increases, we approach an upper bound of N2. For example: N= 100 then N * (N-1) = 100 * 99. For N= 1000, 1000*999.

That means as the input size doubles, the execution increases exponentially.
N=10, executions = 100
N=20, executions = 400
N= 40, executions = 1600

So regardless if this is C++, java or T-SQL , this would be an inefficient algorithm.

Part II of this series will discuss the these types of algorithms, commonly referred to as RBAR (or Row By Agonizing Row) and their impact on the database.