Friday, January 31, 2014

Self documentation and extended properties - Building a Data Dictionary

In many of the environments I've worked, the amount of documentation around databases, schemas and objects has been, how shall I say it, somewhat limited. Most of the "self documentation" that I've seen has been within stored procedures and functions explaining purpose and reason for the code.
An example (from the AdventureWorks database)  follows.

CREATE PROCEDURE [dbo].[uspGetManagerEmployees]


@BusinessEntityID [int]
AS

BEGIN
SET NOCOUNT ON;

-- Use recursive query to list out all Employees required for a particular Manager

WITH [EMP_cte]([BusinessEntityID], [OrganizationNode], [FirstName], [LastName], [RecursionLevel]) -- CTE name and columns

AS (

SELECT e.[BusinessEntityID], e.[OrganizationNode], p.[FirstName], p.[LastName], 0 -- Get the initial list of Employees for Manager n

FROM [HumanResources].[Employee] e

INNER JOIN [Person].[Person] p

   ... (code omitted for brevity)

UNION ALL

SELECT e.[BusinessEntityID], e.[OrganizationNode], p.[FirstName], p.[LastName], [RecursionLevel] + 1 -- Join recursive member to anchor

... (code omitted for brevity)

)

-- Join back to Employee to return the manager name

... (code omitted for brevity)


What about documentation for other schema objects? Are there data dictionaries? Many places I've worked, or consulted, may have some external documentation including ER diagrams and data modeling but that's it. I've found very little internal documentation on database objects. Ask a question like "what are all the transactional tables" and you'll get an answer like "Go ask Bill. He designed that". There's a lot of information and documentation that's in someone's brain.


A recent issue at work brought this to light. We were migrating a hub and spoke model database to a multi-tenant model database. It's not as simple as a "backup-copy-restore" process as we needed to preserve the data in both databases. Add to the complexity is that both databases use identity columns for the primary keys. That means when migrating to the multi-tenant db those id columns will change. The due-diligence was performed, data migrated and boom. It all blew up. Emails sent to the migrated client used the wrong client template.


Eyebrows were raised, questions asked, "How did this happen?". "How can we prevent this from happening again?"


Well, there is a way. But it requires SOME knowledge of the schema. It's possible to query the metadata from the DMVS to get tables that have foreign keys, the tables that reference those foreign key tables, the columns referenced and the primary keys. Then, create a checksum on the non-data fields (dates tend to change frequently and will change the checksum) and compare the rows for deltas. But wait...
Problem #1 - Because the identity values changed, every row came back as a delta. That's thousands of rows to check manually.
Problem number #2 - What were we actually trying to compare? That is the real issue.


For example... The Person table has a foreign key to the Address table as each employee has an address. And the Address table has a foreign key to the State and Country tables.
In these cases, the Address table is more of a transactional table while the State and Country tables are lookup tables.
In our "Oops" example, it was the Letters and LettersTemplate tables. The task then is to "find lookup tables, compare the values for the client (or employer) and make sure that the value in the multi-tenant database is the same value as the original database.


So we have way to find child tables referenced by parent tables. But what are the values we need to compare? In some tables the column containing the lookup value is called SystemCode. In others, it's Type, or Description, or Status. There is nothing consistent and there was no documentation other than "check with ".


Enter the world of Extended Properties! This is my "go-to" tool for self documenting a database.
Properties can be added at
  • The database or Schema level (@level0Type)
  • Object level  such table, view or procedure (@level1Type)
  • Column level (@Level2Type)
Properties at each level have a name and value as well


I sat down and went through each of the tables referenced by foreign key and made a list based on my "knowledge" of the application, table name, column names. Then I sat down with the application architect and filtered out tables that were transactional in nature (like Address).




In our case, we used _Property to differentiate them from default supplied Microsoft properties (MS_Description).
Then for the @Value it depended on what we're looking for
  • LookupTable for lookup tables
  • LookupKey to identify the column(s) used as the foreign/lookup key for that table
  • LookupValue to identify the column(s) that contain the actual lookup values
This is what I wound up with:
EXEC sp_addextendedproperty
@name = N', @value = 'LookupTable',
@level0type = N'Schema', @level0name = dbo, -- dbo, Sales, HumanResources, etc
@level1type = N'Table', @level1name = tblBenefitType -- Name of table that is a lookup table

GO
EXEC sp_addextendedproperty
@name = N'CoName_Property', @value = 'LookupKey',
@level0type = N'Schema', @level0name = dbo, -- dbo, Sales, HumanResources, etc
@level1type = N'Table', @level1name = tblBenefitType, -- Name of table that is a lookup table
@level2type = N'Column', @level2name = BenefitTypeId; -- Name of column that contains the key value

GO
EXEC sp_addextendedproperty -- Will need to be executed for each column that comprises a lookup value
@name = N'CoName_Property', @value = 'LookupValue',
@level0type = N'Schema', @level0name = dbo, -- dbo, Sales, HumanResources, etc
@level1type = N'Table', @level1name = tblBenefitType, -- Name of table that is a lookup table
@level2type = N'Column', @level2name = BenefitType; -- Name of column that contains the value pair

GO


Now it's just s simple matter of querying the sys.extended_properties system view to find lookup tables, their keys and values, and using those values in a dynamic query that generates a checksum to compare against databases.

No comments:

Post a Comment