Tuesday, December 31, 2019

Data Migration - Good Practice



As part of Data Migration project, I followed following things.



1. Validation Scripts during development. This will play a vital role in validating data in Go Live and Production (post migration).


2. Reconciliation: There are two kind of reconciliation

  • Technical reconciliation – As part of this, we store the record count of each table (i.e. Source table, Transformed table and Target table)
  • Financial reconciliation -  As part of this, we match the financial numbers as per business needs (i.e. accounts receivable - in old and new system, accounts payable - in old and new system and breakdown of transactions)


3. Actual Data Migration can be done in two ways/approach

  • Bring the old data till last month or a specified date and then last month data during migration day. The advantage in this approach is that you will be migrating Delta on final day and business can do some validation with old data prior to migration day. In this approach we can keep system down time as minimal as possible.
  • Bing Bang migration is a risk-based approach, because success of migration is unknown until migration day. Downtime window is large. In case of any error, migration team needs to fix it as soon as possible in order to proceed further on migration. In addition to these resources needs to monitor the migration flow and record count at the time of migration. 


4. Error records needs to store in a separate table with relevant keys (i.e. the field that indicates uniqueness of the record). These records should be fixed and inserted back into new system after discussing with the business.



5. Additional measure needs to be taken while loading data in SQL Server. (i.e. SP_SpaceUsed ‘<tablename>’ to find the record count, Partition)



6. BDD feature in SQL Server – Balanced Data Distributor is used to read from the source table and write in target table simultaneously. This is because, read operation doesn’t take time but write operation takes more time.



7. Partition can play a vital role in improving the performance.



8. Indexes can improve the performance. I suggest to create Partition and then do Index on top of it.


Tuesday, March 26, 2019

Date functions & Date format in SQL Server



We use lot of date functions in day today coding in SQL Server world. So, thought of preparing a quick reference of commonly used date functions.


1. To get the current date use GetDate() function




2. To get UTC date use GetUTCDate() function


3. See below example show how to get Date only from DateTime


Select CONVERT(DATE , Getdate()) AS DateOnly






4. This example show how to get Time only from DateTime

Select CONVERT(TIME , Getdate()) AS TimeOnly






5. Following example show how to format date


            SELECT FORMAT(GETDATE(), 'HH:mm:ss') AS TimeUsingFormat








6. Use DatePart function to get specific value from Date.

                    a. Year only
SELECT DATEPART(yyyy,GETDATE())





                    b. Month only

-- To get Month                           
SELECT DATEPART(mm, GETDATE())
-- To get Month amended with '0'

SELECT FORMAT(GETDATE() , 'MM')





                    c. To display Month Name

-- To get MonthName
SELECT Format(GETDATE() , 'MMMM') AS NameoftheMonth

-- To get MonthName (3 Character only)
SELECT Format(GETDATE() , 'MMM') AS NameoftheMonth







                    d. Day only

-- To get Day
SELECT DATEPART(dd,   GETDATE())
-- To get Day amended with '0'

SELECT FORMAT(DATEPART(dd, GETDATE()) - 15, '00')






                    e. Year and Month using Format function

-- To get Month and Year
SELECT Format(GETDATE() , 'mmyyyy') AS MonthAndYear







                    f.   The below script will scan "myTable" table from StartDate to EndDate (whatever date range I choose), and iterate through this date-range, and if a record does not exist with any of the dates in this date-range, insert a new record with that particular date in the Timestamp field, but empty/NULL data for the rest of the fields.

DECLARE @StartDate AS DATETIME
DECLARE @EndDate AS DATETIME
DECLARE @CurrentDate AS DATETIME
SET @StartDate = '2015-01-01'
SET @EndDate = GETDATE()
SET @CurrentDate = @StartDate
WHILE (@CurrentDate < @EndDate)
BEGIN
    SELECT * FROM myTable WHERE myTable.Timestamp = "@CurrentDate"
    IF @@ROWCOUNT < 1
        print @CurrentDate
        /*insert a new row query here*/
SET @CurrentDate = convert(varchar(30), dateadd(day,1, @CurrentDate), 101); /*increment current date*/

END





FYI - Starting for SQL Server 2012 version, you can use FORMAT function.




-----------------------------------------------------------------------------------------------------------


Please see following link for further details on Date Functions

Date and Time Conversions Using SQL Server
        https://www.mssqltips.com/sqlservertip/1145/date-and-time-conversions-using-sql-server/


Format SQL Server Dates with FORMAT Function
          https://www.mssqltips.com/sqlservertip/2655/format-sql-server-dates-with-format-function/




DatePart Function






---------------------------------------------------------------------------------------------------




Date Format


Select CONVERT(VARCHAR(10), Getdate(), 112) AS fmtUTCDate




Conversion


----------------------------------------------------------------------------------------------------------------------------


DatePart


SELECT DATEPART(yyyy,'2007-10-30 12:15:32.1234567 +05:10')
SELECT DATEPART(mm,'2007-08-09 12:15:32.1234567 +05:10')
SELECT DATEPART(dd,'2007-10-30 12:15:32.1234567 +05:10')




 ----------------------------------------------------------------------------------------------------------------------------



Tuesday, January 22, 2019

How to get latest record or first record from a table in SQL Server?





One of the common requirements for SQL developer is to get latest or last record from a transaction table. This can be achieved by using Correlated subqueries.


Lets take an example, we have employee table with 7 fields.








Now, I would like to fetch latest transaction record from employee table (based on TransId field). 



SELECT a.transid, a.emp_id, a.emp_name, a.emp_address, a.sex, a.marital_status, a.CreatedDate
from Employee a
WHERE NOT EXISTS (select 1
         from Employee b
         where b.emp_name = a.emp_name and
               b.emp_address = a.emp_address and
               b.sex = a.sex and
                        --b.[CreatedDate] = a.[CreatedDate] and

               b.[TransId] > a.[TransId])








Next, based on transaction date I am displaying latest record.


SELECT a.transid, a.emp_id, a.emp_name, a.emp_address, a.sex, a.marital_status, a.CreatedDate
from Employee a
WHERE NOT EXISTS (select 1
         from Employee b
         where b.emp_name = a.emp_name and
               b.emp_address = a.emp_address and
               b.sex = a.sex and
                        b.[CreatedDate] > a.[CreatedDate] 
               --b.[TransId] > a.[TransId]
                        )








Following example shows how to fetch first transaction record


SELECT a.transid, a.emp_id, a.emp_name, a.emp_address, a.sex, a.marital_status, a.CreatedDate
from Employee a
WHERE NOT EXISTS (select 1
         from Employee b
         where b.emp_name = a.emp_name and
               b.emp_address = a.emp_address and
               b.sex = a.sex and
                        --b.[CreatedDate] = a.[CreatedDate] and
               b.[TransId] < a.[TransId])








I came across with few business scenario’s where few of the employee’s doesn’t have FirstLoginDate, so business asked me to take the first transaction date (in this case field name is CreatedDate) as FirstLoginDate. I used below query to update it.





UPDATE  T1 

       SET [FirstLoginDate] = T2.CreatedDate
       FROM [Varada_ETL_Test].[dbo].EmployeeLoginDetails T1
       INNER Join
              (
                     SELECT a.transid, a.emp_id, a.emp_name, a.emp_address, a.sex, a.marital_status, a.CreatedDate
                     from Employee a
                     WHERE NOT EXISTS (select 1
                                         from Employee b
                                         where b.emp_name = a.emp_name and
                                                b.emp_address = a.emp_address and
                                                b.sex = a.sex and
                                                b.[CreatedDate] < a.[CreatedDate] 
                                                --b.[TransId] > a.[TransId]
                        )
              ) AS T2 ON  T2.emp_id = T1.emp_id
              WHERE [FirstLoginDate] IS NULL
                          









Reference material for Subqueries: