Pages

Thursday, November 3, 2011

Checkpoints in SSIS


Introduction


SQL Server Integration Services (SSIS) packages are designed to extract data from different data sources like databases, flat files and FTP locations etc and transform using different transformation and then load the transformed data into a different data sources. Some packages may take several hours to execute depending on the implementation. In middle of the package execution, in case the database is not available or the source text file is missing, execution will fail. Most of the times, you need to attend to the relevant issue and re-run the package. This means that package will run from the beginning and which means that you have to wait again for long time. 


In simple terms, you are wasting your times by re-running the entire package. If you have an option where you can restart the package from the point of failure it will be easier.


The Checkpoint feature in SSIS is about restarting package from the point of failure. 


Configuration
The technique behind checkpoint is to save the variables values at the time of failure to a file in a XML format. When it re-run, it reads from the saved file and load the variables and will start from the failed flows. When the package completes successfully, the Checkpoint file is removed; the next time the package runs it starts executing from the beginning since there will be no Checkpoint file. When a package fails, the Checkpoint file remains on disk and can be used the next time the package is executed to restore the values of package variables and restart at the point of failure.


First you need to configure the checkpoint file name. For this you need to right-click the package and select properties.


In the properties window, navigate to Checkpoint category and configure like following image.


The following items need to be configured on the properties tab for SSIS package: 

CheckpointFileName - Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path as shown above, it's a good idea to use an expression that concatenates a path defined in a package variable and the package name. 


The following strategy is the recommended method that I used and it is working fine up to now.
1. Have a common folder like C:\checkpointfiles to save all the package configuration files in your SSIS project. In case you need to restart from the start, you simply need to delete the file in the above folder.


2. Pass the folder name to a variable in the child packages from the parent package by using parent package variable type in the package configuration. We have already discussed how to create package configurations for SSIS packages. By following above two steps, by simply changing the main package variable which contains the value for checkpoint folder, all the locations will be changed.


3. The best way to assign a name for package is to assign the package name to the checkpoint file name as there can be only one checkpoint file for a package. For this you can use an expression in the package properties which will be like following.
4. By using the @[System::PackageName] variable, you can address the issues of renaming SSIS packages.

CheckpointUsage - Determines if/how checkpoints are used. Choose from these options: Never (default), IfExists, or Always. Never indicates that you are not using Checkpoints. IfExists is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist. 

SaveCheckpoints - Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

After configuring package for the checkpoint configuration, next is to configure each control task



For each control flow task, you need to set True for FailPackageOnFailure and FailParentOnFailure properties.


FailPackageOnFailure should set to True so that the SSIS package fails if this task fails; this implements the restart at the point of failure behaviour when the SSIS package property SaveCheckpoints is True and CheckpointFileUsage is IfExists. If you have not configured this to true then package will return success, though this task is failed hence checkpoint configuration will not be effected.


FailParentOnFailure Select True when the task is inside of a container task such as the Sequence container; set FailPackageOnFailure for the task to False; set FailPackageOnFailure for the container to True.



Example:

I have created simple SSIS package to demonstrate the SSIS Checkpoint. In this there are three Execute SQL Tasks.
Task A: Select 1/1
Task B: Select 1/0 - Which will introduce an error
Task C: Select 1/4
After configure the SSIS checkpoint, let us run the package and you will end-up with following state.

Let us check the checkpoint file at this moment.

You can see that checkpoint file will save all the variables along the with value time at which the package was failed. Then it will save the ID(s) of the controls that was successful. In the give example {7C89E7ED-4225-44CB-949F-0FA797D0238E} is the ID of the Task A.
Then fix the error at the Task B and re-run the package and you will see the following state for the SSIS package.

Important thing to note is that, it will start from the failed task earlier. As it does not run the failed task, it will reduce the run time. After the successful completion of the SSIS package, checkpoint file will be deleted.




No comments: