The Kepion Planning Data Integration Guide is designed to allow a user to understand the core elements that are involved with extraction, translation, and loading (ETL) of a Kepion application SQL database from a source system.
Skill Set Requirements
- Intermediate knowledge of SQL
- Intermediate knowledge of SQL Server 2008 Integration Services for data integration
- Intermediate understanding of OLAP concepts including star schemas and use of fact, Dimension and hierarchy tables
- Microsoft SQL Server 2008
- Microsoft SQL Server Analysis Services
- Microsoft SQL Server Integration Services
- Microsoft SQL Server Business Intelligence Development Studio
- Microsoft SQL Server Management Studio
Data Integration Concepts
Each Kepion application comes with its own relational database. All ETL can be performed on the application database and may include the following tables:
Dimension Tables: Dimension tables include any table that begins with D_ and DO_. The D_ tables store the main records for the Dimension while the DO_ store the ordering information of the Dimension Members.
Kepion Planning provides support for Dimension ETL through the use of staging tables DS_ and DSE_ tables, representing Dimension staging and Dimension staging error tables respectively. These tables are used in conjunction with the Dimension data loading Stored Procedure of [dbo].[sp_Load_Dimension_From_Staging].
Hierarchy Tables: Hierarchy (or Member List) tables include any table that begins with H_ and HO_. The H_ tables store both the Members in the List and the Parent-Child (PC) Hierarchy member relationships. The HO_ store the ordering information of the PC Hierarchy members.
Hierarchy staging tables are provided by tables that start with Staging_H_. Note that there is no ETL support via Stored Procedure for hierarchy loading.
Note: For PC Hierarchies there are V_H_ and SysCache_V_H tables. These are dropped and created automatically during deployment or processing. If you're manually updating a PC Hierarchy from outside Kepion, you must also truncate SysCache_V_H_ and insert V_H_ into SysCache_V_H_ for any PC Hierarchy.
Model Tables: Model tables include any table that begins with F_. The F_ table stores all the fact records found within a Model sliced by the Dimension Members from the hierarchy table.
Fact staging tables are provided by tables that start with Staging_F_. Note that there is no ETL support via Stored Procedures for fact loading.
Fact records are stored within the application database in a star schema. Looking at the example above, the fact table for the Model is represented by the Model Table and contains logical hierarchy keys to the H_ tables as represented by the Hierarchy Table which in turn have logical Dimension keys to the D_ tables as represented by the Dimension Table.
Data Integration Process
Overview of Data Integration Process
When loading the initial set of data to the Kepion application, perform the following in order:
- Load data to Dimension staging table DS_
- Call Dimension staging SP for all Dimensions
- Review errors in Dimension staging error tables DSE_
- Load all hierarchies directly to H_ and HO_
- Truncate SysCache_V_H_ (Parent-Child Hierarchies)
- Insert V_H_ into SysCache_V_H_ (Parent-Child Hierarchies)
- Load all Models directly to F_
Illustrated below are two methods that can be performed by a data integrator for ETL. Note that other methods are also possible.
- Load from source system directly to application tables. This method skips using any intermediate tables (including Dimension ETL via DS_ tables and SP)
- Load from source system to staging tables and from staging tables to application tables:
Use the application’s staging tables that are created along with each D_, H_, and F_ table to ETL data to the application. Note that these tables are provided solely to assist in an ETL process and are not required to be used. The staging tables include DS_, Staging_H_, Staging_F_.
Data Integration Tools
It's recommended to use Microsoft SQL Server Integration Services (SSIS) packages to perform ETL. To create and manage a data integration process, open the Microsoft Business Intelligence Development Studio (BIDS).
Load Dimensions into the application by populating the DS_ and DO_ tables. The following columns are included in the Dimension staging table DS_:
RowId: This column will be auto-populated.
MemberLabel: This field represents the unique label for the Dimension Members within the Dimension table. This field can take Unicode character strings of up to 200 in length.
MemberName: This field represents the friendly name for the Dimension Members within the Dimension table. This field can take Unicode character strings of up to 200 in length.
Input: A single bit field indicating if data input is allowed for the member. A value of 1 indicates this member accepts input while a value of 0 disallows input.
Annotate: A single bit field indicating whether data cells that intersect with this member should display annotation or value on plan forms. A value of 1 indicates that this member should display data cells with annotations.
Account Type: For the Account Dimension the field Account Type is used to indicate the type of general ledger (GL) account for the member. The values from this field should correspond to the Label field from the AccountLogic_AccountType table.
Dimension PROPERTIES: Additional Dimension properties created from the Modeler will be available on the Dimension table as columns. Populate these columns with the appropriate column types.
Dimension Schema Example
The images below show a typical Dimension structure and the members that it can contain.
Once the DS_ table is populated, you can call the Dimension loading stored procedure to handle the inserts and updates automatically for you. MemberIds will be automatically managed by the SP and stored in the D_ table.
@DimensionName = N'Product',
-- This indicates column that is unique, we will be loading by MemberLabel
-- but is possible to load by other columns as well, in which case you
-- specify in comma delimited format: [column1],[Column2],etc
@BusinessKeyColumns = N'MemberLabel',
@CheckStagingUniqueness = TRUE
In general, when a Dimension is populated, check the following for validity within the Kepion application.
- Do not use the value 0 for MemberID
- All MemberLabel within a single Dimension table must be unique.
- For account Dimension, ensure that all values within the AccountType field map to the Label field of the AccountLogic_AccountType
If loading through the stored procedure, check the rows in the DSE_ table for any errors that are logged.
Load hierarchies into the application by populating the H_ and HO_ tables. The following columns are included in the hierarchy table:
MemberId: A member Id from the Dimension table. MemberId must be unique within the table and cannot contain the value 0.
ParentMemberId: The member’s parent Id from the Dimension table. When the MemberId field and the ParentMemberId field are the same, then the member is a root member.
Hierarchy Schema Example
The images below show a typical hierarchy structure and the members that it can contain.
When a hierarchy is populated, check the following for validity within the Kepion application.
- Ensure all MemberId and ParentMemberId fields are found in the Dimension table.
- Ensure all MemberId are unique within the hierarchy table.
- Ensure there are no cycles and that each member has a path up to a root member.
Load data into the Models by populating the F_. The following columns are included in the Writeback fact table.
RowId: Ignore this column during data load. This field auto-populates when new rows are entered into the table.
HIERARCHY MEMBER IDs: One or more fields that map to members from the spatial hierarchy table as defined by the Model Dimensions used by the Model.
Value: The value field stores the fact record.
Comment: The comment field is used to store data annotations. Populate the comment field with appropriate annotations.
Status: When loading data, this field can be set to 0 or any value greater than or equal to 5. Numbers less than 5 are reserved system values, and numbers greater than 5 can be used to assign any meaning you choose. Values in this field should correspond to records from the Status table.
Activity: The activity field is used to track changes to the fact records. For data loading purposes you can set the field to NULL. Non empty values should correspond to records from the Activities table.
There are two types of F_ table that can belong to a Model:
- Writeback - This table stores data that will change or is considered volatile. For instance, Forecast records should be stored in the Writeback table as changes would be frequent. Data that is stored in the Writeback table will be immediately available for forms and reports to query from as this table will be used by a cube partition on the application’s SSAS database with storage mode set to ROLAP. All input data collected from plans will be stored to this type of table.
- Non-Writeback – These tables can be used to store dynamic or static data, depending on the configuration. By default, a Model is created with a “Static” partition with the OLAP storage mode set to MOLAP. However, Modelers can create additional partitions and that can have different storage modes based on the data to be stored. These tables do not participate in data input but can be configured to store the results of SQL based rule calculations or straight ETL records.
Model Schema Example
The images below show a typical Model structure and the records that it can contain.
When a Model is populated, check the following for validity within the Kepion application.
- Ensure all hierarchy key fields are found in the hierarchy table.
- Ensure all hierarchy keys are at the lowest level within each hierarchy.