This guide explains how to perform data integration (a.k.a. extract-transform-load (ETL)) from a data source to a Kepion SQL database.
Before you Start
Skill Set Requirements
- SQL knowledge
- Familiarity with Microsoft SQL Server Management Studio
Tool Requirements
- Microsoft SQL Server 2016+
- Microsoft SQL Server Management Studio
Note: Refer to our Data Integration Overview to learn about data integration basics and tools.
Data Integration Concepts
Each Kepion Application has a SQL database that can be used for data integration. There are two types of tables used in data integration: staging and target.
Staging Tables
Staging tables are an intermediate workspace used for data processing before loading your data to your target tables. Typically, when the data integration process runs, staging tables are truncated (i.e., data is cleared) before they are populated with the loading data. Each target table has its associated staging table.
Dimension Staging Tables
Includes tables starting with DS_ (Dimension Staging) and DSE_ (Dimension Staging Error). These tables are used in conjunction with the Dimension data loading Stored Procedure (SP):
[dbo].[sp_Load_Dimension_From_Staging]
Attention: You must go through the Dimension loading process to update your Dimension data. This process ensures Dimension MemberIds remain consistent with your Application's assets (e.g., Forms, Rules, etc.).
Hierarchy Staging Tables
Includes tables starting with Staging_H. These tables are pre-created for convenience but are optional in the data integration process. You can also use the target H_ (hierarchy) tables directly. Be aware that changes made to the target table will impact the Application.
Model Staging Tables
Includes tables starting with Staging_F. These tables are pre-created for convenience but are optional in the data integration process. You can also use the target F_ (Model/Fact) tables directly. Be aware that changes made to the target table will impact the Application.
Target Tables
The target tables hold the Application's data. There are three types of target tables: Dimension, Hierarchy, and Model.
Note: Model data are structured in a star schema. The F_ table contains hierarchy keys referencing H_ tables, which in turn have dimension keys referencing D_ tables.
Dimension Tables
Includes any table that begins with D_ or DO_. The D_ tables store the list of Dimension Members; the DO_ tables store the ordering information of the Dimension Members.
Attention: Avoid truncating records in the D_ table as you may lose MemberIds. However, if there are no fact records or assets (e.g., Forms, Rules, etc.) tied to the MemberIds, there is no concern.
Hierarchy Tables (a.k.a. Member Lists)
Includes any table that begins with H_ or HO_. The H_ tables store a subset of a D_ table's Dimension Members and any Parent-Child (PC) relationships. The HO_ tables store the hierarchy members' ordering information.
Note: Each PC hierarchy (i.e., Member List in Hierarchy Mode) has a V_H_ view and SysCache_V_H table to optimize application performance. When updating H_ tables, truncate SysCache_V_H_ and re-insert the records from V_H_.
Model Tables (a.k.a. Fact Tables)
Includes any table that begins with F_. The F_ table stores all of a Model's fact records.
Tip: The FH_ table is a Model Table that logs all posted data changes when the Audit option is enabled. Refer to our audit trail article for more information.
Data Integration Process
Overview
When loading the initial data set to your application database, perform the following in order:
1. Load data to DS_
2. Call Dimension staging SP for all Dimensions
3. Review errors in DSE_ tables
4. Load all hierarchies directly to H_ and HO_
i. Truncate SysCache_V_H_ (Only for Parent-Child Hierarchies)
ii. Insert V_H_ into SysCache_V_H_ (Only for Parent-Child Hierarchies)
5. Load all Models directly to F_
Below are two data integration methods. Note that other viable methods exist.
Method One
Load from source system directly to application tables:
This method does not use any staging tables (including Dimension data integration via DS_ tables and SP).
Method Two
Load from source system to and from staging tables to application tables:
Use the Application’s staging tables with each D_, H_, and F_ table to integrate your data with the Application. These tables are provided solely to assist in the data integration process—meaning they do not need to be used. The staging tables include DS_, Staging_H_, and Staging_F_.
Load/Update Dimension
Load Dimensions to the Application by populating the DS_ and DO_ tables. The following columns are included in the DS_ table:
RowId: This column will be auto-populated.
MemberLabel: The unique label for the Dimension Members within the D_ table. This field can take Unicode character strings of up to 200 in length.
MemberName: The user-friendly name for the Dimension Members within the D_ table. This field can take Unicode character strings of up to 200 in length.
Input: A single-bit field indicating if data input is allowed for the member. A value of 1 indicates this Member accepts input, while a value of 0 disallows input.
Annotate: A single-bit field indicating whether data cells that intersect with this Member should display an annotation or value on a Form. A value of 1 indicates that this Member will display annotation data cells.
Account Type (only Account Dimension): Indicates the type of general ledger (GL) account for the Member. The values from this field should correspond to the Label field from the AccountLogic_AccountType table.
Dimension Properties: Additional Dimension properties created in the Modeler will appear as columns. You can populate these columns as necessary.
Dimension Schema Example
The images below show a typical Dimension structure and the Members it can contain.
Once the DS_ table is populated, you can automatically call the Dimension loading stored procedure to handle the inserts and updates. MemberIds will be automatically managed by the SP and stored in the D_ table.
EXEC [dbo].[sp_Load_Dimension_From_Staging]
@DimensionName = N'Product',
-- This indicates column that is unique, we will be loading by MemberLabel
-- but is possible to load by other columns as well, in which case you
-- specify in comma delimited format: [column1],[Column2],etc
@BusinessKeyColumns = N'MemberLabel',
@CheckStagingUniqueness = TRUE
Dimension Validation
When any Dimension is populated, verify the following:
- The MemberID is not 0
- All MemberLabel values within each D_ table are unique
- For the Account Dimension, ensure that all values in the AccountType field map to the values in the Label field of the AccountLogic_AccountType
- If loading through the stored procedure, any logged errors in the DSE_ table
Load/Update Hierarchy
Load hierarchies into the Application by populating the H_ and HO_ tables. The following columns are included in the H_ table:
MemberId: A MemberId from the DO_ table, which must be unique within the table and cannot contain the value 0.
ParentMemberId: The Member’s parent ID from its DO_ table. When the MemberId field and the ParentMemberId field are the same, the Member is a root member.
Hierarchy Schema Example
The images below show a typical hierarchy structure and the members it can contain.
Hierarchy Validation
When a hierarchy is populated, double-check the following:
- All MemberId and ParentMemberId fields are found in the H_ table
- All MemberId values are unique within the H_ table
- There are no cycles
- Each Member has a path up to a root Member
Load/Update Model
Load data into the Models by populating the F_ table. The following columns are included in the Writeback fact table.
RowId: Ignore this column during data load. This field auto-populates when new rows are entered into the table.
Hierarchy MemberIDs: One or more fields that map to Members from the spatial hierarchy table as defined by the Model Dimensions used.
Value: The value field stores the fact records/data.
Comment: The comment field stores annotations.
Status: When loading data, this field can be set to 0 or any value ≥ 5. Numbers < 5 are reserved for system values, and numbers > 5 can be used for user-defined values. Values in this field should correspond to records in the Status table.
Activity: The activity field tracks changes to the fact records. For data loading purposes, you can set the field to NULL. Non-empty values should correspond to records in the Activities table.
Two types of F_ tables can belong to a Model:
- Writeback – This table stores data that will change or be volatile. For instance, Forecast records should be stored in the Writeback table as changes would be frequent. Data stored in the Writeback table will be immediately available for Forms and reports to query. A cube partition will use this table on the Application’s SSAS database with storage mode set to ROLAP. All input data collected from plans will be stored in this table type.
- Non-Writeback – These tables can be used to store dynamic or static data, depending on the configuration. By default, a Model is created with a static partition with the OLAP storage mode set to MOLAP. However, Modelers can create additional partitions with different storage modes based on the data to be stored. These tables do not participate in data input but can be configured to store the results of SQL-based rule calculations or straight ETL records.
Model Schema Example
The images below show a typical Model structure and the records it can contain.
Model Validation
When a Model is populated, double-check the following:
- All hierarchy key fields are found in the H_ table.
- All hierarchy keys are at the lowest level of each hierarchy.
Comments
0 comments
Please sign in to leave a comment.