Introduction
This overview explains the data integration (a.k.a. extract-transform-load (ETL)) process from a data source to a Kepion SQL database and covers required data integration tools.
Concepts
Each Kepion Application has a SQL database that can be used for data integration. There are three concepts you need to know:
- Data Sources
- Staging Tables
- Target Tables
Data sources
Data sources are where you extract the data you wish to transfer and load it to Kepion databases. Some potential data sources are:
- Excel Workbooks
- SQL databases
Staging tables
Staging tables are an intermediate workspace for data processing before loading your data to your target tables. Typically, staging tables are truncated when the data integration process runs (i.e., data is cleared) before they are populated with the loading data.
Dimension staging tables
Includes tables starting with DS_ (Dimension Staging) and DSE_ (Dimension Staging Error). These tables are used in conjunction with the Dimension data loading Stored Procedure (SP):
[dbo].[sp_Load_Dimension_From_Staging]
Attention: You must go through the dimension loading process to update your dimension data. This process ensures Dimension MemberIds remain consistent with your solution's assets (e.g., forms, rules, etc.).
Model staging tables
Includes tables starting with Staging_F. These tables are pre-created for convenience but optional in data integration. You can also use the target F_ (Model/Fact) tables directly. Be aware that changes made to an F_ table will impact the Application.
Target tables
The target tables hold the solution's data. There are three types of target tables: Dimension, Hierarchy, and Model.
Note: Model data are structured in a star schema. The F_ table contains hierarchy keys referencing H_ tables, which have dimension keys referencing D_ tables.
Dimension tables
Includes any table that begins with D_ or DO_. The D_ tables store the list of dimension members; the DO_ tables store the ordering information of the dimension members.
Attention: Avoid truncating records in the D_ table as you may lose MemberIds. However, if there are no fact records or assets (e.g., forms, rules, etc.) tied to the MemberIds, there is no concern.
Hierarchy tables (a.k.a. member lists)
Includes any table that begins with H_ or HO_. The H_ tables store a subset of a D_ table's dimension members and any Parent-Child (PC) relationships. The HO_ tables store the hierarchy members' ordering information.
Note: Each PC hierarchy (i.e., member List in Hierarchy mode) has a V_H_ view and SysCache_V_H table to optimize application performance. When updating H_ tables, truncate SysCache_V_H_ and re-insert the records from V_H_.
Model tables (a.k.a. Fact tables)
Includes any table that begins with F_. The F_ table stores all of a model's fact records. There are two types of F_ tables that can belong to a model:
- Writeback – Stores data that will change based on user input. For example, user-entered forecast records are stored in the Writeback table.
- Non-Writeback – Stores all non-user-entered data. For example, actuals are stored in the Actuals table, not the Writeback table.
Tip: The FH_ table is a Model table that logs all user-entered data when the Audit option is enabled. Refer to our audit trail article for more information.
Process
The data integration process occurs in two steps:
- Extracting data from a source system and loading it to staging tables.
- Loading data from the staging tables to target tables.
Tip: Before starting data integration, ensure you have a backup of your application database in case of accidental data updates.
As a rule of thumb, execute data integration in the following order:
Click the links above to view how-to articles on the three data integration steps.
Tools
For data integration tools, we have three recommendations:
Management Studio
SSMS is the essential tool for data integration with Kepion databases. You must have it in order to conduct advanced data integration processes.
Azure Data Factory
ADF, released in 2015, is a serverless data integration service based on Microsoft Azure. ADF has a collection of 'pipelines,' the equivalent to SSIS's project and package structures.
ADF will suit your business if the following applies:
- You have Kepion hosted on the cloud
- Your Kepion cloud is not integrated with your on-premise infrastructure
Pros | Cons |
---|---|
Cloud-integrated and works with most cloud data sources | Pricing scheme can be complex depending on the services your company needs |
No installation, machine, or SQL server required |
Integration Services
Launched in 2005, SSIS has long been the top data integration tool for businesses using Microsoft SQL Server. SSIS combines its functional units of data integration work into 'packages,' which can be run individually or in groups.
SSIS will suit your business if you have Kepion installed on-premises.
Pros | Cons |
---|---|
Easily connects with Microsoft tools | Limited integration with cloud data sources and non-Microsoft tools |
Long-established product | May require additional components and/or licensing |
One-time payment. No subscription or service charges. |
Third-party data integration
Third-party data Integration allows data transfer between Kepion and other applications:
- Kepion enables CData and Skyvia users to configure processes that extract, transform, and load data from an external system into Kepion.