ETL Capabilities of WSO2 Enterprise Integrator — Introduction

Natasha Wijesekare
3 min readMay 17, 2019

ETL (Extract, Transform, Load) is a type of data integration used to blend data from multiple sources. This is often used to build a data warehouse. During this process, data is pulled/extracted from a source system, transformed into a format that can serve business needs, and loaded into a data warehouse or data repository or any system.

ETL is used to migrate data from one database to another and is used to convert or transform large databases from one format or type to another. The ETL process is also used to load data from data marts which is usually oriented to a specific department/team and data warehouses.

ETL involves the following tasks:

  • Extracting data from source systems

The source systems can be SAP, ERP, or other operational systems. Data pulled from these systems are converted into one combined data warehouse format which is ready for a transformation. When handling large volumes of data and multiple source systems, the data is combined together.

  • Transforming the data extracted

Transforming data can involve the following tasks: applying business rules, filtering, cleaning, merging data together from multiple sources, and applying simple or complex data validation.

  • Loading the data

Data will be loaded into a data warehouse/repository or other systems.

To get a consolidated view of the data to make better business decisions organizations have relied on ETL processes. Integrating data from multiple sources and systems are still an important part of an organization’s data integration. Many organizations use the traditional ETL process as the data processing model. However there are newer data processing models like stream processing which is able to deal with real time data and automated data management which bypasses the traditional ETL and uses ELT model — Extract, Load, then Transform.

Integrating standalone ETL products with existing SOA infrastructures is complex. Modern day organizations are rapidly evolving and is more focused towards “connected businesses”. So adopting a standalone ETL product is not worthwhile. Usually these standalone ETL products rely on proprietary data integration patterns which causes higher maintenance. Most of these products provide less support for open standards, extension points and connectors. Tendency of organizations is to use reusable business components that are in-house to leverage the advantages of SOA.

Rearchitecting data models is made easier with the WSO2 Enterprise Middleware Platform. WSO2 Enterprise Integrator can be used to seamlessly integrate an ETL process to an existing organizational infrastructure.

Source: https://docs.wso2.com/display/EI630/WSO2+Enterprise+Integrator+Documentation
  • Comprehensive and OOTB support for each aspect of the term ETL (Extract, Transform, Load)
  • Extensive tooling support for ETL patterns
  • Ability to develop data models that vary from simple data mappings to complex data models
  • WSO2 EI connectors to encapsulate third-party API calls. These connectors enable you to connect to and interact with the APIs of services such as Twitter, Salesforce, JIRA etc.

This blog series will take you through the following ETL use cases which are implemented using the WSO2 Enterprise Integrator :

1- Data migration from a legacy database (RDBMS) to Salesforce via a schedule task

2- Real time data migration from Salesforce to Data Lake (MongoDB )and legacy database (RDBMS)

Stay tuned for all the blogs in the series!! :) :)

--

--