What is the world’s most valuable resource? If you answered oil, you might be stuck in the 1990s! Many economists now argue that data has surpassed oil as the most precious commodity. Indeed, it is the fuel that propelled Alphabet, Amazon, Facebook, Apple and countless other household names to the top of the business world. It is easy to understand how this came to be. The properties of data are unique: Data is constantly being generated, it can be stored and transported/transferred cheaply, and it becomes more valuable the more it is used. In this article, we will look at an Extract, Transform, Load (ETL) solution that was developed by the Stack8 team for one of our enterprise customers that wished to centralize some of their Cisco Unified Communications data in an internal data warehouse for analytics and business intelligence purposes.
The company operates multiple contact centers hosted on several Cisco Unified Contact Center Express (UCCX) platforms. The objective was to extract the Unified Communications Manager (CUCM) Call Detail Records (CDRs), Unity Connection (CUC) voicemail information and some UCCX historical data, transform the data to match the data warehouse expected formats and load the data to the warehouse.
As the name implies, ETL is the process of extracting data from one or more sources, transforming the data according to some business logic while respecting the expected format of the destination, and loading the modified data into the destination.
The organization faced two challenges at the extraction level. The first was volume. The company generated several million new records daily, distributed across hundreds of tables in diverse sources using different systems.
After analyzing the needs of the company, our engineers narrowed the extraction scope significantly, which allowed for faster processing time, reduction in noise and lower storage costs. The team also created methods of identifying records already processed in order to ensure that each run would only process the new data.
The second challenge was accessibility. The company lacked the tools to establish connections to various data sources. Stack8 solved this problem by building custom database connectors to query any Cisco data source seamlessly.
The data for this project came from three sources: IBM Informix database, MS-SQL database, and continuously generated CSV files accessed via an FTP server.
To conform with the expected data type formats of the destination, Date and Boolean entries were transformed to be compatible with the warehouse’s system. The dates were also converted to all match the same time zone (UTC), and data in reference columns were replaced with their actual values for improved readability.
The load phase for this project was relatively simple. A single destination and tables mimicking the sources were created to reduce complexity. While the loading process itself was simple, it was nonetheless necessary to develop of a validation mechanism to ensure data integrity and fidelity. Statistical analysis was performed after each run to produce a detailed report of the ETL process.
After several weeks of testing, it was concluded that the tool operated at a 100% accuracy rate.
Is your company sitting on a gold mine? Contact us today to discuss ETL opportunities.