Understanding Modern Data Integration Best Practices

Access to data and real-time analytics are incredibly vital to any business. There are important decisions to be made, process improvements that can be achieved, supply chain notifications that are needed for immediate business choices, as well as other insights and information used to create or drive data-led decisions. With the data landscape changing over time and an increasing number of tools and cloud-based solutions available, the data space is always evolving. Understanding best practices when moving to a modern data architecture can help organizations succeed. Follow along and learn how Strive sets up our clients with the tools needed to bring business value every time.

Centralize Data

Data should be centralized, creating a single source of truth. It’s easy to say, and a lot more complicated to perform, depending on current source systems, but data silos can create multiple sources of data. As you can imagine, this complicates things. Two individuals can view two different data sets that have underlying differences when they should show the same data results.  Creating that single source of truth adds value of reliable data, data sources, and consistent results.

Scalability

One of the wonders of modern cloud tools is accessibility to a massive number of resources with no on-premise hardware increase. Using cloud tools, the elasticity and scalability can be essential to the accessibility of all data. Coinciding with the data lake concept, all data can be landed at low storage costs. If more data is needed to move downstream into the different data layers, resources can be added and scaled accordingly. Additional servers can be added for more power and clusters can be added for more concurrency. In a modern data architecture, scalability should always on the top of mind.

Extract, Load, Transform (ELT) vs. Extract, Transform, Load (ETL)

Traditionally, resources were needed to be planned out on-premise and ahead of time… plus databases had storage limits. This meant developers had to extract data from their sources, transform only what was needed, and load the data into the target. Tools or special coding were needed for this process, and the resources utilized for all of the transforming and loading of data were cumbersome and time consuming.

With the modern data integration approach, businesses can achieve a more ELT approach. The data is extracted and loaded first, due to low storage costs amongst cloud providers, allowing date to be transformed downstream. For example, with all of the unstructured and structured data loaded directly into a data lake, data scientists have access to everything and the ability to analyze every single bit. The data can finally be transformed based on the analysis and business rules to move downstream.

Re-usable Coding

As the number of tools grow and grow in the technology space, trying to keep re-usability in mind is very important. This improves speed to market and can improve the time it takes to deploy code, send extracts, load data into different layers, etc. For example, Strive partnered with one of our clients who needed 150 different files sent to a vendor. Using our proprietary ELT Accelerator, Strive developed reusable code and a database that could be used and run one time, loading all 150 files quickly and efficiently, saving our client time, money, and additional resources, while increasing speed to market.

Excellent Commenting and Documentation

Writing detailed documentation is difficult, but immensely important, especially in the beginning of a project. Creating a foundational aspect of commenting and documentation saves re-work and colleague understanding in the long run. Each piece of code, each SQL statement, and every deployment should have documentation and commenting. This makes support, future debugging, and future code changes easier.

Tool Agnostic Thinking

In order to understand a modern data architecture, it is important to think agnostically when deciding on tools. There are many cloud platforms, from AWS, Azure, Google Cloud as providers, and a growing number of data platforms in the cloud, such as Snowflake. When sifting through, it’s always important to take a step back and remember what is happening behind the scenes. We are moving data from one area to another and if you can learn one tool, it’s simple to learn another.

Minimal Approach

In the data landscape, there is no need to over complicate data pipelines. It adds to the skillset needed for a support team, creates data latency, and can increase the areas needed for code changes.  Let’s keep it as simple as possible. For example, a data lake can be housed within an AWS s3 folder structure and move to Snowflake. All layers can then be made within Snowflake itself. If the coding mindset remembers the ease of deployment and tool use, everyone wins.

Remember, it’s important to understand the current and future state of modern data integration and architecture. Data lakes are needed for data scientists, conformed layers are needed for downstream consumption, and semantic layers are needed for reporting purposes. As long as we all look at data and solutions as scalable, centralized, and re-usable, we are working towards a purpose that makes everyone’s job easier.

 

Interested in Strive’s Data & Analytics Practice?

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in modern data integration or an overall data and analytics assessment, Strive Consulting is dedicated to being your partner, committed to success. Learn more about our Data & Analytics practice here.

Author

Jeff MahanyJeffrey Mahany

Senior Data Engineer

Chicago, Illinois

 

 

Contact Us

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

© 2019 Strive Consulting, LLC., All Rights Reserved. Design by Hinge.