\ Data & Analytics Archives - Strive Consulting, LLC. All Rights Reserved.

Exercising Data Governance Best Practices – How to Stay the Course

Have you ever planned to wake up early in the morning to work out, but instead chose to lie in bed and catch up on some sleep? This can happen even after you have committed—mentally, at least—to a new workout regimen.

That’s because the hard part isn’t resolving to do something new; it’s adjusting your daily habits and generating enough momentum to carry the changes forward. This requires discipline and drive.

The same challenges apply to data governance initiatives. If you have ever been part of a data governance program that hesitated, backfired or stopped completely in its tracks, you know what I’m talking about. Companies are accruing ever-increasing amounts of data and want to be able to transform all that information into insights the same way you want to get in shape. The first step is data governance, but getting your organization to buy-in to a new program conceptually is the easy part. Taking action and sticking to it can be much more challenging.

Indeed, many organizations believe that simply implementing technology—like a Master Data Management system—will improve the health of their data. But if you simply buy workout equipment, do you get healthier? Tools will help streamline your organizational processes and complement information governance and information management, but building and maintaining a culture that treats data as an asset to your organization is the key to ongoing success.

Below are some key factors to building good habits to generate momentum once your data governance program is underway:

1. Impart a sense of urgency for the program.

For every organization with a plan to manage its data assets, there needs to be a sense of urgency to keep the plan in place. The reasons are unique from organization to organization, but they might be driven by compliance, customer satisfaction, sales, revenues, or M&A. Regardless of the reason, it needs to resonate with senior leadership and ideally be tied to the company’s strategic goals in order to be most effective.

2. Communicate, communicate, communicate.

The cornerstone to a successful data governance program is a well-organized (cross-departmental) communication plan. A solid plan helps remove the silos and maintain cross departmental support for the initiative. Seek your champions throughout the organization and meet with key stakeholders regularly to document their pain points. It is important to get people engaged early to keep the excitement going.

3. Operationalize change within the organization.

Your delivery will need to be agile in nature because the plan you put in place will naturally evolve. The goal is to learn what works within your organization early on to ensure you deliver value quickly and the process is sustainable moving forward. Complete tasks iteratively and agree upon a small set of high-value data attributes to aid in validating your data governance process. In addition, manage your data elements to ensure their best quality.

4. Make the plan as RACI as possible.

Actively listen to your supporters and put together a plan that encompasses a RACI (Responsible, Accountable, Consulted & Informed) model so that everyone on the team knows their role across the process. This plan will keep your team focused and guide your initiatives moving forward. You’ll raise your odds of success by forming a strong governance organizational structure with roles and responsibilities in place (for data ownership, stewardship and data champions), along with approvals that complement your existing change management process.

4. Measure, Communicate, Repeat.

Keep in mind that “you can’t manage what you don’t measure.” You’ll need to face the facts and communicate your findings. It’s wise to document and implement KPIs (Key Performance Indictors) so that you can measure the progress of your initiative over time. Linking the KPIs to revenue or sales loss, for example, can be a strong indicator to help drive change. As you learn more about your data, it’s important to communicate what’s meaningful to your stakeholders and continue to move forward.

Similar to continuing on a workout regimen, data governance demands a discipline that takes time and patience to fine tune. This requires changing years of undisciplined behaviors regarding data within your organization, and the change will not happen overnight. Changing these behaviors is an ongoing process that needs to resonate throughout an organization’s culture in order for success to occur.

In addition, it’s important to keep things fresh. When working out, you need to rotate though different core muscle groups and vary the routine to keep things interesting and progressive.  It’s the same with data governance initiatives. Don’t let people get bored with the same repetitive activities day in and day out. Try conducting data discovery sessions where team members present findings from an internal or external dataset that would be interesting to other team members. You can also share successes and learnings from past data related projects to drive discussion.  Another suggestion is to discuss future cross-departmental data projects (or “wish list” items) that can lead into great data roadmap discussions.  The objective is to keep everyone engaged and finding value in meetings so that the team continues to show up and make progress.

Remember that data governance is a journey that requires commitment and hard work. As with exercise, just working out for a month is a great start, but it’s with continued dedication that you really start to notice the change. If you want to take your organization to the next level, you need to develop the discipline toward information management that your organization requires for long-term sustainable success. For those with little experience in implementing or maintaining a data governance plan, experienced consultants can be of great value.

Strive Can Help With Your Data Governance Needs! 

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in modern data integration or an overall data and analytics assessment, Strive Consulting is dedicated to being your partner, committed to success. Learn more about our Data & Analytics practice HERE.

Contact Us

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

Have Your Data and Query It Too!

“Have your cake and eat it too.” How would it make sense to have cake and not be able to eat it? And yet, we have, for decades, had similar experiences with enterprise data warehouses. We have our data; we want to query it too!

Organizations spend so much time, effort, and resources building a single source of truth. Millions of dollars are spent on hardware and software and then there is the cleansing, collating, aggregating, and applying business rules to data. When it comes time to query… we pull data out of enterprise data warehouse and put it into data marts. There simply is never enough power to service everybody who wants to query the data.

With the Snowflake Data Cloud, companies of all sizes can store their data in one place – and every department, every team, every individual can query that data. No more need for the time, expense, effort, and delay to move data out of an enterprise data warehouse and into data marts.

The advance of the ‘data lake’ promised to be the place where all enterprise data could be stored. Structured, semi-structured, and unstructured data could be stored together, cost effectively. And yet, as so many soon found out – data ended up needing to be moved out to achieve the query performance desired. More data marts, more cost, and more time delay to get to business insight.

Snowflake solved this problem by separating data storage from compute. Departments and teams can have their own virtual warehouse, a separate query compute engine that can be sized appropriately for each use case. These query engines do not interfere with each other.  Your data science team can run massive and complex queries without impacting accounting team’s dashboards.

Snowflake does this having designed for the cloud from the ground up. A massively parallel processing database, Snowflake is designed to use the cloud infrastructure and services of AWS, quickly followed by Azure and GCP. Organizations get all the scalability promised by “Hadoop based Big Data” in an easy to use, ANSI Standard SQL data warehouse, that delivers the 5 V’s of big data (Volume, Value, Variety, Velocity and Veracity). Not to mention all of these benefits come with industry leading cost and value propositions.

Speaking of Variety… Snowflake has broken out of the “data warehouse” box and has become ‘The Data Cloud’. All your data types: structured, semi-structured and now, unstructured.  All your workloads: Data Warehouse, Data Engineering, Data Science, Data Lake, Data Applications, and Data Marketplace. You have the scalability in data volume and in query compute engines across all types of data and use cases.

With the Snowflake Data Cloud, you truly can have all your data and query it too. Extracting business value for all departments and all employees along the way.

 

Want to learn more about the Snowflake Data Cloud? 

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.

About Snowflake

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.

Contact Us

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

Why Choose Open-Source Technologies?

In 2022, almost every enterprise has some cloud footprint, especially around their data. These cloud platforms offer closed-source tools which, while offering many benefits, may not always be the best choice for some organizations. First and foremost, these proprietary services can be expensive. In addition to paying for the storage and compute needed to store and access data, you also end up paying for the software itself. You could also become locked into a multi-year contract, or you might find yourself locked into a cloud’s tech stack. Once that happens, it’s very difficult (and expensive) to migrate to a different technology or re-tool your tech stack. To put it simply, if you ever reach a roadblock your closed-source tool can’t solve, there may be no workarounds.

Since closed-source technologies can create a whole host of issues, open-source technologies may be the right choice. Open-source tech is not owned by anyone. Rather, anyone can use, repackage, and distribute the technology. Several companies have monetized open-source technology by packaging and distributing it in innovative ways. Databricks, for example, built a platform on Apache Spark, a big-data processing framework. In addition to providing Spark as a managed service, Databricks offers a lot of other features that organizations find valuable. However, a small organization might not have the capital or the use case that a managed service like Databricks aims to solve. Instead, you can deploy Apache Spark on your own server or a cloud compute instance and have total control. This is especially attractive when addressing security concerns. An organization can benefit from a tool like Spark without having to involve a third party and risk exposing data to the third party.

Another benefit is fine-tuning resource provisioning.

Because you’re deploying the code on your own server or compute instance, you can configure the specifications however you want. That way, you can avoid over-provisioning or under-provisioning. You can even manage scaling, failover, redundancy, security, and more. While many managed platforms offer auto-scaling and failover, it is never so granular as it is when you provision resources yourself.

Many proprietary tools, specifically ETL (Extract, Transfer, Load) and data integration tools, are no-code GUI based solutions that require some prior experience to be implemented correctly. While the GUIs are intended to make it easier for analysts and less-technical people to create data solutions, more technical engineers can find it frustrating. Unfortunately, as the market becomes more inundated with new tools, it can be difficult to find proper training and resources. Even documentation can be iffy! Open-source technologies can be similarly peculiar, but it’s entirely possible to create an entire data stack – data engineering, modeling, analytics, and more – all using popular open-source tech. These tools will almost certainly lack a no-code GUI but are compatible with your favorite programming languages. Spark supports Scala, Python, Java, SQL and R, so anyone who knows one of those skills can be effective using Spark.

But how does this work with cloud environments?

You can choose how much of the open-source stack you want to incorporate. A fully open-source stack would simply be running all your open-source data components on cloud compute instances: database, data lake, ETL, data warehouse, and analytics all on virtual machine(s). However, that’s quite a bit of infrastructure to set up, so it may make sense to unload some parts to cloud-native technologies. Instead of creating and maintaining your own data lake, it would make sense to use AWS S3, Azure Data Lake Storage gen2, or Google Cloud Storage. Instead of managing a compute instance for a database, it would make sense to use AWS RDS, Azure SQL DB, or Google Cloud SQL and use an open-source flavor of database like MySQL or MariaDB. Instead of managing a Spark cluster, it might make sense to let the cloud manage the scaling, software patching, and other maintenance, and use AWS EMR, Azure HDInsight, or Google Dataproc. You could also abandon the idea of using compute instances and architect a solution using a cloud’s managed open-source offerings: AWS EMR, AWS MWAA, AWS RDS, Azure Database, Azure HDInsight, GCP’s Dataproc and Cloud Composer, and those are just data-specific services. As mentioned before, these native services bear some responsibility for maintaining the compute/storage, software version, runtimes, and failover. As a result, the managed offering will be more expensive than doing it yourself, but you’re still not paying for software licensing costs.

In the end, there’s a tradeoff.

There’s a tradeoff between having total control and ease of use, maintenance, and cost optimization, but there is a myriad of options for building an open-source data. You have the flexibility to host it on-premises or in the cloud of your choice. Most importantly, you can reduce spend significantly by avoiding software licensing costs.

 

Interested in Learning More About Open-Source Technologies? 

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in modern data integration or an overall data and analytics assessment, Strive Consulting is dedicated to being your partner, committed to success. Learn more about our Data & Analytics practice HERE.

Contact Us

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

5 Key Concepts in Design Thinking for Visual Analytics

Design Thinking for visual analytics is a proven framework that puts the needs of end users at the forefront of development, enabling organizations to fail-fast, iterate, and design analytical solutions that can scale with excellence and sustain with improvement. Design thinking enables organizations to introduce agile ways of working, data fluency, value creation, and storytelling with data.

Some key concepts involved in Design Thinking:

  • Visualization & KPIs
  • Personas
  • User Journey Mappings
  • Conceptual Data Modeling
  • Wire-framing/Prototyping

Visualization & KPIs  

Visual analytics are essential for enabling users to take action and make data-driven decisions through insights and storytelling. Although visualization is at the forefront of most reporting products, there is a broad spectrum of needs and analytics use cases across any business, all of which are important. Visualizations are great, but a visualization is only effective if there is clear alignment on KPIs and how they can be leveraged. Developing KPIs is both an art and a science. The objective is to identify measures that can meaningfully communicate accomplishment of key goals and drive desired behaviors. Every KPI should relate to a specific business outcome with a performance measure. Unfortunately, KPIs are often confused with business metrics. Although often used in the same spirit, KPIs need to be defined according to critical or core business objectives, while metrics can provide insights that may feed into KPIs. KPIs and metrics can be at the organizational level and trickle down to other functional areas, as seen in the example below with Sales. Therefore, defining personas is a good exercise to understand the different needs of users across an organization.

Visualization & KPIs

Persona Development 

What is a Persona?

A persona is a fictional character, rooted in research, that represents the needs and interests of your customer. It is created to represent a segment of users that might leverage a product in a similar way. Personas facilitate visualization of the user and create empathy with users throughout the design process.

Why do we want Personas?

Developing personas helps gain an understanding for how different users within an organization leverage analytics. This is integral in designing user-centric applications that are organized by user’s needs with relevant content, combined from different sources.

What is a good persona?

A good persona accurately represents the user experience for a single or group of users taking into consideration the user needs to formulate requirements, design context, and complexity of situational behaviors.

By developing personas and understanding the needs of users, we can leverage different approaches to design analytics to guide the user through their desired experience. Whether it’s creating actionable KPIs to measure performance/progress, or enabling the user to self-serve through a guided analytics experience, understanding their analytical needs will help drive the design of the solution.

User Journey 

One of the underlying principles of design thinking is putting the user’s needs first when designing & developing applications. Understanding the empathy of the user by mapping moments of frustration and delight throughout their analytical journey will help formulate the best experience possible.

Conceptual Data Model  

A conceptual data model is a visual representation of concepts and rules that convey the makeup of a system or organization. Conceptual data models are key for defining, organizing, and prioritizing concepts and informational needs of the business.

Wireframing  

Humans are innately visual creatures and often struggle to articulate their needs. Wireframing and prototypes are visual representation that define the experience with reporting and analytics and will visually depict the requirements or needs of a user in preparation for development.

What is it good for?

  • Makes Things Tangible: Helps with visualizing the concept and engaging stakeholders around a product vision
  • Enables Collaboration: Customer/User feedback can be taken into consideration before development begins.
  • Saves Time: Increases the speed of shared understanding and provide guidance to the development team
  • Supports User Testing: Supports usability test iterations to get insight from actual users of the product.

What is the process of wireframing?

A good iterative design process increases fidelity at each step to get closer to the final product that satisfies the needs of a user.

Helpful Tips for wireframing:

  • You don’t have to be an artist.
  • Keep it simple – Sketches, wireframes are meant to convey information.
  • Short Sharp Annotations – Drawings and sketches help articulate ideas, but annotations – explanations – callouts are necessary to explain functionalities and concepts.
  • Encourage Feedback – Feedback is necessary to iterate, refine, improve the design and engage stakeholder around the product vision.

Summary 

Design thinking is a framework that can be applied to almost every user-centric application. The biggest value an organization can recognize by instilling design thinking principles is understanding the needs and empathy of users as they begin to adopt analytics to enable a data-driven culture. If you’re curious how design thinking can be applied to your organization’s visual analytics and products, Strive has proven, strategic offerings to help you achieve your desired goals.

Interested in Design Thinking? What about Data & Analytics?

You’re in luck! Strive Consulting helps companies compete in a data-driven world. We turn information into insight through powerful analytics and data engineering and our Data & Analytics specialists create new opportunities for our clients out of untapped information tucked away across your business. Whether it’s capturing more market share or identifying unmet customer needs, effectively mining, and utilizing the data available to you will help you make faster, more informed decisions and keep pace with today’s rapidly changing business world. Click here to learn more about our D&A practice.

Subscribe

Categories: blog, Data, Data & Analytics, Thought Leadership

How Snowflake Saved My Life

Let me tell you a story of how the Snowflake Data Cloud saved my life. I know it sounds dramatic, but just hear me out.

A few years ago, I worked with a multi-billion-dollar wholesale distributor that had never implemented a data warehouse. Their main goal? Consolidate all data into one location and enable KPI roll-ups from across their disparate systems. However, in this case, they did not want to invest in additional licensing. So, my team set about building a traditional data warehouse leveraging their current platform, SQL Server. Initially, it was a successful four-layer architecture with Staging, Consolidation, Dimensional Model, and Tabular Cubes, with the end visualization solution being Power BI… but within a few months, issues began to surface.

The number of sources feeding into this platform had increased dramatically and this increase started to impact load times. Initially, the batch load processes were running between two and three hours, but over time increased to taking 5, 6, sometimes 7 hours to run! We needed a long-term solution, but in the short term, keep the platform running to deliver data to the organization.

What we were experiencing were challenges with Constraints, Indexing, Locks, Fragmentation, etc… To mitigate these issues, I personally took the step of waking up every morning at 3:00AM to log in and ensure certain process milestones successfully completed in a timely manner. If those milestones were not achieved, the batch process would either stall, fail, or run excessively long and the last thing I wanted was to explain to the business why they were not going to have data until 9:00, 10:00, 11:00AM. After a couple weeks of doing this, it became apparent – we needed a better solution, and fast!

In the past, I had some experience with Big Data platforms, but decided to research options outside of established technologies, such as Cloudera or Hadoop-based solutions and instead looked into something new – Snowflake. Snowflake is the world’s largest and most efficient data management platform, where organizations can access, share, and maintain their data, so I thought, why not? Let’s give it a shot!

We set up a proof of concept initially trying to mimic the 4-layer architecture we had set up in SQL Server. After seeing limited success, as well as being laughed at for even trying it, we took a step back, reevaluated our approach, and flipped the architecture from ‘Extract Transform Load’ toward ‘Extract Load Transform’.. And… Eureka! With this change, we were able to reduce overnight batch runtimes from the 5, 6, 7 hour SQL Server to less than 20 minutes. In fact, our average runtimes for our load processes were around 17 minutes, but now I’m just showing off.

Not only did this have an incredible effect on our ability to deliver data in a timely manner, but it also enabled an increase in the frequency in which we processed data. You see, with the SQL Server we were never able to update data more than once a day, but with Snowflake, we could run the batch process every 20 minutes and quickly deliver requested changes to the models, measures, and dimensions.

The implementation process went from taking weeks to taking days, or even hours, resulting in some very happy stakeholders. With these results, coupled with the fact that I no longer had to wake up at 3:00AM to verify successful batch processes…Snowflake truly saved my life.

Want to learn more about the Snowflake Data Cloud? 

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership here.

About Snowflake

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at Snowflake.com.

Subscribe

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

Snowflake: A Data Engineering Tool

Snowflake began as the best cloud storage tool. It set out to simply store data, while also providing the infinite scalability of the cloud. Right out of the gate it offered incredible flexibility with its decoupled storage and compute. You could model your data in a Kimball dimensional model, a Data Vault, or even a loosely structured data lake all in the same instance. It also handled table clustering, auto scaling, security, caching, and many other features most data engineers and analysts don’t want to worry about. However, it fell short when it came to data extraction and transformation. Other platforms like Redshift, BigQuery, and Synapse integrated so well with their respective cloud data stacks, making data engineering and processing as simple as it gets. Snowflake, on the other hand, has three options for loading and transforming data.

Option 1: Snowpipe and Stages

Snowflake has a proprietary continuous data loading solution called Snowpipe. It uses named Snowflake objects called ‘pipes’, which contain a COPY statement to take data from a stage and load it into a table. A stage is a location where data files are stored in the cloud. These can be internal (within Snowflake itself) or external (on a cloud platform like AWS S3 or Azure Data Lake Storage). Snowpipe can be called by a public REST API endpoint or by using a cloud event messaging service like AWS SNS.

For example:

  1. Log files are uploaded to an AWS S3 bucket
  2. AWS SNS or S3 Event Notifications notify Snowpipe new data is available
  3. Snowpipe grabs the new data and loads it into the target table

There are numerous benefits to this approach:

  • It’s serverless, so you don’t need to configure or monitor resources.
  • It’s continuous, so it’s a good approach for constantly flowing data or small incremental loads.
  • You’re only billed for the resources Snowflake uses to move the data, as opposed to the large billing model for virtual warehouses.
  • If you already have a cloud data lake, this is a simple way to migrate data into Snowflake using native technology.

However, Snowpipe may not be appropriate for the following reasons:

  • Snowpipe alone doesn’t solve Change-Data-Capture (CDC), which requires the use of streams
  • Snowpipe requires additional cloud resources:
    • Cloud data store
    • Event messaging service
    • IAM
    • Snowflake file format and copy options
  • Integration between your Snowflake instance and your cloud instance is required.
  • File size, queuing, etc.

Option 2: Third-Party ETL Tools

From the beginning, Snowflake created a robust partner network and integrations with some of the most popular ETL tools in the market. The ETL tools also create native integrations with Snowflake and have simplified the process of extracting, transforming, and loading data from various sources into Snowflake. While each ETL tool is different, they should all be able to offer the following:

  • One single tool for extraction, transformation, and loading of data
  • Native CDC capabilities
  • Orchestration and automation of ETL/ELT processes
  • Less setup than using Snowpipe and external stages
  • No-code solutions for data transformations

There are many use cases where a third party ETL tool is the right choice, and when implemented successfully, can save time and money for data engineering teams. There are also reasons not to use third party tools:

  • Price for proprietary software can be very expensive
  • Some tools are not fully managed, meaning your team will have to set up the cloud infrastructure to serve the ETL tool
  • Potentially less scalability than Snowflake
  • Difficult to switch from one tool to another
  • Each tool requires additional skills in order to implement effectively
  • Continuous, near-real-time, and real-time loading is immature or nonexistent

Other Options

There are other options for loading data into Snowflake:

  1. BULK loading via COPY command
  2. Data loading wizard in the Snowflake UI

The data loading wizard allows you to load data directly into tables with no actual code. This is intended for smaller datasets and in rare circumstances and should not be used for regular data loading. BULK loading via the COPY command allows you to manually load data by staging it and then copying it to a table. This requires a running virtual warehouse. This too should not be used for regularly scheduled loads or any kind of volatile or business-critical data.

The Current State of Data Engineering in Snowflake

While Snowflake started off as a cloud storage and querying platform, competition has forced it to create more data engineering capabilities. Platforms like Databricks are starting to encroach on data storage, and Snowflake has responded by adding new features:

Stored Procedures

Stored procedures allow you to execute procedural logic. Procedures can take parameters and execute tasks like DML actions. As of October 2021, stored procedures are only available in Javascript.

Tasks

Tasks are named objects that execute a SQL statement at a scheduled time. The emphasis is on the scheduled aspect, as this is the first semblance of orchestration in Snowflake. A task will use a virtual warehouse to run a SQL statement, such as calling a stored procedure. In other words, you can combine tasks and stored procedures to create data pipelines!

Because tasks are named database objects, you can use them in third party ETL solutions as you would a table or view. Certain limitations may apply depending on the tool.

Streams

These are the last piece of the puzzle for creating a data engineering environment. Streams are named objects created off an existing table. A stream will have the same columns as the table, but with three additional columns: METADATA$ACTION, METADATA$ISUPDATE, and METADATA$ROW_ID.

  • METADATA$ACTION specifies if the row is an insert or delete
  • METADATA$ISUPDATE specifies if the insert or delete is an update
  • METADATA$ROW_ID specifies the unique and immutable row for the table.

With these three columns, you can implement Change-Data-Capture on tables. This article shows how to build a relatively simple Type II slowly changing dimension and all the upsert logic associated with it.

Streams can provide an easier alternative to stored procedures for processing Change-Data-Capture. In a traditional Kimball architecture, this would be best used in the ODS layer and for Type II slowly changing dimensions in the presentation layer. Instead of having to program a stored procedure in Javascript, you can combine a stream to provide the table history and a view to handle the insert/update/delete logic. However, a stream only applies to a single table. Streams are not appropriate for ETL processes that require complex joins and aggregations like you would typically use in a staging layer for fact and dimension tables.

Streams and stored procedures still need to be used together for CDC, but the process is simplified. Instead of comparing the ODS table with the new raw data, the stream will determine the table changes for you. The stream and corresponding view that manages the upsert logic are used in a simple stored procedure to expired changed and deleted records, insert new and current changed records, and append metadata to all records.

Bringing it all together

With these native Snowflake objects, data processing no longer requires third party tools. Achieving a robust persistent staging layer with auditability is now possible with streams, and when chained with tasks and called by stored procedures, the entire process can be automated. Pipes can continuously bring new data into Snowflake, so your data will never be stale. The data orchestration and processing components that were previously in a third-party tool now resides entirely within Snowflake, and the code is entirely SQL.

Interested in Learning More about Snowflake as a Data Engineering Tool?

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership here.

About Snowflake

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at Snowflake.com.

Subscribe

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

3 Steps to Building Rapport with Clients in a Hybrid Work Environment

Recently, the workforce has undergone a series of transitions. COVID-19 has forced companies to quickly adopt a digital transformation within their organizations and people workforce. Even in industries that formerly prohibited working remotely, working from the comfort of ones home seemingly became the norm overnight. With the release of the COVID-19 vaccine, many companies adopted a hybrid working solution, allowing employees to both work from home, while also choosing an in office environment as well. In this new day and age, companies now have a host of difficult challenges to over come, with relationship building at the core of them all. Healthy relationships sit at the center of everything we do, so it left me wondering… in this dynamic environment, how do we continue to connect and build relationships with our clients? Well, you’re in luck! Check out theses 3 ways you can build rapport with your clients virtually and in person.

Why Is It Important to Build Rapport?

Rapport is a relationship in which the people or groups involved are “in sync” with each other, understand each other’s feelings or ideas, and communicate smoothly. In other words, when people or teams get to know and like one another, they become friendly and open with fostering a more collaborative work experience.

How Does One Build Better Rapport?

Rapport building takes time, commitment, and dedication. The most successful way to build rapport is to start off by focusing on all 3 key fundamental areas; active listening, intentional relationship building, and consistent delivery. These are the things that people talk about the most when describing a great experience with Strive and their own internal organizational needs.

1. Listen to your clients and stakeholders

One of the most valuable and underrated methods to building rapport and relationships with a client is listening. It is easier said than done, but with practice can be one of the most successful tools in your toolbox. During client discussions and one-on-ones, ensure that you are practicing the 3 components of active listening (comprehend, retain, respond). In a remote environment, we are unable to rely on body language, so try focusing on not only what the client is saying, but how they are saying it. This should give us an understanding of what they are trying to convey. Retain the key points and the emotions the client is trying to express. Reiterating what they stated in a clear and concise manner, while avoiding overuse of jargon is key. Listening to our clients ensures that they feel understood and heard.

2. Create space for relationships

In this hybrid and dynamic work environment, it is easier than ever to lose sight of relationship building that naturally would occur sitting in a conference room or chatting at a coffee machine. Now to build relationships, we need to be intentional. Most experts agree that 70 to 93 percent of all communication is nonverbal 1. Luckily, with the shift to a more hybrid environment, we can now safely meet in person, as well as utilize other digital mediums. Maximize this opportunity to host a happy hour, lunch, or quick coffee in a responsible and safe manner, such as on an open restaurant patio. The change of scenery can foster a change in conversation and create a shared experience that will translate into smoother communication in the digital space. Alternatively, set aside time at the beginning of a meeting or throughout your day to connect with your clients or peers. This can be a quick “ping” or “slack” message asking them about their weekend, seeing if they need help on a task, or talking about a favorite sports team event. We can, and should, utilize all these tools to support rapport building. A client who feels a deeper connection to you is likely to be more collaborative.

3. Deliver results

While this may seem straight forward, it tends to be overlooked. The ability to deliver impressive results is often what separates successful consulting companies from the others. Delivering results is, after all, what we are hired to do. If you are successful in 1 and 2 above, delivery should be straightforward. You will have a list of key objectives, pain points, and deliverables the client not only expects, but you can build on. Alternatively, listening and relationship building become moot points if you or your team is not delivering or following through on commitments, contractually or otherwise. Continual delivery and follow-through will ensure a client is satisfied.

Although the future of traditional small and large business models is still unknown, understanding the fundamentals in successful relationship building is key in making sure we’re prepared for whatever life throws at us. Active listening and creating the space needed to foster genuine relationships, all while continuing to deliver your best results, will set you and your organization up to be the best of the very best.

Interested in learning more about how Strive put’s their clients first?

Here at Strive Consulting, we foster an active, innovative culture, providing the coaching, mentoring, and support our employees and clients need to work at the top of their game and succeed, personally and professionally. Understanding that the corporate world has changed and been challenging on all parties involved in the workplace, Strive is here to provide guidance and assist in any way we can. Strive Consulting is dedicated to being your partner, committed to success.

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

Understanding Modern Data Integration Best Practices

Access to data and real-time analytics are incredibly vital to any business. There are important decisions to be made, process improvements that can be achieved, supply chain notifications that are needed for immediate business choices, as well as other insights and information used to create or drive data-led decisions. With the data landscape changing over time and an increasing number of tools and cloud-based solutions available, the data space is always evolving. Understanding best practices when moving to a modern data architecture can help organizations succeed. Follow along and learn how Strive sets up our clients with the tools needed to bring business value every time.

Centralize Data

Data should be centralized, creating a single source of truth. It’s easy to say, and a lot more complicated to perform, depending on current source systems, but data silos can create multiple sources of data. As you can imagine, this complicates things. Two individuals can view two different data sets that have underlying differences when they should show the same data results.  Creating that single source of truth adds value of reliable data, data sources, and consistent results.

Scalability

One of the wonders of modern cloud tools is accessibility to a massive number of resources with no on-premise hardware increase. Using cloud tools, the elasticity and scalability can be essential to the accessibility of all data. Coinciding with the data lake concept, all data can be landed at low storage costs. If more data is needed to move downstream into the different data layers, resources can be added and scaled accordingly. Additional servers can be added for more power and clusters can be added for more concurrency. In a modern data architecture, scalability should always on the top of mind.

Extract, Load, Transform (ELT) vs. Extract, Transform, Load (ETL)

Traditionally, resources were needed to be planned out on-premise and ahead of time… plus databases had storage limits. This meant developers had to extract data from their sources, transform only what was needed, and load the data into the target. Tools or special coding were needed for this process, and the resources utilized for all of the transforming and loading of data were cumbersome and time consuming.

With the modern data integration approach, businesses can achieve a more ELT approach. The data is extracted and loaded first, due to low storage costs amongst cloud providers, allowing date to be transformed downstream. For example, with all of the unstructured and structured data loaded directly into a data lake, data scientists have access to everything and the ability to analyze every single bit. The data can finally be transformed based on the analysis and business rules to move downstream.

Re-usable Coding

As the number of tools grow and grow in the technology space, trying to keep re-usability in mind is very important. This improves speed to market and can improve the time it takes to deploy code, send extracts, load data into different layers, etc. For example, Strive partnered with one of our clients who needed 150 different files sent to a vendor. Using our proprietary ELT Accelerator, Strive developed reusable code and a database that could be used and run one time, loading all 150 files quickly and efficiently, saving our client time, money, and additional resources, while increasing speed to market.

Excellent Commenting and Documentation

Writing detailed documentation is difficult, but immensely important, especially in the beginning of a project. Creating a foundational aspect of commenting and documentation saves re-work and colleague understanding in the long run. Each piece of code, each SQL statement, and every deployment should have documentation and commenting. This makes support, future debugging, and future code changes easier.

Tool Agnostic Thinking

In order to understand a modern data architecture, it is important to think agnostically when deciding on tools. There are many cloud platforms, from AWS, Azure, Google Cloud as providers, and a growing number of data platforms in the cloud, such as Snowflake. When sifting through, it’s always important to take a step back and remember what is happening behind the scenes. We are moving data from one area to another and if you can learn one tool, it’s simple to learn another.

Minimal Approach

In the data landscape, there is no need to over complicate data pipelines. It adds to the skillset needed for a support team, creates data latency, and can increase the areas needed for code changes.  Let’s keep it as simple as possible. For example, a data lake can be housed within an AWS s3 folder structure and move to Snowflake. All layers can then be made within Snowflake itself. If the coding mindset remembers the ease of deployment and tool use, everyone wins.

Remember, it’s important to understand the current and future state of modern data integration and architecture. Data lakes are needed for data scientists, conformed layers are needed for downstream consumption, and semantic layers are needed for reporting purposes. As long as we all look at data and solutions as scalable, centralized, and re-usable, we are working towards a purpose that makes everyone’s job easier.

Interested in Strive’s Data & Analytics Practice?

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in modern data integration or an overall data and analytics assessment, Strive Consulting is dedicated to being your partner, committed to success. Learn more about our Data & Analytics practice here.

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

Navigating the Storm When You’re Drowning in Data

The rise in demand for accurate, verifiable, and up-to-the-second data, has put a massive amount of pressure on companies’ existing data warehouse architecture. Many companies are struggling with large sets of data, what to do with it, and how to make sense of it all. These companies built out a large, robust data lake, but now they don’t know what to do with it, with large data sets being too overwhelming for IT teams to handle. One could even say, they’re drowning in data!

Luckily, our team here at Strive, while partnering with Snowflake, the data cloud platform, is capable of being your port in the storm. Bringing a much-needed life preserver to organizations through our expertise to help right-size data elements and simplify data challenges. Interested? Let’s walk through a recent example of doing just that.

This year, a Client partnered with Strive to help assess their current data warehouse initiative. Our Client had multiple source systems (15+) with desires to have a single source of truth for reporting. Drowning in manual, highly cumbersome tasks, and long hours for their employees, the Client needed a more efficient process regarding data extraction, transformation, loading, and sharing.

Enter Strive.

Our approach was simple – assist with the building of a strategic vision of future business processes, while implementing a proactive, rather than reactive, approach to technology adoption. Strive introduced the Client to our trusted partner Snowflake as the data platform. Snowflake enables data storage, processing, and analytic solutions that are faster, easier to use, and far more flexible than traditional offerings. Additionally, the sharing functionality makes it easy for organizations to quickly share governed and secure data in real time. 

Knowing that executives and senior leadership wanted “better” reporting accuracy, but development staff wanted a more efficient way to accomplish these tasks, Strive had their work cut out for them. Our Data & Analytics team turned to the implementation of Strive’s proprietary ELT Accelerator to help build a dedicated reporting environment using Snowflake. We analyzed the Client’s current processes, conducted technical interview meetings, and identified opportunities to achieve the desired future state and enable a modern data integration platform capable of high-volume data transfer.

Using the ELT Accelerator, Strive was able to:

  • Set up an Azure Database for ETL Accelerator source tables and stored procedures
  • Customize Azure Data Factory Settings to support new environments
  • Create Scheduled pipeline for recurring data extraction to the Snowflake data lake

The Extract, Load, and Transform (ELT) process leverages built-in distributed query processing capabilities and eliminated resources needed for data transformation prior to loading, saving the Client headache and unnecessary spending on compute resources. Once our team fully implemented our ELT tool, we were able to help our Client load millions of rows of data, drive additional insights, and ultimately, enable the business to make better decisions.

So again, we ask you – is your organization drowning in data? By partnering with our team of trusted advisors, Strive can help you navigate the rough waters of any large data lake and calm the storm.

Interested in Strive’s ELT Accelerator?

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership here.

About Snowflake

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at Snowflake.com.

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership

Reduce Costs, Improve Query Performance, and Optimize Warehouse Usage with Snowflake

As organizations expand their Snowflake footprint, there comes a time to pose questions like ‘Are there opportunities to save on costs?’ or ‘Is our environment running efficiently?’ Fortunately, this is where Snowflake shines with several levers to pull to help maximize cost and performance. For those who are not familiar with Snowflake’s billing structure / platform, Snowflake bills for compute and storage independently. Follow along to learn about ways that help tune data environments to help reduce cost and improve query/general performance when working within the Snowflake Data Platform.

Putting it simply, organizations can tune environments through a variety of ways. Two powerful methods that can greatly impact cost and performance is through scaling up and scaling out.

SCALING UP:

What is scaling up? In an ideal world, queries (a request for data from one or more database tables) return in less than a second. However, in the real world, query response times can vary from less than a second to hours, even days, based on a number of factors. Size of the warehouse plays a key factor in the time it takes to process a query. Scaling is the processing power, or horsepower, used to process a query. Available warehouse sizing follow t-shirt sizes (XS, S, M, L, XL, etc.).

When is it best to scale up? Since scaling up increases processing power, it is best used when organizations want to improve the performance of complex queries. It’s worth mentioning that ‘larger’ does not always mean ‘faster’. For basic queries that are already executing quickly, you may not see performance improvements after resizing.

What is considered a complex query? The more tables, columns, and/or conditional logic specified in the query, the more complex the query is. An example of a complex query is a query that joins a frequently used fact table with 1 billion records, 10 years of history, and 5 dimensions. Different sized warehouses impact the time and effort used when processing data. For example,  if you used an extra small warehouse, the query returns in 20+ minutes. Small warehouse, the query returns in 12 minutes. Medium warehouse, query returns in as little as 5 minutes and so on…you get the drift.

How does this impact cost and benefit organizations at the enterprise level? One of the benefits most loved about Snowflake is the per-second billing model, meaning you can run larger warehouses and simply suspend the warehouse when not in use. Snowflake makes this easy with its auto-suspend capability. When the auto-suspend feature is disabled, costs can run high. It is recommended businesses set auto-suspend to a low value (e.g., 5 or 10 minutes or less).

Below is an example of where auto-suspend can be configured.

One of the many services Strive provides is a Health Check where one of our highly experienced consultants partner with a client to check this and a variety of other configurations in their environment to optimize, identify bottlenecks and issues, and ensure organizations are maximizing all features that Snowflake has to offer.

“2000 Simultaneous Queries against a 40 billion record table all return in under 10 seconds.” – Larry Querbach: Enterprise Data Architect, devon

SCALING DOWN:

Scaling out is where warehouses are added to create a multi-cluster warehouse (requires a Snowflake Enterprise Edition or higher).

When is it best to scale out? Concurrency, concurrency, concurrency. Multi-clusters are designed specifically for handling queuing and performance issues related to large numbers of concurrent users and/or queries. It’s helpful to remember that the difference between scaling up vs. out is that scaling out is best used for higher number of concurrent queries. When there are more queries submitted that can be processed, queries accumulate in the queue and wait to be run. An example of where scaling out can add value can be found here and is shown in the image below. The image shows the load of a warehouse from 1pm to 5 pm. The orange bars indicate that queries accumulated in the queue. Blue bars are queries that ran concurrently during this time frame. Remember, if the running query load is high or there’s queuing, consider starting a separate warehouse and moving queued queries.

Optimize Warehouse with Snowflake

How does this impact cost? Not auto-scaling out translates to delays in viewing key data points required to make quick decisions. Businesses can be stuck in a waiting room causing bottlenecks, which leads to time wasted. Auto-scaling out enables organizations and data departments to process more data at one time, getting data in your hands quicker to make data-led decisions and stay ahead of the fast-paced business environments seen today in our marketplace.

So, what have we learned?

Scaling up is best for complex queries. Scaling out is best for processing concurrent queries. Ways to reduce costs, improve query performance, and use warehouses effectively and efficiently are:

  1. Scaling out is also known as using multi-cluster warehouses. Scaling out is best when servicing high volume of concurrent queries.
  2. If you are using Snowflake Enterprise Edition (or a higher edition), all your warehouses should be configured as multi-cluster warehouses and with auto-scale mode. This enables Snowflake to automatically start and stop warehouses as needed saving on costs.
  3. Remember to enable auto-suspend and keep the number low between 5 to 10 minutes to avoid paying for usage when not in use.
  4. Larger warehouses do not always translate to faster query response times. Increasing the size of a warehouse helps complex queries, not so much for basic queries.
  5. Experiment with different types of queries and different warehouse sizes to determine the combinations that best meet your specific query needs and workload.

Interested in optimizing warehouse usage and reducing cost with Snowflake?

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership here.

About Snowflake

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at Snowflake.com.

Subscribe

Categories: blog, Data, Data & Analytics, Snowflake, Thought Leadership
/chroot/home/striveco/striveconsulting.com/html/wp-content/themes/starting-point/resources/views/index.blade.php