No Phishing in the Data Lake

How to Find and Mitigate Security Risks in Large Data Storage

For any business with a data strategy in place, the next step on the roadmap to data transformation is to capture all the structured and unstructured data flowing into the organization. To do so, organizations must create a data lake to store data from IoT devices, social media, mobile apps, and other disparate sources in a usable way.

What is a Data Lake?

A data lake, per AWS, is a centralized repository that allows an organization to store data as is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.

Data lakes differ from data warehouses in that data warehouses are like libraries. As data comes into a warehouse, it gets carefully filed according to a structured system that has been defined in advance, making it easy and quick to find exactly what you’re looking for given a specific request. In data lakes, there’s no defined schema, which means data can be stored without needing to know what questions may require answers in the future. As in an e-Bookstore, you can search generally and call all relevant results from various media types and make decisions based on machine learning recommendations and other people’s insights.

Many organizations are evolving their data storage to incorporate data lakes. However, maintaining any online information storage comes with security risks that must be identified and mitigated.

Security Risks and Consequences Within Data Lakes

Over the past few decades, improvements in compute power and storage space coupled with much more affordable storage prices have made it possible to store massive amounts of data in one place. Not long ago, storing a database of every citizen’s Social Security number would have been impractical—now it’s pennies on the dollars to store as a table in a data lake.

As much opportunity as large data storage provides organizations, it also creates risk. When vulnerabilities occur in repositories, their infrastructure, or any dependencies, the level of impact depends on the type and scale of the information that was compromised. Since data lakes have vast amounts in a single location, when breaches occur, the impact is often spectacular in size and in magnitude.

Common tactics hackers use to exploit enterprise data are Initial Access, Defense Evasion, and Credential Access. Kurt Alaybeyoglu, Senior Director of Cybersecurity and Compliance at Strive Consulting, says organizations often make a mistake by focusing too strongly on preventing Initial Access—a cybercriminal getting into the org’s network. Data lakes interact with so many sources that it doesn’t take network access to be able to cause damage.

“The two primary security risks in a data lake,” Kurt says, “are exfiltration ofand impact to sensitive data.” As the name suggests, data exfiltration is the unauthorized transfer of data. Attackers can either steal specific piece(s) of data or, more often, simply take a copy of an entire lake—akin to a burglar carrying away a safe so they can open it and rifle through its contents at their leisure. Data impact ranges from encrypting the data in the lake, to wiping it, corrupting it, or destroying the means of access to the platform.

Both tactics can, and have been proven to, be catastrophic for an organization’s survival.

Are Data Lakes Worth the Risk?

Facing such dire consequences in the event of a cyberattack, why do businesses choose to use data lakes? Conventional wisdom says not to keep all your eggs in one basket—compartmentalizing data to avoid total compromise is surely more secure. But for many, according to Kurt, the rewards of data lakes outweigh the risks.

“Being able to access massive data at your fingertips with simple queries is what allows modern apps to exist,” he explains. “Take Uber as an example. Uber, as a technology, completely disrupted the taxi service model. It got rid of the need for dispatchers because at its heart was software that acted as one, pairing users and drivers faster than most humans can. Their software functions because Uber created a data lake that contains information like riders, drivers, maps, payment information, etc. that allow all of these disparate aspects to function seamlessly”

While separating this data into different repositories may be more secure, it would take significantly longer times for the application to function, from running all the queries to payment processing, to time calculation for the ride—it would completely preclude the app’s usefulness. Not to mention the added complexity would make securing data just as-if not more-difficult.

“As security professionals, we have to try to mitigate those risks as best as possible,” he says. “At the end of the day, data security is a business function. Our job is to say ‘yes, we can do that, but here are the risks.’ Leaders must decide what they’re willing to pay to mitigate, what they’ll pay to transfer, and what risks they’re willing to accept.”

3 Ways to Prevent Security Breaches in the Data Lake

What makes data lakes so risky is that the valuable commodity, data, by necessity must be accessible, whether that’s to a platform, an end user, or someplace else. The data must be available in order to be useful. So, an organization’s top three focus points to protect that data are as follows:

  1. Rigorous access control: More people with unfettered access to the data lake means more potential entry points for a hacker to attempt to exploit. To secure the data lake, be thoughtful about who can access it and when. Validate those users’ identities using strong passwords and multi-factor authentication (MFA). If the data lake contains particularly sensitive information, consider more advanced hardware solutions such as FIDO2 keys.
  2. Regular vulnerability scanning and testing: Because data lakes and supporting platforms aren’t tied to a single device, hackers no longer need to achieve initial access to get ahold of the data. For most applications that interact with data lakes, a successful breach may only take a SQL or command injection that forces the system to respond with data it’s not supposed to—no device compromise needed. Because of that risk, proactively looking for the holes in a data lake’s security is paramount. Use a combination of application threat modeling, vulnerability scans, and application penetration testing to identify weak points, then remediate them quickly.
  3. Better detection through better training: “Data lakes are examples of what modern storage/compute allows us to do,” Kurt says. “We haven’t put the same level of effort and value into collecting audit logs to be able to make detection and analytics earlier in the cyberattack chain possible.” The answer? Staffing and training. Proactive threat detection comes from a skillset that knows what to investigate. “How do I collect audit logs from the platform? What logs should I collect? How do I determine when someone has accessed the data versus what’s just noise? That investigative mindset and skillset is in high demand and low supply,” says Kurt.

His suggestion to overcome the talent gap: Companies that rely on data lakes should build detection skillsets from within. It’s easier to pay to train a person who is well-versed in the inner workings of an organization’s platform that can build data security than it is to bring in a security generalist to work within an org’s data lake.

The advantage of training an internal employee is that they have the full view of the data product roadmap, which means they can start developing future updates on the platform that build security in from the ground up. That’s security by design—the brass ring of risk management in a data lake.

Where exploitable data exists, opportunists will try to access it. Data lakes provide organizations an incomparable ability to un-silo work, answer new questions by drawing information from diverse sources, and innovate technology that creates the next apex experience. For that reason, businesses must up level their investment in data security in concert with their investment in data storage and usability. On the data roadmap, that’s the ultimate step toward data transformation.

Protect your data. Protect your business.

Learn more about Strive’s cybersecurity services HERE, or set up a Launch Future State of Data Workshop to create your 1-3-year data vision plan HERE.

Adventures in Snowflake Cost Management

Pay for use is both an exciting and challenging aspect of using the Snowflake Data Cloud. I’ve lead workshops and proudly proclaimed “Snowflake has brought an end to capacity planning!” And it has.  You never have to figure out how much storage or processing power you are going to need. You don’t have to plan for three-year storage needs and hope that you’ve not bought too little. It’s a constant dance – but no more. With Snowflake you can just add whatever data you need and only pay for what you are using. The same is true for the query processing power. When Black Friday hits, you have power on demand and yet you’re not paying for power all year long.

Now budget planning? That’s a different story… Typically, you will have bought a certain size machine to run your database or contracted for a certain amount of cloud capacity…and whether you use it a little or a lot, you pay the same. When you see your Snowflake costs sky rocket, you’ll start to think about usage in ways you never had to before. 

Here are some tips for being more efficient with your Snowflake spend.

Think Small, Run Big

Thinking time and development time should be done on an x-small or small compute warehouse. When it comes time to run a job or a long query, that’s when you spin up a larger warehouse, run the job, and then shut the warehouse down. You have capacity on demand, so you will want to size your warehouse to optimize cost both in what Snowflake charges and in human capital. Why wait for 2 hours on a long job when you can run it in 15 minutes by using a warehouse 8 times the size? For the most part, you’ll see run times cut in half and the cost doubled at each size up. So, it’s cost neutral to use a bigger warehouse but saves human cost.

Sometimes even the Snowflake cost is saved by running a more expensive, larger warehouse. How so? If the compute warehouse is too small, it may have to spill data to local or even remote cache. Disk drives are a lot slower than ram. When you use a larger sized warehouse, you also get more ram. Thus, the query or load can complete so much faster that you are saving more than the extra cost of being large.

One Expandable Warehouse for All

It is typical for companies to assign each team or business unit their own warehouse. It’s one of the ways companies can manage cost charge-back. However, it’s inefficient to have multiple warehouses with their meters running up charges when a single shared warehouse will do. To handle overuse, you set it up as a multi-cluster that will spawn other instances when there is demand and shrink them when demand goes away. You use roles or tags to handle divvying up the shared cost across those using the warehouse.

Break Large Load Files Into Many Smaller Ones

Snowflake is a massively parallel database. Each node in a Snowflake warehouse cluster has 8 processes. A large sized warehouse has 8 nodes, 32 processes. If you try to load a single large file, only one of the processes is used. If you have the file broken up (Snowflake recommends 100-250mb chunks), then all 32 processes will work in parallel rocketing your loading performance.

Judicious Use of Cluster Keys

Snowflake builds micro-partitions when data is loaded. For 90% of the scenarios, you can just let Snowflake do its thing and you will get great performance. This is one of the ways Snowflake is so cost effective, it doesn’t take an army of tuning DBAs to operate. However, there are going to be times when you will need to put a cluster key on a table to get the performance needed. And poor performing queries cost extra money.

There was a 40 billion row table that was joined to a 3 billion row table in a view that brought reporting to its knees. Clustering both tables on the join keys enable the report to run in less than 2 minutes. For more information on clustering see Snowflake’s documentation.

Lift and Shift Still Needs Tuning

One of the common mistakes is to assume that “if it worked in the old system, it should work in Snowflake”. You will encounter performance issues (and thus cost issues) whose solution will not lay in adjusting Snowflake warehouses. 

Here are just some recent tuning scenarios I’ve encountered:

There was a data load that was running $500 to $1500 per day. 8 billion rows of inventory were loaded per day. Every item in every store across the world was scanned. The loading procedure used a MERGE.  So, 8 billion searches to find the right row and update the data. And yet, there was no history. Once the merge happened the current value was the only value. Thus, a merge wasn’t needed at all.  In effect, the table was a daily snapshot of inventory and the data coming in was all that was needed. Removing the merge took a process from 8 hours on a very expensive 64 node Snowflake warehouse to a couple minutes of a 32-node snowflake warehouse. A savings of $15k-$30k per month was realized.

Just because “the query worked on XYZ database” doesn’t mean everything is okay. A very expensive and long running query on Snowflake was fixed by discovering a cartesian join. When all the proper keys were added to the join, the query ran fast.

Oftentimes in mature systems – there are views built upon views built upon views. A slow report sent me spelunking through the “view jungle”. I discovered one of the views had a join to a table where no fields from that table were used plus a distinct. At a half billion rows, this unnecessary join and thus unnecessary distinct caused the performance problem.

The take away is that a good deal of the work will be taking a fresh look at the problem and not taking “the old system” as gospel for the new system.

Monitor the Spend

Snowflake has views to help you monitor cost and performance. They are located in the Snowflake database in the ACCOUNT_USAGE schema. If you have multiple accounts, the combined values are in the ORGANIZATION_USAGE schema. There are prebuilt Tableau, PowerBI, Sigma and other dashboards you can download. There is no substitute, however, for getting familiar with the views themselves.

Strive is a proud partner of Snowflake!

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.


Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.

How to Modernize A Data Strategy Approach

Modernizing your company’s data strategy can be a daunting task. Yet making this change — and doing it right — has never been more important, with torrents of data now dictating much of the day-to-day in many organizations.

Missing the boat on making this change now can hold your business back in meaningful ways down the line. Changing your approach to capturing, sharing, and managing your data can help you avoid many of the pitfalls that befall businesses today, such as duplicating data across the organization and processing overlaps.

Implementing an effective data strategy will enable you to treat data not as an accidental byproduct of your business, but an essential component that can help you realize its full potential. Setting out clear, company-specific targets will help you tackle these challenges effectively.

Before you embark on this journey, however, it is crucial to understand why you want to modernize and where you are now and identify the most efficient path to the finish line.

Strategic Vision – Future of Your Data

The first step is to define a vision for your own data modernization. Do you know why you want to modernize your data strategy and what your business can gain in the process? Do you have an aligned strategy and a clear idea of what a thriving  Data ecosystem will entail?

Defining your goals — whether that is to gain a better grasp of your data, enhance accuracy or take specific actions based on the insights it can provide — is paramount before initializing this process.

Equally essential is to ensure early on that executive leadership is on board, since overhauling your data strategy will require significant investment in time and resources. This will be needlessly difficult without full buy-in at the very top. Figuring out how better data management will tie in with your overall business strategy will also help you make your case to leadership.

Ways of Working – Operating Model

Next, you need to figure out how this modernization will take place and pinpoint how your operating structure will change under this new and improved system.

Setting out ahead of time how data engineers and data scientists will work with managers to shepherd this strategy and maintain it in the long run will ensure a smooth process and help you avoid wasting time and resources.

Identifying what your team will look like and gathering the required resources to implement this project will lead you directly into implementation.

Accessibility & Transparency — See the Data

Gaining access and transparency, at its core, is about implementing new systems so that you gain better visibility of the data you have. You want to make sure that your structured and unstructured content — and associated metadata — is identifiable and easy to access and reference.

Putting the infrastructure in place to ingest the data your business already creates, and format it in a way that lets you access it efficiently, might appear basic. But figuring out how to achieve this through data integration or engineering is a vital step and getting this wrong can easily jeopardize the entire project.

Data Guardianship — Trust the Data

Once you have brought your data to the surface, determining ownership within your organization will ensure both that accuracy is maintained, and that data is managed and updated within the correct frameworks. 

This includes applying ethical and data sharing principles, as well as internal governance and testing, so that you can ensure your data is always up-to-date and handled responsibly. Making sure that you can trust the data you are seeing is essential to guarantee the long-time benefits you are hoping to gain through data modernization in the first place. 

Plus, you can rest easy knowing that your reporting data is accurate instead of worrying about falling foul of external compliance metrics and other publication requirements.

Data Literacy — Use the Data

Tying back to your internal data management, literacy is all about making sure that you have the right skillsets in place to make savvy use of the insights you are gaining from your data.

You and your team need to make sure you are trained and equipped to handle this process both during implementation and once your new system is in place — so you can leverage the results in the best possible way and make it easier to access and share data throughout the company.

After all, making secure financial and operational decisions will depend on how much you trust in your own core capabilities. Ideally, a successful data management strategy will enable you to understand every part of your business. This applies not just internally, but also spans your customers, suppliers and even competitors.

Take the First Step with Strive

Our experts at Strive Consulting are here to help you assess whether you are ready to embark on this journey and provide you with a clear perspective of where you are, what’s involved, and how to get there. We are ready to walk you through this process and make sure the final product ends up in the right place, so you can be confident that your data is in safe hands — your own. Learn more about Strive’s Data & Analytics and Management Consulting practices HERE.

Contact Us

An Example of a Living Data Mesh: The Snowflake Data Marketplace

The enterprise data world has been captivated by a new trend: Data Mesh. The “What Is Data Mesh” articles have already come out, but in this publication, I want to highlight a live, in production, worldwide Data Mesh example – The Snowflake Data Marketplace.

As in every “new thing” that comes down the pike, people will change the definition to suit their purposes and point of view, and I am no different. Zhamak Dehghani, a Director of Emerging Technologies at ThoughtWorks, writes that Data Mesh must contain the following shifts:

  • Organization: From central controlled to distributed data owners. From enterprise IT to the domain business owners.
  • Technology: It shifts from technology solutions that treat data as a byproduct of running pipeline code to solutions that treat data and code that maintains it as one lively autonomous unit.
  • Value: It shifts our value system from data as an asset to be collected to data as a product to serve and delight the data users (internal and external to the organization).
  • Architecture: From central warehouses and data lakes to a distributed mesh of data products with a standardized interface. 

It is on this principal that I take departure and advocate the Snowflake Data Cloud. I believe that the advantages that have always been in a centralized data store can be retained, while the infinite scale of Snowflake’s Data Cloud facilitates the rest of the goals behind Data Mesh.

With so much to understand about the new paradigm and its benefits, or even grasping what an up and running Data Mesh would look like… to date, even simplified overview articles are lengthy. As I wrestled with coming to my own understanding of Data Mesh and how Strive could bring our decades of successful implementations in all things data, software development, and organizational change management to bear, I was hit by a simple notion. There is already a great example of a successfully implemented, world-wide, multi-organization Data Mesh – The Snowflake Marketplace.

There are more than 1,100 data sets from more than 240 providers, available to any Snowflake customer. The data sets from the market become part of the customer’s own Snowflake account and yet are managed and kept up to date by providers. No ETL needed and no scheduling. When providers update their data, it is updated for all subscribers. This is the definition of “data as a product”.

In effect, The Snowflake Data Cloud is the self-service, data-as-a-platform infrastructure. The Snowflake Marketplace is the discovery and governance tool within it. Everyone that has published data into the Marketplace has become product owners and delivered data as a product.

We can see the promised benefit of the Snowflake Marketplace as Data Mesh in this – massive scalability. I’m not speaking of the Snowflake platforms near infinite scalability, impressive as that is, however considering how every team publishing data into the market has been able to do so without the cooperation of another team. None of the teams that have published data have had to wait in line to have their priorities bubble up to the top of IT’s agenda.  A thousand new teams can publish data today. A hundred thousand new teams can publish their data tomorrow.

This meets the organizational shift from centralized control to decentralized domain ownership, and the data as a product, and technically with data and the code together as one product. 

Data consumers can go to market and find data that they need, regardless of which organization created the data. If it’s in the Snowflake Marketplace, any Snowflake customer can use the data for their own needs. Each consumer of the data will bring their own compute, so that nobody’s use of the data is impacting or slowing down the performance of another team’s dashboards.

Imagine that instead of weather data published by AccuWeather and financial data by Capital One – it’s your own organizations customer, employee, marketing, and logistics data. Each data set is owned by the business team that creates the data. They are the team that knows the data best. They curate, cleanse, and productize the data themselves. They do so on their own schedule and with their own resources. That data is then discoverable and usable by anyone else in the enterprise (gated by role-based security). Imagine that you can scale as your business demands, as new businesses are acquired, as ideation for new products occur. All facilitated by IT, but never hindered by IT as a bottle neck.

With Snowflake’s hyper scalability and separation of storage and compute, and its handling of structured, semi-structured, and unstructured data, it’s the perfect platform to enable enterprise IT to offer “data as self-serve infrastructure” to the business domain teams. From there, it is a small leap to see how the Snowflake Data Marketplace is, in fact, a living example of a Data Mesh with all the benefits realized in Zhamak Dehghani’s papers.

As a data practitioner with over 3 decades of my own experience, I am as excited today as ever to see the continuous evolution of how to get value out of data and deal with the explosion in data types and volumes. I welcome Data Mesh and the innovations it is promising, along with Data Vault 2.0, cloud data hyper-scale databases, like Snowflake, to facilitate the scale and speed to value of today’s data environment.

Strive is a proud partner of Snowflake!

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.


Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.

Exercising Data Governance Best Practices – How to Stay the Course

Have you ever planned to wake up early in the morning to work out, but instead chose to lie in bed and catch up on some sleep? This can happen even after you have committed—mentally, at least—to a new workout regimen.

That’s because the hard part isn’t resolving to do something new; it’s adjusting your daily habits and generating enough momentum to carry the changes forward. This requires discipline and drive.

The same challenges apply to data governance initiatives. If you have ever been part of a data governance program that hesitated, backfired or stopped completely in its tracks, you know what I’m talking about. Companies are accruing ever-increasing amounts of data and want to be able to transform all that information into insights the same way you want to get in shape. The first step is data governance, but getting your organization to buy-in to a new program conceptually is the easy part. Taking action and sticking to it can be much more challenging.

Indeed, many organizations believe that simply implementing technology—like a Master Data Management system—will improve the health of their data. But if you simply buy workout equipment, do you get healthier? Tools will help streamline your organizational processes and complement information governance and information management, but building and maintaining a culture that treats data as an asset to your organization is the key to ongoing success.

Below are some key factors to building good habits to generate momentum once your data governance program is underway:

1. Impart a sense of urgency for the program.

For every organization with a plan to manage its data assets, there needs to be a sense of urgency to keep the plan in place. The reasons are unique from organization to organization, but they might be driven by compliance, customer satisfaction, sales, revenues, or M&A. Regardless of the reason, it needs to resonate with senior leadership and ideally be tied to the company’s strategic goals in order to be most effective.

2. Communicate, communicate, communicate.

The cornerstone to a successful data governance program is a well-organized (cross-departmental) communication plan. A solid plan helps remove the silos and maintain cross departmental support for the initiative. Seek your champions throughout the organization and meet with key stakeholders regularly to document their pain points. It is important to get people engaged early to keep the excitement going.

3. Operationalize change within the organization.

Your delivery will need to be agile in nature because the plan you put in place will naturally evolve. The goal is to learn what works within your organization early on to ensure you deliver value quickly and the process is sustainable moving forward. Complete tasks iteratively and agree upon a small set of high-value data attributes to aid in validating your data governance process. In addition, manage your data elements to ensure their best quality.

4. Make the plan as RACI as possible.

Actively listen to your supporters and put together a plan that encompasses a RACI (Responsible, Accountable, Consulted & Informed) model so that everyone on the team knows their role across the process. This plan will keep your team focused and guide your initiatives moving forward. You’ll raise your odds of success by forming a strong governance organizational structure with roles and responsibilities in place (for data ownership, stewardship and data champions), along with approvals that complement your existing change management process.

4. Measure, Communicate, Repeat.

Keep in mind that “you can’t manage what you don’t measure.” You’ll need to face the facts and communicate your findings. It’s wise to document and implement KPIs (Key Performance Indictors) so that you can measure the progress of your initiative over time. Linking the KPIs to revenue or sales loss, for example, can be a strong indicator to help drive change. As you learn more about your data, it’s important to communicate what’s meaningful to your stakeholders and continue to move forward.

Similar to continuing on a workout regimen, data governance demands a discipline that takes time and patience to fine tune. This requires changing years of undisciplined behaviors regarding data within your organization, and the change will not happen overnight. Changing these behaviors is an ongoing process that needs to resonate throughout an organization’s culture in order for success to occur.

In addition, it’s important to keep things fresh. When working out, you need to rotate though different core muscle groups and vary the routine to keep things interesting and progressive.  It’s the same with data governance initiatives. Don’t let people get bored with the same repetitive activities day in and day out. Try conducting data discovery sessions where team members present findings from an internal or external dataset that would be interesting to other team members. You can also share successes and learnings from past data related projects to drive discussion.  Another suggestion is to discuss future cross-departmental data projects (or “wish list” items) that can lead into great data roadmap discussions.  The objective is to keep everyone engaged and finding value in meetings so that the team continues to show up and make progress.

Remember that data governance is a journey that requires commitment and hard work. As with exercise, just working out for a month is a great start, but it’s with continued dedication that you really start to notice the change. If you want to take your organization to the next level, you need to develop the discipline toward information management that your organization requires for long-term sustainable success. For those with little experience in implementing or maintaining a data governance plan, experienced consultants can be of great value.

Strive Can Help With Your Data Governance Needs! 

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in modern data integration or an overall data and analytics assessment, Strive Consulting is dedicated to being your partner, committed to success. Learn more about our Data & Analytics practice HERE.

Contact Us

Getting Change Management Right

Strategic initiative failure rates remain high but working with the right partner can yield success. As such, executives put tremendous resources into planning and implementing their transformative projects.

However, seasoned executives also know that the success of those projects rests on getting users to adapt to new technologies, new processes, and new ways of working as much as – if not even more so – than any other element of the endeavor.

Unfortunately, successful change remains elusive. The failure rate for all change initiatives has been stuck around 70% for the past two decades and remains there today. 1

Consider figures from Gartner, the tech advisory firm: Its research shows that only 34% of all organizational change initiatives are a clear success, while half are out-and-out failures. 2

Those figures tell only part of the story, though. Here at Strive Consulting, we’ve found that companies without internal Change Management teams generally experience even higher failure rates. Why? Because they have neither the deep knowledge, nor the experience and tools, to enable change.

As a result, these companies often use online tutorials that offer only highlights on the topic, or they rely on overly complex white papers that don’t provide guidance on tailoring a program to the organizations’ own unique needs.

Neither option delivers information on the concrete tools and techniques needed to effectively teach people how to work in new and different ways. Rather, they tend to focus on the psychology – how the end user feels about the changes – and share some generic guiding principles, such as the ‘importance of communication’.

In reality, Change Management is a specialized skill, and it is one that needs to be expertly adapted to each initiative and tailored to every organization to ensure success. Strive’s Change Management framework acknowledges that reality and brings together four critical elements that must be addressed for an organization to successfully navigate transformation.

Those four elements are:

  • Alignment and Engagement
  • Change Impact and Analytics
  • Communication
  • Readiness and Training

Our extensive experience in helping a broad range of clients steer their companies through change has allowed us to hone in on these key areas and build a Change Management framework that leverages each of them to the maximum effect. We’ll focus on five critical tools across three elements of our framework

Let’s look at the first element: Alignment and Engagement. This element ensures that we’re collaborating with the right people in the plan and that their goals and priorities are well understood. With our ‘Story for Change’ we ask five important questions: What is happening, why now, so what, how are we going to achieve this, and now what? Asking these questions and listening to responses from project leaders gives us and, more importantly, the organization a clear, precise understanding on where it wants to be at the end of the transformation. While collaborating with these same leaders, we group and assess different stakeholder cohorts on a 2×2 grid measuring one’s level of influence on success and one’s impact imposed. The Stakeholder Assessment is the backbone to tailoring change, considering that all cohorts are coming from very different starting points and have different roles within the broader future state.

Next, we’re looking at Change Impacts and Analytics. For this, Strive evaluates how someone’s responsibilities will change and by how much. With Change Analysis we document all unique impacts and map against the stakeholder cohorts, identifying whether groups will perceive the impact as positive, negative, or neutral. This lets us understand what users will feel about the changes they’re facing and develop the various engagement, communication, and training activities needed to build understanding, knowledge, and commitment. We also develop metrics that track adoption, so we can confirm success, as well as identify those cohorts who may need additional support.

In tandem, we’re planning necessary Communications. This is all about informing key stakeholders through integrated, targeted, and timely program messaging. It’s also about understanding how communication flows within an organization. We believe there must be a communication cascade strategy within any program undergoing change for it to successfully transform. So, top-level sponsors need to effectively communicate with their direct reports, and in turn those managers need to effectively convey messages to their teams. Moreover, a communication plan compliments this cascade of information for each audience. Communication timed appropriately, focused on the right message, and delivered via the right vehicle helps all parties understand the importance of transformation for the organization as a whole.

On top of all this, we evaluate Readiness and Training. While training is hyper-focused and can be niche, we’ll focus on readiness. Quantitative metrics showing before and after results tell a clear part of the story, but it is one-sided. Qualitative surveying helps leadership understand if, and by how much, do stakeholder cohorts and users understand why the change is taking place, are aware of the impacts to their day-to-day responsibilities, know where they go for resources, and believe the change is overall positive.

Now, none of these four framework elements works in isolation. Rather, we consider them all together. In fact, we factor them into the lifecycle of a broader Change Management approach, creating a timeline from start to go-live that includes markers along the way. This means planning, for example, what milestones should be achieved counting down from 90, 60, 30, 15, 7, and 1 day out.

The payoff for having a structured Change Management workstream is significant, with This alone shows the value of having a solid Change Management strategy in place and the importance of having a partner who can deliver such results.

Looking for sample deliverables? Or maybe a bit more information? Let’s Talk!  

Here at Strive, we take pride in our Management Consulting practice, where we can assist you in your initial digital product development needs, all the way through to completion. Our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your company’s growth strategy.

Have Your Data and Query It Too!

“Have your cake and eat it too.” How would it make sense to have cake and not be able to eat it? And yet, we have, for decades, had similar experiences with enterprise data warehouses. We have our data; we want to query it too!

Organizations spend so much time, effort, and resources building a single source of truth. Millions of dollars are spent on hardware and software and then there is the cleansing, collating, aggregating, and applying business rules to data. When it comes time to query… we pull data out of enterprise data warehouse and put it into data marts. There simply is never enough power to service everybody who wants to query the data.

With the Snowflake Data Cloud, companies of all sizes can store their data in one place – and every department, every team, every individual can query that data. No more need for the time, expense, effort, and delay to move data out of an enterprise data warehouse and into data marts.

The advance of the ‘data lake’ promised to be the place where all enterprise data could be stored. Structured, semi-structured, and unstructured data could be stored together, cost effectively. And yet, as so many soon found out – data ended up needing to be moved out to achieve the query performance desired. More data marts, more cost, and more time delay to get to business insight.

Snowflake solved this problem by separating data storage from compute. Departments and teams can have their own virtual warehouse, a separate query compute engine that can be sized appropriately for each use case. These query engines do not interfere with each other.  Your data science team can run massive and complex queries without impacting accounting team’s dashboards.

Snowflake does this having designed for the cloud from the ground up. A massively parallel processing database, Snowflake is designed to use the cloud infrastructure and services of AWS, quickly followed by Azure and GCP. Organizations get all the scalability promised by “Hadoop based Big Data” in an easy to use, ANSI Standard SQL data warehouse, that delivers the 5 V’s of big data (Volume, Value, Variety, Velocity and Veracity). Not to mention all of these benefits come with industry leading cost and value propositions.

Speaking of Variety… Snowflake has broken out of the “data warehouse” box and has become ‘The Data Cloud’. All your data types: structured, semi-structured and now, unstructured.  All your workloads: Data Warehouse, Data Engineering, Data Science, Data Lake, Data Applications, and Data Marketplace. You have the scalability in data volume and in query compute engines across all types of data and use cases.

With the Snowflake Data Cloud, you truly can have all your data and query it too. Extracting business value for all departments and all employees along the way.


Want to learn more about the Snowflake Data Cloud? 

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.

About Snowflake

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.

Contact Us

Why Choose Open-Source Technologies?

In 2022, almost every enterprise has some cloud footprint, especially around their data. These cloud platforms offer closed-source tools which, while offering many benefits, may not always be the best choice for some organizations. First and foremost, these proprietary services can be expensive. In addition to paying for the storage and compute needed to store and access data, you also end up paying for the software itself. You could also become locked into a multi-year contract, or you might find yourself locked into a cloud’s tech stack. Once that happens, it’s very difficult (and expensive) to migrate to a different technology or re-tool your tech stack. To put it simply, if you ever reach a roadblock your closed-source tool can’t solve, there may be no workarounds.

Since closed-source technologies can create a whole host of issues, open-source technologies may be the right choice. Open-source tech is not owned by anyone. Rather, anyone can use, repackage, and distribute the technology. Several companies have monetized open-source technology by packaging and distributing it in innovative ways. Databricks, for example, built a platform on Apache Spark, a big-data processing framework. In addition to providing Spark as a managed service, Databricks offers a lot of other features that organizations find valuable. However, a small organization might not have the capital or the use case that a managed service like Databricks aims to solve. Instead, you can deploy Apache Spark on your own server or a cloud compute instance and have total control. This is especially attractive when addressing security concerns. An organization can benefit from a tool like Spark without having to involve a third party and risk exposing data to the third party.

Another benefit is fine-tuning resource provisioning.

Because you’re deploying the code on your own server or compute instance, you can configure the specifications however you want. That way, you can avoid over-provisioning or under-provisioning. You can even manage scaling, failover, redundancy, security, and more. While many managed platforms offer auto-scaling and failover, it is never so granular as it is when you provision resources yourself.

Many proprietary tools, specifically ETL (Extract, Transfer, Load) and data integration tools, are no-code GUI based solutions that require some prior experience to be implemented correctly. While the GUIs are intended to make it easier for analysts and less-technical people to create data solutions, more technical engineers can find it frustrating. Unfortunately, as the market becomes more inundated with new tools, it can be difficult to find proper training and resources. Even documentation can be iffy! Open-source technologies can be similarly peculiar, but it’s entirely possible to create an entire data stack – data engineering, modeling, analytics, and more – all using popular open-source tech. These tools will almost certainly lack a no-code GUI but are compatible with your favorite programming languages. Spark supports Scala, Python, Java, SQL and R, so anyone who knows one of those skills can be effective using Spark.

But how does this work with cloud environments?

You can choose how much of the open-source stack you want to incorporate. A fully open-source stack would simply be running all your open-source data components on cloud compute instances: database, data lake, ETL, data warehouse, and analytics all on virtual machine(s). However, that’s quite a bit of infrastructure to set up, so it may make sense to unload some parts to cloud-native technologies. Instead of creating and maintaining your own data lake, it would make sense to use AWS S3, Azure Data Lake Storage gen2, or Google Cloud Storage. Instead of managing a compute instance for a database, it would make sense to use AWS RDS, Azure SQL DB, or Google Cloud SQL and use an open-source flavor of database like MySQL or MariaDB. Instead of managing a Spark cluster, it might make sense to let the cloud manage the scaling, software patching, and other maintenance, and use AWS EMR, Azure HDInsight, or Google Dataproc. You could also abandon the idea of using compute instances and architect a solution using a cloud’s managed open-source offerings: AWS EMR, AWS MWAA, AWS RDS, Azure Database, Azure HDInsight, GCP’s Dataproc and Cloud Composer, and those are just data-specific services. As mentioned before, these native services bear some responsibility for maintaining the compute/storage, software version, runtimes, and failover. As a result, the managed offering will be more expensive than doing it yourself, but you’re still not paying for software licensing costs.

In the end, there’s a tradeoff.

There’s a tradeoff between having total control and ease of use, maintenance, and cost optimization, but there is a myriad of options for building an open-source data. You have the flexibility to host it on-premises or in the cloud of your choice. Most importantly, you can reduce spend significantly by avoiding software licensing costs.


Interested in Learning More About Open-Source Technologies? 

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in modern data integration or an overall data and analytics assessment, Strive Consulting is dedicated to being your partner, committed to success. Learn more about our Data & Analytics practice HERE.

Contact Us

5 Key Concepts in Design Thinking for Visual Analytics

Design Thinking for visual analytics is a proven framework that puts the needs of end users at the forefront of development, enabling organizations to fail-fast, iterate, and design analytical solutions that can scale with excellence and sustain with improvement. Design thinking enables organizations to introduce agile ways of working, data fluency, value creation, and storytelling with data.

Some key concepts involved in Design Thinking:

  • Visualization & KPIs
  • Personas
  • User Journey Mappings
  • Conceptual Data Modeling
  • Wire-framing/Prototyping

Visualization & KPIs  

Visual analytics are essential for enabling users to take action and make data-driven decisions through insights and storytelling. Although visualization is at the forefront of most reporting products, there is a broad spectrum of needs and analytics use cases across any business, all of which are important. Visualizations are great, but a visualization is only effective if there is clear alignment on KPIs and how they can be leveraged. Developing KPIs is both an art and a science. The objective is to identify measures that can meaningfully communicate accomplishment of key goals and drive desired behaviors. Every KPI should relate to a specific business outcome with a performance measure. Unfortunately, KPIs are often confused with business metrics. Although often used in the same spirit, KPIs need to be defined according to critical or core business objectives, while metrics can provide insights that may feed into KPIs. KPIs and metrics can be at the organizational level and trickle down to other functional areas, as seen in the example below with Sales. Therefore, defining personas is a good exercise to understand the different needs of users across an organization.

Visualization & KPIs

Persona Development 

What is a Persona?

A persona is a fictional character, rooted in research, that represents the needs and interests of your customer. It is created to represent a segment of users that might leverage a product in a similar way. Personas facilitate visualization of the user and create empathy with users throughout the design process.

Why do we want Personas?

Developing personas helps gain an understanding for how different users within an organization leverage analytics. This is integral in designing user-centric applications that are organized by user’s needs with relevant content, combined from different sources.

What is a good persona?

A good persona accurately represents the user experience for a single or group of users taking into consideration the user needs to formulate requirements, design context, and complexity of situational behaviors.

By developing personas and understanding the needs of users, we can leverage different approaches to design analytics to guide the user through their desired experience. Whether it’s creating actionable KPIs to measure performance/progress, or enabling the user to self-serve through a guided analytics experience, understanding their analytical needs will help drive the design of the solution.

User Journey 

One of the underlying principles of design thinking is putting the user’s needs first when designing & developing applications. Understanding the empathy of the user by mapping moments of frustration and delight throughout their analytical journey will help formulate the best experience possible.

Conceptual Data Model  

A conceptual data model is a visual representation of concepts and rules that convey the makeup of a system or organization. Conceptual data models are key for defining, organizing, and prioritizing concepts and informational needs of the business.


Humans are innately visual creatures and often struggle to articulate their needs. Wireframing and prototypes are visual representation that define the experience with reporting and analytics and will visually depict the requirements or needs of a user in preparation for development.

What is it good for?

  • Makes Things Tangible: Helps with visualizing the concept and engaging stakeholders around a product vision
  • Enables Collaboration: Customer/User feedback can be taken into consideration before development begins.
  • Saves Time: Increases the speed of shared understanding and provide guidance to the development team
  • Supports User Testing: Supports usability test iterations to get insight from actual users of the product.

What is the process of wireframing?

A good iterative design process increases fidelity at each step to get closer to the final product that satisfies the needs of a user.

Helpful Tips for wireframing:

  • You don’t have to be an artist.
  • Keep it simple – Sketches, wireframes are meant to convey information.
  • Short Sharp Annotations – Drawings and sketches help articulate ideas, but annotations – explanations – callouts are necessary to explain functionalities and concepts.
  • Encourage Feedback – Feedback is necessary to iterate, refine, improve the design and engage stakeholder around the product vision.


Design thinking is a framework that can be applied to almost every user-centric application. The biggest value an organization can recognize by instilling design thinking principles is understanding the needs and empathy of users as they begin to adopt analytics to enable a data-driven culture. If you’re curious how design thinking can be applied to your organization’s visual analytics and products, Strive has proven, strategic offerings to help you achieve your desired goals.

Interested in Design Thinking? What about Data & Analytics?

You’re in luck! Strive Consulting helps companies compete in a data-driven world. We turn information into insight through powerful analytics and data engineering and our Data & Analytics specialists create new opportunities for our clients out of untapped information tucked away across your business. Whether it’s capturing more market share or identifying unmet customer needs, effectively mining, and utilizing the data available to you will help you make faster, more informed decisions and keep pace with today’s rapidly changing business world. Click here to learn more about our D&A practice.


How Snowflake Saved My Life

Let me tell you a story of how the Snowflake Data Cloud saved my life. I know it sounds dramatic, but just hear me out.

A few years ago, I worked with a multi-billion-dollar wholesale distributor that had never implemented a data warehouse. Their main goal? Consolidate all data into one location and enable KPI roll-ups from across their disparate systems. However, in this case, they did not want to invest in additional licensing. So, my team set about building a traditional data warehouse leveraging their current platform, SQL Server. Initially, it was a successful four-layer architecture with Staging, Consolidation, Dimensional Model, and Tabular Cubes, with the end visualization solution being Power BI… but within a few months, issues began to surface.

The number of sources feeding into this platform had increased dramatically and this increase started to impact load times. Initially, the batch load processes were running between two and three hours, but over time increased to taking 5, 6, sometimes 7 hours to run! We needed a long-term solution, but in the short term, keep the platform running to deliver data to the organization.

What we were experiencing were challenges with Constraints, Indexing, Locks, Fragmentation, etc… To mitigate these issues, I personally took the step of waking up every morning at 3:00AM to log in and ensure certain process milestones successfully completed in a timely manner. If those milestones were not achieved, the batch process would either stall, fail, or run excessively long and the last thing I wanted was to explain to the business why they were not going to have data until 9:00, 10:00, 11:00AM. After a couple weeks of doing this, it became apparent – we needed a better solution, and fast!

In the past, I had some experience with Big Data platforms, but decided to research options outside of established technologies, such as Cloudera or Hadoop-based solutions and instead looked into something new – Snowflake. Snowflake is the world’s largest and most efficient data management platform, where organizations can access, share, and maintain their data, so I thought, why not? Let’s give it a shot!

We set up a proof of concept initially trying to mimic the 4-layer architecture we had set up in SQL Server. After seeing limited success, as well as being laughed at for even trying it, we took a step back, reevaluated our approach, and flipped the architecture from ‘Extract Transform Load’ toward ‘Extract Load Transform’.. And… Eureka! With this change, we were able to reduce overnight batch runtimes from the 5, 6, 7 hour SQL Server to less than 20 minutes. In fact, our average runtimes for our load processes were around 17 minutes, but now I’m just showing off.

Not only did this have an incredible effect on our ability to deliver data in a timely manner, but it also enabled an increase in the frequency in which we processed data. You see, with the SQL Server we were never able to update data more than once a day, but with Snowflake, we could run the batch process every 20 minutes and quickly deliver requested changes to the models, measures, and dimensions.

The implementation process went from taking weeks to taking days, or even hours, resulting in some very happy stakeholders. With these results, coupled with the fact that I no longer had to wake up at 3:00AM to verify successful batch processes…Snowflake truly saved my life.

Want to learn more about the Snowflake Data Cloud? 

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership here.

About Snowflake

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at