Correlation Vs Causation

By now we have probably all heard the old adage, “Correlation does not equal causation.” But what does this mean for the field of data science? Often, businesses are trying to solve complex business problems with machine learning, but machine learning is not always the best solution, especially for evaluating interventions. While machine learning is a great tool that has many applications, the issue is that the relationships between variables that are found through machine learning models are correlations, not causations.  So, if you just need an accurate output without needing to understand the underlying factors causing that output, machine learning may be for you! In other scenarios, if you are trying to evaluate a business decision or action and the impact it had on revenue or other key metrics – what you are really trying to understand is the causal relationship between your intervention and the resulting outcome. This analysis is better suited for causal inference, which I will demo in this blog.

Let’s suppose that you work for Volusia County government in Florida (the shark attack capital of the world). One of your tasks is to reduce the incidence of shark attacks that occur on Volusia beaches. A Data Analyst is giving a presentation and shows the following chart:

Additionally, the analyst has an algorithm that can predict shark attacks with ~95% accuracy using ice cream sales as one of the predictor variables. You wonder how knowing this information and having an algorithm helps you reduce the incidence of shark attacks. Furthermore, one of your coworkers exclaims, “That’s it! We should ban the sale of ice cream on our beaches! Clearly it is causing shark attacks!”. Immediately, you are skeptical. It doesn’t seem like ice cream would have any impact on shark attacks. And your instincts are correct. Something else is occurring here. The answer lies within confounding variables. Ice cream consumption and the incidence of shark attacks both occur more often in warmer temperatures since people swim on the beach when it is warm outside. The confounding variable here is the temperature outside. A confounding variable is any variable that you’re not investigating that can potentially affect the outcomes of your research study and is exactly the reason why correlation does not equal causation!

So back to the issue at hand, how do we reduce the incidence of shark attacks? One hypothesis would be that increasing the number of life guards on duty would allow sharks to be spotted quicker and we would be able to get people out of the water faster – before they are attacked by sharks. So, you want to know if the increase in the number of lifeguards last summer was the reason shark attacks were reduced, and, if it is, you could further reduce shark attacks by securing funding to hire more lifeguards. But, how do we make sure that we take into account possible confounding variables and that the observed decrease in shark attacks wasn’t due to chance? Enter causal inference to save the day!

There are multiple ways we can get control for confounding variables. Three methods that are regularly used are:

  1. Back-door criterion
  2. Front-door criterion
  3. Instrumental variables

The back-door and front-door criterion comes from Judea Pearl’s do-calculus that you can read about in his book, “Causality: Models, Reasoning and Inference.” Instrumental variables were introduced as early as 1928 by Phillip Wright and are frequently used in econometrics.

Back-door Criterion

This method requires that there are no hidden confounding variables in or outside of the data. In other words, we cannot have any variables that influence both the intervention and the outcome that we haven’t controlled for. It’s not always possible to rule out every possible confounding variable, but with proper hypotheses, we can be reasonably certain.

Correlation Vs Causation
Figure 1. Temperature is a confounding variable in this scenario. It is correlated to the intervention and causally related to the outcome. We would use the back-door criterion in this case to control for it.

Front-door Criterion

You can have a hidden confounding variable with this method as long as you have a third, mediating variable that mediates the effect of the intervention on the outcome and the mediating variable is not impacted by the confounding variable (Ex: level of alertness is a mediating variable between intervention lack of sleep and outcome academic achievement). 

Correlation Vs Causation
Figure 2. In this scenario, we don’t need to control for temperature since we have a mediating variable. The number of lifeguards determines how many stands there will be, and temperature does not impact how many stands there are. We can use the front-door criterion to measure the impact of the amount of life guards on the incidence of shark attacks.

Instrumental Variables

You can also have a confounding variable as long as you have a third variable that is correlated with the intervention, is not correlated with the outcome, and is not impacted by the confounding variable (Ex: if you want to know the effect of classroom size on test scores you would need to find a variable that is highly correlated with classroom size but wouldn’t have an impact on test scores and is not impacted by confounding variable school funding and resources). These can be hard to come by.

Correlation Vs Causation
Figure 3. If you had variable z that was correlated with number of lifeguards, and not correlated with the increase in Shark Attacks and not directly impacted by temperature, you could use the instrumental variables method to measure the impact of life guards on shark attacks.

ATE and CATE

At this point we should take a step back and understand ATE, CATE, and counterfactuals. Often times, we don’t just want to know if the intervention was statistically significant and successfully caused the outcome we are interested in. We also want to know the magnitude, or by how much, our intervention caused an outcome. In our shark attack example, we would want to know how many shark attacks we prevented by increasing the number of life guards. This is called the ATE or the Average Treatment Effect. If we wanted to know how our intervention impacted different beaches, then we would use the Conditional Average Treatment Effect (CATE), which just tells us the average treatment effect for a subset of the population. The ATE is calculated by taking the difference between the outcome with the intervention and the outcome without the intervention. So, in this example, we would take the difference between the outcome of increasing life guards and the outcome of not increasing life guards. But if we can only give one intervention at a time (we can’t simultaneously increase and not increase life guards), how can we know the outcome of the intervention that the beach did not get? This is calculated by counterfactuals. Counterfactuals are things that did not happen, but could have happened (Ex: Joe got the treatment and recovered in 10 days, but the counterfactual outcome is Joe not getting the treatment). I will not get into the weeds of how this is calculated, but at a high level it is calculated using covariates.

Estimates

Once we know whether we need the ATE or the CATE and we know which method we are using to control for confounding variables, then we can identify the method we will use to calculate our ATE. Typically, if we have low dimensionality/complexity in our data, we can use simple methods like matching, stratification, propensity matching, inverse propensity weighting, and the Wald estimator. If we want to calculate the CATE or we have high dimensional/complex data, we can use more advanced ML methods such as Double ML, Orthoforests, T-Learners, X-Learners, and Intent to Treat Driv. Backdoor methods we could use would be linear regression, distance matching, propensity score stratification, propensity score matching, or weighting. Instrumental Variable methods we could use would be Wald estimators and regression discontinuity. If front door criterion is met we could use a two stage regression. This is not an exhaustive list, but a list of potential methods we could use to calculate the ATE.

Refutation

Once we have calculated the ATE or the CATE, we have one more step to perform. This is the refutation step. Refutation tests check the robustness of the estimate. This is essentially a validation test that looks for violations in our assumptions when we calculated our estimates. Some refutation tests we can do to check the strength of our causal relationship are:

  • Adding a random cause variable to see if that significantly changes the ATE/CATE
  • Replacing interventions with random (placebo) variables to see if that significantly changes the ATE/CATE
  • Removing a random subset of the data and see if that significantly changes the ATE/CATE

If these refutation testes come back insignificant (above .05), then the intervention is likely significantly causal.  

Summary

Causal inference is much better suited for many problems that businesses face than a machine learning model. Furthermore, it allows us to identify and quantify a causal relationship between our intervention and the outcome we observe. You can see that going a step above identifying correlations and identifying causal relationships can be a very impactful exercise. If we know that increasing life guards is not just associated with a reduction in shark attacks, but that it causes this reduction in shark attacks, it gives us a direct action we can take to achieve our goal of reducing shark attacks (increasing life guards).

Interested in Causal Inference?

Strive’s Data & Analytics team can help you identify causal relationships in your business. Want to know if the marketing strategy rolled out in Q3 caused an increase in customers and revenue in Q4? Would you like to know if implementing a more robust PTO policy could decrease employee churn? No matter what causal question you have, we are happy to help! Our team of data analysts and data scientists are uniquely positioned to help you take action that will allow your business to reach its goals and beyond! Let us uncover valuable insights that will help your company today!

Contact Us

No Phishing in the Data Lake

How to Find and Mitigate Security Risks in Large Data Storage

For any business with a data strategy in place, the next step on the roadmap to data transformation is to capture all the structured and unstructured data flowing into the organization. To do so, organizations must create a data lake to store data from IoT devices, social media, mobile apps, and other disparate sources in a usable way.

What is a Data Lake?

A data lake, per AWS, is a centralized repository that allows an organization to store data as is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.

Data lakes differ from data warehouses in that data warehouses are like libraries. As data comes into a warehouse, it gets carefully filed according to a structured system that has been defined in advance, making it easy and quick to find exactly what you’re looking for given a specific request. In data lakes, there’s no defined schema, which means data can be stored without needing to know what questions may require answers in the future. As in an e-Bookstore, you can search generally and call all relevant results from various media types and make decisions based on machine learning recommendations and other people’s insights.

Many organizations are evolving their data storage to incorporate data lakes. However, maintaining any online information storage comes with security risks that must be identified and mitigated.

Security Risks and Consequences Within Data Lakes

Over the past few decades, improvements in compute power and storage space coupled with much more affordable storage prices have made it possible to store massive amounts of data in one place. Not long ago, storing a database of every citizen’s Social Security number would have been impractical—now it’s pennies on the dollars to store as a table in a data lake.

As much opportunity as large data storage provides organizations, it also creates risk. When vulnerabilities occur in repositories, their infrastructure, or any dependencies, the level of impact depends on the type and scale of the information that was compromised. Since data lakes have vast amounts in a single location, when breaches occur, the impact is often spectacular in size and in magnitude.

Common tactics hackers use to exploit enterprise data are Initial Access, Defense Evasion, and Credential Access. Kurt Alaybeyoglu, Senior Director of Cybersecurity and Compliance at Strive Consulting, says organizations often make a mistake by focusing too strongly on preventing Initial Access—a cybercriminal getting into the org’s network. Data lakes interact with so many sources that it doesn’t take network access to be able to cause damage.

“The two primary security risks in a data lake,” Kurt says, “are exfiltration ofand impact to sensitive data.” As the name suggests, data exfiltration is the unauthorized transfer of data. Attackers can either steal specific piece(s) of data or, more often, simply take a copy of an entire lake—akin to a burglar carrying away a safe so they can open it and rifle through its contents at their leisure. Data impact ranges from encrypting the data in the lake, to wiping it, corrupting it, or destroying the means of access to the platform.

Both tactics can, and have been proven to, be catastrophic for an organization’s survival.

Are Data Lakes Worth the Risk?

Facing such dire consequences in the event of a cyberattack, why do businesses choose to use data lakes? Conventional wisdom says not to keep all your eggs in one basket—compartmentalizing data to avoid total compromise is surely more secure. But for many, according to Kurt, the rewards of data lakes outweigh the risks.

“Being able to access massive data at your fingertips with simple queries is what allows modern apps to exist,” he explains. “Take Uber as an example. Uber, as a technology, completely disrupted the taxi service model. It got rid of the need for dispatchers because at its heart was software that acted as one, pairing users and drivers faster than most humans can. Their software functions because Uber created a data lake that contains information like riders, drivers, maps, payment information, etc. that allow all of these disparate aspects to function seamlessly”

While separating this data into different repositories may be more secure, it would take significantly longer times for the application to function, from running all the queries to payment processing, to time calculation for the ride—it would completely preclude the app’s usefulness. Not to mention the added complexity would make securing data just as-if not more-difficult.

“As security professionals, we have to try to mitigate those risks as best as possible,” he says. “At the end of the day, data security is a business function. Our job is to say ‘yes, we can do that, but here are the risks.’ Leaders must decide what they’re willing to pay to mitigate, what they’ll pay to transfer, and what risks they’re willing to accept.”

3 Ways to Prevent Security Breaches in the Data Lake

What makes data lakes so risky is that the valuable commodity, data, by necessity must be accessible, whether that’s to a platform, an end user, or someplace else. The data must be available in order to be useful. So, an organization’s top three focus points to protect that data are as follows:

  1. Rigorous access control: More people with unfettered access to the data lake means more potential entry points for a hacker to attempt to exploit. To secure the data lake, be thoughtful about who can access it and when. Validate those users’ identities using strong passwords and multi-factor authentication (MFA). If the data lake contains particularly sensitive information, consider more advanced hardware solutions such as FIDO2 keys.
  2. Regular vulnerability scanning and testing: Because data lakes and supporting platforms aren’t tied to a single device, hackers no longer need to achieve initial access to get ahold of the data. For most applications that interact with data lakes, a successful breach may only take a SQL or command injection that forces the system to respond with data it’s not supposed to—no device compromise needed. Because of that risk, proactively looking for the holes in a data lake’s security is paramount. Use a combination of application threat modeling, vulnerability scans, and application penetration testing to identify weak points, then remediate them quickly.
  3. Better detection through better training: “Data lakes are examples of what modern storage/compute allows us to do,” Kurt says. “We haven’t put the same level of effort and value into collecting audit logs to be able to make detection and analytics earlier in the cyberattack chain possible.” The answer? Staffing and training. Proactive threat detection comes from a skillset that knows what to investigate. “How do I collect audit logs from the platform? What logs should I collect? How do I determine when someone has accessed the data versus what’s just noise? That investigative mindset and skillset is in high demand and low supply,” says Kurt.

His suggestion to overcome the talent gap: Companies that rely on data lakes should build detection skillsets from within. It’s easier to pay to train a person who is well-versed in the inner workings of an organization’s platform that can build data security than it is to bring in a security generalist to work within an org’s data lake.

The advantage of training an internal employee is that they have the full view of the data product roadmap, which means they can start developing future updates on the platform that build security in from the ground up. That’s security by design—the brass ring of risk management in a data lake.

Where exploitable data exists, opportunists will try to access it. Data lakes provide organizations an incomparable ability to un-silo work, answer new questions by drawing information from diverse sources, and innovate technology that creates the next apex experience. For that reason, businesses must up level their investment in data security in concert with their investment in data storage and usability. On the data roadmap, that’s the ultimate step toward data transformation.

Protect your data. Protect your business.

Learn more about Strive’s cybersecurity services HERE, or set up a Launch Future State of Data Workshop to create your 1-3-year data vision plan HERE.

What is Tech Debt and why you need to pay it down

As an executive, you know to keep a close eye on company finances, including whether, when and how much debt the company can handle yet remain fiscally healthy.

Unfortunately, most businesses do not track tech debt and the negative impacts it has to their business. As a result – many businesses make the mistake of believing that managing their technical debt is a simple matter of good housekeeping – a “keep the lights on” operation isolated to their IT departments.

Nothing could be further from the truth.

Technical debt is anything and everything that slows down or hinders development.

Hindered development will lead to problem that cascade through entire company while it slows down improvements in every part of the business with a stake in that development. This often means the *entire* company. Tech debt will delay the delivery of all new features.

That’s not the worst of it.

Like other kinds of debt, tech debt accrues interest and the problems it causes compound over time. It’s like a tax that’s added to everything. Companies with an unacceptable level of tech debt pay the price with slower time to market, bloated production costs, higher IT support costs, and a high level of complexity that draws IT’s time away from more transformative work. Companies with a lot of tech debt also experience high churn rates among its developers, who face frustrating limits on the work they can do because of that tech debt.

Such companies often add more developers, erroneously thinking that if they just turn out more code, they can keep pace with the new features and functions they need to compete. Or they spend increasing amounts of time and IT budget in a game of Whack-A-Mole as they try to fix all the problems that keep popping up.

To help explain the issue, we sometimes equate tech debt to old factory equipment with broken bits that aren’t ever replaced. The factory owner can add more workers to the production line and tack on more pieces to the equipment, but neither approach fixes the underlying mechanical problems. Neither approach makes the equipment run right, and neither will help the factory produce more or better widgets. Ultimately, the only way to make that factory efficient and agile is to fix the broken parts.

It’s the same with technical debt.

But here’s the other challenge with technical debt: In most organizations, it’s something that’s invisible, unmeasured and unquantified. As a result, it typically remains unaddressed.

In fact, most organizations don’t have the slightest idea how much debt they carry. That’s why both young and old companies can see product and engineering costs way above what’s optimal: They have more technical debt than they even know.

As an executive, you may think it’s time to clear the technical books and get rid of all the tech debt that exists within your company. That, though, isn’t the goal. The reality is that there’s always going to be some amount of it.

Instead, here at Strive, we recommend that you tackle the problem by first understanding how much tech debt you have and then whether it’s an acceptable amount or whether it’s so much it’s slowing you down.

Tech debt fits into the overall methodologies to invest in and borrow against in order to be agile and move quickly. We’ve been developing the complex model to quantify tech debt. This allows us to identify and measure its financial impact, considering opportunities that get taken off the table and business risks introduced due to that debt

This approach helps everyone on the executive team and the board understand the importance of paying down its technical debt.

In addition to offering this standardized model for assessing and measuring the ongoing cost of technical debt, we advise clients on ways to pay down that debt and how to design, build and implement debt-free alternatives moving forward.

We know this approach is effective. We worked with one client, a services company, to measure its tech debt and quantify how it was impacting its agility. Executives there were able to see how resolving software bugs and other problems in existing code would unblock product teams and make them both faster and more responsive to evolving market needs. So instead of accumulating more debt, they actually saw better returns.

Companies that have paid down their tech debt and now have a good credit score are well positioned to compete against any upstarts that enter their market. They also require fewer resources to produce superior outcomes and to produce them quickly.

That’s a message the board and your shareholders will want to act on – especially now, as companies increasingly feel the pressure to be as efficient as possible in the face of a constricting economy.

Do you have Tech Debt?

Here at Strive, we take pride in our Technology Enablement practice, where we can assist you in understanding and mitigating your organizations tech debt. Our subject matter experts team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Click to learn more about our Technology Enablement capabilities and how we can help.

An Ode to Project Managers

Today is an opportunity to recognize advancements in Project Management, as well as the increased role it plays in corporate environments.

There have always been Project Managers, even if there wasn’t a formal title or official responsibilities. However, it is interesting to note that the formalization of the ‘Project Manager’ role, and specific knowledge areas they tackle, came to fruition in the 1950’s and followed closely with the introduction of the personal computer and microprocessors in the 1980’s. As more and more people became users, the demand for the new functionality and position increased. More competitors meant faster deployment to stay ahead of the marketplace! Increased competition also forced quality requirements – if your software didn’t function well, other competitors would. Furthermore, the introduction of the cloud and virtual replacements for things that used to be physical meant projects moved faster and with an expectation of high quality that had never been seen before. Enter the “PM”.

Project Management in our current environment requires not only faster delivery, but also identifying the best projects to benefit an organization and their customers. This drive has spawned a multitude of delivery methodologies and highlighted the need for formal approval processes and prioritization within companies. Project Management is the natural group to take on the prioritization and therefore Project Management Offices (PMO) have grown in significance within corporations. Many companies employ the PMO to own the process of intake and approval and when these projects are aligned with corporate strategy, that prioritization process may determine whether the company meets its goals for the year or not. This expanded corporate impact and influence means Project Management is now an integral part of corporate success.

Project Managers as a whole are organized, cool under pressure, and have impeccable collaboration and communication skills. They are innate leaders in their organization and have the technical expertise to ebb and flow between multiple happenings at once. Great negotiation skills and the ability to stand up and manage multiple teams are incredibly important in any organization and luckily, Project Managers do just that. Any Consulting company worth their salt is filled with PM’s and has a knack at enabling and training them as well.

On this International Project Management Day be proud of the impact you make on your respective companies and clients. Continue to grow your leadership skills and always be on the lookout for improvements to ensure Project Management remains relevant in an ever-changing landscape.

Happy Project Management Day to all the members of the Project Management community!

Learn More About Strive’s Project Management Capabilities

Our Management Consulting teams roll up their sleeves and utilizes their expertise, best practices, and firsthand experience to optimize business processes, deliver on strategic initiatives, launch new products, and create sustainable growth for our clients. We know there’s a job to do and we’re eager to do it.

In an era of rapidly changing business and market requirements, the right kind of thought leadership and experience is needed to guarantee customer success. Our Delivery Leadership experts work with key stakeholders to devise, plan, and execute your organization’s most transformational projects and programs. Our Delivery Leadership services includes Portfolio, Program, and Project Management. Strive helps your organization better manage strategic initiatives by identifying and implementing a set of processes, methods and technologies best-suited to meeting your organization’s business objectives.

Adventures in Snowflake Cost Management

Pay for use is both an exciting and challenging aspect of using the Snowflake Data Cloud. I’ve lead workshops and proudly proclaimed “Snowflake has brought an end to capacity planning!” And it has.  You never have to figure out how much storage or processing power you are going to need. You don’t have to plan for three-year storage needs and hope that you’ve not bought too little. It’s a constant dance – but no more. With Snowflake you can just add whatever data you need and only pay for what you are using. The same is true for the query processing power. When Black Friday hits, you have power on demand and yet you’re not paying for power all year long.

Now budget planning? That’s a different story… Typically, you will have bought a certain size machine to run your database or contracted for a certain amount of cloud capacity…and whether you use it a little or a lot, you pay the same. When you see your Snowflake costs sky rocket, you’ll start to think about usage in ways you never had to before. 

Here are some tips for being more efficient with your Snowflake spend.

Think Small, Run Big

Thinking time and development time should be done on an x-small or small compute warehouse. When it comes time to run a job or a long query, that’s when you spin up a larger warehouse, run the job, and then shut the warehouse down. You have capacity on demand, so you will want to size your warehouse to optimize cost both in what Snowflake charges and in human capital. Why wait for 2 hours on a long job when you can run it in 15 minutes by using a warehouse 8 times the size? For the most part, you’ll see run times cut in half and the cost doubled at each size up. So, it’s cost neutral to use a bigger warehouse but saves human cost.

Sometimes even the Snowflake cost is saved by running a more expensive, larger warehouse. How so? If the compute warehouse is too small, it may have to spill data to local or even remote cache. Disk drives are a lot slower than ram. When you use a larger sized warehouse, you also get more ram. Thus, the query or load can complete so much faster that you are saving more than the extra cost of being large.

One Expandable Warehouse for All

It is typical for companies to assign each team or business unit their own warehouse. It’s one of the ways companies can manage cost charge-back. However, it’s inefficient to have multiple warehouses with their meters running up charges when a single shared warehouse will do. To handle overuse, you set it up as a multi-cluster that will spawn other instances when there is demand and shrink them when demand goes away. You use roles or tags to handle divvying up the shared cost across those using the warehouse.

Break Large Load Files Into Many Smaller Ones

Snowflake is a massively parallel database. Each node in a Snowflake warehouse cluster has 8 processes. A large sized warehouse has 8 nodes, 32 processes. If you try to load a single large file, only one of the processes is used. If you have the file broken up (Snowflake recommends 100-250mb chunks), then all 32 processes will work in parallel rocketing your loading performance.

Judicious Use of Cluster Keys

Snowflake builds micro-partitions when data is loaded. For 90% of the scenarios, you can just let Snowflake do its thing and you will get great performance. This is one of the ways Snowflake is so cost effective, it doesn’t take an army of tuning DBAs to operate. However, there are going to be times when you will need to put a cluster key on a table to get the performance needed. And poor performing queries cost extra money.

There was a 40 billion row table that was joined to a 3 billion row table in a view that brought reporting to its knees. Clustering both tables on the join keys enable the report to run in less than 2 minutes. For more information on clustering see Snowflake’s documentation.

Lift and Shift Still Needs Tuning

One of the common mistakes is to assume that “if it worked in the old system, it should work in Snowflake”. You will encounter performance issues (and thus cost issues) whose solution will not lay in adjusting Snowflake warehouses. 

Here are just some recent tuning scenarios I’ve encountered:

There was a data load that was running $500 to $1500 per day. 8 billion rows of inventory were loaded per day. Every item in every store across the world was scanned. The loading procedure used a MERGE.  So, 8 billion searches to find the right row and update the data. And yet, there was no history. Once the merge happened the current value was the only value. Thus, a merge wasn’t needed at all.  In effect, the table was a daily snapshot of inventory and the data coming in was all that was needed. Removing the merge took a process from 8 hours on a very expensive 64 node Snowflake warehouse to a couple minutes of a 32-node snowflake warehouse. A savings of $15k-$30k per month was realized.

Just because “the query worked on XYZ database” doesn’t mean everything is okay. A very expensive and long running query on Snowflake was fixed by discovering a cartesian join. When all the proper keys were added to the join, the query ran fast.

Oftentimes in mature systems – there are views built upon views built upon views. A slow report sent me spelunking through the “view jungle”. I discovered one of the views had a join to a table where no fields from that table were used plus a distinct. At a half billion rows, this unnecessary join and thus unnecessary distinct caused the performance problem.

The take away is that a good deal of the work will be taking a fresh look at the problem and not taking “the old system” as gospel for the new system.

Monitor the Spend

Snowflake has views to help you monitor cost and performance. They are located in the Snowflake database in the ACCOUNT_USAGE schema. If you have multiple accounts, the combined values are in the ORGANIZATION_USAGE schema. There are prebuilt Tableau, PowerBI, Sigma and other dashboards you can download. There is no substitute, however, for getting familiar with the views themselves.

Strive is a proud partner of Snowflake!

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.

ABOUT SNOWFLAKE

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.

What is the Spatial Web and why should your company care?

First things first. What is the Spatial Web?    

Technically it is a three-dimensional computing environment that can seamlessly combine layers of information gathered from countless geo-located connected devices to create seamless immersive experiences. If that sounds a lot like the Metaverse, you’re right. The two terms are often used interchangeably. However, I prefer to use the name Spatial Web instead of Metaverse, believing the former is both more accurate and descriptive.  

Whichever term you use, the best way to understand it is to experience it. The good news is you can do that right now with your smartphone!  

Head outside for a short walk using Google Maps with Live View enabled. What you will see is an Augmented Reality (AR) view of your walking path. The traditional map view will be on the bottom the screen, and the main screen area is using the camera and placing icons and directional information intelligently on top of the real-world camera view.   

This is the Spatial Web! It’s cool on the smartphone, but imagine if all sunglasses magically projected this as a layer and you did not have to hold the phone at arm’s length as you walk. This is a great way to start imagining what the Spatial Web will be in the near future as new devices seamlessly integrate into our lives. 

Ok, so why should your company care? 

The Spatial Web is coming. There’s no question about that. In fact, many of the component elements such as AR, VR, and the Internet of Things are already in use across multiple sectors. 

The gaming industry and the entertainment sector are the most visible leaders in harnessing the Spatial Web. And Facebook (now Meta) has become, perhaps, the most well-known proponent of it. 

But forward-thinking organizations in many other industries are piloting Spatial Web projects that demonstrate the expanse of potential use cases. 

There are good reasons for pursuing those use cases, too, with pilots already delivering benefits and good returns on investments. 

Let’s take an example where a complex critical system is down, say a Wind Turbine, and a mechanic working on it. The mechanic could use AR-enabled goggles to pull up instructions to guide on-site repairs. Or that mechanic could use the goggles to see design schematics that, using artificial intelligence programs, help pinpoint problems. The mechanic could also use those AR-enabled and internet-connected goggles to collaborate with an engineer from the manufacturer, with the engineer being able to see in real-time exactly what the mechanic sees and does on the machine. 

Such capabilities are already here and improving all the time, giving us a glimpse of what’s on the horizon.  

So, what’s ahead? A future where the Spatial Web will simply be part of how we live, work and engage. 

When that day arrives, these metaverse-type technologies will feel like an extension of yourself, just as smartphones have become ever-present ubiquitous tools that constantly inform, guide and connect us.  

And when that time comes, seeing someone wearing smart glasses will be the norm, not the exception. 

The timeline for that future state is years away. Gartner, a tech research firm, has predicted that widescale adoption of metaverse technologies is a decade away

There are, for sure, technical hurdles that need to be overcome on this Spatial Web journey. 

There have been concerns, for example, about the heat generated from the compute processing in smart glasses, the battery life in connected devices and the vertigo some suffer when using virtual reality. 

But tech companies are working on those issues, and it’s only a matter of time before they have them worked out. After all, they have the incentive to do so as there’s existing market demand for these technologies. 

And we’re already seeing tech companies deliver big advances. They are developing audio technologies to ensure immersive audio experiences. They’re maturing haptic technology, or 3D touch, so you’ll be able to actually feel those actions happening in a virtual world. Some companies are trying to do the same thing with smell. 

These technologies will work with existing ones, such as geolocational tech, sensors, artificial intelligence, 5G and eventually 6G, to instantaneously deliver layers of information to users. 

While a fully-realized Spatial Web is still years away, you shouldn’t wait to start making plans for how you will harness its potential; you can’t wait to think about your strategy until everybody starts buying connected glasses. 

If you do, then you’ll already be behind. And if you wait too long, you’ll miss out. 

The reality today, right now, is that you will have to respond to the Spatial Web as it evolves and as it delivers new ways for organizations and individuals to interact. 

This new technology-driven realm will enable increasingly frictionless services to consumers, seamless B-to-B services and new potential applications that some are already starting to imagine. 

Here at Strive, we are exploring the component technologies that collectively make up the emerging Spatial Web. 

And we’re partnering with clients to envision their Spatial Web strategies, outline the infrastructure and skills they’ll need, and devise the optimal business cases to pursue – all so they’re ready to move as the technologies mature and the Spatial Web moves into the mainstream.

Connect with Strive! 

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in the Spatial Web or an overall Technology Enablement assessment, Strive Consulting is dedicated to being your partner, committed to success.  

Contact Us

 

How to Modernize A Data Strategy Approach

Modernizing your company’s data strategy can be a daunting task. Yet making this change — and doing it right — has never been more important, with torrents of data now dictating much of the day-to-day in many organizations.

Missing the boat on making this change now can hold your business back in meaningful ways down the line. Changing your approach to capturing, sharing, and managing your data can help you avoid many of the pitfalls that befall businesses today, such as duplicating data across the organization and processing overlaps.

Implementing an effective data strategy will enable you to treat data not as an accidental byproduct of your business, but an essential component that can help you realize its full potential. Setting out clear, company-specific targets will help you tackle these challenges effectively.

Before you embark on this journey, however, it is crucial to understand why you want to modernize and where you are now and identify the most efficient path to the finish line.

Strategic Vision – Future of Your Data

The first step is to define a vision for your own data modernization. Do you know why you want to modernize your data strategy and what your business can gain in the process? Do you have an aligned strategy and a clear idea of what a thriving  Data ecosystem will entail?

Defining your goals — whether that is to gain a better grasp of your data, enhance accuracy or take specific actions based on the insights it can provide — is paramount before initializing this process.

Equally essential is to ensure early on that executive leadership is on board, since overhauling your data strategy will require significant investment in time and resources. This will be needlessly difficult without full buy-in at the very top. Figuring out how better data management will tie in with your overall business strategy will also help you make your case to leadership.

Ways of Working – Operating Model

Next, you need to figure out how this modernization will take place and pinpoint how your operating structure will change under this new and improved system.

Setting out ahead of time how data engineers and data scientists will work with managers to shepherd this strategy and maintain it in the long run will ensure a smooth process and help you avoid wasting time and resources.

Identifying what your team will look like and gathering the required resources to implement this project will lead you directly into implementation.

Accessibility & Transparency — See the Data

Gaining access and transparency, at its core, is about implementing new systems so that you gain better visibility of the data you have. You want to make sure that your structured and unstructured content — and associated metadata — is identifiable and easy to access and reference.

Putting the infrastructure in place to ingest the data your business already creates, and format it in a way that lets you access it efficiently, might appear basic. But figuring out how to achieve this through data integration or engineering is a vital step and getting this wrong can easily jeopardize the entire project.

Data Guardianship — Trust the Data

Once you have brought your data to the surface, determining ownership within your organization will ensure both that accuracy is maintained, and that data is managed and updated within the correct frameworks. 

This includes applying ethical and data sharing principles, as well as internal governance and testing, so that you can ensure your data is always up-to-date and handled responsibly. Making sure that you can trust the data you are seeing is essential to guarantee the long-time benefits you are hoping to gain through data modernization in the first place. 

Plus, you can rest easy knowing that your reporting data is accurate instead of worrying about falling foul of external compliance metrics and other publication requirements.

Data Literacy — Use the Data

Tying back to your internal data management, literacy is all about making sure that you have the right skillsets in place to make savvy use of the insights you are gaining from your data.

You and your team need to make sure you are trained and equipped to handle this process both during implementation and once your new system is in place — so you can leverage the results in the best possible way and make it easier to access and share data throughout the company.

After all, making secure financial and operational decisions will depend on how much you trust in your own core capabilities. Ideally, a successful data management strategy will enable you to understand every part of your business. This applies not just internally, but also spans your customers, suppliers and even competitors.

Take the First Step with Strive

Our experts at Strive Consulting are here to help you assess whether you are ready to embark on this journey and provide you with a clear perspective of where you are, what’s involved, and how to get there. We are ready to walk you through this process and make sure the final product ends up in the right place, so you can be confident that your data is in safe hands — your own. Learn more about Strive’s Data & Analytics and Management Consulting practices HERE.

Contact Us

3 Ways to Improve Collaboration in the Remote / Hybrid World

We’ve all been in that meeting. There were probably too many people invited, the agenda is vague, 90% of folks are remote and off camera (10% of those are probably folding laundry or some other household chore while they listen in). Then you hear the inevitable words “Let’s brainstorm this.”  2 or 3 enthusiastic participants end up doing most of the talking and the conversation takes on a circular characteristic until a senior leader or manager tries to stop the swirl by making a suggestion of their own. Everyone else is inclined to fall in line, and the meeting moves on to the next arduous loop. After over an hour, you’re left wondering: What did we actually accomplish? 

It’s time to face facts that collaborating in the remote and hybrid world requires different ways of working together. The natural structures of the office that act as palaces of accountability, collaboration, and innovation have been replaced by impersonal video calls in far flung home offices.

Current trends would suggest that the reality of remote and hybrid work isn’t about to end anytime soon, but there are things you can do now to make your virtual meeting time more efficient and enjoyable.  

Here are 3 tips to start improving your virtual meetings.

  • Never start from scratchStructure and visual starting points are always in style

One thing that most people dislike is uncertainty. People are more willing to engage when they know what to expect and feel confident their time will be spent effectively. By providing agendas, objectives, and materials to review before the meeting, you’ll prime your audience or colleagues to start thinking about topics you want to discuss (even if subconsciously) and get better feedback and participation when the time comes.   

About 65% of people are visual learners 1. Having visual aids to help provide context and bring people up to speed quickly will always supercharge your meeting efficiency, particularly in virtual settings, where you can’t draw pretty pictures on whiteboards. Having something to react to will always elicit more effective feedback and progress than trying to start from scratch

  • Use design thinking techniques to unlock diversity in brainstorming sessions

Idea generation and innovation are some of the most difficult things to accomplish in a virtual setting. At Strive, we often leverage ideation techniques from design thinking practices such as affinity mapping, mind mapping, or SCAMPER (among many others) to make brainstorming more enjoyable and participatory for your team. By introducing individual brainstorming and voting principles within these techniques, you’re able to increase participation and better democratize decision making. Using these methods will help your team quickly align around creative ideas everyone can get excited about. Turning boring meetings into fun workshops helps bring some variety and intrigue to your colleague’s days – and they’ll thank you for it.

  • Leverage a virtual collaboration tool like MiroTM

Virtual white-boarding tools have made huge improvements since being thrust into the limelight during the COVID-19 pandemic. We leverage tools like MiroTM with many of our clients to help facilitate engaging workshops, capture requirements, and build relationships. The features of the infinite virtual white-boarding canvas help you capture some of the magic previously only possible when working in person. On top of the real-time collaboration these tools enable, they also provide access to a universe of templates and ideas for creative ways to effectively facilitate a variety of types of meetings (including design thinking techniques mentioned above).

Whatever your role may be in the corporate world, meetings are practically unavoidable, but with these tips and tools, you can become the meeting hero that saves your team from boring and unproductive virtual meetings.

Strive has become experts at virtual collaboration… Need help?

Here at Strive, we take pride in our Management Consulting practice, where we can assist you in your initial digital product development needs, all the way through to completion. Our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your company’s growth strategy.

An Example of a Living Data Mesh: The Snowflake Data Marketplace

The enterprise data world has been captivated by a new trend: Data Mesh. The “What Is Data Mesh” articles have already come out, but in this publication, I want to highlight a live, in production, worldwide Data Mesh example – The Snowflake Data Marketplace.

As in every “new thing” that comes down the pike, people will change the definition to suit their purposes and point of view, and I am no different. Zhamak Dehghani, a Director of Emerging Technologies at ThoughtWorks, writes that Data Mesh must contain the following shifts:

  • Organization: From central controlled to distributed data owners. From enterprise IT to the domain business owners.
  • Technology: It shifts from technology solutions that treat data as a byproduct of running pipeline code to solutions that treat data and code that maintains it as one lively autonomous unit.
  • Value: It shifts our value system from data as an asset to be collected to data as a product to serve and delight the data users (internal and external to the organization).
  • Architecture: From central warehouses and data lakes to a distributed mesh of data products with a standardized interface. 

It is on this principal that I take departure and advocate the Snowflake Data Cloud. I believe that the advantages that have always been in a centralized data store can be retained, while the infinite scale of Snowflake’s Data Cloud facilitates the rest of the goals behind Data Mesh.

With so much to understand about the new paradigm and its benefits, or even grasping what an up and running Data Mesh would look like… to date, even simplified overview articles are lengthy. As I wrestled with coming to my own understanding of Data Mesh and how Strive could bring our decades of successful implementations in all things data, software development, and organizational change management to bear, I was hit by a simple notion. There is already a great example of a successfully implemented, world-wide, multi-organization Data Mesh – The Snowflake Marketplace.

There are more than 1,100 data sets from more than 240 providers, available to any Snowflake customer. The data sets from the market become part of the customer’s own Snowflake account and yet are managed and kept up to date by providers. No ETL needed and no scheduling. When providers update their data, it is updated for all subscribers. This is the definition of “data as a product”.

In effect, The Snowflake Data Cloud is the self-service, data-as-a-platform infrastructure. The Snowflake Marketplace is the discovery and governance tool within it. Everyone that has published data into the Marketplace has become product owners and delivered data as a product.

We can see the promised benefit of the Snowflake Marketplace as Data Mesh in this – massive scalability. I’m not speaking of the Snowflake platforms near infinite scalability, impressive as that is, however considering how every team publishing data into the market has been able to do so without the cooperation of another team. None of the teams that have published data have had to wait in line to have their priorities bubble up to the top of IT’s agenda.  A thousand new teams can publish data today. A hundred thousand new teams can publish their data tomorrow.

This meets the organizational shift from centralized control to decentralized domain ownership, and the data as a product, and technically with data and the code together as one product. 

Data consumers can go to market and find data that they need, regardless of which organization created the data. If it’s in the Snowflake Marketplace, any Snowflake customer can use the data for their own needs. Each consumer of the data will bring their own compute, so that nobody’s use of the data is impacting or slowing down the performance of another team’s dashboards.

Imagine that instead of weather data published by AccuWeather and financial data by Capital One – it’s your own organizations customer, employee, marketing, and logistics data. Each data set is owned by the business team that creates the data. They are the team that knows the data best. They curate, cleanse, and productize the data themselves. They do so on their own schedule and with their own resources. That data is then discoverable and usable by anyone else in the enterprise (gated by role-based security). Imagine that you can scale as your business demands, as new businesses are acquired, as ideation for new products occur. All facilitated by IT, but never hindered by IT as a bottle neck.

With Snowflake’s hyper scalability and separation of storage and compute, and its handling of structured, semi-structured, and unstructured data, it’s the perfect platform to enable enterprise IT to offer “data as self-serve infrastructure” to the business domain teams. From there, it is a small leap to see how the Snowflake Data Marketplace is, in fact, a living example of a Data Mesh with all the benefits realized in Zhamak Dehghani’s papers.

As a data practitioner with over 3 decades of my own experience, I am as excited today as ever to see the continuous evolution of how to get value out of data and deal with the explosion in data types and volumes. I welcome Data Mesh and the innovations it is promising, along with Data Vault 2.0, cloud data hyper-scale databases, like Snowflake, to facilitate the scale and speed to value of today’s data environment.

Strive is a proud partner of Snowflake!

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.

ABOUT SNOWFLAKE

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.

Migration to the Cloud Needs Experienced Help

Executives are already sold on the benefits of moving to the cloud. They know that they need cloud computing to be agile, fast, and flexible; they know cloud allows them to successfully compete in this digital era.

Yet, many enterprise leaders struggle to advance their cloud strategies, with plenty of companies still working to migrate away from on-premise applications and out of their own data centers.

Here at Strive Consulting, we aren’t surprised by such reports: We know that cloud migration comes with numerous significant challenges…. and research backs that up.

Consider the figures from the 2022 State of the Cloud Report from the software company Flexera.

It found that understanding application dependencies is the no. 1 challenge to cloud migrations, with 53% of respondents listing this as a pain point.

Other top challenges include assessing technical feasibility, assessing on-premise vs. cloud costs, right-sizing/selecting best instance, selecting the right cloud provider, and prioritizing the applications to migrate.

Such challenges deter and derail many cloud migration plans.

Many companies don’t have the technical skills they need to address those specific challenges to move their cloud strategies forward, as their staff has, understandably, been trained and focused on supporting their on-premise and legacy systems.

On a similar note, organizations don’t have in-house workers with the experience required to analyze and assess all the available cloud options and to select the best architecture for current and future needs.

As a result, companies slow-walk – or outright put off – their cloud migrations. Or they move forward as best they can, only to realize that they need to redo their work when their new cloud infrastructure fails to yield the financial or transformational benefits they expected.

Those scenarios demonstrate why companies need an experienced hand when they migrate to the cloud and why they need people who can advise them on the right architecture for their own specific environment and their industry’s unique needs.

At Strive, we understand the myriad cloud options – from serverless, containers and virtual machines to infrastructure-as-a-service, platform-as-a-service, and software-as-a-service. We understand the nuances and requirements associated with each choice, the strategic reasons that would make one better than another, how they work together, and the supporting pieces needed to optimize each one’s performance.

Take virtual machines, for example. Going that route requires the creation of automation scripts to spin up and turn off based on use. Companies without much experience or expertise in virtual machines may overlook this critical component and, thus, end up with infrastructure that doesn’t deliver on its objectives.

Companies find that this is often the case, particularly when they’re embarking on their own.

In fact, selecting the wrong cloud option and implementing suboptimal cloud infrastructure are two of the leading reasons for poor outcomes and failed initiatives.

When we partner with companies to advance their cloud adoption, we start by understanding their own unique environment, their enterprise needs, and any industry-specific requirements that could impact their choices around cloud.

We work with our clients to determine whether, for example, they want to modernize by re-architecting their systems and using platform-as-a-service.

Whether the right move is shifting everything as is to the cloud.

Whether going with IaaS or SaaS provides the features, functions, and cost benefits they’re looking for.

Whether and when to go with hybrid, multi-cloud, multitenant, private, or public cloud.

Or whether it’s better to go the serverless route, leveraging features like containers, so they’re not paying for consumption when apps aren’t in use.

We help clients understand the financial implications of their cloud strategy decisions, and we build monitoring tools to track both performance and consumption, so they can detail what they’re using and how much that usage costs. We know from experience that finance departments are particularly interested in that information. But we also see how it benefits IT leaders, who want to allow their developers the freedom to innovate, but still want visibility into the resources being used and at what cost.

We also know from experience the importance of building a cloud environment that’s both secure and scalable, with automation in place to build that infrastructure over and over so organizations can easily build up and tear down as often as needed.

Furthermore, we advise companies on the change management that’s required to successfully migrate to the cloud. As such, we work with developers and engineers to understand new processes and to support them as they develop the expertise they’ll need to maintain, manage, and eventually mature an organizations cloud strategy.

There’s one more point I want to address: Strive knows that a cloud migration plan is not just about technology, that it’s also – and, in fact, more so – about what the technology can do for the business.

The right cloud environment enables companies to pivot quickly. Companies can rapidly and cost effectively create or adopt new functions or test and tweak proof of concepts because they can spin up and wind down computing resources.

All of this enables faster time to market with products and services and an overall more responsive organization.

Our experienced teams help clients achieve that kind of transformation by helping them design and implement the right cloud infrastructure to support those bigger objectives.

Thinking about Migrating to the Cloud? Strive can help!

We take pride in our Technology Enablement practice, where we can assist your organization with all of your cloud enablement needs. Our subject matter experts team up with you to understand your core business needs, while taking a deeper dive into Platform Assessment, Platform Migration, and even Platform Modernization.

Contact Us