Correlation Vs Causation

By now we have probably all heard the old adage, “Correlation does not equal causation.” But what does this mean for the field of data science? Often, businesses are trying to solve complex business problems with machine learning, but machine learning is not always the best solution, especially for evaluating interventions. While machine learning is a great tool that has many applications, the issue is that the relationships between variables that are found through machine learning models are correlations, not causations.  So, if you just need an accurate output without needing to understand the underlying factors causing that output, machine learning may be for you! In other scenarios, if you are trying to evaluate a business decision or action and the impact it had on revenue or other key metrics – what you are really trying to understand is the causal relationship between your intervention and the resulting outcome. This analysis is better suited for causal inference, which I will demo in this blog.

Let’s suppose that you work for Volusia County government in Florida (the shark attack capital of the world). One of your tasks is to reduce the incidence of shark attacks that occur on Volusia beaches. A Data Analyst is giving a presentation and shows the following chart:

Additionally, the analyst has an algorithm that can predict shark attacks with ~95% accuracy using ice cream sales as one of the predictor variables. You wonder how knowing this information and having an algorithm helps you reduce the incidence of shark attacks. Furthermore, one of your coworkers exclaims, “That’s it! We should ban the sale of ice cream on our beaches! Clearly it is causing shark attacks!”. Immediately, you are skeptical. It doesn’t seem like ice cream would have any impact on shark attacks. And your instincts are correct. Something else is occurring here. The answer lies within confounding variables. Ice cream consumption and the incidence of shark attacks both occur more often in warmer temperatures since people swim on the beach when it is warm outside. The confounding variable here is the temperature outside. A confounding variable is any variable that you’re not investigating that can potentially affect the outcomes of your research study and is exactly the reason why correlation does not equal causation!

So back to the issue at hand, how do we reduce the incidence of shark attacks? One hypothesis would be that increasing the number of life guards on duty would allow sharks to be spotted quicker and we would be able to get people out of the water faster – before they are attacked by sharks. So, you want to know if the increase in the number of lifeguards last summer was the reason shark attacks were reduced, and, if it is, you could further reduce shark attacks by securing funding to hire more lifeguards. But, how do we make sure that we take into account possible confounding variables and that the observed decrease in shark attacks wasn’t due to chance? Enter causal inference to save the day!

There are multiple ways we can get control for confounding variables. Three methods that are regularly used are:

  1. Back-door criterion
  2. Front-door criterion
  3. Instrumental variables

The back-door and front-door criterion comes from Judea Pearl’s do-calculus that you can read about in his book, “Causality: Models, Reasoning and Inference.” Instrumental variables were introduced as early as 1928 by Phillip Wright and are frequently used in econometrics.

Back-door Criterion

This method requires that there are no hidden confounding variables in or outside of the data. In other words, we cannot have any variables that influence both the intervention and the outcome that we haven’t controlled for. It’s not always possible to rule out every possible confounding variable, but with proper hypotheses, we can be reasonably certain.

Correlation Vs Causation
Figure 1. Temperature is a confounding variable in this scenario. It is correlated to the intervention and causally related to the outcome. We would use the back-door criterion in this case to control for it.

Front-door Criterion

You can have a hidden confounding variable with this method as long as you have a third, mediating variable that mediates the effect of the intervention on the outcome and the mediating variable is not impacted by the confounding variable (Ex: level of alertness is a mediating variable between intervention lack of sleep and outcome academic achievement). 

Correlation Vs Causation
Figure 2. In this scenario, we don’t need to control for temperature since we have a mediating variable. The number of lifeguards determines how many stands there will be, and temperature does not impact how many stands there are. We can use the front-door criterion to measure the impact of the amount of life guards on the incidence of shark attacks.

Instrumental Variables

You can also have a confounding variable as long as you have a third variable that is correlated with the intervention, is not correlated with the outcome, and is not impacted by the confounding variable (Ex: if you want to know the effect of classroom size on test scores you would need to find a variable that is highly correlated with classroom size but wouldn’t have an impact on test scores and is not impacted by confounding variable school funding and resources). These can be hard to come by.

Correlation Vs Causation
Figure 3. If you had variable z that was correlated with number of lifeguards, and not correlated with the increase in Shark Attacks and not directly impacted by temperature, you could use the instrumental variables method to measure the impact of life guards on shark attacks.

ATE and CATE

At this point we should take a step back and understand ATE, CATE, and counterfactuals. Often times, we don’t just want to know if the intervention was statistically significant and successfully caused the outcome we are interested in. We also want to know the magnitude, or by how much, our intervention caused an outcome. In our shark attack example, we would want to know how many shark attacks we prevented by increasing the number of life guards. This is called the ATE or the Average Treatment Effect. If we wanted to know how our intervention impacted different beaches, then we would use the Conditional Average Treatment Effect (CATE), which just tells us the average treatment effect for a subset of the population. The ATE is calculated by taking the difference between the outcome with the intervention and the outcome without the intervention. So, in this example, we would take the difference between the outcome of increasing life guards and the outcome of not increasing life guards. But if we can only give one intervention at a time (we can’t simultaneously increase and not increase life guards), how can we know the outcome of the intervention that the beach did not get? This is calculated by counterfactuals. Counterfactuals are things that did not happen, but could have happened (Ex: Joe got the treatment and recovered in 10 days, but the counterfactual outcome is Joe not getting the treatment). I will not get into the weeds of how this is calculated, but at a high level it is calculated using covariates.

Estimates

Once we know whether we need the ATE or the CATE and we know which method we are using to control for confounding variables, then we can identify the method we will use to calculate our ATE. Typically, if we have low dimensionality/complexity in our data, we can use simple methods like matching, stratification, propensity matching, inverse propensity weighting, and the Wald estimator. If we want to calculate the CATE or we have high dimensional/complex data, we can use more advanced ML methods such as Double ML, Orthoforests, T-Learners, X-Learners, and Intent to Treat Driv. Backdoor methods we could use would be linear regression, distance matching, propensity score stratification, propensity score matching, or weighting. Instrumental Variable methods we could use would be Wald estimators and regression discontinuity. If front door criterion is met we could use a two stage regression. This is not an exhaustive list, but a list of potential methods we could use to calculate the ATE.

Refutation

Once we have calculated the ATE or the CATE, we have one more step to perform. This is the refutation step. Refutation tests check the robustness of the estimate. This is essentially a validation test that looks for violations in our assumptions when we calculated our estimates. Some refutation tests we can do to check the strength of our causal relationship are:

  • Adding a random cause variable to see if that significantly changes the ATE/CATE
  • Replacing interventions with random (placebo) variables to see if that significantly changes the ATE/CATE
  • Removing a random subset of the data and see if that significantly changes the ATE/CATE

If these refutation testes come back insignificant (above .05), then the intervention is likely significantly causal.  

Summary

Causal inference is much better suited for many problems that businesses face than a machine learning model. Furthermore, it allows us to identify and quantify a causal relationship between our intervention and the outcome we observe. You can see that going a step above identifying correlations and identifying causal relationships can be a very impactful exercise. If we know that increasing life guards is not just associated with a reduction in shark attacks, but that it causes this reduction in shark attacks, it gives us a direct action we can take to achieve our goal of reducing shark attacks (increasing life guards).

Interested in Causal Inference?

Strive’s Data & Analytics team can help you identify causal relationships in your business. Want to know if the marketing strategy rolled out in Q3 caused an increase in customers and revenue in Q4? Would you like to know if implementing a more robust PTO policy could decrease employee churn? No matter what causal question you have, we are happy to help! Our team of data analysts and data scientists are uniquely positioned to help you take action that will allow your business to reach its goals and beyond! Let us uncover valuable insights that will help your company today!

Contact Us

No Phishing in the Data Lake

How to Find and Mitigate Security Risks in Large Data Storage

For any business with a data strategy in place, the next step on the roadmap to data transformation is to capture all the structured and unstructured data flowing into the organization. To do so, organizations must create a data lake to store data from IoT devices, social media, mobile apps, and other disparate sources in a usable way.

What is a Data Lake?

A data lake, per AWS, is a centralized repository that allows an organization to store data as is, without having to first structure the data, and run different types of analytics—from dashboards and visualizations to big data processing, real-time analytics, and machine learning to guide better decisions.

Data lakes differ from data warehouses in that data warehouses are like libraries. As data comes into a warehouse, it gets carefully filed according to a structured system that has been defined in advance, making it easy and quick to find exactly what you’re looking for given a specific request. In data lakes, there’s no defined schema, which means data can be stored without needing to know what questions may require answers in the future. As in an e-Bookstore, you can search generally and call all relevant results from various media types and make decisions based on machine learning recommendations and other people’s insights.

Many organizations are evolving their data storage to incorporate data lakes. However, maintaining any online information storage comes with security risks that must be identified and mitigated.

Security Risks and Consequences Within Data Lakes

Over the past few decades, improvements in compute power and storage space coupled with much more affordable storage prices have made it possible to store massive amounts of data in one place. Not long ago, storing a database of every citizen’s Social Security number would have been impractical—now it’s pennies on the dollars to store as a table in a data lake.

As much opportunity as large data storage provides organizations, it also creates risk. When vulnerabilities occur in repositories, their infrastructure, or any dependencies, the level of impact depends on the type and scale of the information that was compromised. Since data lakes have vast amounts in a single location, when breaches occur, the impact is often spectacular in size and in magnitude.

Common tactics hackers use to exploit enterprise data are Initial Access, Defense Evasion, and Credential Access. Kurt Alaybeyoglu, Senior Director of Cybersecurity and Compliance at Strive Consulting, says organizations often make a mistake by focusing too strongly on preventing Initial Access—a cybercriminal getting into the org’s network. Data lakes interact with so many sources that it doesn’t take network access to be able to cause damage.

“The two primary security risks in a data lake,” Kurt says, “are exfiltration ofand impact to sensitive data.” As the name suggests, data exfiltration is the unauthorized transfer of data. Attackers can either steal specific piece(s) of data or, more often, simply take a copy of an entire lake—akin to a burglar carrying away a safe so they can open it and rifle through its contents at their leisure. Data impact ranges from encrypting the data in the lake, to wiping it, corrupting it, or destroying the means of access to the platform.

Both tactics can, and have been proven to, be catastrophic for an organization’s survival.

Are Data Lakes Worth the Risk?

Facing such dire consequences in the event of a cyberattack, why do businesses choose to use data lakes? Conventional wisdom says not to keep all your eggs in one basket—compartmentalizing data to avoid total compromise is surely more secure. But for many, according to Kurt, the rewards of data lakes outweigh the risks.

“Being able to access massive data at your fingertips with simple queries is what allows modern apps to exist,” he explains. “Take Uber as an example. Uber, as a technology, completely disrupted the taxi service model. It got rid of the need for dispatchers because at its heart was software that acted as one, pairing users and drivers faster than most humans can. Their software functions because Uber created a data lake that contains information like riders, drivers, maps, payment information, etc. that allow all of these disparate aspects to function seamlessly”

While separating this data into different repositories may be more secure, it would take significantly longer times for the application to function, from running all the queries to payment processing, to time calculation for the ride—it would completely preclude the app’s usefulness. Not to mention the added complexity would make securing data just as-if not more-difficult.

“As security professionals, we have to try to mitigate those risks as best as possible,” he says. “At the end of the day, data security is a business function. Our job is to say ‘yes, we can do that, but here are the risks.’ Leaders must decide what they’re willing to pay to mitigate, what they’ll pay to transfer, and what risks they’re willing to accept.”

3 Ways to Prevent Security Breaches in the Data Lake

What makes data lakes so risky is that the valuable commodity, data, by necessity must be accessible, whether that’s to a platform, an end user, or someplace else. The data must be available in order to be useful. So, an organization’s top three focus points to protect that data are as follows:

  1. Rigorous access control: More people with unfettered access to the data lake means more potential entry points for a hacker to attempt to exploit. To secure the data lake, be thoughtful about who can access it and when. Validate those users’ identities using strong passwords and multi-factor authentication (MFA). If the data lake contains particularly sensitive information, consider more advanced hardware solutions such as FIDO2 keys.
  2. Regular vulnerability scanning and testing: Because data lakes and supporting platforms aren’t tied to a single device, hackers no longer need to achieve initial access to get ahold of the data. For most applications that interact with data lakes, a successful breach may only take a SQL or command injection that forces the system to respond with data it’s not supposed to—no device compromise needed. Because of that risk, proactively looking for the holes in a data lake’s security is paramount. Use a combination of application threat modeling, vulnerability scans, and application penetration testing to identify weak points, then remediate them quickly.
  3. Better detection through better training: “Data lakes are examples of what modern storage/compute allows us to do,” Kurt says. “We haven’t put the same level of effort and value into collecting audit logs to be able to make detection and analytics earlier in the cyberattack chain possible.” The answer? Staffing and training. Proactive threat detection comes from a skillset that knows what to investigate. “How do I collect audit logs from the platform? What logs should I collect? How do I determine when someone has accessed the data versus what’s just noise? That investigative mindset and skillset is in high demand and low supply,” says Kurt.

His suggestion to overcome the talent gap: Companies that rely on data lakes should build detection skillsets from within. It’s easier to pay to train a person who is well-versed in the inner workings of an organization’s platform that can build data security than it is to bring in a security generalist to work within an org’s data lake.

The advantage of training an internal employee is that they have the full view of the data product roadmap, which means they can start developing future updates on the platform that build security in from the ground up. That’s security by design—the brass ring of risk management in a data lake.

Where exploitable data exists, opportunists will try to access it. Data lakes provide organizations an incomparable ability to un-silo work, answer new questions by drawing information from diverse sources, and innovate technology that creates the next apex experience. For that reason, businesses must up level their investment in data security in concert with their investment in data storage and usability. On the data roadmap, that’s the ultimate step toward data transformation.

Protect your data. Protect your business.

Learn more about Strive’s cybersecurity services HERE, or set up a Launch Future State of Data Workshop to create your 1-3-year data vision plan HERE.

What is Tech Debt and why you need to pay it down

As an executive, you know to keep a close eye on company finances, including whether, when and how much debt the company can handle yet remain fiscally healthy.

Unfortunately, most businesses do not track tech debt and the negative impacts it has to their business. As a result – many businesses make the mistake of believing that managing their technical debt is a simple matter of good housekeeping – a “keep the lights on” operation isolated to their IT departments.

Nothing could be further from the truth.

Technical debt is anything and everything that slows down or hinders development.

Hindered development will lead to problem that cascade through entire company while it slows down improvements in every part of the business with a stake in that development. This often means the *entire* company. Tech debt will delay the delivery of all new features.

That’s not the worst of it.

Like other kinds of debt, tech debt accrues interest and the problems it causes compound over time. It’s like a tax that’s added to everything. Companies with an unacceptable level of tech debt pay the price with slower time to market, bloated production costs, higher IT support costs, and a high level of complexity that draws IT’s time away from more transformative work. Companies with a lot of tech debt also experience high churn rates among its developers, who face frustrating limits on the work they can do because of that tech debt.

Such companies often add more developers, erroneously thinking that if they just turn out more code, they can keep pace with the new features and functions they need to compete. Or they spend increasing amounts of time and IT budget in a game of Whack-A-Mole as they try to fix all the problems that keep popping up.

To help explain the issue, we sometimes equate tech debt to old factory equipment with broken bits that aren’t ever replaced. The factory owner can add more workers to the production line and tack on more pieces to the equipment, but neither approach fixes the underlying mechanical problems. Neither approach makes the equipment run right, and neither will help the factory produce more or better widgets. Ultimately, the only way to make that factory efficient and agile is to fix the broken parts.

It’s the same with technical debt.

But here’s the other challenge with technical debt: In most organizations, it’s something that’s invisible, unmeasured and unquantified. As a result, it typically remains unaddressed.

In fact, most organizations don’t have the slightest idea how much debt they carry. That’s why both young and old companies can see product and engineering costs way above what’s optimal: They have more technical debt than they even know.

As an executive, you may think it’s time to clear the technical books and get rid of all the tech debt that exists within your company. That, though, isn’t the goal. The reality is that there’s always going to be some amount of it.

Instead, here at Strive, we recommend that you tackle the problem by first understanding how much tech debt you have and then whether it’s an acceptable amount or whether it’s so much it’s slowing you down.

Tech debt fits into the overall methodologies to invest in and borrow against in order to be agile and move quickly. We’ve been developing the complex model to quantify tech debt. This allows us to identify and measure its financial impact, considering opportunities that get taken off the table and business risks introduced due to that debt

This approach helps everyone on the executive team and the board understand the importance of paying down its technical debt.

In addition to offering this standardized model for assessing and measuring the ongoing cost of technical debt, we advise clients on ways to pay down that debt and how to design, build and implement debt-free alternatives moving forward.

We know this approach is effective. We worked with one client, a services company, to measure its tech debt and quantify how it was impacting its agility. Executives there were able to see how resolving software bugs and other problems in existing code would unblock product teams and make them both faster and more responsive to evolving market needs. So instead of accumulating more debt, they actually saw better returns.

Companies that have paid down their tech debt and now have a good credit score are well positioned to compete against any upstarts that enter their market. They also require fewer resources to produce superior outcomes and to produce them quickly.

That’s a message the board and your shareholders will want to act on – especially now, as companies increasingly feel the pressure to be as efficient as possible in the face of a constricting economy.

Do you have Tech Debt?

Here at Strive, we take pride in our Technology Enablement practice, where we can assist you in understanding and mitigating your organizations tech debt. Our subject matter experts team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Click to learn more about our Technology Enablement capabilities and how we can help.

An Ode to Project Managers

Today is an opportunity to recognize advancements in Project Management, as well as the increased role it plays in corporate environments.

There have always been Project Managers, even if there wasn’t a formal title or official responsibilities. However, it is interesting to note that the formalization of the ‘Project Manager’ role, and specific knowledge areas they tackle, came to fruition in the 1950’s and followed closely with the introduction of the personal computer and microprocessors in the 1980’s. As more and more people became users, the demand for the new functionality and position increased. More competitors meant faster deployment to stay ahead of the marketplace! Increased competition also forced quality requirements – if your software didn’t function well, other competitors would. Furthermore, the introduction of the cloud and virtual replacements for things that used to be physical meant projects moved faster and with an expectation of high quality that had never been seen before. Enter the “PM”.

Project Management in our current environment requires not only faster delivery, but also identifying the best projects to benefit an organization and their customers. This drive has spawned a multitude of delivery methodologies and highlighted the need for formal approval processes and prioritization within companies. Project Management is the natural group to take on the prioritization and therefore Project Management Offices (PMO) have grown in significance within corporations. Many companies employ the PMO to own the process of intake and approval and when these projects are aligned with corporate strategy, that prioritization process may determine whether the company meets its goals for the year or not. This expanded corporate impact and influence means Project Management is now an integral part of corporate success.

Project Managers as a whole are organized, cool under pressure, and have impeccable collaboration and communication skills. They are innate leaders in their organization and have the technical expertise to ebb and flow between multiple happenings at once. Great negotiation skills and the ability to stand up and manage multiple teams are incredibly important in any organization and luckily, Project Managers do just that. Any Consulting company worth their salt is filled with PM’s and has a knack at enabling and training them as well.

On this International Project Management Day be proud of the impact you make on your respective companies and clients. Continue to grow your leadership skills and always be on the lookout for improvements to ensure Project Management remains relevant in an ever-changing landscape.

Happy Project Management Day to all the members of the Project Management community!

Learn More About Strive’s Project Management Capabilities

Our Management Consulting teams roll up their sleeves and utilizes their expertise, best practices, and firsthand experience to optimize business processes, deliver on strategic initiatives, launch new products, and create sustainable growth for our clients. We know there’s a job to do and we’re eager to do it.

In an era of rapidly changing business and market requirements, the right kind of thought leadership and experience is needed to guarantee customer success. Our Delivery Leadership experts work with key stakeholders to devise, plan, and execute your organization’s most transformational projects and programs. Our Delivery Leadership services includes Portfolio, Program, and Project Management. Strive helps your organization better manage strategic initiatives by identifying and implementing a set of processes, methods and technologies best-suited to meeting your organization’s business objectives.

Adventures in Snowflake Cost Management

Pay for use is both an exciting and challenging aspect of using the Snowflake Data Cloud. I’ve lead workshops and proudly proclaimed “Snowflake has brought an end to capacity planning!” And it has.  You never have to figure out how much storage or processing power you are going to need. You don’t have to plan for three-year storage needs and hope that you’ve not bought too little. It’s a constant dance – but no more. With Snowflake you can just add whatever data you need and only pay for what you are using. The same is true for the query processing power. When Black Friday hits, you have power on demand and yet you’re not paying for power all year long.

Now budget planning? That’s a different story… Typically, you will have bought a certain size machine to run your database or contracted for a certain amount of cloud capacity…and whether you use it a little or a lot, you pay the same. When you see your Snowflake costs sky rocket, you’ll start to think about usage in ways you never had to before. 

Here are some tips for being more efficient with your Snowflake spend.

Think Small, Run Big

Thinking time and development time should be done on an x-small or small compute warehouse. When it comes time to run a job or a long query, that’s when you spin up a larger warehouse, run the job, and then shut the warehouse down. You have capacity on demand, so you will want to size your warehouse to optimize cost both in what Snowflake charges and in human capital. Why wait for 2 hours on a long job when you can run it in 15 minutes by using a warehouse 8 times the size? For the most part, you’ll see run times cut in half and the cost doubled at each size up. So, it’s cost neutral to use a bigger warehouse but saves human cost.

Sometimes even the Snowflake cost is saved by running a more expensive, larger warehouse. How so? If the compute warehouse is too small, it may have to spill data to local or even remote cache. Disk drives are a lot slower than ram. When you use a larger sized warehouse, you also get more ram. Thus, the query or load can complete so much faster that you are saving more than the extra cost of being large.

One Expandable Warehouse for All

It is typical for companies to assign each team or business unit their own warehouse. It’s one of the ways companies can manage cost charge-back. However, it’s inefficient to have multiple warehouses with their meters running up charges when a single shared warehouse will do. To handle overuse, you set it up as a multi-cluster that will spawn other instances when there is demand and shrink them when demand goes away. You use roles or tags to handle divvying up the shared cost across those using the warehouse.

Break Large Load Files Into Many Smaller Ones

Snowflake is a massively parallel database. Each node in a Snowflake warehouse cluster has 8 processes. A large sized warehouse has 8 nodes, 32 processes. If you try to load a single large file, only one of the processes is used. If you have the file broken up (Snowflake recommends 100-250mb chunks), then all 32 processes will work in parallel rocketing your loading performance.

Judicious Use of Cluster Keys

Snowflake builds micro-partitions when data is loaded. For 90% of the scenarios, you can just let Snowflake do its thing and you will get great performance. This is one of the ways Snowflake is so cost effective, it doesn’t take an army of tuning DBAs to operate. However, there are going to be times when you will need to put a cluster key on a table to get the performance needed. And poor performing queries cost extra money.

There was a 40 billion row table that was joined to a 3 billion row table in a view that brought reporting to its knees. Clustering both tables on the join keys enable the report to run in less than 2 minutes. For more information on clustering see Snowflake’s documentation.

Lift and Shift Still Needs Tuning

One of the common mistakes is to assume that “if it worked in the old system, it should work in Snowflake”. You will encounter performance issues (and thus cost issues) whose solution will not lay in adjusting Snowflake warehouses. 

Here are just some recent tuning scenarios I’ve encountered:

There was a data load that was running $500 to $1500 per day. 8 billion rows of inventory were loaded per day. Every item in every store across the world was scanned. The loading procedure used a MERGE.  So, 8 billion searches to find the right row and update the data. And yet, there was no history. Once the merge happened the current value was the only value. Thus, a merge wasn’t needed at all.  In effect, the table was a daily snapshot of inventory and the data coming in was all that was needed. Removing the merge took a process from 8 hours on a very expensive 64 node Snowflake warehouse to a couple minutes of a 32-node snowflake warehouse. A savings of $15k-$30k per month was realized.

Just because “the query worked on XYZ database” doesn’t mean everything is okay. A very expensive and long running query on Snowflake was fixed by discovering a cartesian join. When all the proper keys were added to the join, the query ran fast.

Oftentimes in mature systems – there are views built upon views built upon views. A slow report sent me spelunking through the “view jungle”. I discovered one of the views had a join to a table where no fields from that table were used plus a distinct. At a half billion rows, this unnecessary join and thus unnecessary distinct caused the performance problem.

The take away is that a good deal of the work will be taking a fresh look at the problem and not taking “the old system” as gospel for the new system.

Monitor the Spend

Snowflake has views to help you monitor cost and performance. They are located in the Snowflake database in the ACCOUNT_USAGE schema. If you have multiple accounts, the combined values are in the ORGANIZATION_USAGE schema. There are prebuilt Tableau, PowerBI, Sigma and other dashboards you can download. There is no substitute, however, for getting familiar with the views themselves.

Strive is a proud partner of Snowflake!

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.

ABOUT SNOWFLAKE

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.

What is the Spatial Web and why should your company care?

First things first. What is the Spatial Web?    

Technically it is a three-dimensional computing environment that can seamlessly combine layers of information gathered from countless geo-located connected devices to create seamless immersive experiences. If that sounds a lot like the Metaverse, you’re right. The two terms are often used interchangeably. However, I prefer to use the name Spatial Web instead of Metaverse, believing the former is both more accurate and descriptive.  

Whichever term you use, the best way to understand it is to experience it. The good news is you can do that right now with your smartphone!  

Head outside for a short walk using Google Maps with Live View enabled. What you will see is an Augmented Reality (AR) view of your walking path. The traditional map view will be on the bottom the screen, and the main screen area is using the camera and placing icons and directional information intelligently on top of the real-world camera view.   

This is the Spatial Web! It’s cool on the smartphone, but imagine if all sunglasses magically projected this as a layer and you did not have to hold the phone at arm’s length as you walk. This is a great way to start imagining what the Spatial Web will be in the near future as new devices seamlessly integrate into our lives. 

Ok, so why should your company care? 

The Spatial Web is coming. There’s no question about that. In fact, many of the component elements such as AR, VR, and the Internet of Things are already in use across multiple sectors. 

The gaming industry and the entertainment sector are the most visible leaders in harnessing the Spatial Web. And Facebook (now Meta) has become, perhaps, the most well-known proponent of it. 

But forward-thinking organizations in many other industries are piloting Spatial Web projects that demonstrate the expanse of potential use cases. 

There are good reasons for pursuing those use cases, too, with pilots already delivering benefits and good returns on investments. 

Let’s take an example where a complex critical system is down, say a Wind Turbine, and a mechanic working on it. The mechanic could use AR-enabled goggles to pull up instructions to guide on-site repairs. Or that mechanic could use the goggles to see design schematics that, using artificial intelligence programs, help pinpoint problems. The mechanic could also use those AR-enabled and internet-connected goggles to collaborate with an engineer from the manufacturer, with the engineer being able to see in real-time exactly what the mechanic sees and does on the machine. 

Such capabilities are already here and improving all the time, giving us a glimpse of what’s on the horizon.  

So, what’s ahead? A future where the Spatial Web will simply be part of how we live, work and engage. 

When that day arrives, these metaverse-type technologies will feel like an extension of yourself, just as smartphones have become ever-present ubiquitous tools that constantly inform, guide and connect us.  

And when that time comes, seeing someone wearing smart glasses will be the norm, not the exception. 

The timeline for that future state is years away. Gartner, a tech research firm, has predicted that widescale adoption of metaverse technologies is a decade away

There are, for sure, technical hurdles that need to be overcome on this Spatial Web journey. 

There have been concerns, for example, about the heat generated from the compute processing in smart glasses, the battery life in connected devices and the vertigo some suffer when using virtual reality. 

But tech companies are working on those issues, and it’s only a matter of time before they have them worked out. After all, they have the incentive to do so as there’s existing market demand for these technologies. 

And we’re already seeing tech companies deliver big advances. They are developing audio technologies to ensure immersive audio experiences. They’re maturing haptic technology, or 3D touch, so you’ll be able to actually feel those actions happening in a virtual world. Some companies are trying to do the same thing with smell. 

These technologies will work with existing ones, such as geolocational tech, sensors, artificial intelligence, 5G and eventually 6G, to instantaneously deliver layers of information to users. 

While a fully-realized Spatial Web is still years away, you shouldn’t wait to start making plans for how you will harness its potential; you can’t wait to think about your strategy until everybody starts buying connected glasses. 

If you do, then you’ll already be behind. And if you wait too long, you’ll miss out. 

The reality today, right now, is that you will have to respond to the Spatial Web as it evolves and as it delivers new ways for organizations and individuals to interact. 

This new technology-driven realm will enable increasingly frictionless services to consumers, seamless B-to-B services and new potential applications that some are already starting to imagine. 

Here at Strive, we are exploring the component technologies that collectively make up the emerging Spatial Web. 

And we’re partnering with clients to envision their Spatial Web strategies, outline the infrastructure and skills they’ll need, and devise the optimal business cases to pursue – all so they’re ready to move as the technologies mature and the Spatial Web moves into the mainstream.

Connect with Strive! 

Here at Strive Consulting, our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your organization’s growth strategy. Whether you’re interested in the Spatial Web or an overall Technology Enablement assessment, Strive Consulting is dedicated to being your partner, committed to success.  

Contact Us

 

Proposed Federal Law Would Boost Security Training for Utilities, Critical Infrastructure Operators

Legislation aims to bolster cyber defenses, but operators should still act now to strengthen security skills

Congress wants to require organizations deemed critical infrastructure to have a cybersecurity awareness training program. And it’s pushing through legislation that would provide such training for free.

More specifically, the Industrial Control Systems Cybersecurity Training Act requires the Cybersecurity and Infrastructure Security Agency – better known as CISA – to provide cybersecurity workers with no-cost training on best practices for securing industrial control systems.

It also calls for CISA to provide both virtual and in-person training, with courses targeted to workers at various skill levels.

These programs would be available to security workers in government entities as well as the private sector.

This new training initiative would supplement a raft of existing training programs already offered by CISA.

The government’s goal here is to ensure security professionals know about emerging threats and how they can most effectively mitigate them – an essential skill in a world where adversarial tactics and techniques are constantly evolving.

Indeed, the bill’s sponsor – U.S. Rep. Eric Swalwell, a California Democrat serving on both the House Select Committee on Intelligence and the House Homeland Security Committee – introduced the bill in May in response to the increasing number of cyberthreats coming out of Russia, saying that the country “must be cognizant of cyberwarfare from state-sponsored actors.”

He noted that this legislation “would help train our information technology professionals in the federal government, national laboratories, and private sector to better defend against damaging foreign attacks.”

Members of both parties agreed: The House on June 21 passed the bill with strong bipartisan support, sending it to the Senate for its approval.

This training initiative has Strive’s vote of confidence, too, as we have long believed that a well-trained, well-informed cybersecurity workforce is essential to protecting both operational technology (OT) and information technology (IT).

And we expect this bill to be enacted into law – as it should be.

Our country needs more training to counter the growing number and sophistication of attacks coming at us here in the United States and at the critical infrastructure sector in particular.

We also recognize that this training could help address some of the challenges that organizations face on the talent front.

First, there’s a lack of cybersecurity professionals in general. A report from the International Information System Security Certification Consortium, or (ISC)², puts the number of unfilled cybersecurity jobs at 377,000 in the United States alone. (It’s about 2.7 million globally.)

The nonprofit Cyberseek puts the number of unfilled U.S. cybersecurity positions even higher, at 714,548 as of mid-August.

At the same time, many of the existing cybersecurity professionals lack some of the essential skills needed to be most effective in their roles – a lack that’s particularly acute in the area of OT cybersecurity, where practitioners must have an understanding of both IT and OT systems as well as the policies, procedures and tools that can protect them.

Consider the findings from The 2022 State of Operational Technology report, which surveyed 3,500 OT security professionals across the globe and found that 69% believe the lack of OT security staff “is diminishing the effectiveness of their organization’s OT security.”

The ICS training act, if passed by the Senate and then signed into law by President Biden, could help alleviate some of those dire findings.

That said, we see no need for critical infrastructure owners and operators to wait for Congress to finalize this act.

Upskilling your existing staff and providing ongoing training to your team is one of the most effective investments you can make – and it’s one you should be making now.

Your security pros already know your environment and have a good handle on the components that present the highest risks and, thus, need the highest levels of protection. So give them the additional skills they need to perform at their best and to their top potential.

As mentioned above, CISA already offers numerous free training programs, including both independent study and instructor-led courses, tailored for critical infrastructure owners and operators. That’s in addition to the training programs offered by multiple other sources, including (ISC)² and SANS as well as colleges and universities.

At the same time critical infrastructure owners and operators should review their cybersecurity awareness training program for their overall workforce to ensure its comprehensive and up-to-date.

It’s worth the effort.

According to the World Economic Forum’s Global Risks Report 2022, 95% of cybersecurity issues can be traced to human error. And the 2022 State of Operational Technology report found that 79% of survey respondents think human error poses the greatest risk for compromise to OT systems.

With figures like that, it’s easy to demonstrate why solid cybersecurity training programs for both security pros and general staff pay off. We see it. The U.S. House of Representatives sees it. And you should know, too, that an investment in security training delivers real returns by decreasing your risk and increasing your security posture.

So, if and when the ICS security training act becomes law, take advantage of the free courses. But don’t feel you should wait for it. Training and up-skilling should be an ongoing activity, and you should be doing it now.

Looking for more information?

Our Cybersecurity & Compliance solutions ensure that your business is protected and secured from cyber threats whenever, wherever. Minimize your risk to cyber attack exposure and regulatory fines without impacting your business operations – Strive can help.

Contact Us

How to Modernize A Data Strategy Approach

Modernizing your company’s data strategy can be a daunting task. Yet making this change — and doing it right — has never been more important, with torrents of data now dictating much of the day-to-day in many organizations.

Missing the boat on making this change now can hold your business back in meaningful ways down the line. Changing your approach to capturing, sharing, and managing your data can help you avoid many of the pitfalls that befall businesses today, such as duplicating data across the organization and processing overlaps.

Implementing an effective data strategy will enable you to treat data not as an accidental byproduct of your business, but an essential component that can help you realize its full potential. Setting out clear, company-specific targets will help you tackle these challenges effectively.

Before you embark on this journey, however, it is crucial to understand why you want to modernize and where you are now and identify the most efficient path to the finish line.

Strategic Vision – Future of Your Data

The first step is to define a vision for your own data modernization. Do you know why you want to modernize your data strategy and what your business can gain in the process? Do you have an aligned strategy and a clear idea of what a thriving  Data ecosystem will entail?

Defining your goals — whether that is to gain a better grasp of your data, enhance accuracy or take specific actions based on the insights it can provide — is paramount before initializing this process.

Equally essential is to ensure early on that executive leadership is on board, since overhauling your data strategy will require significant investment in time and resources. This will be needlessly difficult without full buy-in at the very top. Figuring out how better data management will tie in with your overall business strategy will also help you make your case to leadership.

Ways of Working – Operating Model

Next, you need to figure out how this modernization will take place and pinpoint how your operating structure will change under this new and improved system.

Setting out ahead of time how data engineers and data scientists will work with managers to shepherd this strategy and maintain it in the long run will ensure a smooth process and help you avoid wasting time and resources.

Identifying what your team will look like and gathering the required resources to implement this project will lead you directly into implementation.

Accessibility & Transparency — See the Data

Gaining access and transparency, at its core, is about implementing new systems so that you gain better visibility of the data you have. You want to make sure that your structured and unstructured content — and associated metadata — is identifiable and easy to access and reference.

Putting the infrastructure in place to ingest the data your business already creates, and format it in a way that lets you access it efficiently, might appear basic. But figuring out how to achieve this through data integration or engineering is a vital step and getting this wrong can easily jeopardize the entire project.

Data Guardianship — Trust the Data

Once you have brought your data to the surface, determining ownership within your organization will ensure both that accuracy is maintained, and that data is managed and updated within the correct frameworks. 

This includes applying ethical and data sharing principles, as well as internal governance and testing, so that you can ensure your data is always up-to-date and handled responsibly. Making sure that you can trust the data you are seeing is essential to guarantee the long-time benefits you are hoping to gain through data modernization in the first place. 

Plus, you can rest easy knowing that your reporting data is accurate instead of worrying about falling foul of external compliance metrics and other publication requirements.

Data Literacy — Use the Data

Tying back to your internal data management, literacy is all about making sure that you have the right skillsets in place to make savvy use of the insights you are gaining from your data.

You and your team need to make sure you are trained and equipped to handle this process both during implementation and once your new system is in place — so you can leverage the results in the best possible way and make it easier to access and share data throughout the company.

After all, making secure financial and operational decisions will depend on how much you trust in your own core capabilities. Ideally, a successful data management strategy will enable you to understand every part of your business. This applies not just internally, but also spans your customers, suppliers and even competitors.

Take the First Step with Strive

Our experts at Strive Consulting are here to help you assess whether you are ready to embark on this journey and provide you with a clear perspective of where you are, what’s involved, and how to get there. We are ready to walk you through this process and make sure the final product ends up in the right place, so you can be confident that your data is in safe hands — your own. Learn more about Strive’s Data & Analytics and Management Consulting practices HERE.

Contact Us

Cybersecurity for Utilities: Compliance Does Not Equal Security

The utilities industry remains one of the most heavily regulated sectors in the United States. In fact, every utility must demonstrate its compliance with a significant number of rules and regulations designed to ensure that they each can deliver clean, reliable and safe energy, water or related services.

Given such regulatory obligations, utility executives are intensely focused on ensuring that their organizations comply with the guidelines established by the Environmental Protection Agency, the Federal Energy Regulatory Commission and other such entities. Similarly, utility executives are diligent in making sure they align with frameworks such as the North American Electric Reliability Corporation’s Critical Infrastructure Protection standards (NERC CIP).

That attention to regulations is well-placed. Compliance is non-negotiable, not only because it’s required but because it certifies that you as a utility are performing at the highest levels of safety and efficiency. However, you should not assume that being compliant with all relevant rules and regulations means you’re safe from cyber threats. Compliance does not equal security.

Organizations in the utilities space – and indeed in all other industry verticals – are finding that even when they meet regulatory requirements, they still can have vulnerabilities that unduly expose them to cyber risks.

How can this be? Just consider, for example, that CIP didn’t regulate low-impact assets until recently. In that case, a utility could have been fully compliant with all CIP standards yet still have unprotected low-level assets – a gap that hackers could have exploited and used as entry points to higher-impact assets that, if successfully breached, could have hindered utility operations.

The proof of the compliance vs. security gap can be seen in figures from Verizon’s 2021 Data Breach Investigations Report. It tallied 546 incidents this year (including 355 with confirmed data disclosures) in the mining, quarrying, oil & gas extraction, and utilities sector. Furthermore, the report found that social engineering accounts for 86% of the breaches in the sector, followed by system intrusions and basic web application attacks.

Such statistics indicate that organizations remain vulnerable to cyber attacks even when they’re fully compliant with all the rules and regulations that pertain to this industry. Note, for instance, that phishing attacks and other similar social engineering hacking strategies could possibly succeed even if just one single person in a fully-compliant enterprise falls for the scam.

We see a few other reasons for this dichotomy between being compliant and not necessarily being secure.

As stated earlier, some utilities continue to falsely believe that they’ve adequately secured their environments against cyber threats if they are compliant with all the rules and regulations. Therefore, they’re not investing in needed security measures that fall outside of regulatory requirements.

Similarly, some utilities focus more on compliance and thus invest there to the exclusion of adequate security investments. In such cases, executives often want to ensure that the utility doesn’t encounter negative findings and subsequent fines from regulators; they may not realize that the cost of a cyber incident could be significantly more and bring much more disruption than any regulatory action would.

In other cases, utilities combine security and compliance in one function and task the same people with both jobs – even though those two functions require different skills and expertise and must know and implement completely different strategies and standards. In such circumstances, organizations run the risk of doing neither security nor compliance well and thus falling short in both areas.

On the other hand, some utilities have compliance teams and security teams working independently of each other, each in its own silo. That practice can lead to duplication of efforts, wasted resources and missed opportunities to create a strategic risk management approach that addresses both needs in the most efficient, effective manner.

None of these scenarios is acceptable in an era when the number of cyber threats is growing – one study counted 304 million ransomware attacks worldwide in 2020, a 62% increase from the 2019 tally – and the impact of such attacks is also on the rise.

Companies in critical industries such as utilities are facing a constant threat to their ability to maintain operations and deliver essential services. Given that reality, you must devote the same high-level diligence to security as you commit to compliance.

That means having a security team with the resources needed to think comprehensively about the threats that could impact your utility, the likelihood and potential impact of those threats, and how to guard against them.

It means, too, having a security team capable of implementing, maintaining and maturing the people, processes and technology required to protect the enterprise.

At the same time, you must create an environment where your security and compliance teams can work collaboratively. This helps both departments stay on top of needed actions, as regulators are constantly updating standards to meet new challenges and address emerging threats. It also allows both teams to devise strategies that meet all relevant rules and regulations in a holistic fashion that eliminates gaps but doesn’t waste resources by duplicating efforts.

Keep in mind the payoff for such efforts. You’ll have an environment that delivers the reliability and security the company needs and your customers expect, where compliance requirements inform the security strategy and vice versa. Indeed, in the end you’ll have security and compliance in lockstep to effectively counter their common foe: those bad actors who seek to harm your organization.

If your utility is compliant it could still be ripe for a cyberattack. Let’s talk about how we can help!

3 Ways to Improve Collaboration in the Remote / Hybrid World

We’ve all been in that meeting. There were probably too many people invited, the agenda is vague, 90% of folks are remote and off camera (10% of those are probably folding laundry or some other household chore while they listen in). Then you hear the inevitable words “Let’s brainstorm this.”  2 or 3 enthusiastic participants end up doing most of the talking and the conversation takes on a circular characteristic until a senior leader or manager tries to stop the swirl by making a suggestion of their own. Everyone else is inclined to fall in line, and the meeting moves on to the next arduous loop. After over an hour, you’re left wondering: What did we actually accomplish? 

It’s time to face facts that collaborating in the remote and hybrid world requires different ways of working together. The natural structures of the office that act as palaces of accountability, collaboration, and innovation have been replaced by impersonal video calls in far flung home offices.

Current trends would suggest that the reality of remote and hybrid work isn’t about to end anytime soon, but there are things you can do now to make your virtual meeting time more efficient and enjoyable.  

Here are 3 tips to start improving your virtual meetings.

  • Never start from scratchStructure and visual starting points are always in style

One thing that most people dislike is uncertainty. People are more willing to engage when they know what to expect and feel confident their time will be spent effectively. By providing agendas, objectives, and materials to review before the meeting, you’ll prime your audience or colleagues to start thinking about topics you want to discuss (even if subconsciously) and get better feedback and participation when the time comes.   

About 65% of people are visual learners 1. Having visual aids to help provide context and bring people up to speed quickly will always supercharge your meeting efficiency, particularly in virtual settings, where you can’t draw pretty pictures on whiteboards. Having something to react to will always elicit more effective feedback and progress than trying to start from scratch

  • Use design thinking techniques to unlock diversity in brainstorming sessions

Idea generation and innovation are some of the most difficult things to accomplish in a virtual setting. At Strive, we often leverage ideation techniques from design thinking practices such as affinity mapping, mind mapping, or SCAMPER (among many others) to make brainstorming more enjoyable and participatory for your team. By introducing individual brainstorming and voting principles within these techniques, you’re able to increase participation and better democratize decision making. Using these methods will help your team quickly align around creative ideas everyone can get excited about. Turning boring meetings into fun workshops helps bring some variety and intrigue to your colleague’s days – and they’ll thank you for it.

  • Leverage a virtual collaboration tool like MiroTM

Virtual white-boarding tools have made huge improvements since being thrust into the limelight during the COVID-19 pandemic. We leverage tools like MiroTM with many of our clients to help facilitate engaging workshops, capture requirements, and build relationships. The features of the infinite virtual white-boarding canvas help you capture some of the magic previously only possible when working in person. On top of the real-time collaboration these tools enable, they also provide access to a universe of templates and ideas for creative ways to effectively facilitate a variety of types of meetings (including design thinking techniques mentioned above).

Whatever your role may be in the corporate world, meetings are practically unavoidable, but with these tips and tools, you can become the meeting hero that saves your team from boring and unproductive virtual meetings.

Strive has become experts at virtual collaboration… Need help?

Here at Strive, we take pride in our Management Consulting practice, where we can assist you in your initial digital product development needs, all the way through to completion. Our subject matter experts’ team up with you to understand your core business needs, while taking a deeper dive into your company’s growth strategy.