Reduce Costs, Improve Query Performance, and Optimize Warehouse Usage with Snowflake
As organizations expand their Snowflake footprint, there comes a time to pose questions like ‘Are there opportunities to save on costs?’ or ‘Is our environment running efficiently?’ Fortunately, this is where Snowflake shines with several levers to pull to help maximize cost and performance. For those who are not familiar with Snowflake’s billing structure / platform, Snowflake bills for compute and storage independently. Follow along to learn about ways that help tune data environments to help reduce cost and improve query/general performance when working within the Snowflake Data Platform.
Putting it simply, organizations can tune environments through a variety of ways. Two powerful methods that can greatly impact cost and performance is through scaling up and scaling out.
What is scaling up? In an ideal world, queries (a request for data from one or more database tables) return in less than a second. However, in the real world, query response times can vary from less than a second to hours, even days, based on a number of factors. Size of the warehouse plays a key factor in the time it takes to process a query. Scaling is the processing power, or horsepower, used to process a query. Available warehouse sizing follow t-shirt sizes (XS, S, M, L, XL, etc.).
When is it best to scale up? Since scaling up increases processing power, it is best used when organizations want to improve the performance of complex queries. It’s worth mentioning that ‘larger’ does not always mean ‘faster’. For basic queries that are already executing quickly, you may not see performance improvements after resizing.
What is considered a complex query? The more tables, columns, and/or conditional logic specified in the query, the more complex the query is. An example of a complex query is a query that joins a frequently used fact table with 1 billion records, 10 years of history, and 5 dimensions. Different sized warehouses impact the time and effort used when processing data. For example, if you used an extra small warehouse, the query returns in 20+ minutes. Small warehouse, the query returns in 12 minutes. Medium warehouse, query returns in as little as 5 minutes and so on…you get the drift.
How does this impact cost and benefit organizations at the enterprise level? One of the benefits most loved about Snowflake is the per-second billing model, meaning you can run larger warehouses and simply suspend the warehouse when not in use. Snowflake makes this easy with its auto-suspend capability. When the auto-suspend feature is disabled, costs can run high. It is recommended businesses set auto-suspend to a low value (e.g., 5 or 10 minutes or less).
Below is an example of where auto-suspend can be configured.
One of the many services Strive provides is a Health Check where one of our highly experienced consultants partner with a client to check this and a variety of other configurations in their environment to optimize, identify bottlenecks and issues, and ensure organizations are maximizing all features that Snowflake has to offer.
“2000 Simultaneous Queries against a 40 billion record table all return in under 10 seconds.” – Larry Querbach: Enterprise Data Architect, devon
Scaling out is where warehouses are added to create a multi-cluster warehouse (requires a Snowflake Enterprise Edition or higher).
When is it best to scale out? Concurrency, concurrency, concurrency. Multi-clusters are designed specifically for handling queuing and performance issues related to large numbers of concurrent users and/or queries. It’s helpful to remember that the difference between scaling up vs. out is that scaling out is best used for higher number of concurrent queries. When there are more queries submitted that can be processed, queries accumulate in the queue and wait to be run. An example of where scaling out can add value can be found here and is shown in the image below. The image shows the load of a warehouse from 1pm to 5 pm. The orange bars indicate that queries accumulated in the queue. Blue bars are queries that ran concurrently during this time frame. Remember, if the running query load is high or there’s queuing, consider starting a separate warehouse and moving queued queries.
How does this impact cost? Not auto-scaling out translates to delays in viewing key data points required to make quick decisions. Businesses can be stuck in a waiting room causing bottlenecks, which leads to time wasted. Auto-scaling out enables organizations and data departments to process more data at one time, getting data in your hands quicker to make data-led decisions and stay ahead of the fast-paced business environments seen today in our marketplace.
So, what have we learned?
Scaling up is best for complex queries. Scaling out is best for processing concurrent queries. Ways to reduce costs, improve query performance, and use warehouses effectively and efficiently are:
- Scaling out is also known as using multi-cluster warehouses. Scaling out is best when servicing high volume of concurrent queries.
- If you are using Snowflake Enterprise Edition (or a higher edition), all your warehouses should be configured as multi-cluster warehouses and with auto-scale mode. This enables Snowflake to automatically start and stop warehouses as needed saving on costs.
- Remember to enable auto-suspend and keep the number low between 5 to 10 minutes to avoid paying for usage when not in use.
- Larger warehouses do not always translate to faster query response times. Increasing the size of a warehouse helps complex queries, not so much for basic queries.
- Experiment with different types of queries and different warehouse sizes to determine the combinations that best meet your specific query needs and workload.
Interested in optimizing warehouse usage and reducing cost with Snowflake?
Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership here.
Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at Snowflake.com.