Adventures in Snowflake Cost Management

Pay for use is both an exciting and challenging aspect of using the Snowflake Data Cloud. I’ve lead workshops and proudly proclaimed “Snowflake has brought an end to capacity planning!” And it has.  You never have to figure out how much storage or processing power you are going to need. You don’t have to plan for three-year storage needs and hope that you’ve not bought too little. It’s a constant dance – but no more. With Snowflake you can just add whatever data you need and only pay for what you are using. The same is true for the query processing power. When Black Friday hits, you have power on demand and yet you’re not paying for power all year long.

Now budget planning? That’s a different story… Typically, you will have bought a certain size machine to run your database or contracted for a certain amount of cloud capacity…and whether you use it a little or a lot, you pay the same. When you see your Snowflake costs sky rocket, you’ll start to think about usage in ways you never had to before. 

Here are some tips for being more efficient with your Snowflake spend.

Think Small, Run Big

Thinking time and development time should be done on an x-small or small compute warehouse. When it comes time to run a job or a long query, that’s when you spin up a larger warehouse, run the job, and then shut the warehouse down. You have capacity on demand, so you will want to size your warehouse to optimize cost both in what Snowflake charges and in human capital. Why wait for 2 hours on a long job when you can run it in 15 minutes by using a warehouse 8 times the size? For the most part, you’ll see run times cut in half and the cost doubled at each size up. So, it’s cost neutral to use a bigger warehouse but saves human cost.

Sometimes even the Snowflake cost is saved by running a more expensive, larger warehouse. How so? If the compute warehouse is too small, it may have to spill data to local or even remote cache. Disk drives are a lot slower than ram. When you use a larger sized warehouse, you also get more ram. Thus, the query or load can complete so much faster that you are saving more than the extra cost of being large.

One Expandable Warehouse for All

It is typical for companies to assign each team or business unit their own warehouse. It’s one of the ways companies can manage cost charge-back. However, it’s inefficient to have multiple warehouses with their meters running up charges when a single shared warehouse will do. To handle overuse, you set it up as a multi-cluster that will spawn other instances when there is demand and shrink them when demand goes away. You use roles or tags to handle divvying up the shared cost across those using the warehouse.

Break Large Load Files Into Many Smaller Ones

Snowflake is a massively parallel database. Each node in a Snowflake warehouse cluster has 8 processes. A large sized warehouse has 8 nodes, 32 processes. If you try to load a single large file, only one of the processes is used. If you have the file broken up (Snowflake recommends 100-250mb chunks), then all 32 processes will work in parallel rocketing your loading performance.

Judicious Use of Cluster Keys

Snowflake builds micro-partitions when data is loaded. For 90% of the scenarios, you can just let Snowflake do its thing and you will get great performance. This is one of the ways Snowflake is so cost effective, it doesn’t take an army of tuning DBAs to operate. However, there are going to be times when you will need to put a cluster key on a table to get the performance needed. And poor performing queries cost extra money.

There was a 40 billion row table that was joined to a 3 billion row table in a view that brought reporting to its knees. Clustering both tables on the join keys enable the report to run in less than 2 minutes. For more information on clustering see Snowflake’s documentation.

Lift and Shift Still Needs Tuning

One of the common mistakes is to assume that “if it worked in the old system, it should work in Snowflake”. You will encounter performance issues (and thus cost issues) whose solution will not lay in adjusting Snowflake warehouses. 

Here are just some recent tuning scenarios I’ve encountered:

There was a data load that was running $500 to $1500 per day. 8 billion rows of inventory were loaded per day. Every item in every store across the world was scanned. The loading procedure used a MERGE.  So, 8 billion searches to find the right row and update the data. And yet, there was no history. Once the merge happened the current value was the only value. Thus, a merge wasn’t needed at all.  In effect, the table was a daily snapshot of inventory and the data coming in was all that was needed. Removing the merge took a process from 8 hours on a very expensive 64 node Snowflake warehouse to a couple minutes of a 32-node snowflake warehouse. A savings of $15k-$30k per month was realized.

Just because “the query worked on XYZ database” doesn’t mean everything is okay. A very expensive and long running query on Snowflake was fixed by discovering a cartesian join. When all the proper keys were added to the join, the query ran fast.

Oftentimes in mature systems – there are views built upon views built upon views. A slow report sent me spelunking through the “view jungle”. I discovered one of the views had a join to a table where no fields from that table were used plus a distinct. At a half billion rows, this unnecessary join and thus unnecessary distinct caused the performance problem.

The take away is that a good deal of the work will be taking a fresh look at the problem and not taking “the old system” as gospel for the new system.

Monitor the Spend

Snowflake has views to help you monitor cost and performance. They are located in the Snowflake database in the ACCOUNT_USAGE schema. If you have multiple accounts, the combined values are in the ORGANIZATION_USAGE schema. There are prebuilt Tableau, PowerBI, Sigma and other dashboards you can download. There is no substitute, however, for getting familiar with the views themselves.

Strive is a proud partner of Snowflake!

Strive Consulting is a business and technology consulting firm, and proud partner of Snowflake, having direct experience with query usage and helping our clients understand and monopolize the benefits the Snowflake Data Platform presents. Our team of experts can work hand-in-hand with you to determine if leveraging Snowflake is right for your organization. Check out Strive’s additional Snowflake thought leadership HERE.

ABOUT SNOWFLAKE

Snowflake delivers the Data Cloud – a global network where thousands of organizations mobilize data with near-unlimited scale, concurrency, and performance. Inside the Data Cloud, organizations unite their siloed data, easily discover and securely share governed data, and execute diverse analytic workloads. Join the Data Cloud at SNOWFLAKE.COM.