A Guide To The Snowflake Results Cache
Caching is a critical performance feature in complex systems. It's often a huge win to re-use previous results to avoid an expensive calculation.
Snowflake speeds up queries with caching in two ways: the results cache, which stores query results for 24 hours, and the micro-partition cache, which stores blocks of table data locally in a virtual warehouse.
For many Snowflake users, it's not critical to know how these features work. They're designed to invisibly speed up queries in the background and can't directly be controlled or configured.
If you manage a large Snowflake deployment however, it may be useful to understand how caching works and how you can use it to your advantage. In this post, I'll share the most important aspects of results caching that you should know.
Results caching
Quickstart
SELECTs
and queries that return data are cached by Snowflake. Results are cached for 24 hours.- The results cache is used automatically for repeat queries. If the underlying data has changed or the tables are reclustered, the cache won't be used.
- The results cache is global. Different users or roles running an identical query will use the cache.
- It's essentially free. Cached queries are served out of the cloud services layer and do not use storage or compute resources.
- Queries must be identical for the cache to work. Any syntax change will be considered a new query, even if logically equivalent.
- Not every query will benefit. Queries using external functions or volatile functions like
CURRENT_TIMESTAMP
won't leverage the results cache. - It can be disabled. You can force Snowflake to re-run a query and ignore the cache using the
USE_CACHED_RESULT
parameter.
Deep dive
For most queries that return data, the results are cached for 24 hours. Snowflake calls this feature Persisted Query Results. The core use case for the results cache is to optimize reporting, like when a popular report in a BI tool is reloaded frequently despite that the underlying data hasn't changed.
Result caching is used automatically. You can see this in action if you run the same query twice in a row - the first run will compute the results while the second run will pull the results from cache.
Access the results cache directly
Snowflake allows you to read from the results cache programmatically with the RESULT_SCAN
('{query_id}')
function. This gives you full access to all data that has been cached in the last 24 hours. Here is a simple example:
// Run a query
SELECT 1;
// Fetch the query id
SELECT LAST_QUERY_ID();
// => 01af1fae-0001-577b-0001-ed9600561e1a
// Retrieve query results from cache
SELECT
*
FROM
TABLE(RESULT_SCAN('01af1fae-0001-577b-0001-ed9600561e1a'))
;
The cache is typically for SELECT
queries. However, using RESULT_SCAN
you can access the results of any query that returns data, including:
SELECT
SHOW
DESC
CALL
DELETE
The RESULT_SCAN
function can a helpful tool for running queries on top of system commands like SHOW
that can't be embedded directly in a SELECT
. For example, if we needed to calculate the total number of rows across all tables, we could write:
SHOW TABLES;
WITH tbls AS (
SELECT *
FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
)
SELECT
sum(tbls."rows") AS all_rows
FROM tbls
It is also possible to write results directly to the cache. This is helpful if you want to pre-warm the cache for queries that return lots of data. Using Snowflake scripting, we can execute a query and load the results into cache without returning those results to our client.
EXECUTE IMMEDIATE $$
BEGIN
EXECUTE IMMEDIATE 'SELECT ...';
RETURN LAST_QUERY_ID();
END;
$$;
Disabling results caching
There are situations where you don't want the results cache to be used, like benchmarking query performance. You can disable this by setting the USE_CACHED_RESULT
parameter on the session, the user, or the account.
// Change cache behavior for this session
ALTER SESSION SET USE_CACHED_RESULT = false;
// Change cache behavior for a user
ALTER USER teej SET USE_CACHED_RESULT = false;
// Change cache behavior globally
ALTER ACCOUNT SET USE_CACHED_RESULT = false;
Cache Utilization
It's possible to measure cache utilization with Snowflake's QUERY_HISTORY
view. While this view doesn't directly say if a query used the cache, you can use warehouse_size
as a proxy. When warehouse_size
is NULL, that means the query was either served from metadata or served from results cache.
SELECT
start_time::DATE as query_date,
COUNT_IF(WAREHOUSE_NAME IS NULL) / COUNT(*) as pct_cached
FROM
SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE
query_type = 'SELECT'
AND start_time >= current_timestamp - INTERVAL '14 day'
GROUP BY ALL
Caveats
There are situations where Snowflake will internally cache results, but they won't be used on subsequent queries. You'll most likely to run into this by using a function like CURRENT_TIMESTAMP()
which changes on every run.
Using these will always prevent Snowflake from using the results cache:
- Volatile functions (both built-in and user-defined)
- External functions
These other conditions will prevent Snowflake from using the results cache:
- Changing certain session parameters that modify query results, such as
ROWS_PER_RESULTSET
orTIMESTAMP_OUTPUT_FORMAT
. Session parameters that don't modify results likeAUTOCOMMIT
are OK. - Modifying table micro-partitions in any way. This includes all reclustering and DML operations.
- Modifying table schema by adding or removing columns, even if micro-partitions aren't modified. Other schema modifications like column comments and constraints are OK.
- Any changes to the query syntax. Whitespace changes are generally ok, but there are edge cases that will break caching, like adding a space between a function name and the starting parentheses.
COUNT
░(*)
. Inline comments are OK.
Other considerations
- In the query profile, a query that was served directly from cache will be represented by a single
QUERY RESULT REUSE
node. Queries that build on top of cached results will include anInternalObject
node to represent theRESULT_SCAN
call. - Snowflake respects permissions when accessing the results cache. Roles that wouldn't otherwise have access to the underlying tables will not be able to retrieve the cached data.
Results caching is an important feature that helps you achieve fast query times in Snowflake. Understanding how it works will help you improve performance and decrease costs.