如何维持缓存的一致性?缓存和维护缓存一致性的不同方式
有一句名言:"计算机科学有两大难题:缓存不一致和变量命名。"本文就介绍缓存与源数据不一致的基本知识。
作者:Lu Pan
出处:https://blog.the-pans.com/different-ways-of-caching-in-distributed-system/
Phil Karlton曾经说过,“计算机科学中只有两件事:缓存失效和命名事物。[1]”引用还有其他很好的变化。我个人最喜欢的是杰夫阿特伍德的名言,“计算机科学有两个难点:缓存失效,命名事物和一个一个错误。”显然缓存很难。像分布式系统中的几乎所有东西一样,乍一看也许看起来很难。我将在分布式系统中经历一些常见的缓存方法,这应该涵盖您将使用的绝大多数缓存系统。具体来说,我将重点关注如何保持缓存一致性。
缓存和缓存一致性
在我们讨论不同的缓存方式之前,我们需要非常准确地了解缓存和缓存一致性的含义,特别是因为一致性是一个严重超载的术语。
在这里我们将缓存定义为:
一个单独的系统,它存储底层数据存储的物化部分视图。
请注意,这是一个非常通用和宽松的定义。它包括通常被认为是缓存的内容,它存储与(通常是持久的)数据存储相同的值。它甚至包括人们通常不会想到的缓存。例如,数据库的分解二级索引[2]。在我们的定义中,它也可以是缓存,并且维护缓存一致性很重要。
这里我们称缓存是否一致
eventually the value of key
k
should be the same as the underlying data store, ifk
exits in cache.
With this definition, a cache is always consistent if it stores nothing. But that's just not interesting at all since it utterly useless.
为什么要使用缓存
Usually cache is deployed for improving read/write performance. Performance here can be latency, throughput, resource utilization, etc. And usually these are correlated. Protecting database is usually also a very important motivation for building a cache. But you can argue that it's also a performance problem that it's solving.
不同种类的缓存
Look-aside / demand-fill cache
For look-aside cache, client will query cache first before querying the data store. If it's a HIT, it will return the value in cache. If it's a MISS, it will return the value from data store. That's it. It says nothing about how the cache should be filled. It just specifies how it would be queried. But usually, it's demand-fill. Demand-fill means in the case of MISS, client will not only uses the value from data store, but also puts that value into cache. Usually if you see a look-aside cache, it's also a demand-fill cache. But it doesn't have to be. E.g. you can have both cache and data store subscribe to the same log (e.g. Kafka) and materialize independently. This is a very reasonable setup. And the cache in this case is a look-aside cache but not demand-fill. And the cache can even have fresher data than the durable data store.
Simple, right? However simple Look-aside/demand-fill cache can have permanent inconsistency! This is often overlooked by people due to the simplicity of look-aside cache. Fundamentally because when client PUT some value into cache, the value can already be stale. Concretely for example
Then from that point on, client will keep getting A
from cache, instead of B
, which is the latest value. Depends on your use case, this may or may not be OK. It also depends on if cache entry has TTL on it. But you should be aware of this before using a look-aside/demand-fill cache.
This problem can be solved. Memcache e.g. uses lease
to solve the problem. Because fundamentally, client is doing read-modify-write on cache with no primitives to guarantee the safety of the operation. In this setting, the read
is read from cache. modify
is read from db. write
is write back to cache. A simple solution for doing read-modify-write is to keep some kind of "ticket" to represent the state of the cache on read
and compare the "ticket" on write
. This is effectively how Memcache solves the problem. And Memcache calls it lease
, which you can of as just simple counter which gets bumped on every cache mutation. So on read
, it gets back a lease
from Memcache host, and on write
client passes the lease
along. Memcache will fail the write
if lease
has been changed on the host. Now get back to our previous example:
Things are consistent again :)
Write-through / read-through cache
Write-through cache means for mutation, client writes to cache directly. And cache is responsible of synchronously write to the data store. It doesn't say anything about reads. Clients can do look-aside reads or read-through.
Read-through cache means for reads, client reads directly from cache. And if it's a MISS, cache is responsible of filling the data from data store and reply to client's query. It doesn't say anything about writes. Clients can do demand-fill writes to cache or write-through.
Now you get a table looks like
TAO (TAO: Facebook’s Distributed Data Store for the Social Graph) for example is a read-through & write-through cache.
It's not very common to have write-through & look-aside cache. Since if you have already built a service that sits in the middle of client and data store, knows how to talk to the data store, why not do it for both read and write operations. That being said, with limited cache size, a write-through & look-aside cache might be the best thing for hit ratio depending on your query pattern. E.g. if most reads would follow immediately after the write, then a write-through & look-aside cache might provide the best hit ratio. Read-through & demand-fill doesn't make sense.
Now let's take a look at the consistency aspect of write-through & read-through cache. For single box problem, as long as update-lock
for write
and fill-lock
for read
are grabbed properly, read and writes to the same key can be serialized and it's not hard to see that cache consistency will be maintained. If there are many replicas of cache, it becomes a distributed system problem, which a few potential solutions might exist. The most straightforward solution to keep multiple replicas of cache consistent is to have a log of mutations/events and update cache based on that log. This log serves the purpose of single point of serialization. It can be Kafka or even MySQL binlog. As long as mutations are globally total ordered in a way that's easy to replay these events, eventual cache consistency can be maintained. Notice that the reasoning behind this is the same as synchronization in distributed system.
Write-back / memory-only cache
There's another class of cache that suffers from data loss. E.g. Write-back cache will acknowledge the write before writing to durable data store, which obviously can suffer from data loss if it crashes in between. This type of cache has its own set of use cases usually for very high volume of throughput and qps. But doesn't necessarily care too much about durability and consistency. Redis with persistence turned off fits into this category.
According to https://martinfowler.com/bliki/TwoHardThings.htmland https://skeptics.stackexchange.com/questions/19836/has-phil-karlton-ever-said-there-are-only-two-hard-things-in-computer-science, this famous quote indeed came from Phil Karlton. ↩︎
It doesn't work like regular aggregated secondary indexes, as if you read from the index, it might be lagging and show stale results. Normal ACID attributes of relational database doesn't specifically mention indexes. I would say database with disaggregated secondary index, breaks atomicity in ACID. ↩︎
本文为转载文章,版权归原作者所有,不代表本站立场和观点。