荐 Python functools.lru_cache 实现高速缓存及其原理

2022-07-31,,,

Python functools.lru_cache 实现高速缓存及其原理

      • 原理
      • 原理测试
        • 功能
        • 缓存原理测试
        • LRU算法
        • 图解绘图来源
      • 源码讲解
        • 依赖
        • lru_cache函数
        • _lru_cache_wrapper 函数
        • _make_key 生成缓存存贮的键
        • func.cache_clear() 清除缓存
        • func.cache_info() 获取缓存信息

原理

lru_cache的实现,依赖于Python的闭包,以及LRU算法。另外,这个缓存方式是线程安全的,其生命周期,开始于进程创立后的被装饰函数的的第一次运行,直到进程结束

  1. 借助Python的闭包,实现函数结果的高速缓存
  2. 借助LRU算法(最近最少使用),实现函数结果的高速缓存更新。
  3. 如果,maxsize=None,则将禁用LRU功能,并且缓存可以无限增长。
  4. 如果将typed设置为true,则将分别缓存不同类型的函数参数。例如,f(3)和f(3.0)将被视为具有不同结果的不同调用
  5. 由于使用字典来缓存结果,因此函数的位置和关键字参数必须是可哈希的。
  6. 可以将不同的参数模式视为具有单独的缓存项的不同调用。例如,f(a=1,b=2)f(b=2,a=1) 的关键字参数顺序不同,并且可能具有两个单独的缓存条目。
  7. 借助func.cache_info()方法,查看缓存信息
  8. 借助func.cache_clear()方法,清除缓存信息

讲解中,注释写的比较明白,如想进一步了解,可以仔细阅读

原理测试

功能

下面这个测试函数,只会在第一次运行的时候,停顿1s。其它次运行,不会停顿

import time
from functools import lru_cache


@lru_cache
def sleep(x, y):
    time.sleep(1)
    print('I woke up')
    return x * 10, y * 10

sleep(1, 2)
sleep(1, 2)

缓存原理测试

可以仔细观察,闭包参数的变化,执行函数前后,变化的参数前五个参数(下表中的最上面的5个参数,参数含义参考下文函数讲解)。由此可见,这个装饰器的缓存位置确实是利用闭包来实现

closure_pre = sleep.__closure__
for i in closure_pre:
    print(f'| {i} | {i.cell_contents} |')

sleep(1, 2)
closure = sleep.__closure__
for i in closure:
    print(f' {i} | {i.cell_contents} |')
print('| 第一次运行前 | 闭包参数 | 第一次运行后 | 闭包参数 |')
print('|--|--|--|--|')
参数名 第一次运行前 闭包参数 第一次运行后 闭包参数
cache <cell at 0x000001570D814B80: dict object at 0x000001570D554440> {} <cell at 0x000001570D814B80: dict object at 0x000001570D554440> {[1, 2]: [[[…], […], None, None], [[…], […], None, None], [1, 2], (10, 20)]}
full <cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770> False <cell at 0x000001570D814C10: bool object at 0x00007FF8232CD770> False
misses <cell at 0x000001570D814C40: int object at 0x00007FF823311680> 0 <cell at 0x000001570D814C40: int object at 0x00007FF823311680> s0
hits <cell at 0x000001570D814D00: int object at 0x00007FF823311680> 0 <cell at 0x000001570D814D00: int object at 0x00007FF8233116A0> 1
root <cell at 0x000001570D814D30: list object at 0x000001570D7F4C80> [[…], […], None, None] <cell at 0x000001570D814D30: list object at 0x000001570D7F4C80> [[[…], […], [1, 2], (10, 20)], [[…], […], [1, 2], (10, 20)], None, None]
原函数 <cell at 0x000001570D814DF0: function object at 0x000001570D5A8940> <function sleep at 0x000001570D5A8940> <cell at 0x000001570D814DF0: function object at 0x000001570D5A8940> <function sleep at 0x000001570D5A8940>
PREV <cell at 0x000001570D644250: int object at 0x00007FF823311680> 0 <cell at 0x000001570D644250: int object at 0x00007FF823311680> 0
NEXT <cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0> 1 <cell at 0x000001570D6441F0: int object at 0x00007FF8233116A0> 1
KEY <cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0> 2 <cell at 0x000001570D5396A0: int object at 0x00007FF8233116C0> 2
RESULT <cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0> 3 <cell at 0x000001570D8143D0: int object at 0x00007FF8233116E0> 3
maxsize <cell at 0x000001570D814CD0: int object at 0x00007FF823312680> 128 <cell at 0x000001570D814CD0: int object at 0x00007FF823312680> 128
typed <cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770> False <cell at 0x000001570D814D90: bool object at 0x00007FF8232CD770> False
cache_len <cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20> <method-wrapper ‘len’ of dict object at 0x000001570D554440> <cell at 0x000001570D814BE0: method-wrapper object at 0x000001570D814E20> <method-wrapper ‘len’ of dict object at 0x000001570D554440>
cache_get <cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0> <built-in method get of dict object at 0x000001570D554440> <cell at 0x000001570D814BB0: builtin_function_or_method object at 0x000001570D7D3BD0> <built-in method get of dict object at 0x000001570D554440>
make_key <cell at 0x000001570D814CA0: function object at 0x000001570D81C310> <function _make_key at 0x000001570D81C310> <cell at 0x000001570D814CA0: function object at 0x000001570D81C310> <function _make_key at 0x000001570D81C310>
lock <cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40> <unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40> <cell at 0x000001570D814C70: _thread.RLock object at 0x000001570D814E40> <unlocked _thread.RLock object owner=0 count=0 at 0x000001570D814E40>

LRU算法

这里主要是借助循环双链表实现的,参考下图,理解LRU实现过程。

初始化

添加第一个缓存

添加第二个缓存

重新运行第一次调用

缓存已经到了最大值,最旧的缓存替换为新缓存

图解绘图来源

Python Tutor,交互式网站,可以查看变量的引用动画

绘图代码,一次运行下面代码,观察变量变化

# 初始化

PREV, NEXT, KEY, RESULT = 0, 1, 2, 3
cache = {}
root = []
root[:] = [root, root, None, None] 

# 添加第一个缓存 
key= 1
result = 10
last = root[PREV]
link = [last, root, key, result]
last[NEXT] = root[PREV] = cache[key] = link

# 添加第二个缓存

key1 = 2
result1 = 20
last1 = root[PREV]
link1 =  [last, root, key1, result1]
last1[NEXT] = root[PREV] = cache[key1] = link1

# 重新运行第一次调用

link_prev, link_next, _key, result2 = link

link_prev[NEXT] = link_next
link_next[PREV] = link_prev
last = root[PREV]
last[NEXT] = root[PREV] = link
link[PREV] = last
link[NEXT] = root

# 缓存已经到了最大值,最旧的缓存替换为新缓存
key2 = 3
result2 = 30
oldroot = root
oldroot[KEY] = key2
oldroot[RESULT] = result2
root = oldroot[NEXT]
oldkey = root[KEY]
oldresult = root[RESULT]
root[KEY] = root[RESULT] = None
del cache[oldkey]
cache[key2] = oldroot

源码讲解

依赖

from collections import namedtuple
from _thread import RLock

WRAPPER_ASSIGNMENTS = ('__module__', '__name__', '__qualname__', '__doc__',
                       '__annotations__')
WRAPPER_UPDATES = ('__dict__',)

_initial_missing = object()

# 类的一种简易声明方式
_CacheInfo = namedtuple("CacheInfo", ["hits", "misses", "maxsize", "currsize"])

lru_cache函数

  1. maxsize:这个参数有两个含义:user_function,直接以lru_cache(user_function)形式使用,maxsize默认为128。另一种含义就是maxsize=maxsize,用来指定最大能缓存几个函数结果。
  2. typed:参数类型是否作为存储键的生成
def lru_cache(maxsize=128, typed=False):
    """
    :param maxsize: 这个参数有两个含义:user_function,直接以lru_cache(user_function)形式使用,maxsize默认为128。另一种含义就是maxsize=maxsize,用来指定最大能缓存几个函数结果。
    :param typed:
    :return:
    """
    if isinstance(maxsize, int):
        # Negative maxsize is treated as 0
        if maxsize < 0:
            maxsize = 0
    elif callable(maxsize) and isinstance(typed, bool):
        # The user_function was passed in directly via the maxsize argument
        user_function, maxsize = maxsize, 128
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)
    elif maxsize is not None:
        raise TypeError(
            'Expected first argument to be an integer, a callable, or None')

    # maxsize=None
    def decorating_function(user_function):
        wrapper = _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo)
        return update_wrapper(wrapper, user_function)

    return decorating_function

_lru_cache_wrapper 函数

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
    # 一个被装饰的函数对象,所有的调用,共用一个缓存对象
    sentinel = object()  # 未命中缓存时的默认值
    make_key = _make_key  # 用于生成缓存内容的键
    PREV, NEXT, KEY, RESULT = 0, 1, 2, 3  # names for the link fields

    cache = {}  # 闭包缓存内容存储位置
    hits = misses = 0  # hit:命中次数,misses:缓存结果个数
    full = False  # misses是否大于等于maxsize
    cache_get = cache.get  # 为了方便,写的一个中间函数
    cache_len = cache.__len__  # 缓存的长度,未调用,替代len(cache)函数
    lock = RLock()  # 列表的插入不是线程安全的,用这个锁来保证这个装饰器的线程安全
    root = []  # 循环双链表的根
    root[:] = [root, root, None, None]  # 初始化循环双向列表

    if maxsize == 0:
        def wrapper(*args, **kwds):
            # 没有缓存-只是统计信息更新
            nonlocal misses
            misses += 1
            result = user_function(*args, **kwds)
            return result

    elif maxsize is None:
        def wrapper(*args, **kwds):
            # 只会无限添加缓存,不会限制长度,顺序
            nonlocal hits, misses
            # 利用函数参数创建缓存键
            key = make_key(args, kwds, typed)
            result = cache_get(key, sentinel)
            if result is not sentinel:
                hits += 1
                return result
            misses += 1
            result = user_function(*args, **kwds)
            cache[key] = result
            return result

    else:
        def wrapper(*args, **kwds):
            nonlocal root, hits, misses, full
            key = make_key(args, kwds, typed)
            with lock:
                link = cache_get(key)
                if link is not None:
                    # 如果链表中存在这个缓存,将其移到链表的最前面
                    link_prev, link_next, _key, result = link
                    link_prev[NEXT] = link_next
                    link_next[PREV] = link_prev
                    last = root[PREV]
                    last[NEXT] = root[PREV] = link
                    link[PREV] = last
                    link[NEXT] = root
                    hits += 1
                    return result
                misses += 1
            result = user_function(*args, **kwds)
            with lock:
                if key in cache:
                    # 命中缓存
                    pass
                elif full:
                    # 如果缓存已经达到最大,将会删除最旧的缓存,写入新内容
                    oldroot = root
                    oldroot[KEY] = key
                    oldroot[RESULT] = result
                    root = oldroot[NEXT]
                    oldkey = root[KEY]
                    root[KEY] = root[RESULT] = None
                    del cache[oldkey]
                    cache[key] = oldroot
                else:
                    # 添加新内容到缓存中
                    last = root[PREV]
                    link = [last, root, key, result]
                    last[NEXT] = root[PREV] = cache[key] = link
                    full = (cache_len() >= maxsize)
            return result

    def cache_info():
        """缓存内容"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())

    def cache_clear():
        """清除缓存"""
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

    wrapper.cache_info = cache_info
    wrapper.cache_clear = cache_clear
    return wrapper

_make_key 生成缓存存贮的键

class _HashedSeq(list):
    """ 
     此类保证每个元素调用hash()的次数不超过一次。 
     这很重要,因为lru_cache()将在缓存未命中多次哈希键。
    """
	# 这个声明的作用是阻止在实例化类时为实例分配dict,节省存储空间
    __slots__ = 'hashvalue'

    def __init__(self, tup, hash=hash):
        self[:] = tup
        self.hashvalue = hash(tup)

    def __hash__(self):
        return self.hashvalue


def _make_key(args, kwds, typed, kwd_mark=(object(),), fasttypes={int, str}, tuple=tuple, type=type, len=len):
	"""
    键值是由原函数参数和参数类型组成的,参数类型可以不适用
    :param typed: 
    """

    key = args
    if kwds:
        key += kwd_mark
        for item in kwds.items():
            key += item
    if typed:
        key += tuple(type(v) for v in args)
        if kwds:
            key += tuple(type(v) for v in kwds.values())
    elif len(key) == 1 and type(key[0]) in fasttypes:
        return key[0]
    return _HashedSeq(key)

func.cache_clear() 清除缓存

更改闭包参数,置为初始值。

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
	...
	 def cache_clear():
        """清除缓存"""
        # 闭包中,在函数内部,用来更新闭包参数值,须先利用关键字nonlocal声明
        # 作用类似于 global
        nonlocal hits, misses, full
        with lock:
            cache.clear()
            root[:] = [root, root, None, None]
            hits = misses = 0
            full = False

func.cache_info() 获取缓存信息

输出结构声明参见上文的,依赖部分

def _lru_cache_wrapper(user_function, maxsize, typed, _CacheInfo):
	...
    def cache_info():
        """缓存内容"""
        with lock:
            return _CacheInfo(hits, misses, maxsize, cache_len())
# 输出内容
# CacheInfo(hits=0, misses=2, maxsize=128, currsize=2)

本文地址:https://blog.csdn.net/wei_bo_cai/article/details/107860395

《荐 Python functools.lru_cache 实现高速缓存及其原理.doc》

下载本文的Word格式文档,以方便收藏与打印。