International Core Journal of Engineering 2020-26 | Page 62

store the cached data, but save the index of data. When data appearing in the retrieval-history queue has reaches K times, the data is put into the cache. While K is 1, LRU-K algorithm is LRU algorithm. And the larger K is, the higher hit ratio gets, but the lower adaptability of the algorithm is. The authors propose k=2 for a decent trade-off of performance and overhead. Although LRU-K increases impact of data retrieve frequency on replacement by using the retrieval-history sequence, retrieval-history sequence will take up much cache space. If the length of historical sequence is too short, the hit ratio is decreased, otherwise, the cache space becomes smaller, affecting cache efficiency. LARC also caches data by blocks, which divides blocks into mediocre block and popular block. The mediocre block is not considered into the cache, which reduces the data volume of SSD and promotes performance and durability. In order to ensure the higher hit ratio of LARC, the popular block is cached into SSD. As a prerequisite to reduce overhead, LARC algorithm can dynamically adjust to accommodate different types of workloads. Hence, the LARC algorithm needs SSD hardware's support, which can't be used in hardware environment without SSD. At the same time, LARC algorithm has disadvantages of ARC algorithm. In a distributed environment, scholars also have conducted extensive research. For example, Top-K based on distributed memory in HiBase, an algorithm for index hotspot data cache replacement proposed by Wei Ge et al. [12], is established to cache index data and improve index query efficiency. The core idea of TOP-K is to use the “access-to-insert” strategy to make full use of the cache space. When the cache space is full and a replacement operation needs to be performed, the cumulative heat score is performed for each piece of data through the data access frequency and the heat attenuation coefficient, and the data with the lower cumulative heat score is eliminated and removed from the cache. But in the environment where the data access frequency is almost the same, the operation of swapping in and out is drastically increased, resulting in a drop in the hit rate. Jing Zhang et al. [13] adapt a real-time cloud service-oriented cache system. A distributed cache layer based on the LRU algorithm is added to the Hadoop distributed file system to cache files. Its cache system still has the weakness of LRU, which pays no attention to the data access frequency. Johnson et al. [7] proposes a 2Q algorithm that is based on two cache queues. If the data is retrieved, it will be cached in the first queue. When the data in the first cache queue is retrieved again, it will move the data from the first cache queue to the second one. LRU algorithm is used for two cache queues. The second queue of the 2Q algorithm directly discards the eliminated data using the LRU policy, but when the data is accessed periodically, page jitter tends to occur. Zhou et al. [8] proposes Multi-Queue (MQ) algorithm by using multi LRU queues. If the data in the queue is retrieved more than 2 times, it is promoted to the next queue until the data all in the last queue. In order to prevent the data from being eliminated forever, the data needs to be moved from the current queue to the previous queue when the data is not accessed within the specified time range. Instead of eliminating the data directly, the cached data is removed from the cache queue and placed in the visit retrieval queue; if the requested data is hit in the retrieval-history queue, the data priority is calculated and moved to the appropriate level of cache queue. However, the MQ algorithm is complex and has high execution overhead and the delay is high in real-time system. In an application environment with high real-time requirements, when the replacement strategy is too complicated, it will increase the response time. Considering the frequency of data access and use of double queues, this paper organizes cached data to ensure that data with high retrieved frequency could be stored into cache space and not easily eliminated. In the cache space, the indexes of cache data and the cached data are respectively organized using different data structures, and the data search and the corresponding cache replacement operations are implemented in an asynchronous manner to reduce the query response time for search quickly. ARC algorithm [9] is a dynamic cache algorithm, which uses two LRU queues L1 and L2 for caching data. However, the capacity of the two queues is dynamically changed. The one-to-one correspondence between L1 and L2 is the two retrieval-history queues H1 and H2. The same as LRU-K, the two history queues also do not cache data. The new required data is put into L1 according to the LRU algorithm. While the data in L1 is visited again, the data is moved into L2 queue. The data index is put into the corresponding H1 and H2, when L1 and L2 eliminate data. When the data in H1 is accessed, the ARC algorithm considers that the capacity of L1 is insufficient, the capacity of L2 is reduced by 1, and the capacity of L1 is increased by 1. And when the data in H2 is retrieved, the same as H1. Both ARC and LRU-K need more storage space, and ARC algorithm are more complex and costly to replace. III. A N LRU- BASED D OUBLE Q UEUE K- FREQUENCY C ACHE M ETHOD A. Cache Architecture In this paper we propose a two-layer cache architecture, which included the cache layer and the data layer. The cache layer stores hot data. The data layer stores origin data in a distributed environment. Fig. 1 illustrates the cache architecture and operation flow of data retrieval. The client sends a retrieval request to cache layer. If the data is found directly in the cache, cache layer will return these results to client, meanwhile trigger an asynchronous operation in order to update the related attributes of the data in the cache. If no data is found, the request is redirected to the data layer for retrieval. After returning the retrieval result to client, an asynchronous update operation is triggered and the result is updated into the cache. Hongbin Tsai et al. [10] proposes a Time-shift least recently used (TSLRU) algorithm, that calculates the data access frequency in a period. TSLRU uses two data structures FIFO queue storing the recently retrieved data, and min heap storing the re-retrieved cached data. Using the different data structures caching data ensures that different retrieval-frequency data has different cache lifetime. As the TSLRU algorithm needs to maintain the min heap structure, it takes a long time to adjust the heap structure while inserting or deleting data. Sai Huang et al. [11] puts forward a dynamic cache algorithm LARC based on the SSD(Solid State Drives). 40