International Core Journal of Engineering 2020-26

IV. E XPERIMENTAL E VALUATION A. Experimental Setup In this paper, we use HBase as the data layer and set up a cluster with five PCs. HBase uses Hadoop's HDFS file system and the cache layer uses Redis for memory storage management. The node configuration and software version parameters of the cluster are shown in Table 1. TABLE I. C OMPUTER NODE CONFIGURATION INFORMATION . Name Configuration CPU Memory Disk OS Network JVM Version Hadoop Version HBase Version ZooKeeper Version Redis Version Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz*4 16GB 1TB 7200TBM SATA II openSUSE Leap 42.3 Bandwidth 100Mbps jdk1.8.0_144 hadoop-2.6.5 hbase-1.2.6 zookeeper-3.4.10 redis-4.0.9 Fig. 4 Hit rate. C. Retrieval Time The retrieval time refers to the time spent querying the entire retrieval data sequence, which is directly related to the hit rate. When the retrieval data is hit in the cache, the result is returned directly, saving the time of I/O and seeking in disk. Fig. 5 shows the data series retrieval time using cache algorithms and no cache algorithm. The No Cache in the figure indicates that the cache is not used, BlockCache indicates that only the LRU-based block cache method provided by HBase is used, DLK indicates that only the record-oriented double queue K-frequency cache method of this paper is used, and Both indicates that both the BlockCache method and the DLK method are used simultaneously. As seen from the figure, the retrieval time is less after using the cache. the record-oriented cache method is superior to the file-oriented BlockCache method. the DLK and the BlockCache work best when used simultaneously. The data set used in the experiment is an online transaction record of bank, with a total of 1737,044 records and a CSV file of 909MB. In the experiment, according to the data retrieval characteristics of data analysis and model establishment, 20,000 pieces of data are extracted, and an ordered access sequence with high and low frequency uniformly distributed is generated, with a length of 15 sequences. In order to prove the effectiveness of the proposed method, we compare it with LRU, LRU-K and 2Q. Besides, we repeat the experiment several times and average for each operation of the same algorithm, ensuring the stability of the results. And use three different sizes of cache capacity of 6000, 8000, and 1000. B. Hit Rate In the comparative experiment of the hit rate, the hit rate hr can be calculated as follows: hr = hit(D) hit(D)+miss(D) (3) where D denotes the query data, hit(D) presents the number of cache hits in the query data, and miss(D) is the quantity of cache miss. Fig. 5 Retrieval time of using cache and no cache. Fig. 6 shows the comparative results of different cache algorithms. The experiments of each cache algorithm are performed under the condition that the HBase BlockCache is closed. As can be seen from the figure, the retrieval time of the DLK is the shortest, because DLK with a higher hit rate requires less retrieval time. Fig. 4 depicts the comparative results of hit rates. The results show that the hit rate of 2Q is not larger than DLK's. The reason is that the second-level LRU of 2Q evicts the data directly, whereas the DLK evicts the data from second-level queue(i.e. L2) and insert it into the first-level queue(i.e., L1). DLK considers the data in the second-level queue is more likely to be retrieved again. LRU-K needs to maintain a retrieval-history sequence, therefore the available cache capacity is less than other algorithms. Moreover, when the data retrieval interval is greater than the LRU-K retrieval-history sequence length, the data cannot be cached into memory. Fig. 6 Retrieval time of using different cache algorithm. 42

International Core Journal of Engineering 2020-26 | Page 64