International Core Journal of Engineering 2020-26 | Page 64
IV. E XPERIMENTAL E VALUATION
A. Experimental Setup
In this paper, we use HBase as the data layer and set up a
cluster with five PCs. HBase uses Hadoop's HDFS file system
and the cache layer uses Redis for memory storage
management. The node configuration and software version
parameters of the cluster are shown in Table 1.
TABLE I. C OMPUTER NODE CONFIGURATION INFORMATION .
Name Configuration
CPU
Memory
Disk
OS
Network
JVM Version
Hadoop Version
HBase Version
ZooKeeper Version
Redis Version Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz*4
16GB
1TB 7200TBM SATA II
openSUSE Leap 42.3
Bandwidth 100Mbps
jdk1.8.0_144
hadoop-2.6.5
hbase-1.2.6
zookeeper-3.4.10
redis-4.0.9
Fig. 4 Hit rate.
C. Retrieval Time
The retrieval time refers to the time spent querying the
entire retrieval data sequence, which is directly related to the
hit rate. When the retrieval data is hit in the cache, the result is
returned directly, saving the time of I/O and seeking in disk.
Fig. 5 shows the data series retrieval time using cache
algorithms and no cache algorithm. The No Cache in the
figure indicates that the cache is not used, BlockCache
indicates that only the LRU-based block cache method
provided by HBase is used, DLK indicates that only the
record-oriented double queue K-frequency cache method of
this paper is used, and Both indicates that both the
BlockCache method and the DLK method are used
simultaneously. As seen from the figure, the retrieval time is
less after using the cache. the record-oriented cache method is
superior to the file-oriented BlockCache method. the DLK
and the BlockCache work best when used simultaneously.
The data set used in the experiment is an online
transaction record of bank, with a total of 1737,044 records
and a CSV file of 909MB. In the experiment, according to the
data retrieval characteristics of data analysis and model
establishment, 20,000 pieces of data are extracted, and an
ordered access sequence with high and low frequency
uniformly distributed is generated, with a length of 15
sequences. In order to prove the effectiveness of the proposed
method, we compare it with LRU, LRU-K and 2Q. Besides,
we repeat the experiment several times and average for each
operation of the same algorithm, ensuring the stability of the
results. And use three different sizes of cache capacity of 6000,
8000, and 1000.
B. Hit Rate
In the comparative experiment of the hit rate, the hit rate
hr can be calculated as follows:
hr =
hit(D)
hit(D)+miss(D)
(3)
where D denotes the query data, hit(D) presents the
number of cache hits in the query data, and miss(D) is the
quantity of cache miss.
Fig. 5 Retrieval time of using cache and no cache.
Fig. 6 shows the comparative results of different cache
algorithms. The experiments of each cache algorithm are
performed under the condition that the HBase BlockCache is
closed. As can be seen from the figure, the retrieval time of
the DLK is the shortest, because DLK with a higher hit rate
requires less retrieval time.
Fig. 4 depicts the comparative results of hit rates. The
results show that the hit rate of 2Q is not larger than DLK's.
The reason is that the second-level LRU of 2Q evicts the data
directly, whereas the DLK evicts the data from second-level
queue(i.e. L2) and insert it into the first-level queue(i.e., L1).
DLK considers the data in the second-level queue is more
likely to be retrieved again. LRU-K needs to maintain a
retrieval-history sequence, therefore the available cache
capacity is less than other algorithms. Moreover, when the
data retrieval interval is greater than the LRU-K
retrieval-history sequence length, the data cannot be cached
into memory.
Fig. 6 Retrieval time of using different cache algorithm.
42