Apache Ranger Centralized Security and Audit
Framework
and used as the Hive execution engine
• Partition the data to avoid table scans
• Use ORC as the underlying storage file format
Apache Ranger offers a centralized security frame-
work to manage fine-grained access control over
Hadoop, Hive and other related components such as
HBase. Using the Apache Ranger administration con-
sole, users can easily manage policies controlling
access to a Hive database, table, or column for a par-
ticular set of users and/or groups. For deeper control
of the environment, Apache Ranger also allows for
audit tracking and policy analytics.
In this section, we want to discuss Tez in more detail,
and mention three more performance levers that can
significantly improve query performance in Hive:
Vectorized Query Execution, Cost Based Optimizer
and Long Live and Process (LLAP).
Apache Hive on Tez
In Urdu the word ‘Tez’ means fast, swift, intelligent.
Apache Tez has became the new paradigm for Hive
execution by enabling sub-second query perfor-
mance that was not possible in the ‘MapReduce’
world. MapReduce is still supported for Hive execu-
tion but Tez is now the default engine when running
Hive jobs in Hadoop. As mentioned before, Tez avoids
disk IO by avoiding expensive shuffle and shorts,
while leveraging more efficient map side joins. For a
typical execution pattern, data flows from node to
node of an execution graph (like Apache Spark, Tez
represents the computation as a direct acyclic graph);
reducer’s intermediate data is passed to the next
reducer without any disk writes. Consequently,
Apache Tez benefits from more memory (heap size of
HiveServer) and tuning of memory parameters. For
Apache Ranger policy control consists of two major parts:
• Specification of resources for which the policy
is applicable (such as Hive database/tables/
columns)
• Specification of conditions, such as users/
groups, access-types and custom-conditions,
for which the access should be allowed
Hive Performance Optimization
We have already discussed three important elements
of an Apache Hive implementation that need to be
considered carefully to get optimal performance
from Apache Hive.
• Make sure Tez is installed on the EMR cluster
Enterprise
Legacy
System
Security Admins
Define Policies
Audit User Activity
Enterprise
Directory
Services
Sync
Users
Ranger
User Sync
Server
Authenticate
Access
Data in
Hadoop
Systems
Enterprise Users
Plugins Authorize
the Access & Audit
the Activity
28 | THE DOPPLER | FALL 2017
HDFS Plugin
Hive Plugin
H Base Plugin
Knox Plugin
Storm Plugin
Sync
Users
Policy
Integration
Policy Admin
Server
Policy DB
Sync
Policies
Store Audit
Activity
Figure 4: Apache Ranger Architecture
Audit
Store