Cloudera is stepping further into the security arena with Sentry, an open-source project that provides an authorization framework for Apache Hadoop.
The security module is designed to make Hadoop, a popular framework for crunching massive amounts of unstructured data, a more palatable option for high-security industries such as financial services and healthcare. It includes authorization controls for multi-user applications that can meet the Role Based Access Control (RBAC) requirements of those industries.
Sentry integrates with Apache Hive and Cloudera Impala, the latter an open-source SQL query engine for data stored in HDFS (Hadoop Distributed File System) and HBase. (Cloudera claims Impala can process queries 10 to 30 times faster than Hive/MapReduce.) It can assign user privileges at a sub-file level, which means companies and developers can have employees crunch massive datasets without worrying about worrying about more sensitive data ending up visible.
For example, say a company wants an employee to run a query that incorporates multiple datasets about customer behavior, but wants to keep certain types of information—credit card data, addresses, etc.—hidden from that employee. In theory, Sentry provides that sort of fine-grained control over access; administrators can choose to block data based on employee roles, or particular subsets of data within a database. There’s also multi-tenant administration, so those administrators can hand off security tasks to underlings.
While other security solutions have hardened the perimeter around datasets, and encryption can shield data itself from unauthorized viewing, relatively few tools have dealt with the access side of the equation—something Sentry means to address.
But Sentry isn’t the first granular security solution for Big Data: software produced by database-software startup Sqrrl, for example, allows users to lock down data at an individual-cell level as it progresses through a particular system. Like Sentry, Sqrrl relies on authorization-based access controls, paired with other security measures such as auditing, encryption, and integration with platforms such as the Kerberos authentication protocol.
Big Data is a hot sub-industry at the moment, with revenues and software offerings expected to only increase in coming years. As it gets bigger, expect more security tools to roll out, as well.
Image: Fillipe Matos Frazao/Shutterstock.com