As if the engineering and conceptual issues surrounding data analysis weren’t complicated enough, firms attempting to crunch “real world” data—i.e., information on actual people, such as financial and medical records—have to deal with a bewildering array of regulations designed to keep that data safe and secure.
Hence the evolution of startups such as Sqrrl, which utilizes Apache Accumulo to crunch data at a cell level, and can scale up to handle petabytes’ worth of data. Most of its founding staff, including CEO Oren Falkowitz, previously did stints at the National Security Agency; because after you’ve created data-analysis tools for top-secret intelligence work, what’s the next logical life-step but to start a small company with a cute cartoon squirrel for a logo? A few on the team also helped develop the Apache Accumulo project, which is why it forms the startup’s software backbone.
Apache Accumulo is based on Google’s BigTable design and built atop Apache Hadoop. Other open-source projects incorporating some elements of BigTable include Cassandra and HBase. Accumulo innovations include the aforementioned cell-level access control and the ability to “modify key/value pairs at various points in the data management process,” according to the Apache Software Foundation’s Website.
Apache Hadoop is an increasingly popular framework for reliably running distributed applications on large hardware clusters. It’s already in use by a number of organizations large and small, including IBM and Facebook, and research firms such as IDC estimate that worldwide revenues from software relying on it will earn hundreds of millions—if not billions—over the next few years.
However, Sqrrl claims that its software can “move beyond” Hadoop batch processing “and conduct a wide variety of real-time analyses, including information retrieval, statistical, graph, and visualizations across diverse data environments.”
In Sqrrl’s estimation, row- and column-level security restrictions simply aren’t enough to actually lock down data while ensuring a smooth analysis. Organizations need cell-level security, which Sqrrl claims can more easily support flexible data schemas, a variety of indexing patterns, and analytic adaptability “at scale.” Only certain users can see certain kinds of data, thanks to authorization-based access controls; product security measures include auditing, encryption, and integration with platforms such as the Kerberos authentication protocol.
Sqrrl claims users authentication is undertaken with minimal loss of performance, due to built-in caching and compression algorithms. By locking down individual cells, the software can ensure that all pieces of information within a dataset are truly secure; contrast that with importing a whole record for analysis, which may include a combination of secure and unsecure data—a big no-no when it comes time to crunch data for, say, healthcare studies.
Evidently, others see value in Sqrrl’s model: the company recently raised $2 million in capital.