EMC’s recent acquisition of Pivotal Labs coincided with the release of Greenplum Chorus. The former seems to be driven with the need to inject its internal software organization talent and leadership around Agile delivery; large software shops from time to time need a bit of cultural change to enhance productivity and remain nimble. For Pivotal, its distribution and marketing prowess just got a shot in the arm - EMC’s sales and marketing coupled with its penetration in F1000 companies will help it compete better against the likes of Rally as Agile Delivery is just starting to gain traction.
The main story though is around the notion of self-service. Just a few years ago, the notion of business users being able to write their own queries would have given their IT counterparts the shivers. IT has long had the centralized, locked down mentality when it came to business intelligence and analytics. They felt only they were intimately familiar with optimization of data access patterns and understood the ramifications of data distribution better than its owners. Though times are changing: most analytical oriented databases can manage adhoc workloads better through smarter query optimization and processing, better suited design (e.g., columnar databases) and make use of new infrastructure capabilities (e.g., SSD, in-memory architecture) In addition, the tools used by business users, such as MicroStrategy or Cognos are getting better at pushing down the various operations to the database, allowing for better use of processing power.
In the last year or so, the notion of self-service has expanded beyond queries. Chorus is a prime example of this trend. Some of the capabilities are foundational, like federated metadata repository and search. On those dimensions, it is addressing areas where Greenplum was lagging and with this release is closer to par with some of its competitors.
The more notable and leading capabilities are the self-service around provisioning (or ‘spinning out’) of data sets for studies and ability for users to integrate their own data sources via REST or by uploading common file formats. If the underlying data is stored appropriately, it allows data scientists to be self sufficient for most of their daily activities, without relying on IT support. In addition, it accelerates the integration of third party datasources in the investigative phase, enhancing overall organizational learning productivity.
Finally, few of the capabilities are an implicit acknowledgement that researchers or data analysts are not the most organized bunch - the concept of shared libraries and code seems like a marketing euphemism for code management tools. For most developers these are second nature, though for analytics departments needing to scale as they grow, these become a necessity to preserve the intrinsic knowledge and manage rapid iterations across a large team.
Like EMC, Microsoft has been focusing on the accessibility and integration of third party data more heavily (versus ‘spinning out’ of datamarts). The first foray was launch of Data Marketplace on Azure. In addition to data sets, there are applications that can be accessed. Supposedly, building on SQL Server 2012, there will be a ‘private’ version of marketplace for use within the organization that would allow users to collaborate on queries, data sets and visualizations.
It will be interesting to see how the database vendors (e.g., Teradata) react as well as the vertical integrated (e.g., Cognos + DB2 offering from IBM) evolve to address the growing awareness that social collaboration is key to unlocking the information potential of corporate data assets.