In some recent articles, there has been discussion of data standards for emerging Insurance technologies such as telematics. The advice provided appears to be off mark in terms of optimizing business value from ‘Big Data’ concepts.
A key aspect of ‘Big Data’ is being able to store raw data as it is being generated and to leverage it for a need once it has been identified. The technologists would call it ‘late binding’ where the data is extracted from source systems in its raw form and the structure is only imposed when a query is sent. This allows one to store raw data and apply evolving structure as they learn more about the information needed.
To aid an iterative exploratory use of data, the following guiding principles should be considered:
1. Store raw information in non-relational but structured format
- At an entity level (e.g., customer, account), structure allows linkages to similar data items. This aids analysis and allows one to incremental apply more detailed query drill through at time of analysis
- Lack of relational construct does not mean that the data is unstructured – it just means that enough structure is in place to further mapping into relational constructs at time
2. Premature optimization can impede innovation and discovery
- Discovery of insights relies on prototyping in an iterative manner. Building out a
- By building out specific data models for the raw data to be mapped to (traditionally done as part of the ETL process), it impacts overall solution time and effort. The traditional approach results
- By optimizing and mapping the raw information into a data model, there are usually assumptions made that may result in the fidelity and details being lost. These would be key in case the original assumptions are later found to be invalid
3. Controls to ensure privacy and protection need to be embedded at time of data sourcing
- Complexity around the sources and data structure necessitates that controls be embedded at the point of sourcing. Through a pragmatic approach, one can categorize the level of safeguards in a balanced manner that does impede the prototyping and discovery approach
4. Throw out the book on sampling
- Unless your data is in petabytes or more, the need to sample down has diminished. For once, the bias of sampling has been removed from the discovery process, allowing for an unadulterated perspective.