This was a great discussion on how established leaders in their markets (e.g., Allstate in the Insurance Pricing arena) are using competitive platforms (in this case Kaggle) to bring new algorithmic concepts and analytical approaches to solve classic problems.
Allstate’s competition was discussed and some interesting perspectives shared.
Eric (from Allstate) mentioned that some of the winning algorithms would have been too complex to explain to Allstate’s customers and thus may not have been suitable for practical implementation. That statement is partially true, as most insurance carriers are using GLM and derivative models whereas the public’s general comprehension ends with simple regression. As such, the causality of higher premiums may be explained in generic terms, the actual contributors are usually deemed trade secrets. The ‘disconnect from the model’ or ‘black box model’ perception is going to increase over time as complex algorithms (sometimes referred to as ‘machine learning’) begins to gain prominence in more consumer facing interactions.
Another point that was tangentially discussed was the inadvertent over-fitting of the submitted models to the sample data set. This is not surprising given the competitive personalities engaged on Kaggle, as noted Jeremy (Kaggle’s Chief Scientist). The takeaway is that data preparation is paramount and the criteria used to benchmark need further scrutiny. After all, the teams participating are astute and whose deep technical expertise will be focused on winning as per definition. In some cases, the definition may unintentionally deviate from the perceived objectives.
Finally, Jeremy raised an interesting point on how the best models are those from outside the subject domain. Existing benchmarks are set by those familiar with the subject domain and thus well versed in the conventional ways. For a ‘game changing shift’, a radical approach is necessary and generally occurs when applying patterns from other domains/fields.
It is going to be interesting as new algorithms and tools are leveraged in established industries. Though not a big fan of ‘payday loan companies’, the founder of ZestCash was interviewed at the same conference in terms of how his company is using thousands of attributes coalesced into 10 models to better rate and underwrite loans to the underbanked. This is quite revolutionary where the ‘industry benchmark’ is a regression model developed in the 70s using 12-15 criteria and commonly referred to as FICO.