The goal is to build a product-grade system for predicting buying behavior of current and prospect customers of insurance companies and a model for predicting the probability of the customer resigning their policy next year.
The entire insurance industry in Denmark is buzzing with machine learning- and data-driven opportunities yet not many companies attempted to improve their processes using related technologies. As a collaboration between TIA Technology and one of our customers, we set ourselves to build a product recommendation system and a churn model that are going to bring measurable business value and set a precedence for how machine learning can be efficiently used in Nordic insurance market.
We use a KNN and Neural Network model for product recommendations and an LSTM and XGBoost for the churn.
The gif below shows a live integration beetween a front-end platform and a machine learning api.
The predictions of the model change upon an insurance product being added to a customer's profile.
Oracle SQL Developer, Python, Pandas, Keras, Sklearn, Flask, Docker
We're mostly working with data stored in a data warehouse prepared for the purposes of our BI department, which provides us with expertise regarding the data extraction. We’re using Pandas for data transformations, Sklearn for the product recommendations and Keras for time series modelling.
The slides are a part of a presentation for a client we’re working with.
- Product recommendation
The product recommendation system is built out of 2 parts. KNN model that automatically approximates current customers’ buying behavior by comparing each customer to their closest neighbor, and a Neural Network that does the same for new customers. We use XGBoost to interpret the results in terms of feature importances.
The churn model works on an insurance policy level, where a year-by-year time series data is used to come up with a probability of the policy not being extended next year. As the baseline we use an XGBoost linear model on the last row before prediction and try to use an RNN to beat its performance.
My contribution as an ML Software Engineer
I am involved in every part of the pipeline and I'm responsible
for the entire process from the data being available in a CSV format,
through the conceptualizing, experimentation, model building, communication with the client, to a live API.
I developed a product recommendation and churn model and created a Flask api for retraining it
and exposing its predictions. I integrated it with a front-end application and used docker-compose
to make an online version of the product available for demo purposes.