As part of Lyft's data science challenge, together with a fellow classmate, I used a dataset containing three months of data about drivers, riders, and rides to create a recommendation on how to assess a driver's lifetime value based on simple data points.
Before throwing ourselves into building random models and adjusting parameters or finding some form of distributions that would help us find a solution that fits the data set, we took the time to understand what problem we are really solving and what this might entail. Thus, we created a roadmap that would help us to systematically find a solution bit by bit.
1. General Research - Acquiring Domain Knowledge
2. White Boarding - What questions do we have to ask and what would we have to know?
3. Exploratory Analysis - Cleaning and analyzing descriptive properties as well as fitting models for analysis
4. Recalibration - Improving models based on new insights and identified pitfalls
5. Story Telling - Weaving results together with domain knowledge to build story
1. General Research
To make sense of the data we decided to understand how a driver's lifetime value is generated and what context it entails. This included research about incentives for increased driving time, driver and company benefits, general lifetime value calculations, and specific driving behavior.
2. White Boarding
After making sense of what we are looking at, we broke down the problem into its subcomponents and the question that we had to answer by either using the knowledge that we gained or determining gaps that we still had.
3. Exploratory Analysis
To begin our analysis, we simply created a descriptive analysis that would understand the lifetime of a driver based on the last date that is stored in the dataset. This also involved deleting or replacing missing data. We decided to remove data for the purpose of simplifying analyses as we did not have enough domain knowledge as well as data to ensure that simply putting values at means or medians would not skew the explanatory power of the analysis. Following this, we created a correlation heat map to understand what factors might affect the lifetime of a driver.
Following the initial analysis we then further delve into what our data is telling us. For example, we do not have the entire scope of a driver's lifetime because a user might stop driving just because we don't have more data about his further trips. The limitations of the data set meant that we had to make educated decisions about which users to include as well as how to understand behavior in terms of regularity when driving.
6. Story Telling
Using the information gained, we then built a story around what this might mean for Lyft in addition to the limitations that we had found. This included a much deeper behavioral analysis to find patterns in which advertisement can be most effective to prolong the lifetime of a driver.
To see the exact code and results please have a look at the Gist!