Explore how digital footprint analysis improves credit scoring models, boosting default prediction accuracy and approval rates in emerging markets.
Credit scoring models usually rely on traditional financial data, which can reduce prediction accuracy.
Let’s explore how incorporating digital footprint analysis can enhance default prediction and approval rates, especially in emerging markets.
To write this article, we used an extensive study of the impact of digital footprinting on scoring – On the Rise of FinTechs – Credit Scoring using Digital Footprints.
We’ll discuss insights from this study alongside data from RiskSeal’s experience to show how these solutions elevate business credit scoring models beyond default prediction.
Credit score modeling uses statistical methods to predict how likely a borrower is to default, meaning they may not repay a loan as agreed.
Credit models use various data about the applicant to calculate the score.
Traditional models analyze the applicant's financial information, such as payment history, credit history length, types of credit accounts, and recent credit inquiries.
When lenders incorporate alternative data for credit scoring, they leverage a variety of non-traditional data sources. For example, RiskSeal offers digital footprint analysis as part of its comprehensive credit scoring solution.
Data enrichment enhances the predictive accuracy of score models, leading to higher approval rates and a reduction in defaults.
The research team from the Frankfurt School of Finance & Management identified 10 key variables commonly analyzed in digital footprint analysis:
1. Device type. The device from which the customer can visit the site. Examples: Desktop, Tablet, Mobile.
2. Operating system. The operating system the customer is using. Examples: Windows, iOS, Android.
3. Email provider. The type of email service used by the client. Examples: Gmail, Yahoo, T-Online.
4. Channel. The path through which the customer came to the site. Examples: through ad clicks or direct URL entry.
5. Check-out time. The time of day the customer makes a purchase. This can be an indication of the customer's habits.
6. Do Not Track settings. Customers can choose whether to allow tracking of their device or operating system.
7. Email error. Whether an error was made in the customer's email address.
8. Name in email. The client's email contains a name.
9. Number in email. The presence of numbers in the email address.
10. Is lower case. The mail address consists of lower case letters (first name, last name, street, or city).
These variables can serve as proxy indicators of certain customer characteristics, allowing you to segment customers by various criteria. For example, based on income, character, or reputation.
The resulting data are used to improve service quality, customize marketing campaigns, and assess risks.
RiskSeal uses a number of variables that significantly improve the accuracy of a financial organization's credit scoring model.
We enrich the data with hundreds more data points. They can be obtained in three ways – email lookup, phone number lookup, and IP lookup.
These are RiskSeal’s variables:
1. By email
2. By phone number
3. By IP
The researchers found a direct correlation between the default rate and the borrower's credit score.
This indicator is also dependent on some variables derived from the digital footprint analysis.
This is displayed in the diagram below:
Based on this, we can draw the following conclusions:
1. The default rate is directly related to the borrower's credit rating. The deciles in the bottom left corner of the graph indicate that users with the highest credit scores have the lowest default rates.
Conversely, users with the lowest credit scores have the highest default rate. These are placed in the top right corner of the diagram.
2. Linking default risk and digital footprints. Certain combinations of digital footprints, such as “Android + Yahoo” or “Android + Hotmail,” are associated with higher levels of default.
Other combinations, such as “Mac + T-online”, indicate a lower likelihood of delinquency.
3. Impact of individual variables on credit risk levels. The graph indicates that, in addition to combinations of quantifiable traces, other factors can also be useful in assessing credit risk.
Individual variables, like the operating system or email host, also affect the likelihood of default, though their impact may be less significant than that of data combinations.
The graph demonstrates that enhancing scorecards with alternative data can be useful for predicting credit risk alongside traditional credit scores.
In RiskSeal, we also combine several variables to improve the credit score model. For example, a positive attribute is signing up for “Amazon + Netflix”. This group of users will have a low default rate.
RiskSeal assigns borrowers a Digital Credit Score ranging from 0 to 999, determined through digital footprint analysis.
This digital credit scoring model predicts the probability of a person meeting their financial obligations, providing a direct assessment of default risk.
Our team analyzed the link between digital credit scores and default rates for a major credit institution in Mexico. The results show a direct correlation. See the table below for details.
This table shows our client's default rate trend:
The chart below shows this trend more clearly:
AUC, or Area Under the Curve, is a statistical metric that represents the area under the Receiver Operating Characteristic (ROC) curve.
In the context of credit scoring, it allows you to evaluate the performance of a credit score model.
AUC indicates the probability that the model correctly classifies a random pair (one default, one non-default).
The higher the AUC, the more likely the model is to correctly predict defaults versus non-defaults.
When little information is available (e.g., for new customers or regions), the desired AUC = 60%. If sufficient data is available, 70%.
The graph above shows that the bottom 25% of rated customers cover 65% of defaults. This means that the model can identify the majority of defaults concentrated in the bottom quarter of the rating distribution.
The graph below provides a visual comparison of the discriminatory power of traditional and alternative credit scoring.
The key values represented on it are as follows:
The AUC for “Credit Bureau Score” is 68.3%. This shows that the credit score can predict defaults better than random guessing. However, its quality is not high.
The AUC for “Digital Footprint” is 69.6 percent. This figure is slightly greater than that of the credit score. That is, digital footprints provide comparable discriminatory power.
The AUC for the combined “Credit Bureau Score + Digital Footprint” model is 73.6 %. This is 5.3 % more than that of the credit score model.
Such performance suggests that adding digital footprints to the credit score model significantly improves its predictive power.
We at RiskSeal have also seen a significant increase in predictive power in scoring models when improving credit score models using digital footprints.
For example, one of our clients, a Mexican credit organization, achieved an AUC of 80% after just three months of using our solution.
This case study confirms that such AUC scores are realistic using only digital footprint data from RiskSeal.
Adding digital footprints to traditional credit scores (Credit Bureau Scores) improves the discriminatory power of the model. It becomes more accurate in predicting defaults and reducing credit risk.
To visualize the contribution of individual variables to AUC, we lay out the relevant results of the study.
This information permits us to analyze the impact of certain variables on the accuracy of default prediction.
The Marginal AUC is used for this purpose. It shows by how many percentage points the AUC of the credit scoring model increases when a specific variable or a combination of variables is added.
The main conclusions that were obtained are:
This highlights that user behavior is critical in the credit risk prediction process.
Based on our own observations and the conclusions of the study, we at RiskSeal can speak with confidence about the crucial importance of digital footprint analysis for credit score modeling.
Incorporating data from alternative data providers boosts the predictive power of scoring models to new levels.
This approach contributes to the accurate prediction of default.
In addition to increasing the efficiency of risk assessment, the methods described in this article allow increasing the approval rate.
This is even more relevant in emerging markets, where there is no information about potential borrowers in credit bureaus.
Data from digital footprint analysis can be incorporated into credit scoring models, increasing their predictive power. The results of profile studies prove that the use of traditional data in combination with alternative data is 5+ percentage points more effective than each of these approaches separately.
Two types of variables can be used in credit scoring models. Those that potentially indicate a consumer's income (e.g., paid/free email host or OS). And those that reflect the applicant's daily behavior (e.g., browser settings, transaction completion time, etc.).
AUC is a statistical metric that measures the area under the ROC curve – Receiver Operating Characteristic. In the context of credit scoring, it allows you to evaluate the effectiveness of a credit score model.
AUC indicates the probability that the model correctly classifies a random pair (one default, one non-default). The higher the AUC, the more likely the model will correctly predict the probability of default.
RiskSeal uses credit scoring data derived from analyzing the digital footprint of potential borrowers. Credit scoring models based on them are characterized by excellent predictive power. According to our observations, lenders can use them to achieve a record AUC of up to 0.8.