November 5, 2024

Improving Credit Score Modeling With Digital Footprint Insights

Explore how digital footprint analysis improves credit scoring models, boosting default prediction accuracy and approval rates in emerging markets.

Vadim Ilyasov

CTO @RiskSeal

How to Improve Credit Score Modeling With Digital Footprints

Table of contents

Credit scoring models usually rely on traditional financial data, which can reduce prediction accuracy.

Let’s explore how incorporating digital footprint analysis can enhance default prediction and approval rates, especially in emerging markets.

To write this article, we used an extensive study of the impact of digital footprinting on scoring – On the Rise of FinTechs – Credit Scoring using Digital Footprints.

We’ll discuss insights from this study alongside data from RiskSeal’s experience to show how these solutions elevate business credit scoring models beyond default prediction.

‍

Credit score modeling - from traditional to data-driven insights

Credit score modeling uses statistical methods to predict how likely a borrower is to default, meaning they may not repay a loan as agreed.

Credit models use various data about the applicant to calculate the score.

Traditional models analyze the applicant's financial information, such as payment history, credit history length, types of credit accounts, and recent credit inquiries.

When lenders incorporate alternative data for credit scoring, they leverage a variety of non-traditional data sources. For example, RiskSeal offers digital footprint analysis as part of its comprehensive credit scoring solution.

Data enrichment enhances the predictive accuracy of score models, leading to higher approval rates and a reduction in defaults.

Transform your scorecards

with alternative data

Book a Demo

‍

Top variables available from digital footprint analysis

The research team from the Frankfurt School of Finance & Management identified 10 key variables commonly analyzed in digital footprint analysis:

1. Device type. The device from which the customer can visit the site. Examples: Desktop, Tablet, Mobile.

2. Operating system. The operating system the customer is using. Examples: Windows, iOS, Android.

3. Email provider. The type of email service used by the client. Examples: Gmail, Yahoo, T-Online.

4. Channel. The path through which the customer came to the site. Examples: through ad clicks or direct URL entry.

5. Check-out time. The time of day the customer makes a purchase. This can be an indication of the customer's habits.

6. Do Not Track settings. Customers can choose whether to allow tracking of their device or operating system.

7. Email error. Whether an error was made in the customer's email address.

8. Name in email. The client's email contains a name.

9. Number in email. The presence of numbers in the email address.

10. Is lower case. The mail address consists of lower case letters (first name, last name, street, or city).

These variables can serve as proxy indicators of certain customer characteristics, allowing you to segment customers by various criteria. For example, based on income, character, or reputation.

The resulting data are used to improve service quality, customize marketing campaigns, and assess risks.

‍

Digital footprint variables used by RiskSeal

RiskSeal uses a number of variables that significantly improve the accuracy of a financial organization's credit scoring model.

We enrich the data with hundreds more data points. They can be obtained in three ways – email lookup, phone number lookup, and IP lookup.

These are RiskSeal’s variables:

1. By email

Linked online accounts on various online platforms
Email delivery success
Age of email
Domain information
Whether the email has been blacklisted
Type of email (e.g., identification of temporary email addresses)

2. By phone number

Type of phone number (burner phones, disposable numbers, virtual SIM cards)
Country code (can be compared with geolocation data)
Registered online accounts
Whether the number is on blacklists or spam lists

3. By IP

Type of connection (residential, commercial, or mobile)
Use of anonymizes (TOR, Proxy, VPN, etc.)
User geolocation
Presence of high-risk IPs in databases

Stay connected with RiskSeal

‍

Correlation between credit scores and default rates

The researchers found a direct correlation between the default rate and the borrower's credit score.

This indicator is also dependent on some variables derived from the digital footprint analysis.

This is displayed in the diagram below:

The scatter plot comparing digital footprint scores and credit bureau scores. A trend line suggests a slight positive correlation (R² = 1.0%), indicating that credit scores explain very little of the variation in digital footprint scores. The dataset includes 2,488 data points.

Based on this, we can draw the following conclusions:

1. The default rate is directly related to the borrower's credit rating. The deciles in the bottom left corner of the graph indicate that users with the highest credit scores have the lowest default rates.

Conversely, users with the lowest credit scores have the highest default rate. These are placed in the top right corner of the diagram.

2. Linking default risk and digital footprints. Certain combinations of digital footprints, such as “Android + Yahoo” or “Android + Hotmail,” are associated with higher levels of default.

Other combinations, such as “Mac + T-online”, indicate a lower likelihood of delinquency.

3. Impact of individual variables on credit risk levels. The graph indicates that, in addition to combinations of quantifiable traces, other factors can also be useful in assessing credit risk.

Individual variables, like the operating system or email host, also affect the likelihood of default, though their impact may be less significant than that of data combinations.

The graph demonstrates that enhancing scorecards with alternative data can be useful for predicting credit risk alongside traditional credit scores.

‍

RiskSeal's Digital Credit Scoring model explained

In RiskSeal, we also combine several variables to improve the credit score model. For example, a positive attribute is signing up for “Amazon + Netflix”. This group of users will have a low default rate.

RiskSeal assigns borrowers a Digital Credit Score ranging from 0 to 999, determined through digital footprint analysis.

This digital credit scoring model predicts the probability of a person meeting their financial obligations, providing a direct assessment of default risk.

Our team analyzed the link between digital credit scores and default rates for a major credit institution in Mexico. The results show a direct correlation. See the table below for details.

A table showing digital credit score ranges from 0 to 999 with corresponding defaulter and good borrower counts, where default rates decrease from 52% in the 0–99 range to 5% in the 900–999 range.

This table shows our client's default rate trend:

Debtors with a 0-99 score have a default rate of a record 52%
A credit rating of 900-999 points guarantees that the risk of default does not exceed 5%

The chart below shows this trend more clearly:

A bar chart showing default rates by digital credit score range, decreasing from 52% at 0–99 to 5% at 900–999.

‍

Evaluating credit score models with AUC

AUC, or Area Under the Curve, is a statistical metric that represents the area under the Receiver Operating Characteristic (ROC) curve.

In the context of credit scoring, it allows you to evaluate the performance of a credit score model.

This chart shows a Receiver Operating Characteristic (ROC) curve for a credit scoring model. The x-axis represents the percentile by score, and the y-axis shows the proportion of defaults. The curve shows how effectively the model ranks individuals by their risk of default. A key highlight on the graph is the annotation that the lowest 25% of scores account for 65% of all defaults.

AUC indicates the probability that the model correctly classifies a random pair (one default, one non-default).

The higher the AUC, the more likely the model is to correctly predict defaults versus non-defaults.

When little information is available (e.g., for new customers or regions), the desired AUC = 60%. If sufficient data is available, 70%.

The graph above shows that the bottom 25% of rated customers cover 65% of defaults. This means that the model can identify the majority of defaults concentrated in the bottom quarter of the rating distribution.

‍

Discriminatory power of credit bureaus vs. digital footprints

The graph below provides a visual comparison of the discriminatory power of traditional and alternative credit scoring.

ROC curve comparing credit bureau score (AUC 68.3%), digital footprint score (AUC 69.6%), and combined model (AUC 73.6%). Combined model shows 5.3 point improvement over bureau score, indicating stronger predictive power.

The key values represented on it are as follows:

The AUC for “Credit Bureau Score” is 68.3%. This shows that the credit score can predict defaults better than random guessing. However, its quality is not high.

The AUC for “Digital Footprint” is 69.6 percent. This figure is slightly greater than that of the credit score. That is, digital footprints provide comparable discriminatory power.

The AUC for the combined “Credit Bureau Score + Digital Footprint” model is 73.6 %. This is 5.3 % more than that of the credit score model.

Such performance suggests that adding digital footprints to the credit score model significantly improves its predictive power.

We at RiskSeal have also seen a significant increase in predictive power in scoring models when improving credit score models using digital footprints.

For example, one of our clients, a Mexican credit organization, achieved an AUC of 80% after just three months of using our solution.

This case study confirms that such AUC scores are realistic using only digital footprint data from RiskSeal.

ROC curve of a binary classifier with AUC 0.8, indicating strong performance. The curve rises sharply above the diagonal random baseline, showing effective discrimination between classes.

Adding digital footprints to traditional credit scores (Credit Bureau Scores) improves the discriminatory power of the model. It becomes more accurate in predicting defaults and reducing credit risk.

‍

Influence of individual variables on AUC

To visualize the contribution of individual variables to AUC, we lay out the relevant results of the study.

Table showing AUC and marginal AUC gains for digital footprint variables predicting default risk.

This table presents how combinations of digital footprint variables contribute to predicting default risk, grouped by conceptual relevance.

This information permits us to analyze the impact of certain variables on the accuracy of default prediction.

The Marginal AUC is used for this purpose. It shows by how many percentage points the AUC of the credit scoring model increases when a specific variable or a combination of variables is added.

The main conclusions that were obtained are:

Computer and operating system: 1.71 p.p. increase in AUC.
Email host: +2.44 p.p. (maximum for individual variables).
Other variables: +0.19 to +1.79 p.p.
Do not track setting: no effect on AUC.
Combinations of variables potentially indicative of consumer income (e.g., paid email hosts and operating system): +2.20 pp.
Combinations of variables reflecting everyday behavior (e.g., browser settings, transaction completion time, etc.): +8.52 p.p. (maximum for combinations of variables).

This highlights that user behavior is critical in the credit risk prediction process.

‍

Conclusion

Based on our own observations and the conclusions of the study, we at RiskSeal can speak with confidence about the crucial importance of digital footprint analysis for credit score modeling.

Incorporating data from alternative data providers boosts the predictive power of scoring models to new levels.

This approach contributes to the accurate prediction of default.

In addition to increasing the efficiency of risk assessment, the methods described in this article allow increasing the approval rate.

This is even more relevant in emerging markets, where there is no information about potential borrowers in credit bureaus.

FAQ

How does digital footprint analysis improve credit score predictions?

Data from digital footprint analysis can be incorporated into credit scoring models, increasing their predictive power. The results of profile studies prove that the use of traditional data in combination with alternative data is 5+ percentage points more effective than each of these approaches separately.

What types of digital variables are used in credit scoring models?

Two types of variables can be used in credit scoring models. Those that potentially indicate a consumer's income (e.g., paid/free email host or OS). And those that reflect the applicant's daily behavior (e.g., browser settings, transaction completion time, etc.).

What is AUC, and why is it used to evaluate credit score models?

AUC is a statistical metric that measures the area under the ROC curve – Receiver Operating Characteristic. In the context of credit scoring, it allows you to evaluate the effectiveness of a credit score model.

AUC indicates the probability that the model correctly classifies a random pair (one default, one non-default). The higher the AUC, the more likely the model will correctly predict the probability of default.

How does RiskSeal’s data compare to traditional credit scoring results?

RiskSeal uses credit scoring data derived from analyzing the digital footprint of potential borrowers. Credit scoring models based on them are characterized by excellent predictive power. According to our observations, lenders can use them to achieve a record AUC of up to 0.8.

See more

Key Steps for Effective Proof of Concept in Credit Scoring

PoC Essentials: What Lenders Need to Know

Understand the crucial aspects fintech providers need to focus on to ensure successful and relevant PoCs.

Traditional Vs. Alternative Credit Scoring Methods

What is Alternative Credit Scoring and How Does It Differ From the Traditional

Discover how alternative credit scoring differs from traditional and aids in identifying reliable clients without credit histories.

Alternative Data Available Through Digital Footprint Analysis

Types of Alternative Credit Data Available through Digital Footprint Analysis

Discover the power of alternative data in credit scoring to enhance financial inclusion.