Back to Blog

Improving Credit Score Modeling With Digital Footprint Insights

Explore how digital footprint analysis improves credit scoring models, boosting default prediction accuracy and approval rates in emerging markets.

Vadim Ilyasov
Vadim Ilyasov
CTO @RiskSeal
How to Improve Credit Score Modeling With Digital Footprints
Table of contents

Credit scoring models usually rely on traditional financial data, which can reduce prediction accuracy. 

Let’s explore how incorporating digital footprint analysis can enhance default prediction and approval rates, especially in emerging markets.

To write this article, we used an extensive study of the impact of digital footprinting on scoring – On the Rise of FinTechs – Credit Scoring using Digital Footprints

We’ll discuss insights from this study alongside data from RiskSeal’s experience to show how these solutions elevate business credit scoring models beyond default prediction.

Credit score modeling - from traditional to data-driven insights

Credit score modeling uses statistical methods to predict how likely a borrower is to default, meaning they may not repay a loan as agreed.

Credit models use various data about the applicant to calculate the score. 

Traditional models analyze the applicant's financial information, such as payment history, credit history length, types of credit accounts, and recent credit inquiries.

When lenders incorporate alternative data for credit scoring, they leverage a variety of non-traditional data sources. For example, RiskSeal offers digital footprint analysis as part of its comprehensive credit scoring solution.

Data enrichment enhances the predictive accuracy of score models, leading to higher approval rates and a reduction in defaults.

Transform your scorecards

with alternative data

Top variables available from digital footprint analysis

The research team from the Frankfurt School of Finance & Management identified 10 key variables commonly analyzed in digital footprint analysis:

1. Device type. The device from which the customer can visit the site. Examples: Desktop, Tablet, Mobile.

2. Operating system. The operating system the customer is using. Examples: Windows, iOS, Android.

3. Email provider. The type of email service used by the client. Examples: Gmail, Yahoo, T-Online.

4. Channel. The path through which the customer came to the site. Examples: through ad clicks or direct URL entry.

5. Check-out time. The time of day the customer makes a purchase. This can be an indication of the customer's habits.

6. Do Not Track settings. Customers can choose whether to allow tracking of their device or operating system.

7. Email error. Whether an error was made in the customer's email address. 

8. Name in email. The client's email contains a name.

9. Number in email. The presence of numbers in the email address.

10. Is lower case. The mail address consists of lower case letters (first name, last name, street, or city).

These variables can serve as proxy indicators of certain customer characteristics, allowing you to segment customers by various criteria. For example, based on income, character, or reputation.

The resulting data are used to improve service quality, customize marketing campaigns, and assess risks.

Digital footprint variables used by RiskSeal

RiskSeal uses a number of variables that significantly improve the accuracy of a financial organization's credit scoring model.

We enrich the data with hundreds more data points. They can be obtained in three ways – email lookup, phone number lookup, and IP lookup. 

These are RiskSeal’s variables:

1. By email

  • Linked online accounts on various online platforms
  • Email delivery success
  • Age of email
  • Domain information
  • Whether the email has been blacklisted
  • Type of email (e.g., identification of temporary email addresses)

2. By phone number

  • Type of phone number (burner phones, disposable numbers, virtual SIM cards)
  • Country code (can be compared with geolocation data)
  • Registered online accounts
  • Whether the number is on blacklists or spam lists

3. By IP

  • Type of connection (residential, commercial, or mobile)
  • Use of anonymizes (TOR, Proxy, VPN, etc.)
  • User geolocation
  • Presence of high-risk IPs in databases

Correlation between credit scores and default rates

The researchers found a direct correlation between the default rate and the borrower's credit score. 

This indicator is also dependent on some variables derived from the digital footprint analysis.

This is displayed in the diagram below:

Correlation between credit scores and default rates

Based on this, we can draw the following conclusions:

1. The default rate is directly related to the borrower's credit rating. The deciles in the bottom left corner of the graph indicate that users with the highest credit scores have the lowest default rates.

Conversely, users with the lowest credit scores have the highest default rate. These are placed in the top right corner of the diagram.

2. Linking default risk and digital footprints. Certain combinations of digital footprints, such as “Android + Yahoo” or “Android + Hotmail,” are associated with higher levels of default. 

Other combinations, such as “Mac + T-online”, indicate a lower likelihood of delinquency.

3. Impact of individual variables on credit risk levels. The graph indicates that, in addition to combinations of quantifiable traces, other factors can also be useful in assessing credit risk.

Individual variables, like the operating system or email host, also affect the likelihood of default, though their impact may be less significant than that of data combinations.

The graph demonstrates that enhancing scorecards with alternative data can be useful for predicting credit risk alongside traditional credit scores.

RiskSeal's Digital Credit Scoring model explained

In RiskSeal, we also combine several variables to improve the credit score model. For example, a positive attribute is signing up for “Amazon + Netflix”. This group of users will have a low default rate.

RiskSeal assigns borrowers a Digital Credit Score ranging from 0 to 999, determined through digital footprint analysis. 

This digital credit scoring model predicts the probability of a person meeting their financial obligations, providing a direct assessment of default risk.

Our team analyzed the link between digital credit scores and default rates for a major credit institution in Mexico. The results show a direct correlation. See the table below for details.

Default rates and Digital Credit Score

This table shows our client's default rate trend:

  • Debtors with a 0-99 score have a default rate of a record 52%
  • A credit rating of 900-999 points guarantees that the risk of default does not exceed 5%

The chart below shows this trend more clearly:

The chart of the correlation of digital credit scores and default rates

Evaluating credit score models with AUC

AUC, or Area Under the Curve, is a statistical metric that represents the area under the Receiver Operating Characteristic (ROC) curve.

In the context of credit scoring, it allows you to evaluate the performance of a credit score model.

Evaluating credit score models with AUC

AUC indicates the probability that the model correctly classifies a random pair (one default, one non-default).

The higher the AUC, the more likely the model is to correctly predict defaults versus non-defaults.

When little information is available (e.g., for new customers or regions), the desired AUC = 60%. If sufficient data is available, 70%.

The graph above shows that the bottom 25% of rated customers cover 65% of defaults. This means that the model can identify the majority of defaults concentrated in the bottom quarter of the rating distribution.

Discriminatory power of credit bureaus vs. digital footprints 

The graph below provides a visual comparison of the discriminatory power of traditional and alternative credit scoring.

Discriminatory power of credit bureaus vs. digital footprints 

The key values represented on it are as follows:

The AUC for “Credit Bureau Score” is 68.3%. This shows that the credit score can predict defaults better than random guessing. However, its quality is not high.

The AUC for “Digital Footprint” is 69.6 percent. This figure is slightly greater than that of the credit score. That is, digital footprints provide comparable discriminatory power.

The AUC for the combined “Credit Bureau Score + Digital Footprint” model is 73.6 %. This is 5.3 % more than that of the credit score model. 

Such performance suggests that adding digital footprints to the credit score model significantly improves its predictive power.

We at RiskSeal have also seen a significant increase in predictive power in scoring models when improving credit score models using digital footprints.

For example, one of our clients, a Mexican credit organization, achieved an AUC of 80% after just three months of using our solution.

This case study confirms that such AUC scores are realistic using only digital footprint data from RiskSeal.

Receiver Operating Characteristic (ROC) curve

Adding digital footprints to traditional credit scores (Credit Bureau Scores) improves the discriminatory power of the model. It becomes more accurate in predicting defaults and reducing credit risk.

Influence of individual variables on AUC 

To visualize the contribution of individual variables to AUC, we lay out the relevant results of the study.

The table about influence of individual variables on AUC 
The graph about influence of individual variables on AUC 

This information permits us to analyze the impact of certain variables on the accuracy of default prediction. 

The Marginal AUC is used for this purpose. It shows by how many percentage points the AUC of the credit scoring model increases when a specific variable or a combination of variables is added.

The main conclusions that were obtained are:

  • Computer and operating system: 1.71 p.p. increase in AUC.
  • Email host: +2.44 p.p. (maximum for individual variables).
  • Other variables: +0.19 to +1.79 p.p.
  • Do not track setting: no effect on AUC.
  • Combinations of variables potentially indicative of consumer income (e.g., paid email hosts and operating system): +2.20 pp.
  • Combinations of variables reflecting everyday behavior (e.g., browser settings, transaction completion time, etc.): +8.52 p.p. (maximum for combinations of variables).

This highlights that user behavior is critical in the credit risk prediction process.

Conclusion

Based on our own observations and the conclusions of the study, we at RiskSeal can speak with confidence about the crucial importance of digital footprint analysis for credit score modeling.

Incorporating data from alternative data providers boosts the predictive power of scoring models to new levels.

This approach contributes to the accurate prediction of default.

In addition to increasing the efficiency of risk assessment, the methods described in this article allow increasing the approval rate

This is even more relevant in emerging markets, where there is no information about potential borrowers in credit bureaus. 

Improve your credit scoring accuracy

With Data Enrichment

FAQ

How does digital footprint analysis improve credit score predictions?

Data from digital footprint analysis can be incorporated into credit scoring models, increasing their predictive power. The results of profile studies prove that the use of traditional data in combination with alternative data is 5+ percentage points more effective than each of these approaches separately.

What types of digital variables are used in credit scoring models?

Two types of variables can be used in credit scoring models. Those that potentially indicate a consumer's income (e.g., paid/free email host or OS). And those that reflect the applicant's daily behavior (e.g., browser settings, transaction completion time, etc.).

What is AUC, and why is it used to evaluate credit score models?

AUC is a statistical metric that measures the area under the ROC curve – Receiver Operating Characteristic. In the context of credit scoring, it allows you to evaluate the effectiveness of a credit score model.

AUC indicates the probability that the model correctly classifies a random pair (one default, one non-default). The higher the AUC, the more likely the model will correctly predict the probability of default.

How does RiskSeal’s data compare to traditional credit scoring results?

RiskSeal uses credit scoring data derived from analyzing the digital footprint of potential borrowers. Credit scoring models based on them are characterized by excellent predictive power. According to our observations, lenders can use them to achieve a record AUC of up to 0.8.

Ready to chat?

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Schedule time with me