ML / Classification Project

Bank Term Deposit Prediction

A two-stage machine learning pipeline predicting whether a bank client will subscribe to a term deposit after a phone marketing campaign.

Project objective

Maximise recall, controlled by F2.

The positive class is small, so accuracy alone is misleading. The model is designed to catch as many likely subscribers as possible, while F2 keeps precision from becoming completely uncontrolled.

Recall

Primary

F2

Guardrail

ROC-AUC

Tracked

Lift

Business

F2 = (1 + 4) · (P · R) / (4P + R)

Problem understanding

Predict subscriber intent.

Only 11.7% of clients subscribed. A naive model can predict “no” for everyone and still achieve high accuracy.

Because missing a real subscriber is worse than making an extra call, the pipeline prioritises recall and uses F2 as the model-selection metric.

Target distribution

Dataset split

70 / 15 / 15stratified on target

45,211 records · random_state = 42

Train — 70%
Val — 15%
Test — 15%

Input features

15 features + 1 dropped

Compact feature summary with type, encoding notes, and missing-value strategy.

agenumeric

Description

Client age in years.

Values / Encoding

18 – 95

Missing strategy

No special missing-value handling required.

Exploratory data analysis

What the data tells us

The main signals are class imbalance, right-skewed numeric features, and strong duration effect.

Numeric feature IQR distributions

age

1895

Outliers 4.9%

balance

-8,019102,127

Outliers 9.8%

day

131

Outliers 0%

duration

04,918

Outliers 7.1%

campaign

163

Outliers 8.2%

pdays

-1871

Outliers 12.4%

previous

0275

Outliers 14.3%

pdays = −1 for most records, meaning the client was never previously contacted. duration, balance, campaign, and previous show strong right-skew.

Feature types breakdown

7

Numeric

4

Categorical

3

Binary

2

Ordinal

Job distribution

Multi-metric radar comparison

Missing value summary

FeatureMissing%StrategyFill
job740.16%Mode fill'blue-collar'
education1740.39%Mode fill'secondary'
contact1,3242.93%Mode fill'unknown'
poutcome36,95981.7%DROPPED

Data preprocessing

Cleaning, encoding & balancing

All train-only transformations are preserved to avoid leakage and keep inference reproducible.

01

Fill missing values

job'blue-collar'74 rows
education'secondary'174 rows
contact'unknown'1,324 rows
poutcomeDROPPED36,959 rows
02

SMOTENC oversampling

Applied only on the training set to avoid leakage. It balances the minority class while respecting categorical feature boundaries.

Strategy

0.25

Scope

Train only

Handles

Mixed types

Purpose

Recall boost

03

Feature encoding

Ordinaleducation: primary → 1, secondary → 2, tertiary → 3
Ordinalmonth: jan → 1, feb → 2, … dec → 12
Binarydefault, housing, loan: yes → 1, no → 0
OHEjob, marital, contact
Dropcontact_unknown removed after OHE
04

StandardScaler

Fitted on the training data only, then applied to validation and test sets.

Scaled numeric columns

age, balance, day, duration, campaign, pdays, previous

Pipeline

How the model pipeline works

Six reproducible stages from raw data to final model evaluation.

01

Problem & Goal

Predict term deposit subscription. Optimise for recall using F2 as the tuning guardrail.

02

EDA

Explore imbalance, distributions, outliers, missing values, and target correlations.

03

Preprocessing

Fill missing values, drop high-missing columns, encode categorical features, scale numeric columns.

04

Stage 1 Training

Train seven classifiers and shortlist the strongest candidates by recall and F2.

05

Stage 2 Tuning

Run a finer search on the winners and compare tuned models on validation performance.

06

Evaluation

Review final metrics, confusion matrices, feature importance, radar comparison, and lift curve.

Model training

Two-stage model selection

Stage 1 evaluates seven classifiers. Stage 2 fine-tunes the shortlisted winners.

Stage 1 — all models evaluated

Seven classifiers trained on preprocessed data

Recall vs F2 — pre-tuning

Full metrics table

ModelRecallF2PrecisionAccuracyROC-AUC
Logistic Reg.87.8%61.4%27.8%71.9%86.5%
Grad. Boost66.5%55.2%32.9%80.2%83.0%
Extra Trees76.1%54.8%25.9%71.7%81.1%
Rand. Forest30.2%30.7%31.0%82.8%76.6%
KNN35.4%35.1%25.1%76.8%73.0%
Dec. Tree39.3%38.9%20.0%73.2%66.8%
Baseline19.4%17.0%11.4%72.9%49.7%

Model selection

Recall-specificity goodness plot

Closer to the top-right ideal point means stronger recall with fewer false positives.

Hyperparameter search — Stage 1

GridSearchCV · cv=3 · F2 scorer

Before Stage 1 tuning

After Stage 1 tuning

Improvement after Stage 1 tuning

ModelRecall beforeRecall afterΔ RecallF2 beforeF2 afterΔ F2
LR ★87.8%90.8%3.0%61.4%64.9%3.5%
Grad. Boost66.5%71.0%4.5%55.2%58.1%2.9%
Extra Trees76.1%79.2%3.1%54.8%56.7%1.9%

Hyperparameter search — Stage 2

Finer grid on Stage 1 winners

Before Stage 2 tuning

After Stage 2 tuning

Improvement after Stage 2 tuning

ModelRecall beforeRecall afterΔ RecallF2 beforeF2 afterΔ F2
LR Stage 1 ★80.6%90.8%10.2%56.2%64.9%8.7%
Grad. Boost74.1%71.0%-3.1%53.1%58.1%5.0%
Extra Trees79.2%—%56.7%—%
Baseline19.4%—%17.0%—%

Final results

Model evaluation

The final model prioritises finding subscribers while keeping F2 stronger than the alternatives.

Final Recall vs F2

Final metrics table

ModelRecallF2PrecisionAccuracyROC-AUC
LR Stage 1 ★90.8%64.9%26.9%69.3%87.3%
Grad. Boost71.0%58.1%32.1%79.6%84.3%
Extra Trees79.2%56.7%25.5%70.6%82.0%
Baseline19.4%17.0%11.4%72.9%49.7%

Confusion matrices — final models

Best

LR S1 ★

4,181

61.6%

TN

1,807

26.6%

FP

97

1.4%

FN

697

10.3%

TP

Multi-metric radar comparison

Cumulative lift curve

Dashed line = random baseline. Top deciles capture subscribers faster than random targeting.

Feature importance

Feature importance table

FeatureImportanceRank
duration0.312#1
balance0.118#2
age0.094#3
pdays0.087#4
day0.076#5
campaign0.068#6
previous0.052#7
month0.041#8
job0.038#9
education0.033#10

Ready to try the model?

Enter a single client profile or upload a CSV batch.

Try it →