Classification

Logo

Welcome to My Page.

View the Project on GitHub clembrain/ML-Classification

ML-Classification

πŸ“… Date: April 02, 2024
πŸ” Domain: Banking
πŸ“Š Topic: Predictive Modeling for Telemarketing Campaigns


πŸ”— View Full Project Files


🧠 Abstract

This project leverages supervised machine learning classification techniques to predict whether a customer will subscribe to a term deposit during a Portuguese bank’s telemarketing campaign. Using publicly available data from the UCI Machine Learning Repository, the aim is to explore how decision-making can be optimized using AI models like Random Forest, Decision Trees, and Boosted Models.

Key benefits for the bank include:


πŸ“‚ Dataset Overview


πŸ“ˆ Exploratory Data Analysis

The following visuals were generated:


subscription Figure 1: Showing the values for No and Yes class on (subscription) target variable.


subscription status Figure 2: It shows that persons within 30 years subscribe the most.


subscription per month Figure 3: This code uses a bar chart with stylised formatting to compute and display subscription rates by month. The month of march has highest subscription rates.


frequency of jobs Figure 4:The categorical features analysis shows histograms and KDE plots of the distribution of continuous features.


correlation heat map Figure 5:Calculated numerical features for correlations, visualising and to view them interactively for insights into relationships.


πŸ”‘ Variables


🧹 Data Preprocessing


onehotencoding


onehotencoding Figure 6: Firstly I removed the target variable and deleted the dummy class variables I used for visualisation. The dataset categorical variables are fully encoded as one-hot-encoded features and ready for modelling.


spliting spliting Figure 7: Above I defined my features and target, I then converted my target β€œy” from (yes to 1 and no to 0) Splitting all feature into β€˜X’ and target variable to β€˜y’


variance Figure 8: From the result, no features were removed this means all training and testing sets had variance above the threshold.


RFselection Figure 9: The result β€œ((31647, 37), (13564, 37))” shows that the feature selection process reduced the dataset from 42 to 37 features. This means 5 features were identified as less important and removed, leaving a more focused set of features for the model. This will make the model more efficient and potentially improve its performance by removing irrelevant or redundant information.


undersampled class Figure 10: Addressing class imbalance by applying Random Undersampling, balancing the dataset target variable for improved model fairness.


πŸ€– Models Implemented

1️⃣ Decision Tree Classifier


Decision tree class Figure 11: The Decision Tree Classification result reports 80% accuracy and the confusion matrix is as follows.


decission tree tune Figure 12: The best cross-validated accuracy achieved with these parameters is 82.36%, indicating improved performance compared to the untuned model. This configuration balances complexity and predictive accuracy, making it more robust.


2️⃣ Random Forest Classifier


randomforestclass Figure 13: The Random Forest classifier achieved an accuracy of 83.04%. The Random Forest performs better on minority Class 1 compared to Decision Tree but still struggles with precision, indicating room for improvement in predicting positive cases.


RFtune Figure 14: The best cross-validated accuracy achieved is 86.14%, demonstrating improved performance with the tuned parameters. This setup balances complexity and predictive power for robust model performance.


ROC_Curve Figure 15: The outcome above distinguishes the positive and negative class using β€œArea Under the Curve(AUC)”


3️⃣ Azure ML Boosted Decision Tree


4️⃣ Azure ML Neural Network


Azure Dataset Upload Figure 15: Dataset upload into Azure table and visualisation


Convert to Indicator Values - Step 1
Convert to Indicator Values - Step 2
Figure 16: Using Execute-Python-Script was used to convert the categorical target variables from (yes to 1 and no to 0).


Normalise
Figure 17: Using the normalise data module above to scale the dataset using MinMax scaler method.


transformeddata
Figure 18: Next above, is shown the imbalanced class label β€œy” where 1 is highly dominated by 0.


smotetarget
Figure 19: Next above, I split the result into 80% training and 20% testing and then I carried out SMOTE to improve the balancing of β€œy” also known as subscription.


azuretune
Figure 20: Also above I have attached my Two-class-boosted Decision Tree to Tune Model Hyperparameters which trained the tuned version of the model, then it would output the most efficient combination by accuracy, and compared to base model.


first half
finalhalf Figure 21: The Score model follows and the Evaluate model are added to generate and analyse model performance.


πŸ“Š ROC Curve Visual

azuretwoclass Figure 22: This shows the Two Class Boosted Decision Tree Report


twoclassneural Figure 23: This shows the Two-Class Neural Network.