ML Fraud Detection

📦 Repository Structure

This project currently lives inside the broader DataInsideData technical portfolio monorepo.

The implementation is located within: applied-data-science-machine-learning/

This project contains the notebooks, charts, and documentation for an applied fraud detection machine learning workflow built around exploratory analysis, preprocessing, imbalanced classification, and model evaluation.

As the project evolves, it may later be broken out further or expanded into a more reusable ML case study structure, but for now the monorepo serves as the primary source for implementation and documentation.

🏗️ How This Page Is Generated

The documentation below is dynamically rendered from the project's GitHub README inside the portfolio repository.

This site retrieves the README and converts it into HTML for presentation on DataInsideData.com.

This keeps the repository as the single source of truth for project documentation, while the website acts as the presentation layer for easier browsing, portfolio storytelling, and technical context.

Over time, this workflow may expand into a more automated publishing pattern connecting repository updates, project documentation, and portfolio rendering.

🔎 Data Sources

This project works from a large simulated financial transaction dataset designed for fraud detection analysis.

Transaction type and amount fields
Origin and destination balance fields
Built-in fraud flag field (IsFlaggedFraud)
Target fraud label (IsFraud)

These inputs are explored and transformed into a modeling-ready dataset used for classification experiments, feature preparation, and imbalanced-learning evaluation.

⚙️ Engineering Note

This project follows a staged machine learning workflow:

Exploration → EDA and fraud-pattern inspection
Transformation → preprocessing and feature preparation
Modeling → classification and hyperparameter tuning
Evaluation → precision, recall, sensitivity, confusion-matrix analysis
Output → reproducible notebooks, charts, and project documentation

The repository is the execution layer, while this page serves as the documentation and presentation layer.

🧠 Modeling Insight

A core theme of this project is that accuracy alone is not enough in fraud detection.

Because fraudulent transactions are rare, the project emphasizes metrics and evaluation strategies that better reflect real fraud-detection usefulness, including:

precision
recall
sensitivity
confusion matrix behavior
with-SMOTE vs without-SMOTE model comparison

This helps frame fraud detection as a business-sensitive classification problem rather than a simple accuracy exercise.

📊 Visual Analytics

The project includes visual outputs that support both exploratory analysis and model interpretation, including:

naïve fraud flag vs actual fraud comparison
transaction type distribution by fraud status
transaction amount by type
final model confusion matrix
model comparison with and without SMOTE

Charts and model-result visuals are displayed below as part of the evolving case study documentation.

ML Fraud Detection

An applied machine learning case study focused on fraud detection, imbalanced classification, preprocessing, model comparison, and evaluation in a high-risk financial context.

Notebook: Open Notebook
Repository: View on GitHub

Loading project documentation...

Fari Lindo

ML Fraud Detection