📦 Repository Structure

This project currently lives inside the broader DataInsideData technical portfolio monorepo.

The implementation is located within: applied-data-science-machine-learning/

inside: DataEden / fari-tech-portfolio

This project contains the notebooks, charts, and documentation for an applied fraud detection machine learning workflow built around exploratory analysis, preprocessing, imbalanced classification, and model evaluation.

As the project evolves, it may later be broken out further or expanded into a more reusable ML case study structure, but for now the monorepo serves as the primary source for implementation and documentation.

🏗️ How This Page Is Generated

The documentation below is dynamically rendered from the project's GitHub README inside the portfolio repository.

This site retrieves the README and converts it into HTML for presentation on DataInsideData.com.

This keeps the repository as the single source of truth for project documentation, while the website acts as the presentation layer for easier browsing, portfolio storytelling, and technical context.

Over time, this workflow may expand into a more automated publishing pattern connecting repository updates, project documentation, and portfolio rendering.

🔎 Data Sources

This project works from a large simulated financial transaction dataset designed for fraud detection analysis.

  • Transaction type and amount fields
  • Origin and destination balance fields
  • Built-in fraud flag field (IsFlaggedFraud)
  • Target fraud label (IsFraud)

These inputs are explored and transformed into a modeling-ready dataset used for classification experiments, feature preparation, and imbalanced-learning evaluation.

⚙️ Engineering Note

This project follows a staged machine learning workflow:

  • Exploration → EDA and fraud-pattern inspection
  • Transformation → preprocessing and feature preparation
  • Modeling → classification and hyperparameter tuning
  • Evaluation → precision, recall, sensitivity, confusion-matrix analysis
  • Output → reproducible notebooks, charts, and project documentation

The repository is the execution layer, while this page serves as the documentation and presentation layer.

🧠 Modeling Insight

A core theme of this project is that accuracy alone is not enough in fraud detection.

Because fraudulent transactions are rare, the project emphasizes metrics and evaluation strategies that better reflect real fraud-detection usefulness, including:

  • precision
  • recall
  • sensitivity
  • confusion matrix behavior
  • with-SMOTE vs without-SMOTE model comparison

This helps frame fraud detection as a business-sensitive classification problem rather than a simple accuracy exercise.

📊 Visual Analytics

The project includes visual outputs that support both exploratory analysis and model interpretation, including:

  • naïve fraud flag vs actual fraud comparison
  • transaction type distribution by fraud status
  • transaction amount by type
  • final model confusion matrix
  • model comparison with and without SMOTE

Charts and model-result visuals are displayed below as part of the evolving case study documentation.

ML Fraud Detection

An applied machine learning case study focused on fraud detection, imbalanced classification, preprocessing, model comparison, and evaluation in a high-risk financial context.

Loading project documentation...

Updated: