Builder Showcase README Readiness Research

Purpose

This component extends the Builder Showcase project by using a GitHub README dataset as a research baseline for project-readiness analysis.

The goal is to score a sample of public GitHub READMEs using the same or similar readiness rubric used in the Builder Showcase review pipeline. This allows the project to compare general public GitHub documentation quality against Builder Showcase submissions over time.

Why this matter:

Builder Showcase is not only reviewing projects. It is building evidence around what makes a project understandable, credible, and ready to be used as proof of skill.

Value Proposition Mindset

GitHub contains millions of public projects, but a public repository is not automatically a clear, complete, or credible portfolio asset.

Many repositories may contain useful work, but reviewers, recruiters, collaborators, and funders often need more than code. They need to understand:

  • What the project does
  • Why it matters
  • How to run or use it
  • What problem it solves
  • What tools and methods were used
  • Whether the project is active
  • Whether it has results, screenshots, tests, or documentation
  • Whether it is ready to be evaluated as proof of skill

Builder Showcase addresses the gap between:

A project exists on GitHub.

and

A project is understandable, reviewable, and credible as evidence of technical ability.

The README Readiness Research component helps quantify that gap.

Core Product Argument

The central product argument is:

Public GitHub repositories are abundant, but project-readiness is uneven. Builder Showcase adds a structured proof layer that evaluates and improves how technical work is presented, documented, and interpreted.

This research component supports that argument with data.

Research Questions

This analysis is designed to answer questions such as:

  1. What percentage of sampled public repositories have a README?
  2. What percentage of READMEs are thin, semi-complete, strong, or showcase-ready?
  3. Which README sections are most commonly missing?
  4. How often do public READMEs include setup, usage, screenshots, tests, results, architecture, or deployment information?
  5. How does README readiness vary by language, stars, forks, activity, or project age?
  6. How do public GitHub README readiness scores compare to Builder Showcase submissions?
  7. Do Builder Showcase users improve their readiness over time after review or feedback?

Current Starting Point

The first CSV has already been created:

github_readme_readiness_scores.csv

This file should act as the first research output and scoring dataset.

At the notebook stage, the CSV-first approach is preferred because it allows the rubric, scoring logic, columns, and analysis outputs to be tested before importing anything into Supabase.

GitHub README sample
        ↓
Fetch README metadata and/or README content
        ↓
Apply Builder Showcase scoring rubric
        ↓
Save results to github_readme_readiness_scores.csv
        ↓
Analyze readiness patterns in notebook
        ↓
Create charts, summary tables, and product insights
        ↓
If results are useful, import cleaned version into Supabase
        ↓
Compare public GitHub baseline against Builder Showcase submissions

Why CSV First

The CSV-first approach keeps the research flexible and safe.

Benefits:

  • No production database changes during early experimentation
  • Easy to inspect scoring outputs manually
  • Easier to tune rubric weights
  • Easy to rerun notebooks
  • Simple to version the dataset
  • Cleaner Supabase import later
  • Better separation between research data and Builder Showcase user/submission data

The early CSV should be treated as a research artifact, not yet as product data.

Suggested CSV Columns

Recommended columns for github_readme_readiness_scores.csv:

sample_id
sample_date
repo_full_name
owner
repo_name
repo_url
source
language
stars
forks
watchers
open_issues
created_at
updated_at
pushed_at
has_readme
readme_fetch_status
readme_word_count
readme_char_count
rubric_version
readme_score
readiness_level
overview_score
setup_score
usage_score
features_score
visuals_score
architecture_score
data_methods_score
results_score
tests_score
license_score
deployment_score
contact_score
sections_found
sections_missing
notes

The most important long-term column is:

rubric_version

This allows scoring changes to be tracked over time.

Example rubric versions:

builder_showcase_readme_v1
builder_showcase_readme_v1_1
builder_showcase_readme_v2

Readiness Levels

Recommended scoring levels:

90–100: showcase_ready
75–89: strong
50–74: semi_ready
25–49: thin
1–24: minimal
0: missing

These levels should support both analysis and product language.

For external-facing summaries, the categories can be simplified:

showcase_ready
strong
semi_ready
not_ready
missing

README Rubric Categories

The Builder Showcase scoring structure can be reused or adapted to score public README samples.

Suggested categories:

Category Why it matters
Overview Explains what the project is
Problem / purpose Explains why the project exists
Setup / installation Shows whether others can run it
Usage Shows how the project is used
Features Explains core capabilities
Screenshots / visuals Helps reviewers understand quickly
Architecture / workflow Shows technical structure
Data / methods Important for data, AI, and analytics projects
Results / findings Shows outcomes and impact
Tests / validation Signals engineering maturity
License Supports reuse and open-source clarity
Deployment / demo Shows product readiness
Contact / author Supports attribution and follow-up

Analysis Outputs

The notebook should produce summary outputs such as:

README existence rate
Average README score
Median README score
Readiness level distribution
Most common missing sections
README score by language
README score by stars/forks
README score by recent activity
README score by project age
Top examples of strong READMEs
Common patterns in thin or missing READMEs

Useful product-facing metrics:

% of sampled repos with README
% of sampled repos with setup instructions
% of sampled repos with usage examples
% of sampled repos with screenshots or diagrams
% of sampled repos with tests or validation
% of sampled repos with results or findings
% of sampled repos that are showcase_ready

Product Insight Framing

The analysis should be translated into value-proposition insights.

Example:

In a sampled set of public GitHub repositories, many projects had some documentation, but fewer included the full set of sections needed for reviewability, reproducibility, and portfolio readiness.

Example:

The most common gaps were setup instructions, screenshots, results, and testing details. These are exactly the kinds of gaps Builder Showcase is designed to identify and improve.

Example:

Builder Showcase can help turn raw repository links into structured evidence of project quality, completeness, and technical communication.

Future Supabase Integration

If the notebook analysis produces useful results, the cleaned dataset can be imported into Supabase using research-specific tables.

Recommended research tables:

github_repo_samples
github_readme_scores
github_readme_rubric_versions

These should remain separate from production Builder Showcase submission tables.

Later, the platform can compare:

Public GitHub baseline
        vs.
Builder Showcase submitted projects
        vs.
Builder Showcase approved/published projects
        vs.
Builder Showcase post-review improved projects

Suggested Supabase Purpose

The Supabase layer should support business intelligence, not just storage.

Potential uses:

  • Track public README readiness trends over time
  • Compare Builder Showcase users against the public baseline
  • Measure improvement after feedback
  • Generate product impact metrics
  • Support investor/funder storytelling
  • Support blog posts and public research notes
  • Support dashboard visualizations

Product Impact Metrics

Once Builder Showcase has users, the following product impact metrics become possible:

Average public GitHub README readiness score
Average Builder Showcase initial submission score
Average Builder Showcase approved project score
Average score improvement after review
Percentage of users who improve from thin/semi_ready to strong/showcase_ready
Most common improvements after feedback

Example future claim:

Builder Showcase users improved their average README readiness score from 52/100 at submission to 81/100 after review.

That kind of metric directly supports the product value proposition.

This component should be documented either inside the existing notebooks README or as a separate README.

Recommended approach:

  • Add a short section to the existing notebooks README for visibility.
  • Create a separate README for the component if the pipeline grows into multiple notebooks, scripts, or datasets.

Suggested component name:

GitHub README Readiness Research

Suggested notebook name:

01_github_readme_readiness_scoring.ipynb

Suggested folder structure:

notebooks/
  01_github_readme_readiness_scoring.ipynb

data/
  raw/
    readmes/
  processed/
    github_readme_readiness_scores.csv

src/
  readme_fetcher.py
  readme_scorer.py
  readiness_analysis.py

docs/
  github_readme_readiness_research.md

Immediate Build Steps

  1. Inspect the existing github_readme_readiness_scores.csv.
  2. Confirm the final column names.
  3. Convert the Builder Showcase rubric into a reusable scoring function.
  4. Apply the scoring function to the first sample.
  5. Save the scored CSV.
  6. Create summary tables in the notebook.
  7. Create initial charts.
  8. Review whether the results support the product argument.
  9. Tune the scoring rubric if needed.
  10. Prepare a clean Supabase import only after the notebook results are stable.

First Milestone

The first milestone is not Supabase import.

The first milestone is:

A notebook that scores a public GitHub README sample and produces clear readiness statistics that support the Builder Showcase value proposition.

Suggested milestone outputs:

github_readme_readiness_scores.csv
readiness_level_distribution.csv
missing_sections_summary.csv
readme_score_by_language.csv
readme_score_by_activity.csv

Final Direction

This component strengthens Builder Showcase by turning the review pipeline into a research-backed product intelligence system.

It helps prove that Builder Showcase is not only a submission platform. It is a structured proof layer that can evaluate project readiness, identify documentation gaps, and measure improvement over time.

The long-term opportunity is to show that Builder Showcase helps builders move from:

public but unclear

to:

reviewable, explainable, and showcase-ready