GitHub README Readiness Research

This notebook component supports the Builder Showcase value proposition by scoring a sample of public GitHub READMEs using the same readiness mindset used in the Builder Showcase review pipeline.

The goal is to create a public GitHub baseline for project-readiness analysis.

This helps answer:

  • How many sampled public repositories have a README?
  • How complete are those READMEs?
  • Which sections are commonly missing?
  • How many projects appear showcase-ready?
  • How does the public GitHub baseline compare to Builder Showcase submissions over time?

Primary output:

data/processed/github_readme_readiness_scores.csv

Recommended notebook:

notebooks/01_github_readme_readiness_scoring.ipynb

Recommended future outputs:

readiness_level_distribution.csv
missing_sections_summary.csv
readme_score_by_language.csv
readme_score_by_activity.csv

This component should remain CSV-first during early experimentation. If the results are useful, the cleaned and versioned dataset can later be imported into Supabase for Builder Showcase business intelligence, product comparison, and long-term readiness tracking.