Builder Showcase README Readiness Research
Purpose
This component extends the Builder Showcase project by using a GitHub README dataset as a research baseline for project-readiness analysis.
The goal is to score a sample of public GitHub READMEs using the same or similar readiness rubric used in the Builder Showcase review pipeline. This allows the project to compare general public GitHub documentation quality against Builder Showcase submissions over time.
In plain terms:
Builder Showcase is not only reviewing projects. It is building evidence around what makes a project understandable, credible, and ready to be used as proof of skill.
Value Proposition Mindset
GitHub contains millions of public projects, but a public repository is not automatically a clear, complete, or credible portfolio asset.
Many repositories may contain useful work, but reviewers, recruiters, collaborators, and funders often need more than code. They need to understand:
- What the project does
- Why it matters
- How to run or use it
- What problem it solves
- What tools and methods were used
- Whether the project is active
- Whether it has results, screenshots, tests, or documentation
- Whether it is ready to be evaluated as proof of skill
Builder Showcase addresses the gap between:
A project exists on GitHub.
and
A project is understandable, reviewable, and credible as evidence of technical ability.
The README Readiness Research component helps quantify that gap.
Core Product Argument
The central product argument is:
Public GitHub repositories are abundant, but project-readiness is uneven. Builder Showcase adds a structured proof layer that evaluates and improves how technical work is presented, documented, and interpreted.
This research component supports that argument with data.
Research Questions
This analysis is designed to answer questions such as:
- What percentage of sampled public repositories have a README?
- What percentage of READMEs are thin, semi-complete, strong, or showcase-ready?
- Which README sections are most commonly missing?
- How often do public READMEs include setup, usage, screenshots, tests, results, architecture, or deployment information?
- How does README readiness vary by language, stars, forks, activity, or project age?
- How do public GitHub README readiness scores compare to Builder Showcase submissions?
- Do Builder Showcase users improve their readiness over time after review or feedback?
Current Starting Point
The first CSV has already been created:
github_readme_readiness_scores.csv
This file should act as the first research output and scoring dataset.
At the notebook stage, the CSV-first approach is preferred because it allows the rubric, scoring logic, columns, and analysis outputs to be tested before importing anything into Supabase.
Recommended Workflow
GitHub README sample
↓
Fetch README metadata and/or README content
↓
Apply Builder Showcase scoring rubric
↓
Save results to github_readme_readiness_scores.csv
↓
Analyze readiness patterns in notebook
↓
Create charts, summary tables, and product insights
↓
If results are useful, import cleaned version into Supabase
↓
Compare public GitHub baseline against Builder Showcase submissions
Why CSV First
The CSV-first approach keeps the research flexible and safe.
Benefits:
- No production database changes during early experimentation
- Easy to inspect scoring outputs manually
- Easier to tune rubric weights
- Easy to rerun notebooks
- Simple to version the dataset
- Cleaner Supabase import later
- Better separation between research data and Builder Showcase user/submission data
The early CSV should be treated as a research artifact, not yet as product data.
Suggested CSV Columns
Recommended columns for github_readme_readiness_scores.csv:
sample_id
sample_date
repo_full_name
owner
repo_name
repo_url
source
language
stars
forks
watchers
open_issues
created_at
updated_at
pushed_at
has_readme
readme_fetch_status
readme_word_count
readme_char_count
rubric_version
readme_score
readiness_level
overview_score
setup_score
usage_score
features_score
visuals_score
architecture_score
data_methods_score
results_score
tests_score
license_score
deployment_score
contact_score
sections_found
sections_missing
notes
The most important long-term column is:
rubric_version
This allows scoring changes to be tracked over time.
Example rubric versions:
builder_showcase_readme_v1
builder_showcase_readme_v1_1
builder_showcase_readme_v2
Readiness Levels
Recommended scoring levels:
90–100: showcase_ready
75–89: strong
50–74: semi_ready
25–49: thin
1–24: minimal
0: missing
These levels should support both analysis and product language.
For external-facing summaries, the categories can be simplified:
showcase_ready
strong
semi_ready
not_ready
missing
README Rubric Categories
The Builder Showcase scoring structure can be reused or adapted to score public README samples.
Suggested categories:
| Category | Why it matters |
|---|---|
| Overview | Explains what the project is |
| Problem / purpose | Explains why the project exists |
| Setup / installation | Shows whether others can run it |
| Usage | Shows how the project is used |
| Features | Explains core capabilities |
| Screenshots / visuals | Helps reviewers understand quickly |
| Architecture / workflow | Shows technical structure |
| Data / methods | Important for data, AI, and analytics projects |
| Results / findings | Shows outcomes and impact |
| Tests / validation | Signals engineering maturity |
| License | Supports reuse and open-source clarity |
| Deployment / demo | Shows product readiness |
| Contact / author | Supports attribution and follow-up |
Analysis Outputs
The notebook should produce summary outputs such as:
README existence rate
Average README score
Median README score
Readiness level distribution
Most common missing sections
README score by language
README score by stars/forks
README score by recent activity
README score by project age
Top examples of strong READMEs
Common patterns in thin or missing READMEs
Useful product-facing metrics:
% of sampled repos with README
% of sampled repos with setup instructions
% of sampled repos with usage examples
% of sampled repos with screenshots or diagrams
% of sampled repos with tests or validation
% of sampled repos with results or findings
% of sampled repos that are showcase_ready
Product Insight Framing
The analysis should be translated into value-proposition insights.
Example:
In a sampled set of public GitHub repositories, many projects had some documentation, but fewer included the full set of sections needed for reviewability, reproducibility, and portfolio readiness.
Example:
The most common gaps were setup instructions, screenshots, results, and testing details. These are exactly the kinds of gaps Builder Showcase is designed to identify and improve.
Example:
Builder Showcase can help turn raw repository links into structured evidence of project quality, completeness, and technical communication.
Future Supabase Integration
If the notebook analysis produces useful results, the cleaned dataset can be imported into Supabase using research-specific tables.
Recommended research tables:
github_repo_samples
github_readme_scores
github_readme_rubric_versions
These should remain separate from production Builder Showcase submission tables.
Later, the platform can compare:
Public GitHub baseline
vs.
Builder Showcase submitted projects
vs.
Builder Showcase approved/published projects
vs.
Builder Showcase post-review improved projects
Suggested Supabase Purpose
The Supabase layer should support business intelligence, not just storage.
Potential uses:
- Track public README readiness trends over time
- Compare Builder Showcase users against the public baseline
- Measure improvement after feedback
- Generate product impact metrics
- Support investor/funder storytelling
- Support blog posts and public research notes
- Support dashboard visualizations
Product Impact Metrics
Once Builder Showcase has users, the following product impact metrics become possible:
Average public GitHub README readiness score
Average Builder Showcase initial submission score
Average Builder Showcase approved project score
Average score improvement after review
Percentage of users who improve from thin/semi_ready to strong/showcase_ready
Most common improvements after feedback
Example future claim:
Builder Showcase users improved their average README readiness score from 52/100 at submission to 81/100 after review.
That kind of metric directly supports the product value proposition.
Recommended Notebook README Section
This component should be documented either inside the existing notebooks README or as a separate README.
Recommended approach:
- Add a short section to the existing notebooks README for visibility.
- Create a separate README for the component if the pipeline grows into multiple notebooks, scripts, or datasets.
Suggested component name:
GitHub README Readiness Research
Suggested notebook name:
01_github_readme_readiness_scoring.ipynb
Suggested folder structure:
notebooks/
01_github_readme_readiness_scoring.ipynb
data/
raw/
readmes/
processed/
github_readme_readiness_scores.csv
src/
readme_fetcher.py
readme_scorer.py
readiness_analysis.py
docs/
github_readme_readiness_research.md
Immediate Build Steps
- Inspect the existing
github_readme_readiness_scores.csv. - Confirm the final column names.
- Convert the Builder Showcase rubric into a reusable scoring function.
- Apply the scoring function to the first sample.
- Save the scored CSV.
- Create summary tables in the notebook.
- Create initial charts.
- Review whether the results support the product argument.
- Tune the scoring rubric if needed.
- Prepare a clean Supabase import only after the notebook results are stable.
First Milestone
The first milestone is not Supabase import.
The first milestone is:
A notebook that scores a public GitHub README sample and produces clear readiness statistics that support the Builder Showcase value proposition.
Suggested milestone outputs:
github_readme_readiness_scores.csv
readiness_level_distribution.csv
missing_sections_summary.csv
readme_score_by_language.csv
readme_score_by_activity.csv
Final Direction
This component strengthens Builder Showcase by turning the review pipeline into a research-backed product intelligence system.
It helps prove that Builder Showcase is not only a submission platform. It is a structured proof layer that can evaluate project readiness, identify documentation gaps, and measure improvement over time.
The long-term opportunity is to show that Builder Showcase helps builders move from:
public but unclear
to:
reviewable, explainable, and showcase-ready