DataInsideData Blog — Site & System Architecture & Design Specification
1) High-Level System Architecture
flowchart TB
U[Site Owner / Contributor] --> C[Content Sources]
subgraph Content_Layer[Content Layer]
P[_posts]
PR[_projects]
H[_how_tos]
F[_fixes]
SP[Standalone Pages]
end
subgraph Config_Layer[Configuration Layer]
CFG[_config.yml]
NAV[_data/navigation.yml]
end
subgraph Presentation_Layer[Presentation Layer]
MM[Minimal Mistakes Theme]
INC[_includes]
SCSS[assets/css/main.scss]
CUSTOM[_sass/minimal-mistakes/_custom.scss]
JS[assets/js/mermaid-init.js]
IMG[assets/images]
end
C --> P
C --> PR
C --> H
C --> F
C --> SP
P --> J[Jekyll Build Engine]
PR --> J
H --> J
F --> J
SP --> J
CFG --> J
NAV --> J
MM --> J
INC --> J
SCSS --> J
CUSTOM --> J
JS --> J
IMG --> J
J --> SITE[_site Compiled Static Site]
SITE --> GA[GitHub Actions Deploy Workflow]
GA --> PUB[Public Repo: did-site-public]
PUB --> GHP[GitHub Pages / gh-pages]
GHP --> DNS[Route 53 + datainsidedata.com]
DNS --> V[Site Visitor]
What this diagram shows
- The Site is organized into clear layers: content, configuration, presentation, build, and deployment.
- Jekyll compiles all markdown, config, theme logic, includes, styles, and scripts into _site.
- Deployment is separated from authoring through the two-repo strategy (separation of concerns).
Why it matters
- This is the best “executive view” of the system.
- It helps future contributors understand where content lives versus how the site is rendered and shipped.
2) Public Site Map / Navigation Flow
flowchart TD
HOME[Start Here /]
BLOG[Posts /blog/]
PROJECTS[Projects /projects/]
HOWTOS[How Tos /how-tos/]
FIXES[Fixes /fixes/]
ABOUT[About /about/]
CONTACT[Contact /contact/]
ARCHIVE[Archive /archive/]
PRIVACY[Privacy /privacy/]
TERMS[Terms /terms/]
NOTFOUND[404 Page]
HOME --> BLOG
HOME --> PROJECTS
HOME --> HOWTOS
HOME --> FIXES
HOME --> ABOUT
HOME --> CONTACT
BLOG --> ARCHIVE
BLOG --> PRIVACY
BLOG --> TERMS
PROJECTS --> PROJECT_DETAIL[Project Detail Pages]
HOWTOS --> HOWTO_DETAIL[How-To Detail Pages]
FIXES --> FIX_DETAIL[Fix Detail Pages]
BLOG --> POST_DETAIL[Post Detail Pages]
ARCHIVE --> POST_DETAIL
What this diagram shows
- The homepage is the landing page, while /blog/ is the posts hub.
- Collections each have a hub page and then detail pages underneath.
- Legal and utility pages support the site but are not primary discovery hubs.
- they are placed on the footer and the footer-bottom.
Why it matters
- This is the visitor-facing navigation model.
- It is useful for both UX planning and contributor onboarding.
3) Collection Architecture
flowchart LR
POSTS_SRC[_posts] --> POSTS_HUB[/blog/]
POSTS_SRC --> POST_PAGES[Blog Post Pages]
PROJECTS_SRC[_projects] --> PROJECTS_HUB[/projects/]
PROJECTS_SRC --> PROJECT_PAGES[Project Pages]
HOWTOS_SRC[_how_tos] --> HOWTOS_HUB[/how-tos/]
HOWTOS_SRC --> HOWTO_PAGES[How-To Pages]
FIXES_SRC[_fixes] --> FIXES_HUB[/fixes/]
FIXES_SRC --> FIX_PAGES[Fix Pages]
What this diagram shows
- Each collection has a source folder and a public-facing hub.
- The hub and the item pages are separate concepts.
- Jekyll generates item pages from collection entries.
Why it matters
- This makes it easier to explain where to place new content.
- It also helps maintain clean boundaries between content types.
4) Standalone Page Architecture
flowchart TD
ROOT[Standalone Pages]
ROOT --> START[Start Here /]
ROOT --> ABOUT[About /about/]
ROOT --> CONTACT[Contact /contact/]
ROOT --> ARCHIVE[Archive /archive/]
ROOT --> PRIVACY[Privacy /privacy/]
ROOT --> TERMS[Terms /terms/]
ROOT --> ERR[404 Page]
What this diagram shows
- These pages are not part of collections.
- They serve either informational, navigational, legal, or utility purposes.
- They are managed as individual pages rather than repeated content patterns.
Why it matters
- This helps separate “site structure pages” from “content publishing pages.”
- It is especially useful for future staff and contributors.
5) Section Navigation Diagram
flowchart TD
MAIN[Main Navigation]
MAIN --> START[Start Here]
MAIN --> POSTS[Posts]
MAIN --> PROJECTS[Projects]
MAIN --> HOWTOS[How Tos]
MAIN --> FIXES[Fixes]
MAIN --> ABOUT[About]
MAIN --> CONTACT[Contact]
HOWTOS --> AWS[AWS]
HOWTOS --> GITHUB[GitHub]
HOWTOS --> WINDOWS[Windows]
FIXES --> CONDA[Conda]
FIXES --> JEKYLL[Jekyll]
FIXES --> PYTHON[Python]
What this diagram shows
- The primary top-level nav is concise and brand-friendly.
- The secondary navs organize How Tos and Fixes by topic families.
- Navigation is intentionally layered rather than flat.
Why it matters
- This gives contributors a quick mental model of site taxonomy.
- It also helps spot future expansion points.
6) Rendering / Theme Dependency Diagram
flowchart TB
PAGE[Markdown Page or </br>Collection Item] --> FM[Front Matter]
FM --> LAYOUT[Minimal Mistakes Layout]
LAYOUT --> INC[_includes]
INC --> FOOTER[footer.html Override]
INC --> GHPROJ[github-project.html]
INC --> GHREADME[github-readme.html]
SCSS_ENTRY[assets/css/main.scss] --> MM_SKIN[MM Skin Import]
SCSS_ENTRY --> MM_THEME[Minimal Mistakes </br>Theme Import]
SCSS_ENTRY --> CUSTOM_SCSS[_sass/minimal-mistakes/_custom.scss]
JS_INIT[assets/js/mermaid-init.js] --> MERMAID[Mermaid Rendering]
LAYOUT --> OUTPUT[Rendered HTML]
FOOTER --> OUTPUT
GHPROJ --> OUTPUT
GHREADME --> OUTPUT
CUSTOM_SCSS --> OUTPUT
MERMAID --> OUTPUT
What this diagram shows
- Minimal Mistakes provides layout structure.
- local includes extend the theme without replacing the whole layout system.
- Styling flows through main.scss into MM imports and the custom overrides.
Why it matters
- This is the clearest view of how customization layer sits on top of the theme.
- It makes future debugging much easier.
7) Content Publishing Flow
flowchart TD
AUTHOR[Write Markdown Content] --> FRONTMATTER[Add Front Matter]
FRONTMATTER --> TYPE{Content Type?}
TYPE -->|Post| POSTS[_posts]
TYPE -->|Project| PROJECTS[_projects]
TYPE -->|How-To| HOWTOS[_how_tos]
TYPE -->|Fix| FIXES[_fixes]
TYPE -->|Standalone| PAGES[Standalone Page]
POSTS --> BUILD[Jekyll Build]
PROJECTS --> BUILD
HOWTOS --> BUILD
FIXES --> BUILD
PAGES --> BUILD
BUILD --> HTML[Generated HTML in _site]
HTML --> DEPLOY[Deployment Pipeline]
DEPLOY --> LIVE[Live Website]
What this diagram shows
- All content types follow the same lifecycle: write, classify, build, deploy.
- Front matter is the key control point for layout and metadata.
- Jekyll normalizes everything into static output.
Why it matters
- This is best contributor-facing authoring workflow diagram.
- It supports SDLC and content governance.
8) Deployment Flow — Two Repo Strategy
flowchart TB
SRC[Private Source Repo] --> PUSH[Push to main]
PUSH --> ACTIONS[GitHub Actions Workflow]
ACTIONS --> BUILD[Jekyll Production Build]
BUILD --> CNAME[Write CNAME to _site]
CNAME --> VERIFY[Sanity Checks]
VERIFY --> PUBLISH[Push _site to Public Repo]
PUBLISH --> PUBLIC[Public Repo: did-site-public]
PUBLIC --> GHPAGES[gh-pages Branch]
GHPAGES --> DOMAIN[datainsidedata.com via Route 53]
What this diagram shows
- Source authoring and public hosting are intentionally separated.
- GitHub Actions compiles the site and publishes only the built artifact.
- The public repo acts as the serving layer rather than the authoring layer.
Why it matters
- This is a strong professional deployment architecture.
- It reduces accidental exposure of source-only material and keeps hosting clean.
9) Public vs Internal Workspace Diagram
flowchart TB
WORKSPACE[Project Workspace]
WORKSPACE --> PUBLIC[Public Build Inputs]
WORKSPACE --> INTERNAL[Internal / Non-Build Assets]
PUBLIC --> POSTS[_posts]
PUBLIC --> PROJECTS[_projects]
PUBLIC --> HOWTOS[_how_tos]
PUBLIC --> FIXES[_fixes]
PUBLIC --> PAGES[Standalone Pages]
PUBLIC --> CONFIG[_config.yml]
PUBLIC --> NAV[_data/navigation.yml]
PUBLIC --> INCLUDES[_includes]
PUBLIC --> STYLES[SCSS/CSS/JS]
PUBLIC --> IMAGES[Public Assets]
INTERNAL --> DRAFTS[_drafts]
INTERNAL --> DOCS[docs]
INTERNAL --> RAW[raw/source folders]
INTERNAL --> BACKUPS[backup files]
INTERNAL --> LOCALONLY[ignored local artifacts]
What this diagram shows
- Not everything in your repo/workspace is part of the public site.
- Internal folders support drafting, documentation, and source preservation.
- Build inputs and maintainer-only artifacts should stay conceptually separate.
Why it matters
- This helps contributors avoid touching the wrong areas.
- It also supports cleaner repo hygiene and long-term maintainability.
10) Generated Pages / Utility Architecture
flowchart TD
SITE[Jekyll Site Build] --> FEED[feed.xml]
SITE --> SITEMAP[sitemap.xml]
SITE --> ROBOTS[robots.txt]
SITE --> PAGINATION[Paginated Blog Pages]
SITE --> ERR[404 Page]
SITE --> ARCHIVE[Archive Page]
SITE --> SEARCH[Lunr Search Index]
SITE --> TAGS[Future Tags Page]
What this diagram shows
- Some important site outputs are generated or utility-oriented.
- These pages support discovery, navigation, crawling, and user recovery.
- The future tags page fits naturally into this supporting architecture.
Why it matters
- This is key for SEO, findability, and content operations.
- It also shows that the site is more than a collection of markdown pages.
11) Future Tags Architecture
flowchart TD
POSTS[Blog Posts] --> TAGS_META[Tags in Front Matter]
TAGS_META --> TAGS_INDEX[/tags/]
TAGS_INDEX --> TAG_GROUPS[Grouped Tag Sections]
TAG_GROUPS --> RELATED_POSTS[Linked Post Lists]
ARCHIVE[/archive/] --> TAGS_INDEX
BLOG[/blog/] --> TAGS_INDEX
What this diagram shows
- Posts declare tags in front matter.
- A /tags/ page can act as a taxonomy index.
- That page can connect back into blog discovery and archive browsing.
Why it matters
- This is the cleanest MM-friendly way to improve content discoverability.
- It supports internal linking, topical clustering, and better browsing behavior.
Notes
- Use a master /tags/ page with grouped sections first.
- Later, if needed, expand toward more advanced tag archive behavior.
12) GitHub Project Embed Flow
flowchart TD
PROJECT_PAGE[Project Page] --> INCLUDE[github-project.html Include]
INCLUDE --> API[GitHub Repo API]
INCLUDE --> README_RAW[Raw README URL]
README_RAW --> MARKED[marked.js Markdown Parser]
API --> META[Repo Metadata Rendered]
MARKED --> README_HTML[README Rendered in Page]
META --> FINAL[Enhanced Project Presentation]
README_HTML --> FINAL
What this diagram shows
- Your project pages can dynamically enrich themselves with GitHub data.
- Repo metadata and README content are fetched separately.
- The rendered result becomes a richer project showcase page.
Why it matters
- This is a strong portfolio feature.
- It also documents a custom behavior future contributors would not know from theme defaults alone.
13) Analytics & Observability Architecture
flowchart TB
VISITOR[Site Visitor] --> SITE[datainsidedata.com]
SITE --> SCRIPT[Google gtag Script]
SCRIPT --> GA4[Google Analytics Property]
GA4 --> DASH[Analytics Dashboard]
DASH --> INSIGHTS[Traffic + Behavior Insights]
What this diagram shows
- When a visitor loads a page, the gtag script runs in the browser.
- The script sends pageview and event data to the Google Analytics property.
- The analytics dashboard aggregates that telemetry for reporting.
Why it matters
- This provides observability into how the site is used.
- It supports decisions about content strategy, SEO performance, and site navigation.
Configuration Architecture (Analytics)
The _config.yml entry is the control point:
analytics:
provider: "google-gtag"
google:
tracking_id: "G-XXXXXXX"
anonymize_ip: true
What this configuration does
-
provider:
google-gtagEnables theGA4tracking integration built into Minimal Mistakes. -
tracking_idConnects the site to your specific Google Analytics property. -
anonymize_ip: trueEnables IP anonymization for privacy compliance.
How Jekyll activates analytics
- This part is important for the architecture explanation.
DID’s GitHub Actions workflow sets:
env:
JEKYLL_ENV: production
During build:
bundle exec jekyll build
Minimal Mistakes only injects analytics scripts when:
JEKYLL_ENV == production
14) Flow Chart
flowchart TD
CONFIG[_config.yml <br/>analytics settings] --> BUILD[Jekyll Build]
BUILD --> CHECK{JEKYLL_ENV == production?}
CHECK -->|Yes| INJECT[Inject gtag Script]
CHECK -->|No| SKIP[Skip Analytics Script]
INJECT --> SITE[Compiled Static Site]
SITE --> VISITOR[Visitor Browser]
VISITOR --> GA4[Google Analytics]
Why this is important
- Analytics does not run during local development
- It only runs in production builds
- This prevents test traffic from polluting metrics
That is exactly how production analytics behaves in an enterprise environment.
Analytics & Telemetry
The site integrates Google Analytics via the Minimal Mistakes google-gtag provider. Tracking is configured in _config.yml and injected into the site during production builds.
Analytics scripts are only included when the environment variable JEKYLL_ENV=production is set. This ensures development and preview builds do not send telemetry data.
Collected analytics data includes:
- page views
- traffic sources
- geographic distribution
- device types
- user navigation paths
- IP anonymization is enabled for privacy compliance.
15) Small pro-level improvement (optional)
Later add an Event Tracking layer.
Example events:
- project page views
- GitHub repo clicks
- outbound link clicks
- tutorial completion
Architecture addition:
flowchart LR
USER --> PAGE[Project Page]
PAGE --> EVENT[Custom Event]
EVENT --> GA4[Google Analytics Events]
This turns analytics from passive measurement into product insight.
The site now includes all major production-grade layers:
- content model
- presentation framework
- CI/CD pipeline
- custom component system
- analytics telemetry
- SEO infrastructure
That’s a very complete architecture for a modern static site.
Positions for diagrams in the document
Order:
- System Overview
- Public Site Architecture
- Content Architecture
- Rendering / Theme Architecture
- Publishing Workflow
- Deployment Architecture
- Analytics & Observability
- SEO / Discovery Systems
- Future Enhancements
Data Inside Data™.