Problem #
I wanted a tool to manage both my online portfolio and resume PDFs.
Keeping a portfolio and resume in sync is annoying. Every time I updated my resume PDF, the portfolio website lagged behind. Every new project added to the website meant manually reformatting the same content for my LaTeX resume. With three resume variants (Full-Stack, Backend, AI), this became tedious.
The pain points:
- Duplicate Content Management - Editing the same information in multiple places (website JSON, LaTeX file, multiple resume variants)
- Format Fragmentation - Different technologies (React, LaTeX, JSON) with no shared data layer
- Manual Deployment - Updating resumes required local LaTeX compilation, then manual upload to hosting
- Version Control - Hard to find the latest version of resume PDFs
I wanted a system where I could edit content once, and have both the portfolio website and all resume PDFs update automatically.
Constraints #
- Single Source of Truth: All content (profile, experience, projects, skills) lives in JSON files that feed both the website and resumes
- Multi-Variant Resumes: Support for 3 resume types (Full-Stack, Backend, AI) with variant-specific content overrides
- Automated PDF Generation: Resume PDFs auto-generate on every commit - no local LaTeX compilation
- SEO Optimized: Website scores 100/100 on Lighthouse and generates proper structured data
- Minimal Infrastructure: No databases, no servers - just static files and GitHub Actions
- ATS Compatibility: Generated PDFs are machine-readable for Applicant Tracking Systems
Architecture #
fullstack.json
backend.json
ai.json"] profile ~~~ experience ~~~ projects ~~~ variants end subgraph merge["Data Merge Layer"] direction LR astro_loader["loadPortfolioData.ts
(Astro Build-Time)"] python_render["render_resume.py
(Python + Jinja2)"] astro_loader ~~~ python_render end subgraph website["Astro + React Website"] direction LR context["PortfolioContext
(Variant Aware)"] pages["Dynamic Pages:
/, /backend, /ai
/projects/[slug]"] context ~~~ pages end subgraph cicd["GitHub Actions CI/CD"] direction LR steps["1. Run render_resume.py
2. pdflatex compile
3. Upload to R2 CDN"] outputs["Outputs:
• ankit_jangwan_fullstack.pdf
• ankit_jangwan_backend.pdf
• ankit_jangwan_ai.pdf"] steps ~~~ outputs end json --> merge merge --> website merge --> cicd
Key components:
- JSON Data Layer: Base files (profile, experience, projects) + variant-specific overrides
- Astro + React: Static site generation with React islands for interactivity
- Jinja2 + LaTeX: Python script renders JSON → LaTeX template → PDF
- GitHub Actions: Matrix build for all 3 resume variants, with R2 CDN upload
Decisions & Tradeoffs #
Why migrate from Vite + React to Astro?
The portfolio originally used Vite + React (SPA). It worked, but I hit limitations:
- Poor SEO - SPA rendering meant search engines saw minimal content. Used
react-helmetfor meta tags but it wasn't enough. - No GEO - couldn't implement proper structured data for AI search engines
- Lighthouse scores - JavaScript-heavy bundles hurt performance
I migrated to Astro for its partial hydration model, which delivers better Core Web Vitals while still supporting React components where interactivity is needed:
- Zero JS by default - components only hydrate when needed (
client:load) - Built-in static generation -
getStaticPaths()for project pages - Native sitemap generation - SEO-friendly out of the box
- Schema.org support - structured data for GEO
Tradeoff: Learning curve for the islands architecture pattern, but the SEO and performance gains were worth it.
Why Jinja2 + LaTeX over PDF libraries?
I evaluated several approaches:
- react-pdf: Poor typography control, difficult ATS compatibility
- Puppeteer/Playwright: Browser-based rendering adds complexity
- LaTeX: Professional typography, ATS-friendly, widely used in tech
Jinja2 for templating because:
- Familiar Python ecosystem, I've used it extensively in my work
- Custom delimiters
(( ))avoid conflicts with LaTeX's{ }syntax - Filter functions for escaping special characters (
&,%,$)
Why variant-specific JSON files?
Instead of a monolithic config, I split variant data because:
- Skills differ between Full-Stack, Backend, and AI roles
- Experience descriptions need tailoring (e.g., more backend focus for Backend resume)
- Project priorities change per variant (visibility filtering)
The merge strategy: base files provide raw data, variant files provide experienceDetails, skills, projectOverrides.
Why Cloudflare R2 over GitHub Releases?
Initially, the GitHub Actions workflow published resume PDFs as GitHub Release artifacts. This worked, but I hit a blocker: release artifacts are not publicly accessible for private repositories.
I wanted to:
- Keep my personal portfolio repo private (contains personal details)
- Share a public template repo for others to fork and customize
- Have publicly accessible resume download links
GitHub Releases couldn't satisfy all three. Cloudflare R2 solved this:
- Public CDN access - PDFs are accessible even when the source repo is private
- Simple upload - AWS CLI (pre-installed on GitHub Actions runners) works with R2's S3-compatible API
Tradeoff: Requires R2 bucket setup and credentials in GitHub Secrets.
Implementation Details #
LaTeX Template with Jinja2
The template uses custom delimiters to avoid LaTeX conflicts:
env = Environment(
block_start_string="((",
block_end_string="))",
variable_start_string="((=",
variable_end_string="=))",
)Example template snippet:
(( for job in experience ))
\resumeSubheading
{((= job.role | latex_escape =))}{((= job.location =))}
{((= job.company =))}{((= job.startDate | format_duration(job.endDate) =))}
(( endfor ))Data Merge in TypeScript (Website)
The loadPortfolioData.ts function merges at build time:
// Filter projects by variant visibility
const mergedProjects = mergeProjectsWithVariant(
projData.projects,
variantData, // Contains projectOverrides
version // 'fullstack' | 'backend' | 'ai'
);GitHub Actions Matrix Build
All three resume variants build in parallel:
strategy:
matrix:
variant: [fullstack, backend, ai]
include:
- variant: fullstack
output_name: ankit_jangwan_fullstack.pdf
is_default: trueBuild triggers on any change to data files:
paths:
- 'public/data/resume.tex.j2'
- 'public/data/variants/**'
- 'public/data/profile.json'
- 'public/data/experience.json'Failure Modes & Mitigations #
LaTeX Compilation Errors
- Problem: Special characters (
&,#,%) break LaTeX compilation - Mitigation:
latex_escapeJinja2 filter sanitizes all text fields - Result: No compilation failures from content
JSON Validation
- Problem: Invalid JSON breaks both website and resume builds
- Mitigation: TypeScript types enforce schema at build time
- Future: Add JSON Schema validation in CI
Resume Over-Length
- Problem: Too many experience bullets overflow page margins
- Mitigation: Template limits experience details to top 3 jobs with bullets; older roles show header only
- Design: Projects limited to top 3, with link to full portfolio
Results & Metrics #
- Data Consistency: Zero content drift - all outputs generated from shared JSON schema
- Automation: Resume updates went from manual workflows to commit-driven builds (~30 min saved/update)
- SEO & GEO: Static rendering + structured data = perfect Lighthouse scores
- Build time: All resume variants generated in parallel in ~45 seconds
Lessons Learned #
What I'd do differently
- Add JSON Schema validation - TypeScript catches errors at build time, but a schema would catch them at commit time
- Preview before merge - A GitHub Action that renders resume previews on PRs would catch formatting issues earlier
- Structured logging in Python script - Currently uses print statements; proper logging would help debug CI issues
- Use Astro from the start - Astro is the right choice for content-heavy websites
What worked well
- Variant system - One codebase, three resumes, zero duplication
- Vite → Astro migration - SEO and Lighthouse improvements justified the refactor
- LaTeX for resumes - Professional output, ATS-compatible, and version-controllable
- R2 CDN for PDFs - Public template sharing while keeping personal repo private
AI Assistant Layer: Building a $0-Cost RAG Chatbot #
I didn't just want to list "AI/ML" under my skills - I wanted to prove it. The portfolio includes a custom-built AI chatbot that answers visitor questions using the portfolio's own content.
Why I built it: I wanted to go beyond simple API wrappers and build a RAG pipeline from scratch. The goal was to work with vector databases, embedding models, real-time streaming, and multi-provider setups. It's built to demonstrate my ability to build real AI systems.
The constraint ($0 cost): I wanted to run this at exactly $0 without cutting corners on architecture. I achieved this by using Cerebras (serving open-weight Llama models) for fast inference and local sentence-transformers for embeddings. The output quality of open-weight models is slightly lower than GPT-4 or Claude, but the architecture is provider-agnostic. If I need better quality, swapping to a premium model is just an API key change in LLMClient.
Architecture Overview
(chunking + embeddings)"] qdrant["Vector DB"] json_data --> sync case_studies --> sync sync --> qdrant end subgraph runtime["Runtime (per question)"] direction LR query["User Question"] embed["sentence-transformers
Embedding"] search["Vector DB Similarity Search"] prompt["Prompt Builder
(context + history)"] llm["Multi-Provider LLM
(Cerebras / Gemini / OpenAI)"] suggest["Function Calling
(parallel suggestions)"] query --> embed --> search --> prompt --> llm search --> suggest end subgraph delivery["Delivery"] direction LR sse["SSE Stream"] worker["Cloudflare Worker
Proxy"] frontend["React Chat Widget"] storage["localStorage
(1h TTL)"] sse --> worker --> frontend frontend --> storage end ingest --> runtime --> delivery
RAG Pipeline
- Ingestion:
sync_knowledge_base.pyreads JSON data files and markdown case studies, chunks them usingRecursiveCharacterTextSplitter, generates embeddings via localsentence-transformers(all-MiniLM-L6-v2), and upserts vectors into Qdrant with metadata (source URL, title, project slug). - Retrieval: On each query, the user's question is embedded and searched against Qdrant with similarity scoring. Top results form the context for the LLM prompt.
- Generation: A system prompt forces the LLM to answer only from the provided context, preventing hallucinations and ensuring accurate representation.
Multi-Provider LLM Client
I built an LLMClient class that acts as a universal adapter using the OpenAI SDK structure:
- Cerebras (Llama) — Primary, fast and free
- Gemini / OpenAI GPT-4 — Easy-to-swap alternatives via
settings.yaml - OpenRouter / NVIDIA — Additional fallbacks
Streaming & Function Calling
Nobody likes staring at a loading spinner.
- Streaming Response: Server-Sent Events (SSE). The FastAPI backend yields tokens as they generate, and a Cloudflare Worker proxies this stream to the React frontend. Sub-second time-to-first-token.
- Zero-Latency Suggestions: After each response, the system generates 2-3 follow-up questions using OpenAI-compatible function calling. This runs in parallel with the main text stream in a
ThreadPoolExecutor, so suggestion chips appear instantly when the answer finishes.
Chat History
Conversation context is managed client-side using localStorage (1-hour TTL). The last 5 exchanges are sent to the backend with each request, allowing the LLM to handle follow-up questions ("tell me more about that project") without a backend database for session state.
Decisions & Tradeoffs
Why Qdrant over Pinecone? I wanted to avoid per-vector pricing. Qdrant is open-source, runs in Docker, and supports the metadata filtering I needed.
Why sentence-transformers over OpenAI embeddings?
Cost and speed. Completely free, runs locally without network latency, and all-MiniLM-L6-v2 handles portfolio-scale semantic search well.
Why client-side history? Stateless backends are easier to scale. By keeping chat history in the visitor's browser, I eliminated the need for a session database, making the system GDPR-friendly by default.