Message Scheduler - Engineering Case Study

Problem #

Inspired by FutureMe, I wanted a personal tool to schedule messages to myself and others across multiple channels (email, Telegram). Existing solutions were either:

Too expensive for personal use
Lacked multi-channel support (email-only or SMS-only)
Cumbersome scheduling UX — most tools require manual date/time selection; I wanted to schedule messages using natural language (e.g. "tomorrow at 9am", "next Friday evening").

The core pain: I needed a way to schedule reminders, follow-ups, and time-delayed notifications that would actually arrive when expected.

Constraints #

Scale: Handle 10,000+ scheduled messages per day without degradation
Reliability: 99.9% uptime with failure handling
Timing accuracy: Sub-second scheduling precision
Multi-channel: Support email (via Amazon SES) and Telegram Bot API
Timezone-aware: Correct delivery times regardless of sender/recipient location
Self-hosted: Run on a single VPS with limited resources
Cost: Minimize infrastructure costs (no managed queues like SQS)

Architecture #

flowchart TB subgraph frontend["Frontend"] ui["Django Templates"] end subgraph backend["Backend Layer"] direction LR django["Django API"] postgres["PostgreSQL"] redis["Redis
(Broker)"] django ~~~ postgres ~~~ redis end subgraph workers["Celery Workers"] direction LR worker1["Celery
Worker 1"] worker2["Celery
Worker 2"] beat["Celery
Beat"] worker1 ~~~ worker2 ~~~ beat end subgraph delivery["Delivery Channels"] direction LR ses["Amazon SES
(Email)"] telegram["Telegram
Bot API"] ses ~~~ telegram end frontend --> backend backend --> workers workers --> delivery

Key components:

Django REST API: Handles scheduling requests, stores messages in PostgreSQL
Celery Beat: Runs once daily at midnight to schedule that day's messages
Celery Workers: Process delivery tasks asynchronously with retry logic
Redis: Message broker for task queue (lightweight, fast)

Deployment #

Docker Compose orchestrates the Django API, Celery workers, Celery Beat, Redis, and PostgreSQL
Nginx acts as a reverse proxy for the API and static assets
Systemd ensures services restart automatically on host reboots
Single-VPS deployment keeps operational overhead low and makes the system easy to debug

No complex configurations — just a single-node setup.

Decisions & Tradeoffs #

Why Django + Celery over FastAPI + custom scheduler?

Django's ORM and admin panel sped up development
Celery's apply_async(eta=...) provides precise scheduling without polling
Tradeoff: slightly higher memory footprint than FastAPI

For a broader discussion of how I approach backend reliability, async offloading, and failure-first design, see How I Design Reliable Backend Systems.

Why natural language scheduling?

Manual date pickers add friction for a task that's often quick and contextual.

Supporting natural language input (e.g. "in 2 weeks", "next Monday morning") makes scheduling faster and reduces errors. This fits well for a reminder-focused product — type "next year" or "next Monday" and the message gets scheduled.

This was implemented using chrono-node, inspired by the approach described in this blog post, with all parsed times normalized to UTC before persistence.

Why Redis over RabbitMQ?

Simpler to operate on a single VPS
Lower memory usage for our scale
Tradeoff: Less durable than RabbitMQ, but acceptable for a reminder system where occasional delayed delivery is preferable to operational complexity.

Why Amazon SES over SendGrid?

Pay-per-email pricing better for variable volume
Native AWS integration for future scaling
Tradeoff: requires domain verification setup

ETA scheduling vs polling

Instead of polling every minute, I use Celery's built-in eta parameter for precise delivery:

Same-day events — scheduled immediately when created:

message = ScheduledMessage.objects.create(send_at=send_time)

if message.send_at.date() == timezone.now().date():
    # Fire immediately with exact ETA
    send_message.apply_async(args=[message.id], eta=message.send_at)

Future events — picked up by daily cron at midnight:

# Celery Beat runs once at 00:00
messages = ScheduledMessage.objects.filter(
    send_at__date=today,
    status='pending'
)
for msg in messages:
    send_message.apply_async(args=[msg.id], eta=msg.send_at)

Why this works:

Fewer checks — one daily cron vs polling multiple times a day
Precise timing — eta starts the delivery at exact scheduled time
Low broker load — tasks sit in Redis until their ETA
No overhead for same-day — no cron needed, scheduled at creation time

Failure Modes #

Message delivery failures

Problem: External APIs (SES, Telegram) can fail transiently
Mitigation: Exponential backoff retry (3 attempts over 15 minutes)
Fallback(To-Do): Move to dead-letter queue after max retries, notify user

Worker crashes

Problem: Celery worker dies mid-task
Mitigation: Tasks are retried automatically when workers crash or restart, preventing message loss.
Monitoring: Health checks restart workers automatically

Timezone edge cases

Problem: Different client vs server timezone cause scheduling ambiguity
Mitigation: Store all times in UTC, convert at display/delivery time
Lesson: Never store local times in the database

Queue floods

Problem: Celery Beat could enqueue same message multiple times
Mitigation: Idempotency key per message, checked before delivery
Result: No duplicate deliveries in production

Results & Metrics #

Throughput: Designed for 10,000+ messages/day (~7 messages/minute on average) on a single VPS.
Uptime: 99.9% over 6 months
Timing accuracy: P99 < 1 second from scheduled time
Retry success rate: 94% of transient failures recovered
Cost: ~$5/month infrastructure (single shared VPS + SES)

Lessons Learned #

What I'd do differently

Add structured logging from day one - debugging async workers is painful without it
Implement dead-letter queue - failed messages can be silently dropped today. Currently monitored via Celery Flower.
Use chrono-node for natural language parsing - supports converting natural language directly into dates, which improves UX. It would have been better to include from start.

What worked well

Celery Beat + Redis - solid scheduling with minimal ops burden
Idempotency keys - prevented all duplicate delivery issues
UTC-everywhere - eliminated timezone bugs entirely