Problem #
Inspired by FutureMe, I wanted a personal tool to schedule messages to myself and others across multiple channels (email, Telegram). Existing solutions were either:
- Too expensive for personal use
- Lacked multi-channel support (email-only or SMS-only)
- Cumbersome scheduling UX — most tools require manual date/time selection; I wanted to schedule messages using natural language (e.g. "tomorrow at 9am", "next Friday evening").
The core pain: I needed a way to schedule reminders, follow-ups, and time-delayed notifications that would actually arrive when expected.
Constraints #
- Scale: Handle 10,000+ scheduled messages per day without degradation
- Reliability: 99.9% uptime with failure handling
- Timing accuracy: Sub-second scheduling precision
- Multi-channel: Support email (via Amazon SES) and Telegram Bot API
- Timezone-aware: Correct delivery times regardless of sender/recipient location
- Self-hosted: Run on a single VPS with limited resources
- Cost: Minimize infrastructure costs (no managed queues like SQS)
Architecture #
(Broker)"] django ~~~ postgres ~~~ redis end subgraph workers["Celery Workers"] direction LR worker1["Celery
Worker 1"] worker2["Celery
Worker 2"] beat["Celery
Beat"] worker1 ~~~ worker2 ~~~ beat end subgraph delivery["Delivery Channels"] direction LR ses["Amazon SES
(Email)"] telegram["Telegram
Bot API"] ses ~~~ telegram end frontend --> backend backend --> workers workers --> delivery
Key components:
- Django REST API: Handles scheduling requests, stores messages in PostgreSQL
- Celery Beat: Runs once daily at midnight to schedule that day's messages
- Celery Workers: Process delivery tasks asynchronously with retry logic
- Redis: Message broker for task queue (lightweight, fast)
Deployment #
- Docker Compose orchestrates the Django API, Celery workers, Celery Beat, Redis, and PostgreSQL
- Nginx acts as a reverse proxy for the API and static assets
- Systemd ensures services restart automatically on host reboots
- Single-VPS deployment keeps operational overhead low and makes the system easy to debug
No complex configurations — just a single-node setup.
Decisions & Tradeoffs #
Why Django + Celery over FastAPI + custom scheduler?
- Django's ORM and admin panel sped up development
- Celery's
apply_async(eta=...)provides precise scheduling without polling - Tradeoff: slightly higher memory footprint than FastAPI
Why natural language scheduling?
Manual date pickers add friction for a task that's often quick and contextual.
Supporting natural language input (e.g. "in 2 weeks", "next Monday morning") makes scheduling faster and reduces errors. This fits well for a reminder-focused product — type "next year" or "next Monday" and the message gets scheduled.
This was implemented using chrono-node, inspired by the approach described in this blog post, with all parsed times normalized to UTC before persistence.
Why Redis over RabbitMQ?
- Simpler to operate on a single VPS
- Lower memory usage for our scale
- Tradeoff: Less durable than RabbitMQ, but acceptable for a reminder system where occasional delayed delivery is preferable to operational complexity.
Why Amazon SES over SendGrid?
- Pay-per-email pricing better for variable volume
- Native AWS integration for future scaling
- Tradeoff: requires domain verification setup
ETA scheduling vs polling
Instead of polling every minute, I use Celery's built-in eta parameter for precise delivery:
Same-day events — scheduled immediately when created:
message = ScheduledMessage.objects.create(send_at=send_time)
if message.send_at.date() == timezone.now().date():
# Fire immediately with exact ETA
send_message.apply_async(args=[message.id], eta=message.send_at)Future events — picked up by daily cron at midnight:
# Celery Beat runs once at 00:00
messages = ScheduledMessage.objects.filter(
send_at__date=today,
status='pending'
)
for msg in messages:
send_message.apply_async(args=[msg.id], eta=msg.send_at)Why this works:
- Fewer checks — one daily cron vs polling multiple times a day
- Precise timing —
etastarts the delivery at exact scheduled time - Low broker load — tasks sit in Redis until their ETA
- No overhead for same-day — no cron needed, scheduled at creation time
Failure Modes #
Message delivery failures
- Problem: External APIs (SES, Telegram) can fail transiently
- Mitigation: Exponential backoff retry (3 attempts over 15 minutes)
- Fallback(To-Do): Move to dead-letter queue after max retries, notify user
Worker crashes
- Problem: Celery worker dies mid-task
- Mitigation: Tasks are retried automatically when workers crash or restart, preventing message loss.
- Monitoring: Health checks restart workers automatically
Timezone edge cases
- Problem: Different client vs server timezone cause scheduling ambiguity
- Mitigation: Store all times in UTC, convert at display/delivery time
- Lesson: Never store local times in the database
Queue floods
- Problem: Celery Beat could enqueue same message multiple times
- Mitigation: Idempotency key per message, checked before delivery
- Result: No duplicate deliveries in production
Results & Metrics #
- Throughput: Designed for 10,000+ messages/day (~7 messages/minute on average) on a single VPS.
- Uptime: 99.9% over 6 months
- Timing accuracy: P99 < 1 second from scheduled time
- Retry success rate: 94% of transient failures recovered
- Cost: ~$5/month infrastructure (single shared VPS + SES)
Lessons Learned #
What I'd do differently
- Add structured logging from day one - debugging async workers is painful without it
- Implement dead-letter queue - failed messages can be silently dropped today. Currently monitored via Celery Flower.
- Use chrono-node for natural language parsing - supports converting natural language directly into dates, which improves UX. It would have been better to include from start.
What worked well
- Celery Beat + Redis - solid scheduling with minimal ops burden
- Idempotency keys - prevented all duplicate delivery issues
- UTC-everywhere - eliminated timezone bugs entirely