Skip to content

Debugging

Logging

Local Development

Use pretty logging for readable output:

bash
MAILMAN_LOGGING__FORMAT=pretty RUST_LOG=debug cargo run --bin mailman-bin-api

Log levels:

  • error — unrecoverable failures
  • warn — degraded but functional (e.g., ClamAV unreachable, using noop scanner)
  • info — request lifecycle, queue processing events
  • debug — detailed handler logic, SQL queries, SQS messages
  • trace — full request/response bodies, raw MIME content

Production Logs

Structured JSON output to CloudWatch:

bash
# Tail API logs
aws logs tail /ecs/mailman-api --since 30m --follow

# Filter for errors
aws logs filter-log-events \
  --log-group-name /ecs/mailman-worker \
  --filter-pattern "ERROR" \
  --start-time $(date -d '1 hour ago' +%s)000

Common Issues

SQLx Compile Errors

SQLx validates queries at compile time against a live database. If you see errors like error returned from database: relation "foo" does not exist:

  1. Ensure Postgres is running and DATABASE_URL is set
  2. Run migrations: sqlx migrate run --source crates/adapters-postgres/migrations
  3. For CI/Docker builds, use offline mode: cargo sqlx prepare to generate .sqlx/ cache, then build with SQLX_OFFLINE=true

"Connection refused" on Tests

Integration tests need Postgres running on the port in DATABASE_URL:

bash
# Check if Postgres container is running
docker ps | grep mailman-postgres

# Restart if needed
docker start mailman-postgres

Worker Not Processing Messages

The worker needs valid SQS queue URLs. Check:

  1. Queue URLs are set in environment variables
  2. AWS credentials are available (profile, env vars, or instance role)
  3. The queues exist in the configured region

For local development without AWS, you can't run the worker — it polls SQS and will fail immediately.

ClamAV Scan Failures

ClamAV is optional. If the host is not configured, a noop scanner is used. If it IS configured but unreachable:

  • The worker logs a warning and the inbound message processing fails
  • Messages stay in the SQS queue for retry
  • Check MAILMAN_SCAN__CLAMAV_HOST and MAILMAN_SCAN__CLAMAV_PORT

Rate Limiting in Tests

Integration tests may hit rate limits if running many send tests in sequence. The TestApp helper configures generous limits, but if you see 429 responses:

  1. Check that tests use the TestApp helper (which sets high limits)
  2. If testing rate limiting specifically, use the dedicated rate_limit_integration.rs test file

Useful SQL Queries

Check Recent Message Activity

sql
SELECT message_id, direction, subject, sent_at
FROM messages
ORDER BY created_at DESC
LIMIT 20;

Check Thread State

sql
SELECT t.id, t.subject, COUNT(m.message_id) as msg_count,
       t.created_at, t.updated_at
FROM threads t
LEFT JOIN messages m ON m.thread_id = t.id
WHERE t.deleted_at IS NULL
GROUP BY t.id
ORDER BY t.updated_at DESC
LIMIT 10;

Check Delivery Pipeline

sql
SELECT m.message_id, m.subject,
       de.status, de.ses_message_id, de.created_at
FROM messages m
LEFT JOIN delivery_events de ON de.message_id = m.message_id
WHERE m.direction = 'outbound'
ORDER BY m.created_at DESC
LIMIT 20;

Check Webhook Health

sql
SELECT url, status, failure_count,
       last_success_at, last_failure_at
FROM webhook_endpoints
ORDER BY failure_count DESC;

Check Suppression List

sql
SELECT address, reason, created_at
FROM suppressed_addresses
ORDER BY created_at DESC
LIMIT 20;

Request Tracing

Every API request gets a unique trace ID in the response headers and logs. Use it to correlate a user-reported issue with the specific log entries:

bash
# Find logs for a specific request
aws logs filter-log-events \
  --log-group-name /ecs/mailman-api \
  --filter-pattern '"trace_id":"abc123"'