· 8 min read

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

By HookCap Team

Webhook Best Practices: Retry Logic, Idempotency, and Error Handling

Most webhook integrations fail silently. A handler returns 500, the provider retries a few times, then stops. Your system never processed the event and no one knows.

Webhooks are not guaranteed delivery by default. How reliably your integration works depends almost entirely on how you write the receiver. This guide covers the patterns that make webhook handlers production-grade: proper retry handling, idempotency, error response codes, and queue-based processing.

Understand the Delivery Model

Before building handlers, understand what you are dealing with:

  • Providers send webhook events as HTTP POST requests
  • They expect a 2xx response within a timeout (typically 5-30 seconds)
  • If they do not receive 2xx, they retry on a schedule (often exponential backoff over hours or days)
  • Most providers have a maximum retry count after which the event is dropped
  • Some providers allow you to manually retry from their dashboard
Stripe retry schedule:
Attempt 1:  immediate
Attempt 2:  5 minutes
Attempt 3:  30 minutes
Attempt 4:  2 hours
Attempt 5:  5 hours
Attempt 6:  10 hours
Attempt 7:  24 hours
... continues for ~72 hours total

This retry behavior is your safety net — but only if your handler is idempotent.

Rule 1: Respond Fast, Process Async

Your webhook handler should acknowledge receipt immediately and do the actual work in the background. If you do database writes, call external APIs, or send emails synchronously inside the handler, you risk timing out.

// BAD: synchronous processing risks timeout
app.post('/webhook/stripe', async (req, res) => {
  const event = JSON.parse(req.body);

  if (event.type === 'payment_intent.succeeded') {
    // This could take several seconds
    await fulfillOrder(event.data.object);
    await sendConfirmationEmail(event.data.object.metadata.email);
    await updateInventory(event.data.object.metadata.items);
  }

  res.json({ received: true }); // might never get here if above throws
});

// GOOD: acknowledge immediately, process async
app.post('/webhook/stripe', async (req, res) => {
  const event = JSON.parse(req.body);

  // Queue the work — respond in milliseconds
  await queue.add('stripe-webhook', { event });

  res.json({ received: true }); // always returns 200 fast
});

// Worker processes the queue
queue.process('stripe-webhook', async (job) => {
  const { event } = job.data;
  if (event.type === 'payment_intent.succeeded') {
    await fulfillOrder(event.data.object);
    await sendConfirmationEmail(event.data.object.metadata.email);
    await updateInventory(event.data.object.metadata.items);
  }
});

The queue gives you retry logic, failure visibility, and async processing without blocking the HTTP response.

Rule 2: Make Handlers Idempotent

Since providers retry webhooks, your handler may receive the same event multiple times. You must make your handler safe to run more than once with the same event ID.

Without idempotency, a network blip that causes Stripe to retry a payment_intent.succeeded event could charge a customer twice, create duplicate orders, or send duplicate emails.

Track Processed Event IDs

The simplest approach: store event IDs and skip events you have already processed.

async function handleStripeEvent(event) {
  // Check if we already processed this event
  const existing = await db.query(
    'SELECT id FROM processed_webhooks WHERE event_id = $1',
    [event.id]
  );

  if (existing.rows.length > 0) {
    console.log(`Skipping duplicate event: ${event.id}`);
    return; // idempotent: no-op on duplicate
  }

  // Process the event
  await processEvent(event);

  // Record that we processed it
  await db.query(
    'INSERT INTO processed_webhooks (event_id, processed_at) VALUES ($1, NOW())',
    [event.id]
  );
}

Upsert Instead of Insert

When creating records from webhook data, use upsert (insert-or-update) instead of plain insert:

-- BAD: fails or creates duplicate on retry
INSERT INTO subscriptions (stripe_id, user_id, status, plan)
VALUES ($1, $2, $3, $4);

-- GOOD: idempotent, safe to run multiple times
INSERT INTO subscriptions (stripe_id, user_id, status, plan)
VALUES ($1, $2, $3, $4)
ON CONFLICT (stripe_id)
DO UPDATE SET status = EXCLUDED.status, plan = EXCLUDED.plan;

Use Database Transactions with Idempotency Key

For more complex operations, wrap the idempotency check and business logic in a transaction:

async function handleWebhookIdempotent(eventId, operation) {
  return await db.transaction(async (trx) => {
    // Atomic check-and-insert prevents race conditions on concurrent retries
    const result = await trx.raw(`
      INSERT INTO processed_webhooks (event_id, processed_at)
      VALUES (?, NOW())
      ON CONFLICT (event_id) DO NOTHING
      RETURNING id
    `, [eventId]);

    if (result.rows.length === 0) {
      // Already processed — skip
      return null;
    }

    // Run business logic inside the same transaction
    return await operation(trx);
  });
}

Rule 3: Return the Right HTTP Status Codes

Your response code tells the provider whether to retry. Use it correctly:

StatusMeaningProvider behavior
200-299SuccessNo retry
400Bad request (your choice not to process)Providers usually stop retrying
401/403UnauthorizedProviders usually stop retrying
500-503Your server errorProvider retries
TimeoutNo response in timeProvider retries

The key distinction: use 5xx when the error is transient (database temporarily down, external API timeout) and 4xx when the error is permanent (invalid payload format, unsupported event type).

app.post('/webhook', async (req, res) => {
  let event;

  // Signature verification failure: return 400, don't want retry
  try {
    event = verifyAndParseWebhook(req.body, req.headers);
  } catch (err) {
    return res.status(400).json({ error: 'Invalid signature' });
  }

  // Unknown event type: return 200, don't retry
  if (!supportedEvents.includes(event.type)) {
    return res.status(200).json({ received: true, skipped: true });
  }

  // Queue for async processing, return 200 fast
  try {
    await queue.add(event);
    return res.status(200).json({ received: true });
  } catch (err) {
    // Queue is down: return 503 so provider retries later
    return res.status(503).json({ error: 'Service unavailable' });
  }
});

Rule 4: Handle Out-of-Order Delivery

Providers do not guarantee that webhooks arrive in the order events occurred. A customer.subscription.updated event might arrive before the customer.subscription.created event for the same subscription.

Design your handlers to work regardless of order:

async function handleSubscriptionEvent(event) {
  const sub = event.data.object;

  if (event.type === 'customer.subscription.updated') {
    // Don't assume the subscription already exists in your DB
    await db.query(`
      INSERT INTO subscriptions (stripe_id, status, plan, updated_at)
      VALUES ($1, $2, $3, NOW())
      ON CONFLICT (stripe_id)
      DO UPDATE SET
        status = EXCLUDED.status,
        plan = EXCLUDED.plan,
        updated_at = EXCLUDED.updated_at
      WHERE subscriptions.updated_at < EXCLUDED.updated_at
    `, [sub.id, sub.status, sub.items.data[0].price.id]);
  }
}

The WHERE subscriptions.updated_at < EXCLUDED.updated_at clause handles the case where an older event arrives after a newer one — it will not overwrite newer data with stale data.

Rule 5: Log Everything

Log enough to reconstruct what happened to any webhook event without going back to the provider’s dashboard:

const logger = require('pino')();

app.post('/webhook', async (req, res) => {
  const eventId = req.headers['stripe-event-id'] ?? 'unknown';
  const eventType = req.body?.type ?? 'unknown';

  logger.info({ eventId, eventType }, 'Webhook received');

  try {
    await queue.add({ event: req.body });
    logger.info({ eventId, eventType }, 'Webhook queued');
    res.json({ received: true });
  } catch (err) {
    logger.error({ eventId, eventType, err }, 'Failed to queue webhook');
    res.status(503).json({ error: 'Unavailable' });
  }
});

// In your queue worker
queue.process(async (job) => {
  const { event } = job.data;
  logger.info({ eventId: event.id, type: event.type, attempt: job.attemptsMade }, 'Processing webhook');

  try {
    await processEvent(event);
    logger.info({ eventId: event.id }, 'Webhook processed successfully');
  } catch (err) {
    logger.error({ eventId: event.id, err }, 'Webhook processing failed');
    throw err; // let the queue retry
  }
});

Rule 6: Monitor Webhook Health

Failed webhooks are silent by default. Set up monitoring:

  1. Check provider dashboards — Stripe, GitHub, and Shopify all show webhook delivery history. Check them regularly or set up alerts.

  2. Alert on queue depth — If your webhook queue grows, something is wrong upstream.

  3. Track error rates — Log a counter whenever a webhook handler fails. Alert if the error rate spikes.

  4. Set up dead letter queues — Events that fail after all retries should go to a dead letter queue for manual inspection, not disappear silently.

// BullMQ dead letter queue example
const queue = new Queue('webhooks');
const worker = new Worker('webhooks', processWebhook, {
  attempts: 5,
  backoff: { type: 'exponential', delay: 1000 },
});

worker.on('failed', (job, err) => {
  if (job.attemptsMade >= job.opts.attempts) {
    // Move to dead letter queue
    deadLetterQueue.add('failed-webhook', {
      event: job.data.event,
      error: err.message,
      failedAt: new Date().toISOString(),
    });
  }
});

Testing Webhook Handling with HookCap

HookCap makes it easy to test these patterns before production:

  1. Capture real webhook payloads — Point your provider to a HookCap endpoint to collect real events. Inspect headers, body structure, and signature format.

  2. Test retry handling — Use HookCap’s replay feature to send the same event to your handler multiple times. Verify that your idempotency logic prevents duplicate processing.

  3. Test error recovery — Replay a captured event to a handler you deliberately break (return 500). Watch how your queue retries it. Fix the handler and replay again.

  4. Simulate out-of-order delivery — Capture a sequence of related events and replay them in reverse order to verify your handler processes them correctly.

The replay feature is especially useful for idempotency testing: you can replay the same event ID dozens of times and confirm your database shows exactly one processed record each time.

Summary

Production webhook handlers need:

  1. Fast acknowledgment — Return 200 immediately, process async
  2. Idempotency — Track event IDs, use upserts, handle duplicate deliveries
  3. Correct status codes — 5xx for transient errors (retry-worthy), 4xx for permanent errors
  4. Order independence — Design DB writes to handle out-of-order events
  5. Comprehensive logging — Log receipt, queuing, processing, and failures
  6. Dead letter queues — Capture events that exhaust all retries

Most webhook failures come down to missing one of these. Add them to your integration checklist before going to production.