How We Handle Webhook Deduplication (And Why You Should Too), Bojan Josifoski

HubSpot and Salesforce both send duplicate webhook events. If you process them twice, you get corrupted data, duplicate notifications, and broken attribution. Here is the idempotency pattern that prevents it.

If you build integrations with HubSpot or Salesforce long enough, you will encounter duplicate webhook events. Not occasionally. Regularly. Both platforms document this behavior. Both recommend that receivers handle it gracefully. Most integrations do not.

The consequence of processing a duplicate event depends on what the event triggers. A duplicate contact update might overwrite a field with the same value, which is harmless. A duplicate deal close event that triggers attribution computation will double-count revenue. A duplicate order sync will create duplicate records. The damage scales with the importance of the event.

When I built the CRM integration layer for SampleHQ’s WordPress-based platform, webhook deduplication was one of the first problems I had to solve. Here is the pattern I landed on and why it works.

Why Duplicates Happen

CRM platforms send webhooks over HTTP. If your endpoint does not respond quickly enough, or if the response gets lost in transit, the platform assumes delivery failed and retries. That retry is a duplicate. Your system has already processed the event, but the platform does not know that.

HubSpot retries failed webhook deliveries multiple times with increasing delays. Salesforce does the same. Both platforms also occasionally send genuine duplicates due to internal event propagation, where a single user action triggers multiple events that look identical from the receiver’s perspective.

You cannot prevent duplicates from being sent. You can only prevent them from being processed twice.

The Idempotency Table

The solution is an idempotency layer. Every incoming webhook gets a unique hash computed from two things: the scope of the event and a unique identifier within that scope.

The scope distinguishes between different types of events. A HubSpot deal update has a different scope than a HubSpot contact update, which has a different scope than a Salesforce opportunity change, which has a different scope than a Shippo tracking webhook. Without scoping, you risk hash collisions between unrelated events.

The unique identifier is whatever makes the event distinct within its scope. For HubSpot, it might be the event ID or a combination of object ID and timestamp. For Salesforce, it might be the replay ID. For Shippo, it is the tracking number plus the status.

The hash is computed as MD5 of the scope concatenated with the identifier. Before processing any event, the system checks whether that hash exists in the idempotency table. If it does, the event is acknowledged (HTTP 200 so the CRM does not retry) but not processed. If it does not, the event is processed and the hash is stored.

TTL and Cleanup

Idempotency keys do not need to live forever. Duplicate events typically arrive within minutes of the original. A seven-day TTL provides generous coverage while keeping the table small. A daily cleanup job removes expired entries.

The TTL matters because without it, the idempotency table grows indefinitely. In a multi-tenant system where every tenant receives CRM webhooks, the table can grow to millions of rows within months. The cleanup job keeps it manageable.

Scoping Matters More Than You Think

Getting the scope right is the difference between a deduplication system that works and one that causes subtle bugs.

Consider this scenario: a HubSpot deal is updated twice in quick succession, once to change the stage and once to change the amount. These are different events with different payloads, but if your scope is too broad (just “hubspot”) or your identifier is too narrow (just the deal ID), the second event will be incorrectly deduplicated. The system will see the deal ID hash, assume it is a duplicate, and skip the amount update.

The fix is to make the scope specific enough that genuinely different events always produce different hashes, while genuinely duplicate events always produce the same hash. In practice, this means: scope = provider + object type + action, identifier = object ID + relevant payload fields or event timestamp.

Multi-Tenant Considerations

In a multi-tenant system, webhooks from different tenants arrive at the same endpoint. The idempotency table needs to be tenant-aware. A deal update for Tenant A and an identical deal update for Tenant B are different events, even if they have the same deal ID and payload.

The simplest approach is to include the tenant identifier in the scope. So the hash becomes MD5 of tenant_id + provider + object_type + action + event_identifier. This ensures complete isolation between tenants while using a single idempotency table.

What Happens When Dedup Fails

If the idempotency check itself fails, say the database query times out, the system has to make a choice: process the event (risking duplication) or skip it (risking data loss). There is no safe default.

I chose to process on failure. The reasoning is that most events are not duplicates, so skipping an event is more likely to cause data loss than processing it is to cause duplication. And most duplicate processing is recoverable (attribution can be recomputed, notifications can be ignored), while missed events often are not.

The fallback is a nightly reconciliation job that checks for obvious duplication artifacts and corrects them. This is the safety net that makes the “process on failure” policy safe in practice.

Implementation Checklist

If you are building webhook integrations with any CRM, here is what you need:

An idempotency table with columns for scope, key_hash, and created_at. A hash function that combines scope and event identifier. A check-before-process pattern on every webhook handler. A TTL-based cleanup job. Scoping that is specific enough to distinguish genuinely different events. And a failure mode policy that you have thought through rather than left to chance.

This is not complex engineering. It is about fifteen lines of code in the webhook handler and a database table. But the difference between having it and not having it is the difference between an integration that works reliably at scale and one that corrupts data under load.

This pattern is running in production in SampleHQ, processing webhooks from HubSpot, Salesforce, and Shippo across hundreds of tenant workspaces. It has prevented thousands of duplicate event processings since launch.

How We Handle Webhook Deduplication (And Why You Should Too)