Handling Sync Failures

Data sync operations interact with external APIs that can fail for many reasons: expired credentials, rate limits, network timeouts, schema changes, or provider outages. Otesse's sync engine is designed to handle these failures gracefully with automatic retries, circuit breakers, and dead letter queues. This page explains how failures are detected, handled, and resolved.

Types of Failures

Execution-Level Failures

These affect the entire sync run, not just individual records:

Failure	Cause	Automatic Response
Credentials expired	OAuth token expired during execution	Attempt automatic refresh. If refresh succeeds, resume. If not, mark execution as "failed" and connection as "expired"
Provider API down	5xx response from provider	Retry the request up to 3 times with exponential backoff (1s, 5s, 15s). If all retries fail, mark execution as "failed"
Rate limited	429 response from provider	Respect the `Retry-After` header. Pause execution for the specified duration. Resume after cooldown
Timeout	Execution exceeds configured timeout	Mark as "timed_out." Record progress counts as-is. Note which batch was in progress
Schema mismatch	Provider API returns unexpected field structure	Log the mismatch. Skip affected records. Continue processing others

Record-Level Failures

These affect individual records while allowing the rest of the batch to continue:

Failure	Cause	Response
Validation error	A required field is missing or has an invalid value	Log the error, increment `recordsFailed`, continue to next record
Duplicate conflict	The record already exists at the target	Skip if duplicate detection is enabled, create if not
Reference not found	A foreign key references an entity that does not exist at the target	Log as failed, continue processing
Transform error	A field mapping transformation failed (e.g., invalid date format)	Use default value if configured, otherwise mark as failed

Automatic Retry

When a sync execution fails or times out, the system can automatically retry if retryOnFailure is enabled on the sync job:

Retry	Delay	Behavior
1st	5 minutes	Creates a new SyncExecution with trigger type "retry"
2nd	30 minutes	Same
3rd	2 hours	Same
Beyond max retries	N/A	No more automatic retries. Alert created if configured. Admin notified

Each retry creates a fresh SyncExecution record, preserving the history of every attempt. The retry uses incremental mode if the original job was incremental, processing only records that changed since the last successful run.

Circuit Breaker

The circuit breaker prevents the sync engine from generating excessive errors when something is fundamentally wrong (like an invalid API key or a breaking schema change):

Trigger condition: More than 50% of processed records fail AND at least 10 records have been processed.

When the circuit breaker activates:

The current batch is halted
Remaining records are not processed
The execution status is set to "failed" with a circuit breaker note
The progress counts reflect what was processed before halting
The administrator is notified

The circuit breaker resets on the next execution attempt. This means a single bad batch does not permanently disable the sync job — but it prevents a 10,000-record sync from generating 10,000 error entries.

Dead Letter Queue

Records that fail repeatedly across multiple execution attempts are moved to a dead letter queue:

How Records Enter the Dead Letter Queue

The engine tracks failures per record using ExternalReference and error details
If a specific record fails 3 times across separate executions, it is flagged as "dead letter"
Dead letter records are excluded from future sync runs to prevent infinite retry loops

Managing Dead Letter Records

Dead letter records appear in a "Failed Records" tab on the sync job detail view:

Column	Description
Record ID	The local or external record identifier
Entity Type	Customer, invoice, product, etc.
First Failed	When the record first failed
Last Failed	When the record most recently failed
Attempts	Number of failed attempts
Error	Most recent error message
Actions	Retry (individual) or Dismiss

Retry: Re-processes the single record immediately, removing it from the dead letter queue if successful.

Dismiss: Marks the record as intentionally skipped. It will not be retried automatically and is removed from the dead letter view. A note is recorded in the audit log.

Error Investigation

When a sync execution shows failures, administrators can investigate using the execution detail view:

Error Log

The error log shows every failed record with:

Record: customer-uuid-1234
Entity: Customer
Error: ValidationError - Required field 'email' is null
Timestamp: 2026-03-01 14:23:45

Record Breakdown

A visual breakdown shows the proportions:

Green: Created records
Blue: Updated records
Grey: Skipped records
Red: Failed records

Common Error Patterns

Pattern	Likely Cause	Resolution
All records fail with 401	Expired or invalid credentials	Re-authenticate the connection
All records fail with 429	Rate limit exceeded	Reduce sync frequency or batch size
Specific records fail with validation errors	Data quality issues	Fix the source data or update field mappings
Intermittent 5xx errors	Provider instability	Wait and retry; the automatic retry policy handles this
All records fail with schema error	Provider API changed	Update field mappings to match the new schema

Preventing Failures

Start with incremental mode — Full syncs process more records and are more likely to hit rate limits or timeouts
Set reasonable timeouts — Give long-running syncs enough time to complete without setting the timeout so high that failures go undetected
Monitor execution history — Check the sync job list regularly for executions with "completedwitherrors" status
Keep credentials fresh — For OAuth providers, monitor connection health and re-authenticate before tokens expire
Use field mapping defaults — Set default values for optional fields to prevent null validation errors