Handling Sync Failures

Data sync operations interact with external APIs that can fail for many reasons: expired credentials, rate limits, network timeouts, schema changes, or provider outages. Otesse's sync engine is designed to handle these failures gracefully with automatic retries, circuit breakers, and dead letter queues. This page explains how failures are detected, handled, and resolved.

Types of Failures

Execution-Level Failures

These affect the entire sync run, not just individual records:

FailureCauseAutomatic Response
Credentials expiredOAuth token expired during executionAttempt automatic refresh. If refresh succeeds, resume. If not, mark execution as "failed" and connection as "expired"
Provider API down5xx response from providerRetry the request up to 3 times with exponential backoff (1s, 5s, 15s). If all retries fail, mark execution as "failed"
Rate limited429 response from providerRespect the Retry-After header. Pause execution for the specified duration. Resume after cooldown
TimeoutExecution exceeds configured timeoutMark as "timed_out." Record progress counts as-is. Note which batch was in progress
Schema mismatchProvider API returns unexpected field structureLog the mismatch. Skip affected records. Continue processing others

Record-Level Failures

These affect individual records while allowing the rest of the batch to continue:

FailureCauseResponse
Validation errorA required field is missing or has an invalid valueLog the error, increment recordsFailed, continue to next record
Duplicate conflictThe record already exists at the targetSkip if duplicate detection is enabled, create if not
Reference not foundA foreign key references an entity that does not exist at the targetLog as failed, continue processing
Transform errorA field mapping transformation failed (e.g., invalid date format)Use default value if configured, otherwise mark as failed

Automatic Retry

When a sync execution fails or times out, the system can automatically retry if retryOnFailure is enabled on the sync job:

RetryDelayBehavior
1st5 minutesCreates a new SyncExecution with trigger type "retry"
2nd30 minutesSame
3rd2 hoursSame
Beyond max retriesN/ANo more automatic retries. Alert created if configured. Admin notified

Each retry creates a fresh SyncExecution record, preserving the history of every attempt. The retry uses incremental mode if the original job was incremental, processing only records that changed since the last successful run.

Circuit Breaker

The circuit breaker prevents the sync engine from generating excessive errors when something is fundamentally wrong (like an invalid API key or a breaking schema change):

Trigger condition: More than 50% of processed records fail AND at least 10 records have been processed.

When the circuit breaker activates:

  1. The current batch is halted
  2. Remaining records are not processed
  3. The execution status is set to "failed" with a circuit breaker note
  4. The progress counts reflect what was processed before halting
  5. The administrator is notified

The circuit breaker resets on the next execution attempt. This means a single bad batch does not permanently disable the sync job — but it prevents a 10,000-record sync from generating 10,000 error entries.

Dead Letter Queue

Records that fail repeatedly across multiple execution attempts are moved to a dead letter queue:

How Records Enter the Dead Letter Queue

  1. The engine tracks failures per record using ExternalReference and error details
  2. If a specific record fails 3 times across separate executions, it is flagged as "dead letter"
  3. Dead letter records are excluded from future sync runs to prevent infinite retry loops

Managing Dead Letter Records

Dead letter records appear in a "Failed Records" tab on the sync job detail view:

ColumnDescription
Record IDThe local or external record identifier
Entity TypeCustomer, invoice, product, etc.
First FailedWhen the record first failed
Last FailedWhen the record most recently failed
AttemptsNumber of failed attempts
ErrorMost recent error message
ActionsRetry (individual) or Dismiss

Retry: Re-processes the single record immediately, removing it from the dead letter queue if successful.

Dismiss: Marks the record as intentionally skipped. It will not be retried automatically and is removed from the dead letter view. A note is recorded in the audit log.

Error Investigation

When a sync execution shows failures, administrators can investigate using the execution detail view:

Error Log

The error log shows every failed record with:

Record: customer-uuid-1234
Entity: Customer
Error: ValidationError - Required field 'email' is null
Timestamp: 2026-03-01 14:23:45

Record Breakdown

A visual breakdown shows the proportions:

  • Green: Created records
  • Blue: Updated records
  • Grey: Skipped records
  • Red: Failed records

Common Error Patterns

PatternLikely CauseResolution
All records fail with 401Expired or invalid credentialsRe-authenticate the connection
All records fail with 429Rate limit exceededReduce sync frequency or batch size
Specific records fail with validation errorsData quality issuesFix the source data or update field mappings
Intermittent 5xx errorsProvider instabilityWait and retry; the automatic retry policy handles this
All records fail with schema errorProvider API changedUpdate field mappings to match the new schema

Preventing Failures

  1. Start with incremental mode — Full syncs process more records and are more likely to hit rate limits or timeouts
  2. Set reasonable timeouts — Give long-running syncs enough time to complete without setting the timeout so high that failures go undetected
  3. Monitor execution history — Check the sync job list regularly for executions with "completedwitherrors" status
  4. Keep credentials fresh — For OAuth providers, monitor connection health and re-authenticate before tokens expire
  5. Use field mapping defaults — Set default values for optional fields to prevent null validation errors