Status and Monitoring

Last updated March 21, 2026

This page explains how to read SynthLink's operational signals — not as a guide to the status page UI, but as a reference for interpreting what those signals mean and how they relate to the data you receive.

Overview

Status in SynthLink does not mean a single server's uptime. It is a combined view of several independent operational layers — each of which can be healthy or degraded independently of the others.

The full picture requires reading four types of signal together: whether each crawler has run successfully recently, whether the insight pipeline is processing documents without backlog, whether the public API endpoints are responding, and whether the stored data is structurally consistent. No single signal tells the whole story.

What the status page represents

The /status page is a summary view of pipeline health — not a per-document diagnostic tool and not a real-time debugger. It shows whether the pipeline as a whole is collecting, processing, and serving data in a way that is consistent with normal operation.

Three categories of signal are shown.

Worker runs

Whether each crawler and the insight worker have executed recently, how many records were processed, and whether the most recent run succeeded.

API health

Whether the public endpoints — /documents, /insights, /combined — are responding and returning valid data.

Data integrity

Whether the stored records are structurally consistent — checking for orphan insights, duplicate URLs, and completed insights with missing summaries.

Worker health

Each crawler and the insight worker runs independently. Their run results are recorded separately, so a problem with one worker does not automatically degrade the others.

Worker health reflects more than whether the last run succeeded. It accounts for how recently the worker ran, how many records were processed, and whether the run history shows a stable pattern or intermittent failures. A worker that succeeded once after several failures is not necessarily healthy.

When interpreting worker state, the run history is more informative than any single result. A worker that has been consistently processing records on schedule is in a different state from one that ran once recently but has a long gap before that. The status page surfaces this history so you can distinguish between a transient failure and a persistent operational issue.

Note:A low processed count does not always indicate a problem. If a source published few new items in a given cycle, the crawler may have run successfully but had little to collect.

Data integrity

A system can be operationally healthy — all workers running, all endpoints responding — while still having structural problems in its stored data. Data integrity checks exist to surface this category of issue separately from run health.

SynthLink tracks three integrity metrics.

MetricDescription
Orphan insightsInsight records whose linked document no longer exists. These cannot be joined to a document and represent a structural gap.
Duplicate URLsMultiple document records sharing the same URL. This may indicate a deduplication issue in a recent crawl cycle.
Null summariesInsights with status completed but an empty llm_summary field. These passed processing but produced no usable output.

These counts are not failure states in isolation — small numbers are expected as a side effect of normal operation. They become meaningful when they grow unexpectedly or persist across multiple integrity checks.

API health

API health confirms that the public endpoints are reachable and returning valid responses. From an external consumer's perspective, this is the most immediate operational signal — if the API is not responding, no data can be retrieved regardless of pipeline state.

However, API health is not a proxy for data completeness or freshness. An endpoint can return 200 OK while serving data that was last updated several crawl cycles ago, or while an enrichment backlog is building. API health confirms that the last layer of the pipeline is accessible — it does not confirm that the layers before it are current.

Status labels

The status page uses three labels to summarize each signal. These are quick-read indicators, not definitive judgments.

operational

Recent run history and key signals are within expected ranges. No notable issues are visible from the available operational data.

degraded

One or more signals fall outside expected ranges — such as a recent run failure, a long gap since last run, an API health issue, or elevated integrity counts. Further review is warranted.

no data

Insufficient recent run records to produce a meaningful assessment. This may appear for a newly added source or one that has not run in a long time.

Note:A degraded label does not mean the API is unavailable. Previously collected data remains accessible. The label indicates that some part of the pipeline needs attention, not that all data consumption is blocked.

Status criteria

The status labels map to specific rules. These are evaluated at snapshot time and may lag real-time conditions.

Worker health is degraded if the most recent run failed (ok=false) or if the last successful run is more than 48 hours old.

Worker health shows no data when there are no recent run records for that worker.

API health is degraded if any of /documents, /insights, or /combined returns a non-2xx response during the snapshot check.

Integrity is degraded when orphan_count > 0 or duplicate_count > 0.

Null summaries are tracked separately; null_summary_count > 10 is shown as a warning but does not mark the system degraded by itself.

Reading operational signals

Operational status and data availability are related but not equivalent. Understanding the difference helps avoid over- or under-interpreting the signals on the status page.

Crawler failed this cycle

Documents collected in previous cycles remain accessible. No new documents will appear from that source until the next successful run.

Insight worker backlogged

Documents are available via /documents but their insight records may be pending. /combined will show more items with insight set to null until the backlog clears.

API health degraded

External access to data is affected. Existing cached or locally stored data continues to be valid, but new requests may fail or time out.

Integrity counts elevated

Some records may be incomplete or structurally inconsistent. Normal queries continue to work, but specific records may be missing linked data.

Monitoring and expectations

SynthLink's monitoring objective is not simple uptime. The goal is to confirm that the full pipeline — collection, enrichment, and public access — is functioning in a way that produces reliable, current, structurally consistent data over time.

The /status page is a high-level operational view, not a per-request diagnostic interface. It reflects the state of the pipeline at the time of the most recent snapshot. For assessing whether specific recent documents have been collected or whether a particular source is current, the actual API response is the more direct signal.

When the status page and the API response appear to conflict — for example, when a source shows operational but recent documents are not appearing — the likely explanation is that the crawl cycle has not yet run since the content was published, or that the items did not pass quality filtering. Both are normal conditions, not errors.

View current pipeline statusGo to Status page
Was this helpful?