Limitations
Last updated March 21, 2026
This page describes the operational boundaries of SynthLink — not to diminish what the platform does, but to ensure you understand the constraints well enough to use it correctly. Knowing where the edges are is the prerequisite for getting the most out of what is inside them.
Not a real-time system
SynthLink does not reflect changes at the moment they happen. Data becomes available after it has passed through a crawl cycle, quality filtering, normalization, and — for insight fields — asynchronous enrichment. Each of these stages introduces latency between when something is published at the source and when it appears in the API.
This is not a failure mode — it is the expected behavior of a periodic collection system. The implication is that SynthLink is well-suited for workflows that need structured access to recent public information, and less suited for workflows that depend on detecting events the moment they occur.
Note:The fastest update interval across current sources is Hacker News at every 3 hours. For workflows requiring lower latency than that, a direct source integration is more appropriate.
Coverage is curated, not exhaustive
SynthLink does not attempt to collect everything from every source. Coverage is intentionally selective — sources are added based on usefulness and collectability, and within each source, only items that pass quality thresholds are stored.
Items may be excluded because they are too short, too sparse, behind a paywall, structurally unstable, or inaccessible at the time of collection. This means supporting a source does not guarantee that every document published by that source will appear in SynthLink. The platform is designed to provide a reusable, normalized data layer — not a complete archive.
Source quality and structure vary
Different sources provide different levels of structure and completeness. Some offer stable APIs with rich metadata. Others provide only short RSS summaries. Some require an additional detail page fetch to obtain usable body text, which introduces another point of potential failure.
This variation affects the data you receive. A document from one source may have a complete content field while a document from another source has only a short summary. Some sources produce consistently structured output; others are more variable. When querying across multiple sources, do not assume uniform completeness.
Publication time and ingestion time are different
created_at is the timestamp of first ingestion into SynthLink — not the original publication date at the source. A document published three days ago may have a created_at of today if it was only collected in the most recent crawl cycle.
For workflows where the original publication time matters, the source document URL is the authoritative reference. SynthLink timestamps describe the data's lifecycle within the platform, not the source's lifecycle.
Warning:Do not use created_at as a proxy for when the source document was published. Use it to understand when the document became available in SynthLink, which may be later than the source publication date.
Insights are interpretive, not authoritative
The insight layer — llm_summary, keywords, tags, category — is a generated interpretation of the source document, not a verified or authoritative representation of its content. These fields are designed to support discovery, filtering, and triage — not to serve as a final source of truth.
Generated summaries may omit nuance, mischaracterize edge cases, or reflect the model's interpretation rather than the author's intent. Keywords and categories are inferred, not assigned by the source. Any workflow that requires precise factual accuracy should trace back to the original document URL before drawing conclusions.
SynthLink is a system that helps you reach the original source faster — not one that replaces it.
Partial records are normal
Not every record arrives fully formed. A document may be visible in the API before its insight exists. A document may have a summary but no content. An insight may be in a pending or failed state when your application first reads it.
This is not an error state — it is a consequence of a multi-stage pipeline where each layer updates independently. Integrations that assume complete records will encounter gaps. Integrations designed to handle partial data gracefully will be more resilient.
contentMay be null if detail page fetch failed or source provided summary only.
llm_summaryOnly present when enrichment has completed. The /insights endpoint returns completed insight records only.
keywords / tags / categoryNot available until enrichment completes. Use /combined and check that insight is present before relying on these fields.
Status and availability are not the same
A healthy operational status does not guarantee that the most recent data is fully reflected. A degraded worker status does not mean all data is inaccessible. These two signals describe different things and should be interpreted independently.
When a crawler fails for one cycle, previously collected documents remain accessible. When the API is healthy, it serves whatever is currently in the database — which may not include the last few hours of a slow source. Reading the Status and Monitoring page alongside actual API responses gives a more complete picture than either alone.
External source constraints apply
SynthLink collects from public sources it does not control. If a source changes its feed format, restricts access, introduces rate limits, or becomes temporarily unavailable, those changes affect what SynthLink can collect.
This is not a platform failure — it is a structural property of any system that reads external public data. The platform is designed to recover from transient failures and adapt to structural changes, but it cannot fully insulate consumers from upstream instability.
Best fit
SynthLink is best suited for workflows that need structured, attributed access to recent public information — across multiple sources, in a consistent format, with a first-pass interpretation layer attached.
It is a poor fit for workflows that require real-time event detection, complete source coverage, exact publication timestamps, or authoritative interpretations. Knowing this boundary is not a reason to avoid the platform — it is the prerequisite for using it well.
Good fit
- Structured access to recent public documents
- Cross-source discovery and filtering
- First-pass triage and candidate selection
- Feed and dashboard experiences
- Input layer for agents and pipelines
Poor fit
- Real-time event detection
- Complete source archive
- Exact original publication timestamps
- Authoritative fact verification
- Private or authenticated sources