Concepts

Last updated March 21, 2026

This page explains the core ideas behind SynthLink — how data moves through the system, what each layer does, and why the API is structured the way it is.

Pipeline

SynthLink processes public data through a four-stage pipeline before it reaches your application. Each stage has a distinct responsibility.

The Crawler fetches raw content from public sources on a fixed schedule and normalizes each item into a structured record. The Database deduplicates records by URL and tracks timestamps. The LLM Enrichment stage processes each new document to generate summaries, keywords, tags, and categories. The REST API exposes the enriched records as read-only JSON.

Processing time

After a document is collected, enrichment typically completes within a few minutes. During periods of high volume, it may take longer. You can check the presence of insight in /api/v1/combined to see whether enrichment has finished for a document.

Failure scenarios

If the crawler fails to reach a source, no new documents are written for that cycle. Existing documents remain unchanged. If enrichment fails after the maximum number of retries, the insight record is marked failed and the document is still accessible via /api/v1/documents — only the insight is missing.

Note:The /status page shows the latest crawler run results and enrichment integrity checks so you can verify the pipeline is healthy.

Data freshness

SynthLink is not a real-time API. Data is collected on a schedule that varies by source. There is always some lag between when a document is published and when it appears in the API.

Update intervals by source

SourceIntervalcontent_source

openai_newsevery 12hdetail

nasa_newsevery 24hdetail

github_trendingevery 6hapi

arxivevery 12hrss

hnevery 3hapi

nvdconfigured externallyapi

cisa_cyber_advisoryevery 6hdetail

created_at

created_at is set when the document is first ingested and never changes. It tells you when the document entered SynthLink, not when the source published it.

Filtering by freshness

If you need only recent content, filter on the client side after fetching. The API always returns documents in reverse chronological order by default.

Example — filter documents from the last 24 hours

const res = await fetch(
  "https://synth-link.com/api/v1/documents?limit=50",
  { headers: { "X-SYNTHLINK-KEY": process.env.SYNTHLINK_KEY } }
);
const docs = await res.json();

const oneDayAgo = Date.now() - 24 * 60 * 60 * 1000;
const recent = docs.filter(
  (doc) => new Date(doc.created_at).getTime() > oneDayAgo
);

Enrichment

Every document is sent to a language model which produces four outputs — a plain-language summary, a list of keywords, semantic tags, and a category label.

Example output

Insight fields

{
  "llm_summary": "OpenAI releases GPT-4o, a new multimodal model capable of
    reasoning across text, audio, and images with improved latency.",
  "keywords": ["gpt-4o", "multimodal", "openai", "model release"],
  "tags": ["AI", "language model", "product launch"],
  "category": "AI Research",
  "source": "openai_news",
  "created_at": "2026-03-19T06:01:00Z"
}

Status lifecycle

Enrichment is asynchronous. A new document starts with status: pending, transitions to completed when the model finishes, or failedafter the maximum retry count is exceeded.

Status is tracked internally and is not returned in the public insight response. The /api/v1/insights endpoint returns only completed insight records.

StatusMeaning

pendingQueued or currently being processed

completedEnrichment finished successfully

failedMax retries exceeded — insight unavailable

Common failure causes

Most failures are caused by documents with very little extractable text — empty pages, paywalled content, or documents in unsupported languages. Transient model errors are retried automatically and rarely result in a permanent failure.

Document and insight

SynthLink separates raw content from enriched content into two concepts. A document is the original collected item. An insight is the LLM output attached to it, linked internally in the enrichment pipeline.

Side by side

/api/v1/documents

{
  "title": "GPT-4o System Card",
  "url": "https://openai.com/...",
  "summary": "OpenAI releases...",
  "source": "openai_news",
  "content_source": "rss",
  "created_at": "2026-03-19T06:00:00Z"
}

/api/v1/insights

{
  "llm_summary": "OpenAI releases GPT-4o...",
  "keywords": ["gpt-4o", "multimodal"],
  "tags": ["AI", "product launch"],
  "category": "AI Research",
  "source": "openai_news",
  "created_at": "2026-03-19T06:01:00Z"
}

When to use /combined

Use /api/v1/combined when you need both the document and its insight in a single request — for example, when rendering a feed that shows the title, source, and LLM summary together.

Use the separate endpoints when you only need one side, when you want different filters on each, or when you're paginating large result sets and want finer control over each query.

Source identifiers

Every document has a source field that identifies where it was collected from. Use this value with the source query parameter to filter results to a specific source.

sourceDescription

openai_newsOfficial posts, release notes, and model updates from OpenAI.

nasa_newsMission reports and planetary science discoveries from NASA.

github_trendingDaily trending repositories across all languages.

arxivPreprints in CS, ML, and physics - enriched for scanning.

hnTop stories filtered by score and relevance.

nvdRecent CVEs with severity, KEV signal, and reference metadata.

cisa_cyber_advisoryCISA cybersecurity advisories with mitigation guidance.

Note:New sources are added over time. Check the Sources reference for the full up-to-date list.

Pagination

The data list endpoints (/documents, /insights, /combined) accept a limit parameter that controls the maximum number of records returned per request. The default is 10 and the maximum is 100.

SynthLink does not currently support cursor-based or offset-based pagination. To retrieve large datasets, use a smaller limit and filter by created_at to walk through records in batches.

Example — paginate by timestamp

let before = new Date().toISOString();
const all = [];

while (true) {
  const res = await fetch(
    `https://synth-link.com/api/v1/documents?limit=100`,
    { headers: { "X-SYNTHLINK-KEY": process.env.SYNTHLINK_KEY } }
  );
  const batch = await res.json();
  if (batch.length === 0) break;

  all.push(...batch);

  // 마지막 항목의 created_at 기준으로 다음 배치 필터링
  before = batch[batch.length - 1].created_at;
  if (batch.length < 100) break;
}

Note:Cursor-based pagination is planned for a future API version. Check the Changelog for updates.

Next steps

Was this helpful?

Getting Started

GET /documents