Skip to content

Discovery Context Ingester

discovery-context-ingester produces a compact discovery fact sheet in chat from whatever raw material you throw at it: pasted notes, call transcripts, local files, and explicit private-source requests. It is the first skill to reach for when source material is messy, duplicated, or spread across disconnected systems. Every downstream Solutioneer skill consumes the fact sheet it produces.

A conversational normalization workflow. You hand it an account seed and a pile of raw source material, and it returns:

  • A short summary.
  • A discovery fact sheet.
  • A source index that identifies where each claim came from.
  • Citations when grounding was possible.
  • Explicit assumptions and unknown markers when it was not.

Under the hood it prefers supplied artifacts and local files first, uses repo-local Firecrawl helpers to ground any web URLs you pass in, and only touches Composio when the user explicitly asks to ingest from a private source (Slack, Gmail, Google Drive, HubSpot, or Granola).

Use this skill when the user asks for:

  • Note cleanup or transcript normalization.
  • A discovery fact sheet.
  • Ingestion of Slack, Gmail, Google Drive, HubSpot, or Granola context.
  • Prep work before discovery questions, architecture mapping, or security review.
  • “Can you make sense of all these threads?”

Codex asks one question at a time, in this order:

  1. accountSeed — company domain, company name, or product name.
  2. notes.sources — one or more of: pasted text, local file paths, URLs, or private-source names like hubspot, slack, gmail, googledrive, granola_mcp.
  3. notes.callGoal — the immediate reason you need the fact sheet (for example, “Confirm pilot scope and risks”).

If you already supplied files or artifacts, the skill uses those before asking for more. If accountSeed is already in $discovery-context-ingester ..., the first question is skipped.

The artifact is returned in chat with these sections:

  1. Summary
  2. Discovery fact sheet
  3. Source index
  4. Citations when available
  5. Assumptions / unknowns

Run this first when source material is messy. It is the sanctioned way to turn a pile of threads, docs, and notes into a single normalized artifact. Downstream skills (discovery-question-generator, architecture-fit-mapper, security-review-prep, and poc-handoff-orchestrator) all prefer a fact sheet over raw notes, so running this first compounds.

Paste what you have. Do not curate first. The skill is designed for messy input. Duplicates, boilerplate, and chatter are dropped automatically. Keep only opportunity-relevant statements — everything else falls out in normalization.

Call private sources by name. To request ingestion from a managed-auth system, name it in notes.sources: hubspot, slack, gmail, googledrive. Codex will present a Connect now / Skip choice if the toolkit is not connected. See Discovery Ingestion Sources for the per-source setup.

Treat HubSpot as the only auto-CRM in v1. The skill will auto-ingest HubSpot records when that toolkit is connected. For Salesforce or other CRMs, ask for a CSV export, pasted records, or a local file — the skill will normalize those the same way.

Use Granola only when auth is already preconfigured. granola_mcp requires existing MCP auth. If it is not ready, the skill will offer a manual transcript fallback rather than attempting to auto-configure Granola.

Reuse the fact sheet, do not regenerate it. Once you have a discovery-fact-sheet.json in chat, keep it as an artifactRef for the rest of the deal. Every downstream skill will pick it up.

Save is opt-in. The fact sheet lands in chat. Saving to disk requires a confirmation and a path.

The fact sheet is the canonical input to the rest of the pre-sales chain.

  • Into Discovery Question Generator. The question generator is explicit that it consumes a fact sheet first. Run the ingester, then pass the fact sheet as an artifactRef.
  • Into Architecture Fit Mapper. The mapper prefers a fact sheet over raw notes; constraints from the sheet flow directly into the architecture constraints.
  • Into Integration Fit Gap Analyzer. Stack mentions and vendor names from the fact sheet are a reliable seed for notes.targetSystems.
  • Into Security Review Prep. Security prep reuses the fact sheet to ground the control matrix and extract customer questions into follow-ups.
  • Into POC Handoff Orchestrator. The handoff package references the fact sheet for account context instead of re-researching the company.

Explicit:

$discovery-context-ingester "Acme Health"

Natural language:

Normalize these discovery notes into a fact sheet for Acme Health. The next call needs to confirm pilot scope and risks.

With a private source:

Ingest the #acme-health Slack channel and the HubSpot deal for Acme Health. Turn it into a fact sheet.

Example normalized input (for reference — you never type this directly):

{
"account": { "name": "Acme Health" },
"objective": "Normalize discovery context before the next call",
"notes": {
"sources": [
"Customer wants a pilot tied to faster onboarding.",
"Security review depends on proving Slack and HubSpot data flow.",
"Slack"
],
"callGoal": "Confirm pilot scope and risks"
},
"artifactRefs": [],
"destinations": {},
"providerMode": "local-only"
}
  • With FIRECRAWL_API_KEY — Firecrawl grounds any web URLs in notes.sources via tools/firecrawl/search.ts, tools/firecrawl/scrape.ts, and related wrappers. Grounded claims carry citations.
  • Without FIRECRAWL_API_KEY — ingestion stays local-only. Unresolved external claims are labelled as unknown or as assumptions rather than invented.
  • With COMPOSIO_API_KEY — Managed-auth ingestion is enabled for hubspot, slack, gmail, and googledrive. Each source uses the Connect now / Skip flow the first time you reach for it.
  • Without COMPOSIO_API_KEY — private-source ingestion stays pending. The skill still works against pasted text, files, and artifact refs.
  • Granolagranola_mcp needs preconfigured MCP auth. Without it the skill uses manual transcript fallback.