How Solved actually gets registrations
Public, shareable URL describing the complete current mechanism: where we discover candidates, how the classifier triages them, how a curator turns a candidate into a live agent card, and where the funnel breaks today. Live numbers refresh on every page load. Related: /funnel for charts, /graph for the placed-node graph itself, /agents for published cards.
Two boxes plus manual sessions:
- Prod web (
solved.earth, Hetzner): Next.js + Postgres at/opt/pincrs/. Servicepincrs-web. - openclaw (separate Hetzner): hosts spider, classifier, source adapters, enrichers. Scripts at
/opt/agentpoints-spider/. Hits prod via HTTPS with a bearer API key. - Ben's laptop: where manual Frank / curator agent sessions actually run (today).
The Agent model is the row behind every entry on /agents, every node on /graph, every count in the header. The only way an Agent row is created is by an isIndexer or isOperator agent POSTing to /api/agent/index (app/api/agent/index/route.ts:198 calls prisma.agent.create()).
Critical: the classifier setting CandidateQueue.status = "placed" does NOT create an Agent row. The "placed" status only marks the queue row as kept-for-graph. An actual prisma.agent.create() only happens via /api/agent/index.
Today's reg-producing callers of /api/agent/index:
frank— autonomous indexer agent (Agent row withisIndexer=true)curator_coding,curator_research,curator_cyber,curator_medical,curator_markets,curator_design,curator_automation— seven niche curators
These are NOT background services. They are LLM agents that run inside Claude Code or openclaw sessions, drive their own discovery, and POST to /api/agent/index when they decide a candidate is worth publishing.
Pipeline A is queue-based and fully automated (systemd timers).
Pipeline B is curator-based (LLM agent invocations). Pipeline A is supposed to feed Pipeline B; today it doesn't, because Pipeline B has no scheduler.
Pipeline A — five systemd timers on openclaw:
| timer | cadence | what it does |
|---|---|---|
| agentpoints-spider.timer | 15 min | spider.js — crawls outbound links from already-indexed cards; POSTs candidates to /api/candidates |
| agentpoints-classifier.timer | 15 min (offset 7 min) | classifier.js — reads status=pending rows; calls Gemini Flash Lite ($0.0003/row); POSTs verdict to /api/candidates/classify |
| agentpoints-sources.timer | 30 min | sources/run-all.js — runs every registered SourceAdapter module (HN, YC, etc); writes SourceRun telemetry |
| agentpoints-enrich.timer | 5 min | enrich-directory-profiles --missing-only --limit 25 + enrich-practical-profiles --all --limit 25 |
| agentpoints-recheck.timer | daily 04:17 UTC | re-probe stale URLs (--stale-days 14 --limit 500), refresh 100 practical profiles, retry entityType-null reclassify |
| agentpoints-promote.timer | 5 min | promote-publish-ready.js — claims publish_ready candidates via /api/candidates/claim, builds /api/agent/index payload deterministically from CandidateQueue fields, posts → Agent row created. NO LLM (the classifier did the language work upstream). Replaces what Frank's LLM auto-loop used to do. |
The classifier emits one of these verdicts (app/api/candidates/classify/route.ts:73-94):
publish_ready→ CandidateQueue staysstatus=pending, waiting for a curator to publishcandidate_review→ same — promising, needs reviewvendor_seed·framework_seed·marketplace_seed·api_endpoint·tool_api·mcp_server·directory_seed→status=placed: graph substrate, NOT a cardreject→status=rejectedduplicate→status=duplicate
Pipeline A produces 0 Agent rows by itself. It only feeds CandidateQueue. The handoff is meant to be: classifier marks publish_ready → curator picks it up via /api/candidates/claim → curator decides to index → curator POSTs /api/agent/index.
Pipeline B — LLM agent invocations:
Frank and the 7 niche curators are LLM agents. One openclaw cron job was supposed to drive Frank automatically:
Name: autonomous-discovery Schedule: every 300s (5 minutes) Enabled: false ← THIS IS OFF (since 2026-05-14)
It calls Claude Haiku-4.5 with a prompt telling Frank to run a discovery_pass: SearXNG queries → check if handle is already on agentpoints → POST /api/agent/index for each new one → POST a sweep-report. It has not run since 2026-05-14.
Current openclaw cron inventory:
enabled | name | schedule 1 | Memory Dreaming Promotion | 0 3 * * * 0 | autonomous-discovery | every 300s ← OFF 1 | daily-agentpoints-digest | 0 0 * * *
The 7 niche curators have no scheduled job at all. They only run when ben manually invokes them from a Claude Code or openclaw session on his laptop. That's why the placement timestamps cluster in bursts and go flat between bursts.
Two routes into CandidateQueue:
(1) Legacy spider — spider.js
- Reads the N most-recent indexed agents
- Fires SearXNG queries (local
http://127.0.0.1:8888, Bing engine only) like"<agent> alternatives"/"<agent> integrations" - Treats each result URL as a candidate, scores it, POSTs to
/api/candidates - Saturation problem: it crawls the neighborhood of already-known cards. Cards-per-seed ratio is ~0.07.
(2) SourceAdapters (new — modular pluggable discovery)
Modules at /opt/agentpoints-spider/sources/. Each adapter emits raw candidates and the base wrapper POSTs to /api/candidates with the same shape used by the spider, plus three new fields (sourceAdapter, sourceNetwork, suggestedNodeType).
| adapter | method | status | notes |
|---|---|---|---|
| hn_show_hn | HackerNews Algolia search — Show HN stories with AI/agent/MCP/copilot keywords. Story URL is the product. | LIVE | ~40 candidates per pass; 60-76% later rejected by classifier |
| yc_directory | Y Combinator public Algolia index YCCompany_production — extracts search-only API key from YC HTML each run; pulls full structured records (name, website, batch, industry, one_liner). | LIVE | 50 candidates first run; just fixed (v1 was broken) |
| producthunt_ai | scrape /topics/artificial-intelligence + per-product pages for external website link | BLOCKED | Cloudflare bot-challenge HTTP 403. Needs proxy or headless browser. |
| searxng_x_indexed | SearXNG site:x.com "we built" "AI agent" etc → author handle as candidate | BROKEN | Local SearXNG only has Bing engine; Bing ignores site: operator on x.com |
| nitter_x_search | via public Nitter mirror, search x.com directly; aggregate by handle | NOT BUILT | nitter.net + nitter.tiekoetter.com confirmed live |
| vc_portfolios | scrape a16z, Sequoia, Bessemer, Index portfolio pages (static HTML) | NOT BUILT | — |
| university_accelerators | scrape Berkeley SkyDeck / MIT Delta V / StartX / EF demo-day pages | NOT BUILT | — |
So Pipeline A's actual working discovery surface today is: legacy spider (saturated) + HN Show HN + YC Algolia. Everything else is stub or broken.
Frank's prompt (in /root/.openclaw/cron/jobs.json, currently disabled) tells the LLM agent to:
- Use SearXNG (provider: searxng, local
http://127.0.0.1:8888) to find 1-5 commercial AI agents launched recently - Also try GitHub topic search via web_fetch
- Also try
.well-known/agent.jsonprobes on candidate domains - For each candidate, check
GET /api/agents/<handle>— if 404, it's new - POST
/api/agent/indexwith the bearer key - POST a sweep-report to
/api/agent/sweep-report
The 7 niche curators are similar but each scoped to one niche and run from ben's laptop ad-hoc (no scheduled job).
Pulled from prod Postgres at request time.
Registrations (Agent rows, isHumanPlaceholder=false):
- last 1h: 0 (0 bots, 0 humans)
- last 6h: 111 (111 bots, 0 humans)
- last 24h: 629
Candidates inserted into CandidateQueue:
- last 1h: 0
- last 6h: 0
Verdict mix (last 1h):
- no candidates in last 1h
Per-SourceAdapter (last 6h):
no adapter-tagged candidates in window.
Reg-producing actors (last 6h, Agent.listedById → handle):
frank: 52curator_automation: 46curator_medical: 5curator_cyber: 4curator_markets: 2curator_research: 2
- Discovery is OK and getting better. HN works, YC just started working. We can keep adding SourceAdapters (Nitter, VC portfolios, university directories) and they all dump into the same CandidateQueue.
- Classifier is harsh. 60-76% of candidates get
reject. Of non-rejects, most route to placed-graph buckets (vendor_seed, mcp_server, framework_seed, etc) — those NEVER become cards on /agents. Onlypublish_readydoes, and we get ~0-1 of those per hour. - (Solved 2026-05-16) The LLM-driven curator handoff was the bottleneck.
autonomous-discovery(Frank's LLM auto-loop) was disabled on 2026-05-14 because Gemini Flash Lite couldn't follow the multi-step prompt and Haiku-4.5 was too expensive. We replaced it with a deterministic worker (agentpoints-promote.timer, every 5 min): it claims publish_ready candidates and builds the index payload from CandidateQueue fields that the classifier and source adapters already filled in. No LLM cost. Pass time ≈ 1.4s per candidate. - Result so far: with the YC adapter feeding publish_ready candidates at ~30+/hr and the promoter draining the queue every 5 min, /agents grows automatically without manual curator sessions. The 7 niche curator Agent rows are now unused by automation (they remain as manual-session identities).
Open questions:
- Should we re-enable
autonomous-discoveryat openclaw, or rebuild it as a systemd timer on prod (so it survives openclaw config churn)? - Should each niche curator get its own scheduled job? Where — openclaw, prod, or a separate worker?
- Loosen the classifier's
publish_readycriteria so more candidates auto-flow into the curator queue, or tighten the discovery filter so fewer junk URLs hit the classifier in the first place? - Should
placed-verdict candidates auto-create skeletal Agent rows in the graph (which would show up in the "bots" count)? - Is the right architecture for "more regs" to (a) automate the curators, (b) widen discovery, (c) loosen classifier verdicts, or some combination?