Keyword discovery is the first thing Ranket does after brand setup, and it runs again every Monday. The output is a pool of 15-25 keywords with low difficulty, real search volume, and high relevance to your product — the “unicorns” every SEO blog hopes to find.
What makes this different
Most AI SEO tools ask an LLM to brainstorm keyword variations and call it a day. That produces phrases with no real volume data and no way to know if a low-DR brand can actually rank for them.
Ranket’s discovery is grounded in DataForSEO search data for every candidate. Claude is only used for judgment (relevance scoring, editorial picks), not for invention. Every keyword surfaced has been confirmed to exist as a real search query with real volume and known difficulty.
The 4 seed sources
Discovery starts from four real-data sources, in priority order:
1. Brand profile target keywords (highest priority)
When the brand was set up, Claude extracted 15 seed keywords from your scraped marketing pages. These are the brand’s curated topical territory. Example for a virtual-staging SaaS:
"virtual staging software real estate"
"AI photo enhancement real estate"
"furniture removal photo editor"
"day to dusk real estate photography"
"virtual tour software alternative to Matterport"
...10 more
These get priority because they’re the most explicitly on-brand. Many will be high-KD themselves (KD 40-60), but their variations via keyword_suggestions are usually rankable.
2. Self-ranked keywords
We call DataForSEO’s keywords_for_site on your domain. Every term your site already ranks for (anywhere in Google’s top 100, even position 87) becomes a seed. These are gold because Google has already decided your site is topically relevant for them.
For a brand-new site with no rankings, this source returns nothing — that’s why we have sources 1 and 3 as fallbacks.
3. Scraped page H1s / titles
Every H1 and <title> tag from the marketing pages we scraped becomes a seed. We normalise them first:
"Master Real Estate Video Editing in 2026: AI Workflow Step-by-Step"
→ drop "Master" leadword
→ strip year marker
→ split on colon
→ "real estate video editing"
The normaliser strips year markers, interrogative leadwords (“How to”, “What is”), trailing punctuation, and subtitles after colons.
4. Google Search Console queries (when connected)
If you’ve authorised GSC, the top queries you get impressions for in the last 28 days flow in as seeds. These are the strongest signal because Google has already shown your site for them.
Phase B — Claude supplement (only when thin)
If after combining all 4 real sources the total seed count is under 25, we run a single Sonnet call that generates 15-20 additional on-brand seed phrases based on the brand profile.
This kicks in for brand-new sites with 1-3 pages and no GSC data. Once the brand publishes a few articles, the real-data sources fill out and Phase B stops triggering.
Variation expansion
For each seed (up to 100, prioritised by source), we call DataForSEO keyword_suggestions/live:
seed: "virtual staging software"
returns up to 30 variations like:
• "virtual staging software for realtors"
• "best virtual staging software"
• "free virtual staging software"
• "virtual staging software cost"
• "virtual staging software comparison"
...
each with searchVolume, keywordDifficulty, cpc already attached
Total raw candidate pool: typically 1,500-3,000 phrases.
Hard filters
We drop candidates that fail any of:
- KD > 30 — too competitive for a typical brand to rank
- Volume < 30 — not enough traffic to be worth an article
- Branded — contains the brand’s own domain root (e.g. “bright-shot login” is dropped)
- Garbage — contains URL-encoded characters or weird symbols
- Volume > 10,000 with KD < 2 — almost always a navigational mega-term where DataForSEO couldn’t compute proper KD
Result: typically 50-200 candidates remain.
Word-set dedup
DataForSEO often returns the same concept multiple times with words in different order or filler words added:
"360 virtual tour software"
"virtual tour software 360"
"software for 360 virtual tour"
"virtual tour 360 software"
"virtual 360 tour software"
↓ word-set dedup (sort words, strip stop words)
"360 software tour virtual"
→ keep the highest-volume variant
The dedup key sorts words alphabetically AND strips stop words (of, for, to, your, the, etc.) so cosmetic variants collapse:
"real estate aerial photos" → "aerial estate photos real"
"aerial photos of real estate" → "aerial estate photos real" ← match
"aerial photos for real estate" → "aerial estate photos real" ← match
Typical reduction: 50-200 → 30-150.
Claude relevance scoring
We send the surviving candidates to Claude Haiku in batches of 150, asking for a 0-1 relevance score against the brand profile. The prompt is calibrated to be honest — Haiku is told that a real estate SaaS shouldn’t rate “best laptop for travel” at 0.5 just because the audience uses laptops.
Candidates scoring below 0.6 get dropped. Typical floor pass: 30-150 → 12-30.
Cascade fallback
If after relevance scoring the pool is under 15 keywords, we cascade:
1. Take the top 10 keywords from current pool as new seeds
2. Run keyword_suggestions on them
3. Apply hard filters + dedup to new candidates
4. Score relevance for new ones
5. Add to pool
Repeat up to 2 rounds if still under 15.
Each cascade round opens a new keyword neighborhood. “Best virtual staging software” as a seed surfaces variations like “virtual staging software comparison” and “virtual staging for realtors near me” — terms the original seeds didn’t reach.
SERP rank-gap validation
For the top 50 candidates by interim score, we call DataForSEO SERP and inspect the top 10. If the SERP is dominated by mega-domains (Wikipedia, Amazon, Wikipedia, NYT), the rank gap is penalised. If the top 10 contains weaker domains that a low-DR brand could realistically outrank, the rank gap score is boosted.
Composite opportunity score
Every candidate gets a final score 0-100:
score = composeScore(volume, kd, relevance, rankGap)
where:
volC = min(4.7, log10(volume + 1)) // capped volume
kdC = log10(max(5, kd) + 2) // floored KD
raw = (volC / kdC) × relevance × rankGap
score = clamp(0, 100, round(raw / 3 × 100))
The volume cap (log10(50,000)) prevents generic mega-keywords from dominating. The KD floor (5) prevents kd=0/1 candidates from exploding the score. The result is a balanced metric that rewards real opportunities over fake unicorns.
Editorial judgment pass (Opus)
For the final polish, we send the top 30 candidates to Claude Opus 4.7 with the brand profile and ask: which 5-10 of these would a senior content strategist write articles for FIRST?
Opus considers:
- Intent alignment — does the searcher actually want what the brand sells?
- Brand-defensibility — will the article naturally showcase the brand’s product?
- Funnel position — problem-aware and solution-aware over fully top-of-funnel
- Editorial freshness — skip near-duplicates that survived word-set dedup
Picked keywords get a +20 opportunity score boost so they surface at the very top of the pool. Each pick gets a short editorial reason (“strong commercial intent, low competition, fits virtual-staging buyer journey”).
Worked example: BrightShot first refresh
For bright-shot.com (DR 23, 100 scraped pages), one refresh surfaced:
| Score | KD | Volume | Relevance | Keyword |
|---|---|---|---|---|
| 83 | 1 | 1,300 | 0.60 | decluttering house for sale |
| 79 | 5 | 720 | 0.75 | real estate hdr photography |
| 77 | 6 | 880 | 0.75 | real estate aerial photos |
| 66 | 10 | 170 | 0.95 | virtual real estate staging software |
| 57 | 0 | 30 | 0.90 | 360 degree virtual tour software |
| 52 | 19 | 140 | 0.85 | virtual tour platforms |
| 50 | 19 | 50 | 0.95 | best virtual staging software for real estate |
| 49 | 12 | 50 | 0.80 | videotour ai reviews |
| … | … |
18 keywords total. Distribution: 7 from competitor-ranked (self-ranked of own domain), 10 from variation, 1 from page-h1. All on-brand. All KD ≤ 30. Editorial pass marked 8 as top picks.
Cost and timing
Steady-state, established brand:
DataForSEO
keywords_for_site × 1 $0.013
keyword_suggestions × 100 + items $1.50
SERP × 50 (rank-gap) $0.10
bulk_search_volume × 1 $0.075
Claude
Haiku relevance × ~5 batches $0.04
Opus editorial × 1 $0.12
Embeddings (semantic dedup if enabled) $0.0001
─────────────────────────────────────────────────────────
Total per refresh ~$1.85
Duration: 30-60 seconds.
For thin brands (cascade triggers): +$0.50 per cascade round.
Configuration
Default thresholds, overridable per-brand:
maxKeywordDifficulty: 30minSearchVolume: 30relevanceFloor: 0.6maxSeeds: 100 (capped to control variation API spend)cascadeMinTarget: 15maxCascadeRounds: 2editorialPickCount: 8
Higher-DR brands (DR > 50) can lift maxKeywordDifficulty to 40-50 to chase more competitive terms — at that authority, KD 45 is plausibly rankable.
Limits
- Maximum 100 seeds expanded per refresh
- Maximum 30 variations per seed
- Maximum 50 SERP calls per refresh (rank-gap window)
- Maximum 2 cascade rounds (prevents runaway cost on thin pools)
Diagnostics
Every refresh logs counts at each stage:
{
"seedsFromPages": 128,
"seedsFromSelfRanked": 36,
"seedsFromProfile": 15,
"seedsFromGsc": 16,
"seedsFromClaudeSupplement": 0,
"totalSeedsAfterDedupe": 195,
"totalSeedsUsedForVariations": 100,
"rawCandidatePool": 1832,
"afterHardFilters": 287,
"afterWordSetDedup": 198,
"afterRelevance": 24,
"cascadeRoundsRan": 0,
"afterRankGap": 24,
"editorialPicks": 8,
"costUsd": 1.83,
"requests": 152,
"durationMs": 47000
}
The dashboard surfaces these so you can debug a thin pool (e.g. relevance scoring is too strict) or a failed refresh (e.g. DataForSEO 5xx).