Improve search: multi-term AND + relevance ranking (FTS spike)#95
Improve search: multi-term AND + relevance ranking (FTS spike)#95rdhyee merged 2 commits intoisamplesorg:mainfrom
Conversation
Search input was passed into ILIKE patterns with only single-quote escaping, so a literal "%" or "_" in the query (e.g. "100%", "co_op") silently turned into wildcards. Escape % _ \ and add ESCAPE '\' in both whereClause and the relevance-score expression. Also reframe tools/build_fts_index.py as a spike artifact: the docstring told readers to upload the index to data.isamples.org, but per PR isamplesorg#95 findings the 200-358 MB result is too large to ship. Mark the script NOT in production pipeline and drop the misleading upload instructions. Smoke-tested locally with /tmp/explorer_smoke_test.py (multi-term "pottery cyprus" + wildcard "100%"): 0 JS exceptions, 0 console errors, 0 failed requests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Reviewed and pushed two small follow-ups (134aca2): 1. ILIKE wildcard escaping. Search input was passed into the 2. FTS spike script header. Smoke test ( Exercised: initial load, multi-term search ( Other notes from review (not blocking):
LGTM to merge once you've eyeballed the diff. |
Search input was passed into ILIKE patterns with only single-quote escaping, so a literal "%" or "_" in the query (e.g. "100%", "co_op") silently turned into wildcards. Escape % _ \ and add ESCAPE '\' in both whereClause and the relevance-score expression. Also reframe tools/build_fts_index.py as a spike artifact: the docstring told readers to upload the index to data.isamples.org, but per PR isamplesorg#95 findings the 200-358 MB result is too large to ship. Mark the script NOT in production pipeline and drop the misleading upload instructions. Smoke-tested locally with /tmp/explorer_smoke_test.py (multi-term "pottery cyprus" + wildcard "100%"): 0 JS exceptions, 0 console errors, 0 failed requests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
134aca2 to
7623ff5
Compare
Search improvements (immediate): - Multi-term search: "pottery Cyprus" requires BOTH words to match - Relevance ranking: label matches weighted 3x, place 2x, description 1x - Results sorted by relevance score when searching (random for browsing) FTS spike (future path, documented): - Added tools/build_fts_index.py to build DuckDB FTS index offline - Tested: 358 MB full index, 211 MB lite — too large for auto-download - BM25 scoring works correctly (Porter stemming, stopwords) - Next step: explore smaller index strategies or on-demand loading Closes isamplesorg#84 (spike complete — findings documented in PR) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Search input was passed into ILIKE patterns with only single-quote escaping, so a literal "%" or "_" in the query (e.g. "100%", "co_op") silently turned into wildcards. Escape % _ \ and add ESCAPE '\' in both whereClause and the relevance-score expression. Also reframe tools/build_fts_index.py as a spike artifact: the docstring told readers to upload the index to data.isamples.org, but per PR isamplesorg#95 findings the 200-358 MB result is too large to ship. Mark the script NOT in production pipeline and drop the misleading upload instructions. Smoke-tested locally with /tmp/explorer_smoke_test.py (multi-term "pottery cyprus" + wildcard "100%"): 0 JS exceptions, 0 console errors, 0 failed requests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Closes #84 — FTS spike complete with immediate search improvements and documented future path.
Shipped now (zero new dependencies):
%,_, or\are escaped, so they match literal characters instead of acting as wildcardsFTS spike findings:
tools/build_fts_index.py(preserved as a non-production spike artifact, clearly marked in its module docstring)ATTACHover HTTP in DuckDB-WASM is supported but downloading 200–358 MB is impracticalRecommended next steps (not in this PR):
Test plan
tools/build_fts_index.pyruns successfully with local parquet (spike artifact, not part of the deploy)🤖 Generated with Claude Code