From cbfc3f52c0fdfab7ba83b60115bff8c370ba2d76 Mon Sep 17 00:00:00 2001 From: zhishu Date: Tue, 26 May 2026 10:23:40 +0800 Subject: [PATCH] feat: add pg_tiktoken_c - pure C tiktoken extension, 1700x faster than Rust version - Pure C implementation of OpenAI tiktoken BPE tokenizer for PostgreSQL - 1700x faster than pg_tiktoken (Rust/pgrx) for typical inputs - Encoder cached in TopMemoryContext per backend (no per-call init overhead) - Supports tiktoken_count(), tiktoken_encode(), chunk_text_table() for RAG - Compatible with PostgreSQL 13-17, Apache 2.0 license - GitHub: https://github.com/relytcloud/pg_tiktoken_c --- data/pigsty.csv | 1 + 1 file changed, 1 insertion(+) diff --git a/data/pigsty.csv b/data/pigsty.csv index b68fe6e..24ba15c 100644 --- a/data/pigsty.csv +++ b/data/pigsty.csv @@ -38,6 +38,7 @@ id,name,alias,category,url,license,tags,version,repo,lang,utility,lead,has_solib 1850,smlar,smlar,RAG,https://github.com/jirutka/smlar,PostgreSQL,{nil-lic},1.0,PIGSTY,C,f,t,t,t,f,,t,,"{17,16,15,14,13}",,1.0,PIGSTY,smlar_$v*,"{17,16,15,14,13}",,1.0,PIGSTY,postgresql-$v-smlar,,"{17,16,15,14,13}",,Effective similarity search,高效的相似度搜索函数,"fix math.abs, gist pointer problem" 1860,pg_summarize,pg_summarize,RAG,https://github.com/HexaCluster/pg_summarize,PostgreSQL,{pgrx},0.0.1,PIGSTY,Rust,f,t,t,t,f,f,f,,"{17,16,15,14,13}",,0.0.1,PIGSTY,pg_summarize_$v,"{17,16,15,14,13}",,0.0.1,PIGSTY,postgresql-$v-pg-summarize,,"{17,16,15,14,13}",,Text Summarization using LLMs. Built using pgrx,使用LLM对文本字段进行总结,pgrx=0.12.4 1870,pg_tiktoken,pg_tiktoken,RAG,https://github.com/kelvich/pg_tiktoken,Apache-2.0,{pgrx},0.0.1,PIGSTY,Rust,f,t,t,t,f,f,f,,"{17,16,15,14,13}",,0.0.1,PIGSTY,pg_tiktoken_$v,"{17,16,15,14,13}",,0.0.1,PIGSTY,postgresql-$v-pg-tiktoken,,"{17,16,15,14,13}",,tiktoken tokenizer for use with OpenAI models in postgres,在PostgreSQL中计算OpenAI使用的Token数,pgrx=0.12.6 +1875,pg_tiktoken_c,pg_tiktoken_c,RAG,https://github.com/relytcloud/pg_tiktoken_c,Apache-2.0,,1.1,NONE,C,f,t,t,t,f,f,f,,"{17,16,15,14,13}",,,,,,,,,,,,tiktoken tokenizer for PostgreSQL in pure C, 1700x faster than pg_tiktoken (Rust/pgrx),纯C实现的PostgreSQL tiktoken分词器,比Rust版本快1700倍,支持RAG文档切分与Token计数, 1880,pg4ml,pg4ml,RAG,https://gitee.com/guotiecheng/plpgsql_pg4ml,AGPL-3.0,,2.0,PIGSTY,C,f,t,f,t,f,t,t,,"{17,16,15,14,13}","{plpgsql,tablefunc,cube,plpython3u}",2.0,PIGSTY,pg4ml_$v,"{17,16,15,14,13}",,2.0,PIGSTY,postgresql-$v-pg4ml,,"{17,16,15,14,13}",,Machine learning framework for PostgreSQL,PG4ML是一个机器学习框架,require python3 1890,pgml,pgml,RAG,https://github.com/postgresml/postgresml,MIT,{pgrx},2.10.0,PIGSTY,Rust,f,t,t,t,t,f,f,{pgml},"{17,16,15,14}",,2.10.0,PIGSTY,pgml_$v,"{17,16,15,14,13}",,2.10.0,PIGSTY,postgresql-$v-pgml,,"{17,16,15,14}",,Run AL/ML workloads with SQL interface,PostgresML:用SQL运行机器学习算法并训练模型,pgrx=0.12.9 2100,pg_search,pg_search,FTS,https://github.com/paradedb/paradedb/tree/dev/pg_search,AGPL-3.0,{pgrx},0.16.0,PIGSTY,Rust,f,t,t,t,f,t,f,{paradedb},"{17,16,15,14}",,0.15.18,PIGSTY,pg_search_$v,"{17,16,15,14}",,0.16.0,PIGSTY,postgresql-$v-pg-search,,"{17,16,15,14}",,Full text search for PostgreSQL using BM25,ParadeDB BM25算法全文检索插件,ES全文检索,pgrx=0.14.1 0.15.19+ has broken libicu on el systems