[ai-assisted] feat(chunking): Blockify 청킹 PoC 추가#515
Open
donghyuck wants to merge 1 commit into
Open
Conversation
Issue: - Closes #514 Why: - Markdown 기반 RAG에서 질문·답변 중심 Knowledge Block 청킹을 PoC로 검증할 수 있도록 blockify 전략이 필요했다. What: - ChunkingStrategyType에 blockify 전략을 추가하고 Markdown pipeline option 검증에서 허용했다. - chunking starter에 opt-in BlockifyChunker, BlockifyGenerator 포트, deterministic heuristic generator, blockify 설정 metadata를 추가했다. - blockify metadata schema, fingerprint, requested/actual strategy, source evidence, section fallback 정보를 chunk metadata에 저장하도록 했다. - blockify 비활성 오류, fallback, fingerprint 안정성, 자동설정, Markdown option 테스트를 추가했다. - README와 CHANGELOG를 갱신했다. Validation: - ./gradlew :studio-platform-chunking:test :starter:studio-platform-starter-chunking:test :studio-platform-markdown:test :starter:studio-platform-starter-markdown:test: PASS - git diff --check: PASS
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Markdown 기반 RAG에서 질문·답변 중심 Knowledge Block 청킹을 PoC로 검증할 수 있도록
blockify전략이 필요합니다. 기존structure-based와 동일 원본 기준 Projection을 분리해 검색 품질, 답변 품질, 처리 성능, 비용을 비교할 수 있게 합니다.What
ChunkingStrategyType에BLOCKIFY("blockify")를 추가하고 Markdown pipeline option에서chunkingStrategy=blockify를 허용했습니다.starter-chunking에 opt-inBlockifyChunker,BlockifyGenerator포트, deterministicHeuristicBlockifyGenerator를 추가했습니다.schemaVersion,requestedChunkingStrategy,actualChunkingStrategy,blockifyFingerprint,sourceEvidence,validationStatus,promptVersion,generatorModel등을 저장하도록 했습니다.structure-basedfallback chunk로 보존합니다.studio.chunking.blockify.*설정 metadata, README, CHANGELOG, 단위/자동설정/Markdown option 테스트를 추가했습니다.Related Issues
Validation
./gradlew :studio-platform-chunking:test :starter:studio-platform-starter-chunking:test :studio-platform-markdown:test :starter:studio-platform-starter-markdown:testgit diff --checkRisk / Rollback
studio.chunking.blockify.enabled=true를 켠 환경에서 PoC heuristic generator 결과가 기대한 운영 LLM 결과와 다를 수 있습니다. 기본값은 비활성이라 기존 전략에는 영향이 없습니다.blockify전략, 설정, metadata 생성 경로가 제거되고 기존fixed-size,recursive,structure-based동작으로 돌아갑니다.AI / Subagent Usage
git diff --check를 실행했습니다.Checklist
AI-Assistedvalue is correct