Skip to content

Commit 6d1237a

Browse files
committed
fix(security): cap KB document download size to prevent memory-exhaustion DoS
Knowledge-base ingestion downloaded an attacker-controlled external fileUrl with no byte cap: downloadFileFromUrl defaults maxBytes to MAX_SAFE_INTEGER, so the streaming reader buffered the entire response into memory uncapped. An authenticated user could OOM the processing worker by pointing fileUrl at a server that streams an unbounded body. Wire the documented 100MB file-size limit (MAX_FILE_SIZE) into the ingestion download helper. The existing stream limiter aborts the read once the cap is exceeded and rejects up front on an oversized Content-Length, so the body is never fully buffered.
1 parent 3c6c6b1 commit 6d1237a

1 file changed

Lines changed: 12 additions & 1 deletion

File tree

apps/sim/lib/knowledge/documents/document-processor.ts

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ import { retryWithExponentialBackoff } from '@/lib/knowledge/documents/utils'
2222
import { StorageService } from '@/lib/uploads'
2323
import { isInternalFileUrl } from '@/lib/uploads/utils/file-utils'
2424
import { downloadFileFromUrl } from '@/lib/uploads/utils/file-utils.server'
25+
import { MAX_FILE_SIZE } from '@/lib/uploads/utils/validation'
2526
import { mistralParserTool } from '@/tools/mistral/parser'
2627

2728
const logger = createLogger('DocumentProcessor')
@@ -380,8 +381,18 @@ async function handleFileForOCR(
380381
}
381382
}
382383

384+
/**
385+
* Downloads an ingestion source file, enforcing the {@link MAX_FILE_SIZE} document
386+
* limit. `maxBytes` aborts the streaming read once the cap is exceeded (and rejects
387+
* up front on an oversized `Content-Length`), so an attacker-controlled `fileUrl`
388+
* pointing at an unbounded body cannot exhaust the processing worker's memory.
389+
*/
383390
async function downloadFileWithTimeout(fileUrl: string, userId?: string): Promise<Buffer> {
384-
return downloadFileFromUrl(fileUrl, { timeoutMs: TIMEOUTS.FILE_DOWNLOAD, userId })
391+
return downloadFileFromUrl(fileUrl, {
392+
timeoutMs: TIMEOUTS.FILE_DOWNLOAD,
393+
maxBytes: MAX_FILE_SIZE,
394+
userId,
395+
})
385396
}
386397

387398
async function downloadFileForBase64(fileUrl: string, userId?: string): Promise<Buffer> {

0 commit comments

Comments
 (0)