Description
I've been having an issue where the 'base64-encode' stream filter produces corrupt output under certain circumstances.
This code:
$streamUrl = "http://localhost/generate.pdf.php";
$streamSessionId = internal_function_here_gets_valid_session_id();
$streamContext = stream_context_create([
'http'=> [
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: ".session_name()."=$streamSessionId\r\n"
]
]);
$stream = fopen($streamUrl, 'r', false, $streamContext);
\stream_filter_append($stream, 'convert.base64-encode');
header('Content-Type: application/pdf');
echo base64_decode(\stream_get_contents($stream));
Produces this corrupt PDF:
bad.pdf
Here's the raw base64 that was produced:
$streamUrl = "http://localhost/generate.pdf.php";
$streamSessionId = internal_function_here_gets_valid_session_id();
$streamContext = stream_context_create([
'http'=> [
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: ".session_name()."=$streamSessionId\r\n"
]
]);
$stream = fopen($streamUrl, 'r', false, $streamContext);
\stream_filter_append($stream, 'convert.base64-encode');
header('Content-Type: text/plain');
echo \stream_get_contents($stream);
bad.pdf.base64.txt
By contrast, if I use the base64_encode() function rather than the stream filter, I get different (valid) results:
$streamUrl = "http://localhost/generate.pdf.php";
$streamSessionId = internal_function_here_gets_valid_session_id();
$streamContext = stream_context_create([
'http'=> [
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: ".session_name()."=$streamSessionId\r\n"
]
]);
$stream = fopen($streamUrl, 'r', false, $streamContext);
header('Content-Type: text/plain');
echo base64_encode(\stream_get_contents($stream));
good.pdf.base64.txt
And the actual PDF:
$streamUrl = "http://localhost/generate.pdf.php";
$streamSessionId = internal_function_here_gets_valid_session_id();
$streamContext = stream_context_create([
'http'=> [
'method'=>"GET",
'header'=>"Accept-language: en\r\n" .
"Cookie: ".session_name()."=$streamSessionId\r\n"
]
]);
$stream = fopen($streamUrl, 'r', false, $streamContext);
header('Content-Type: text/plain');
echo \stream_get_contents($stream);
good.pdf
It should be noted that the internal URL above dynamically generates the PDF (using PHP). I cannot replicate the issue using a static PDF served directly by Apache. Nevertheless, I don't think the generating URL is to blame, as it works fine so long as the stream filter is not applied. There is something about the way PHP interacts with itself over HTTP that triggers the bug.
Reducing the dynamic PDF generation endpoint proved to be a bit of a challenge, as seemingly unrelated things would effect whether the bug occurred or not. Finally though, I was finally able to reduce it to:
// This is generate.pdf.php
$pdf = file_get_contents('good.pdf');
header('Content-Type: application/pdf');
echo $pdf;
sleep(1);
The sleep() call at the end actually proved to be what did it: if the HTTP connection doesn't close immediately, the stream filter gets corrupted.
A few observations about the corruption itself:
- The first 73,719 bytes of both PDFs are identical, but diverge completely from that point forward
- Likewise, the first 1,223,189 characters of both base64 strings are identical, but the output diverges completely from that point forward
- The stream filter inserted two random padding characters in the middle of the string right at the point where things started to go wrong
- Despite those added padding characters, both base64 strings and both PDFs are the same size (the good base64 string has two padding characters in the normal place at the end of the string)
- Although PHP is able to decode the bad base64, other systems (such as Python) cannot
My best guess is the bug is something to do with timing and/or buffering in the stream filter implementation.
PHP Version
PHP 8.5.1 (cli) (built: Dec 16 2025 15:59:07) (NTS gcc x86_64)
Copyright (c) The PHP Group
Built by Amazon Linux
Zend Engine v4.5.1, Copyright (c) Zend Technologies
with Zend OPcache v8.5.1, Copyright (c), by Zend Technologies
Operating System
Amazon Linux 2023
Description
I've been having an issue where the 'base64-encode' stream filter produces corrupt output under certain circumstances.
This code:
Produces this corrupt PDF:
bad.pdf
Here's the raw base64 that was produced:
bad.pdf.base64.txt
By contrast, if I use the
base64_encode()function rather than the stream filter, I get different (valid) results:good.pdf.base64.txt
And the actual PDF:
good.pdf
It should be noted that the internal URL above dynamically generates the PDF (using PHP). I cannot replicate the issue using a static PDF served directly by Apache. Nevertheless, I don't think the generating URL is to blame, as it works fine so long as the stream filter is not applied. There is something about the way PHP interacts with itself over HTTP that triggers the bug.
Reducing the dynamic PDF generation endpoint proved to be a bit of a challenge, as seemingly unrelated things would effect whether the bug occurred or not. Finally though, I was finally able to reduce it to:
The
sleep()call at the end actually proved to be what did it: if the HTTP connection doesn't close immediately, the stream filter gets corrupted.A few observations about the corruption itself:
My best guess is the bug is something to do with timing and/or buffering in the stream filter implementation.
PHP Version
Operating System
Amazon Linux 2023