Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/proxy_integration_tests_javascript.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
with:
persist-credentials: false

- name: Set up Node
uses: actions/setup-node@v4
uses: actions/setup-node@53b83947a5a98c8d113130e565377fae1a50d02f # v6.3.0 (Node 24)
with:
node-version: "24"
cache: npm
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/proxy_integration_tests_php.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
with:
persist-credentials: false

Expand Down
11 changes: 6 additions & 5 deletions .github/workflows/proxy_integration_tests_python.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,23 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
with:
persist-credentials: false

- name: Set up Python
uses: actions/setup-python@v5
uses: actions/setup-python@a309ff8b426b58ec0e2a45f0f869d46889d02405 # v6.2.0 (Node 24)
with:
python-version: "3.x"
# Pin for reproducible dependency wheels (pycurl, etc.); adjust as needed.
python-version: "3.12"

- name: Install system dependencies (pycurl)
run: sudo apt-get update && sudo apt-get install -y libcurl4-openssl-dev

- name: Install python-proxy-headers and example dependencies
- name: Install example dependencies
run: |
python -m pip install --upgrade pip
pip install python-proxy-headers requests urllib3 aiohttp httpx cloudscraper autoscraper pycurl
pip install -r python/requirements.txt

- name: Require PROXY_URL Actions secret
env:
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/proxy_integration_tests_ruby.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,12 +19,12 @@ jobs:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2
- uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2 (Node 24)
with:
persist-credentials: false

- name: Set up Ruby
uses: ruby/setup-ruby@2e007403fc1ec238429ecaa57af6f22f019cc135 # v1.234.0
uses: ruby/setup-ruby@3ff19f5e2baf30647122352b96108b1fbe250c64 # v1.299.0 (Node 24)
with:
ruby-version: "3.3"
bundler-cache: true
Expand Down
61 changes: 24 additions & 37 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,69 +9,56 @@ Example code for using proxy servers in different programming languages. Current

## Python Proxy Examples

### Using python-proxy-headers

The [python-proxy-headers](https://github.com/proxymesh/python-proxy-headers) library enables sending custom headers to proxy servers and receiving proxy response headers. This is essential for services like [ProxyMesh](https://proxymesh.com) that use custom headers for country selection and IP assignment.

**Installation:**

```bash
pip install python-proxy-headers
pip install -r python/requirements.txt
```

**Running Examples:**
`pycurl` needs libcurl and `curl-config` (for example Debian/Ubuntu: `libcurl4-openssl-dev`). The test runner skips `pycurl-*` examples when `pycurl` is not installed, and skips `scrapy-proxy` when `import scrapy` fails (for example a broken `cryptography` / `cffi` install).

All examples read proxy configuration from environment variables:
**Running Examples:**

```bash
# Required: Set your proxy URL
export PROXY_URL='http://user:pass@proxy.example.com:8080'

# Optional: Custom test URL (default: https://api.ipify.org?format=json)
# Optional: Target URL (default: https://api.ipify.org?format=json)
export TEST_URL='https://httpbin.org/ip'

# Optional: Send a custom header to the proxy
export PROXY_HEADER='X-ProxyMesh-Country'
export PROXY_VALUE='US'

# Optional: Read a specific header from the response
# Optional: Print one response header
export RESPONSE_HEADER='X-ProxyMesh-IP'

# Run a single example
python python/requests-proxy-headers.py
# Single example
python python/requests-proxy.py

# Run all examples as tests
# All examples as tests
python python/run_tests.py

# Run specific examples
python python/run_tests.py requests-proxy-headers httpx-proxy-headers
# Specific examples (substring match, like the JS runner)
python python/run_tests.py requests httpx
```

**Examples:**

| Library | Example | Description |
|---------|---------|-------------|
| [requests](https://docs.python-requests.org/) | [requests-proxy-headers.py](python/requests-proxy-headers.py) | Simple HTTP requests with proxy headers |
| [requests](https://docs.python-requests.org/) | [requests-proxy-headers-session.py](python/requests-proxy-headers-session.py) | Session-based requests for connection pooling |
| [urllib3](https://urllib3.readthedocs.io/) | [urllib3-proxy-headers.py](python/urllib3-proxy-headers.py) | Low-level HTTP client with proxy headers |
| [aiohttp](https://docs.aiohttp.org/) | [aiohttp-proxy-headers.py](python/aiohttp-proxy-headers.py) | Async HTTP client with proxy headers |
| [httpx](https://www.python-httpx.org/) | [httpx-proxy-headers.py](python/httpx-proxy-headers.py) | Modern HTTP client with proxy headers |
| [httpx](https://www.python-httpx.org/) | [httpx-async-proxy-headers.py](python/httpx-async-proxy-headers.py) | Async httpx with proxy headers |
| [pycurl](http://pycurl.io/) | [pycurl-proxy-headers.py](python/pycurl-proxy-headers.py) | libcurl bindings with proxy headers |
| [pycurl](http://pycurl.io/) | [pycurl-proxy-headers-lowlevel.py](python/pycurl-proxy-headers-lowlevel.py) | Low-level pycurl integration |
| [cloudscraper](https://github.com/venomous/cloudscraper) | [cloudscraper-proxy-headers.py](python/cloudscraper-proxy-headers.py) | Cloudflare bypass with proxy headers |
| [autoscraper](https://github.com/alirezamika/autoscraper) | [autoscraper-proxy-headers.py](python/autoscraper-proxy-headers.py) | Automatic web scraping with proxy headers |

> **Note:** Most Python HTTP libraries do not expose custom headers on HTTPS `CONNECT` tunneling by default. These examples use [python-proxy-headers](https://github.com/proxymesh/python-proxy-headers) adapters to send proxy headers and read proxy response headers consistently.

### Basic Proxy Examples

* [requests-proxy.py](python/requests-proxy.py) - Basic proxy usage with requests
* [requests-random-proxy.py](python/requests-random-proxy.py) - Random proxy rotation
| [requests](https://docs.python-requests.org/) | [requests-proxy.py](python/requests-proxy.py) | Basic `GET` with `proxies=` |
| [requests](https://docs.python-requests.org/) | [requests-session-proxy.py](python/requests-session-proxy.py) | Session with pooled connections |
| [urllib3](https://urllib3.readthedocs.io/) | [urllib3-proxy.py](python/urllib3-proxy.py) | `ProxyManager` |
| [aiohttp](https://docs.aiohttp.org/) | [aiohttp-proxy.py](python/aiohttp-proxy.py) | Async client, `proxy=` on the request |
| [httpx](https://www.python-httpx.org/) | [httpx-proxy.py](python/httpx-proxy.py) | Sync client, `proxy=` on the client |
| [httpx](https://www.python-httpx.org/) | [httpx-async-proxy.py](python/httpx-async-proxy.py) | Async client |
| [pycurl](http://pycurl.io/) | [pycurl-proxy.py](python/pycurl-proxy.py) | libcurl via `setopt` (`PROXY`, `WRITEDATA`, etc.) |
| [cloudscraper](https://github.com/VeNoMouS/cloudscraper) | [cloudscraper-proxy.py](python/cloudscraper-proxy.py) | Requests-based scraper with `proxies` |
| [autoscraper](https://github.com/alirezamika/autoscraper) | [autoscraper-proxy.py](python/autoscraper-proxy.py) | Offline `html=` demo (matches upstream tests); README shows `request_args` + `proxies` for live URLs |
| [Scrapy](https://scrapy.org/) | [scrapy-proxy.py](python/scrapy-proxy.py) | `scrapy runspider` with `meta['proxy']` |

### Other Python scripts

### Scrapy
* [requests-random-proxy.py](python/requests-random-proxy.py) - Random proxy rotation

* [scrapy-proxy-headers.py](python/scrapy-proxy-headers.py) - Scrapy spider with proxy headers
> **Note:** Like the Ruby, JavaScript, and PHP examples here, these scripts use each library's normal proxy options only. Most of them do not send custom headers on the HTTPS `CONNECT` tunnel or surface proxy `CONNECT` response headers. For that, see [python-proxy-headers](https://github.com/proxymesh/python-proxy-headers) or [scrapy-proxy-headers](https://github.com/proxymesh/scrapy-proxy-headers).

## JavaScript / Node.js Proxy Examples

Expand Down
48 changes: 0 additions & 48 deletions python/aiohttp-proxy-headers.py

This file was deleted.

39 changes: 39 additions & 0 deletions python/aiohttp-proxy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/usr/bin/env python3
"""
aiohttp with an HTTP proxy.

Configuration via environment variables:
PROXY_URL - Proxy URL (required), e.g., http://user:pass@proxy:8080
TEST_URL - URL to request (default: https://api.ipify.org?format=json)
RESPONSE_HEADER - Optional header name to print from the response

Documentation: https://docs.aiohttp.org/en/stable/client_advanced.html#proxy-support
"""
import asyncio
import os
import sys

import aiohttp

proxy_url = os.environ.get('PROXY_URL') or os.environ.get('HTTPS_PROXY')
if not proxy_url:
print('Error: Set PROXY_URL environment variable', file=sys.stderr)
sys.exit(1)

test_url = os.environ.get('TEST_URL', 'https://api.ipify.org?format=json')
response_header = os.environ.get('RESPONSE_HEADER')


async def main() -> None:
timeout = aiohttp.ClientTimeout(total=30)
async with aiohttp.ClientSession(timeout=timeout) as session:
async with session.get(test_url, proxy=proxy_url) as response:
body = await response.text()
print(f'Status: {response.status}')
print(f'Body: {body}')
if response_header:
print(f'{response_header}: {response.headers.get(response_header)}')


if __name__ == '__main__':
asyncio.run(main())
45 changes: 0 additions & 45 deletions python/autoscraper-proxy-headers.py

This file was deleted.

60 changes: 60 additions & 0 deletions python/autoscraper-proxy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
#!/usr/bin/env python3
"""
AutoScraper with a proxy (how to pass ``request_args``).

The AutoScraper project tests ``build`` / ``get_result_similar`` with **inline HTML**
only — see ``tests/unit/test_build.py`` and ``tests/integration/`` in
https://github.com/alirezamika/autoscraper — not with live URLs. That keeps tests
deterministic. This script does the same for the integration runner.

**Using a proxy with a real URL** matches the library README::

scraper.build(url, wanted_list, request_args={'proxies': proxies, 'timeout': 30})
scraper.get_result_similar(url, request_args={'proxies': proxies, 'timeout': 30})

``PROXY_URL`` is required here so this example fits the same env as the other scripts;
this demo does not open a network connection — it only exercises AutoScraper on
embedded HTML.

Configuration via environment variables:
PROXY_URL - Required by the test runner (same as other examples), e.g.
http://user:pass@proxy:8080

Documentation: https://github.com/alirezamika/autoscraper
"""
import os
import sys

from autoscraper import AutoScraper

# Same idea as upstream tests/unit/test_build.py — fixed HTML, no HTTP.
SAMPLE_HTML = """<!DOCTYPE html>
<html><head><title>Proxy example</title></head>
<body>
<h1>AutoScraper proxy example</h1>
<p>Paragraph one.</p>
</body></html>
"""
PLACEHOLDER_URL = 'https://example.invalid/autoscraper-proxy-demo'


def main() -> None:
scraper = AutoScraper()
wanted_list = ['AutoScraper proxy example']
learned = scraper.build(
html=SAMPLE_HTML,
url=PLACEHOLDER_URL,
wanted_list=wanted_list,
)
similar = scraper.get_result_similar(html=SAMPLE_HTML, url=PLACEHOLDER_URL)
print(f'AutoScraper build: {learned}')
print(f'AutoScraper get_result_similar: {similar}')
if not learned:
sys.exit(1)


if __name__ == '__main__':
if not (os.environ.get('PROXY_URL') or os.environ.get('HTTPS_PROXY')):
print('Error: Set PROXY_URL environment variable', file=sys.stderr)
sys.exit(1)
main()
Loading
Loading