Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions .rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ This file provides guidance to programming agents when working with code in this

## Project Overview

The Apify SDK for Python (`apify` package on PyPI) is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides Actor lifecycle management, storage access (datasets, key-value stores, request queues), event handling, proxy configuration, and pay-per-event charging. It builds on top of the [Crawlee](https://crawlee.dev/python) web scraping framework and the [Apify API Client](https://docs.apify.com/api/client/python). Supports Python 3.10–3.14. Build system: hatchling.
The Apify SDK for Python (`apify` package on PyPI) is the official library for creating [Apify Actors](https://docs.apify.com/platform/actors) in Python. It provides Actor lifecycle management, storage access (datasets, key-value stores, request queues), event handling, proxy configuration, and pay-per-event charging. It builds on top of the [Crawlee](https://crawlee.dev/python) web scraping framework and the [Apify API Client](https://docs.apify.com/api/client/python). Supports Python 3.11–3.14. Build system: hatchling.

## Common Commands

Expand Down Expand Up @@ -46,7 +46,7 @@ uv run poe e2e-tests
## Code Style

- **Formatter/Linter**: Ruff (line length 120, single quotes for inline, double quotes for docstrings)
- **Type checker**: ty (targets Python 3.10)
- **Type checker**: ty (targets Python 3.11)
- **All ruff rules enabled** with specific ignores — see `pyproject.toml` `[tool.ruff.lint]` for the full ignore list
- Tests are exempt from docstring rules (`D`), assert warnings (`S101`), and private member access (`SLF001`)
- Unused imports are allowed in `__init__.py` files (re-exports)
Expand All @@ -71,7 +71,7 @@ uv run poe e2e-tests

- **`_proxy_configuration.py`** — `ProxyConfiguration` manages Apify proxy setup (residential, datacenter, groups, country targeting).

- **`_models.py`** — Pydantic models for API data structures (Actor runs, webhooks, pricing info, etc.).
- **`_webhook.py`** — The `Webhook` dataclass (ad-hoc / persistent webhook definition) and the `to_client_representations` helper. Response and data models are no longer defined in the SDK — they come from `apify-client` v3 (e.g. `Run`, the Actor pricing-info models).

### Storage Clients (`src/apify/storage_clients/`)

Expand Down Expand Up @@ -101,8 +101,9 @@ Optional integration (`apify[scrapy]` extra) providing Scrapy scheduler, middlew
### Key Dependencies

- **`crawlee`** — Base framework providing storage abstractions, event system, configuration, service locator pattern
- **`apify-client`** — HTTP client for the Apify API (`ApifyClientAsync`)
- **`apify-shared`** — Shared constants and utilities (`ApifyEnvVars`, `ActorEnvVars`, etc.)
- **`apify-client`** — HTTP client for the Apify API (`ApifyClientAsync`); also the source of response and data models (`Run`, pricing info, webhook representations)

The SDK no longer depends on `apify-shared`. The platform env-var enums (`ApifyEnvVars`, `ActorEnvVars`) are vendored in `apify._consts` and re-exported from the top-level `apify` package.

## Testing

Expand Down
4 changes: 2 additions & 2 deletions docs/02_concepts/code/07_webhook.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
import asyncio

from apify import Actor, Webhook, WebhookEventType
from apify import Actor, Webhook


async def main() -> None:
async with Actor:
# Create a webhook that will be triggered when the Actor run fails.
webhook = Webhook(
event_types=[WebhookEventType.ACTOR_RUN_FAILED],
event_types=['ACTOR.RUN.FAILED'],
request_url='https://example.com/run-failed',
)

Expand Down
9 changes: 5 additions & 4 deletions docs/02_concepts/code/07_webhook_preventing.py
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
import asyncio

from apify import Actor, Webhook, WebhookEventType
from apify import Actor, Webhook


async def main() -> None:
async with Actor:
# Create a webhook that will be triggered when the Actor run fails.
# Create a webhook with an idempotency key to prevent duplicates on retries.
webhook = Webhook(
event_types=[WebhookEventType.ACTOR_RUN_FAILED],
event_types=['ACTOR.RUN.FAILED'],
request_url='https://example.com/run-failed',
idempotency_key=Actor.configuration.actor_run_id,
)

# Add the webhook to the Actor.
await Actor.add_webhook(webhook, idempotency_key=Actor.configuration.actor_run_id)
await Actor.add_webhook(webhook)

# Raise an error to simulate a failed run.
raise RuntimeError('I am an error and I know it!')
Expand Down
130 changes: 130 additions & 0 deletions docs/04_upgrading/upgrading_to_v4.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,133 @@ run = await Actor.call('user/actor', timeout='inherit')
The deprecated `latest_sdk_version`, `log_format`, and `standby_port` fields have been removed from `Configuration`:
- In place of `standby_port`, use `web_server_port`.
- `latest_sdk_version` and `log_format` don't have replacement. SDK version checking isn't supported for the Python SDK and the log format should be adjusted in code instead.

## Built on `apify-client` v3

The SDK is now built on [`apify-client`](https://docs.apify.com/api/client/python) v3 and no longer depends on `apify-shared`. The sections below cover the user-visible consequences; see the client's [Upgrading to v3](https://docs.apify.com/api/client/python/docs/upgrading/upgrading-to-v3) guide for the full list of changes in the client itself.

### Environment variable enums moved

If you imported the platform environment-variable enums from `apify_shared.consts` (`ApifyEnvVars`, `ActorEnvVars`), import them from `apify` instead — they are now vendored in the SDK and re-exported from the top-level package.

```python
# Before (v3)
from apify_shared.consts import ApifyEnvVars

# After (v4)
from apify import ApifyEnvVars
```

## Typed responses

`Actor.start`, `Actor.abort`, `Actor.call`, and `Actor.call_task` now return `apify_client._models.Run` instead of the SDK-side `ActorRun`. Both are [Pydantic](https://docs.pydantic.dev/latest/) models with the same snake_case fields, so field access is unchanged — only the type and import path differ. The SDK no longer ships its own response models (`apify._models` has been removed); response shapes come from `apify-client`.

## Literal string aliases instead of StrEnum classes

Generated enum-like types are now [`Literal`](https://docs.python.org/3/library/typing.html#typing.Literal) string aliases instead of `StrEnum` classes. Pass plain strings instead of enum members.

- `apify.WebhookEventType` is now a `Literal[...]` instead of a `StrEnum`. Use plain string values (`'ACTOR.RUN.FAILED'`) instead of enum members.
- `apify_shared.consts.ActorEventTypes` (a `StrEnum`) is replaced by `apify.ActorEventTypes`, now a `Literal['systemInfo', 'persistState', 'migrating', 'aborting']`. For runtime values, use `apify.Event` (re-exported from Crawlee) instead of enum members.

```python
# Before (v3)
from apify import Actor
from apify_shared.consts import ActorEventTypes

Actor.on(ActorEventTypes.SYSTEM_INFO, callback)

# After (v4)
from apify import Actor, Event

Actor.on(Event.SYSTEM_INFO, callback)
```

## Actor pricing info models

The Actor pricing-info models exposed through `Actor.configuration.actor_pricing_info` — `FreeActorPricingInfo`, `FlatPricePerMonthActorPricingInfo`, `PricePerDatasetItemActorPricingInfo`, `PayPerEventActorPricingInfo`, and the nested `ActorChargeEvent` / `PricingPerEvent` — are now thin subclasses of the corresponding `apify-client` models instead of standalone SDK copies. The discriminated-union shape is unchanged, so existing access (`pricing_model`, per-event titles and prices) keeps working; the models now expose the full `apify-client` field set, and a charge event's `event_price_usd` is optional (it is unset for tier-priced events). `ChargingManager.get_pricing_info()` is unchanged.

## `Webhook` API simplified

The `Webhook` model has been slimmed down to only the fields a user sets when defining a webhook. Server-populated response fields (`id`, `created_at`, `modified_at`, `user_id`, `is_ad_hoc`, `condition`, `last_dispatch`, `stats`) and the unused `WebhookCondition` helper class have been removed. The `description` and `should_interpolate_strings` fields have also been removed — they are not part of the ad-hoc webhook representation (`event_types`, `request_url`, `payload_template`, `headers_template`) that `Actor.start` / `Actor.call` / `Actor.call_task` and `Actor.add_webhook` now send. `Webhook` is now a plain `@dataclass` instead of a Pydantic `BaseModel` — construct it with snake_case kwargs; `.model_dump()` / `.model_validate()` are gone.

The retry and idempotency kwargs that used to live on `Actor.add_webhook` have moved onto the `Webhook` instance itself.

```python
# Before (v3)
from apify import Actor, Webhook

await Actor.add_webhook(
Webhook(event_types=['ACTOR.RUN.FAILED'], request_url='https://example.com'),
ignore_ssl_errors=False,
do_not_retry=False,
idempotency_key='my-key',
)

# After (v4)
from apify import Actor, Webhook

await Actor.add_webhook(
Webhook(
event_types=['ACTOR.RUN.FAILED'],
request_url='https://example.com',
ignore_ssl_errors=False,
do_not_retry=False,
idempotency_key='my-key',
)
)
```

The `idempotency_key` kwarg form on `Actor.add_webhook` still works for one more release but emits a `DeprecationWarning` and will be removed in v5.0. The `ignore_ssl_errors` and `do_not_retry` kwargs have been removed outright — set them on the `Webhook` instance.

`apify.WebhookCondition` is no longer exported; the SDK now binds the webhook to the current Actor run internally.

The `webhooks` argument on `Actor.start`, `Actor.call`, and `Actor.call_task` still accepts `list[Webhook]` and the fields used at the call site (`event_types`, `request_url`, `payload_template`, `headers_template`) are unchanged.

## `Actor.new_client` — `timeout` scales all tiers

`apify-client` v3 split its single timeout into four tiers (short / medium / long / max). `Actor.new_client(timeout=...)` still takes a single `timedelta`; the SDK uses it as the medium-tier baseline and scales the other tiers proportionally (short = `timeout / 6`, long = `timeout * 12`, max = `timeout * 12`). The public signature is unchanged — no migration needed.

## Using the client from `Actor.new_client`

`Actor.new_client()` (and the `Actor.apify_client` property) now returns an `apify-client` v3 `ApifyClientAsync`. When you use that client directly, the client's v3 breaking changes apply — the most impactful ones are below. See the client's [Upgrading to v3](https://docs.apify.com/api/client/python/docs/upgrading/upgrading-to-v3) guide for the complete reference.

### 404 raises `NotFoundError` on ambiguous endpoints

Direct `.get(id)` and `.delete(id)` calls still swallow 404 into `None`. But where a 404 could mean either the parent or the sub-resource is missing, the client now raises `NotFoundError` instead of returning `None`.

```python
# Before (v3)
client = Actor.new_client()

# Returned None on 404.
dataset = await client.run('some-run-id').dataset().get()

# After (v4)
from apify_client.errors import NotFoundError

client = Actor.new_client()

# Raises NotFoundError; handle it explicitly.
try:
dataset = await client.run('some-run-id').dataset().get()
except NotFoundError:
dataset = None
```

### Keyword-only arguments

Secondary parameters on several client methods can no longer be passed positionally.

```python
# Before (v3)
await client.key_value_store('my-store').set_record('my-key', {'data': 1}, 'application/json')
await client.run('my-run').charge('my-event', 5)

# After (v4)
await client.key_value_store('my-store').set_record('my-key', {'data': 1}, content_type='application/json')
await client.run('my-run').charge('my-event', count=5)
```

### Async `iterate_*` are no longer coroutine functions

`DatasetClientAsync.iterate_items()` and `KeyValueStoreClientAsync.iterate_keys()` are now plain `def` functions returning `AsyncIterator[T]`. Consumer code (`async for ...`) is unchanged; if you annotate the call's return value, change `AsyncGenerator[T, None]` to `AsyncIterator[T]`.
5 changes: 2 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,7 @@ keywords = [
"scraping",
]
dependencies = [
"apify-client>=2.3.0,<3.0.0",
"apify-shared>=2.0.0,<3.0.0",
"apify-client>=3.0.0,<4.0.0",
"crawlee>=1.0.4,<2.0.0",
"cachetools>=5.5.0",
"cryptography>=42.0.0",
Expand Down Expand Up @@ -197,7 +196,7 @@ builtins-ignorelist = ["id"]

[tool.ruff.lint.isort]
known-local-folder = ["apify"]
known-first-party = ["apify_client", "apify_shared", "crawlee"]
known-first-party = ["apify_client", "crawlee"]

[tool.ruff.lint.pylint]
max-branches = 18
Expand Down
9 changes: 7 additions & 2 deletions src/apify/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
from importlib import metadata

from apify_shared.consts import WebhookEventType
from apify_client._literals import WebhookEventType
from crawlee import Request
from crawlee.events import (
Event,
Expand All @@ -14,13 +14,18 @@

from apify._actor import Actor
from apify._configuration import Configuration
from apify._models import Webhook
from apify._consts import ActorEnvVars, ApifyEnvVars
from apify._proxy_configuration import ProxyConfiguration, ProxyInfo
from apify._webhook import Webhook
from apify.events._types import ActorEventTypes

__version__ = metadata.version('apify')

__all__ = [
'Actor',
'ActorEnvVars',
'ActorEventTypes',
'ApifyEnvVars',
'Configuration',
'Event',
'EventAbortingData',
Expand Down
Loading