Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .github/workflows/skywalking.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -390,6 +390,14 @@ jobs:
config: test/e2e-v2/cases/storage/es/es-sharding/e2e.yaml
env: ES_VERSION=8.18.8

- name: Inspect API BanyanDB
config: test/e2e-v2/cases/inspect/banyandb/e2e.yaml
- name: Inspect API PostgreSQL
config: test/e2e-v2/cases/inspect/postgresql/e2e.yaml
- name: Inspect API Elasticsearch 8.18.8
config: test/e2e-v2/cases/inspect/elasticsearch/e2e.yaml
env: ES_VERSION=8.18.8

- name: Runtime Rule MAL Storage BanyanDB
config: test/e2e-v2/cases/runtime-rule/mal-storage/banyandb/e2e.yaml
- name: Runtime Rule MAL Storage PostgreSQL
Expand Down
1 change: 1 addition & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

#### Project

* Extend the `GET /inspect/entities` admin API to inspect a metric persisted by **any** OAP, even one this node does not define locally. When the metric is unknown to the local registry, the caller supplies `valueColumn` + `valueType` and the storage backend resolves the physical index/table/group from its own running config (no DB schema/table-metadata read): ES uses the merged `metrics-all` index + `metric_table` discriminator, JDBC probes the node's function tables by the `table_name` discriminator, and BanyanDB synthesizes a read-only measure schema. Scope is no longer required — the `entity_id` is decoded structurally (service / 2nd-level / relations) with a generic `name` leaf. Locally-defined metrics keep the exact field names, scope, and `mqeEntity` as before.
* Remove the always-on alarm-to-event conversion (`EventHookCallback`). A triggered alarm is no longer synthesized into the events pipeline as an `Alarm`/`AlarmRecovery` event; events now originate only from real event sources (agents, SkyWalking CLI, Kubernetes Event Exporter). Alarms remain available through the alarm store (`getAlarm`/`queryAlarms`) and the configured alarm hooks. This drops a documented "Known Event" and removes 1-2 synthetic event records per alarm fire.
* **New `queryAlarms` GraphQL query — entity / layer / rule filters for alarms.** Adds
a comprehensive alarm query API alongside the legacy `getAlarm`. The new
Expand Down
87 changes: 78 additions & 9 deletions docs/en/setup/backend/admin-api/inspect.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,12 @@ let operators answer two questions without writing exploratory MQE:
1. *Which metrics has OAP registered, and at what downsampling?*
2. *For metric `X` in time range `T`, which entities currently hold values?*

The output of (2) carries a ready-to-paste `mqeEntity` payload, so the
follow-up MQE call against the public GraphQL `execExpression` mutation is
copy-paste from the inspect response.
For a locally-defined metric, the output of (2) carries a ready-to-paste
`mqeEntity` payload, so the follow-up MQE call against the public GraphQL
`execExpression` mutation is copy-paste from the inspect response. A metric
persisted by **another OAP** that this node does not define can also be
inspected with caller-supplied metadata (see
[Foreign metrics](#foreign-metrics-not-defined-on-this-oap)).

## Enabling

Expand Down Expand Up @@ -83,8 +86,12 @@ curl 'http://oap-admin:17128/inspect/metrics?regex=service_cpm'

### `GET /inspect/entities`

For a metric + time range + step, returns the entities holding values, each
decoded into a human-readable shape and an MQE-ready `mqeEntity` payload.
For a metric + time range + step, returns the entities holding values. For a
metric this OAP defines locally, each row is decoded into a human-readable
shape and an MQE-ready `mqeEntity` payload. A metric persisted by **another
OAP** that this node does not define can also be inspected by additionally
supplying `valueColumn` + `valueType` — see
[Foreign metrics](#foreign-metrics-not-defined-on-this-oap).

Restricted to `REGULAR_VALUE` and `LABELED_VALUE` metrics. The non-MQE
metric types (`HEATMAP` / `SAMPLED_RECORD`) and the out-of-scope scopes
Expand All @@ -94,11 +101,13 @@ Query parameters:

| Name | Required | Description |
|------|----------|-------------|
| `metric` | yes | Metric name. Must resolve in `ValueColumnMetadata`. |
| `metric` | yes | Metric name. If unknown to this OAP's local registry, also supply `valueColumn` + `valueType` (see [Foreign metrics](#foreign-metrics-not-defined-on-this-oap)). |
| `start` | yes | Time-range start. Same format as MQE `Duration.start`: `yyyy-MM-dd` (DAY), `yyyy-MM-dd HH` (HOUR), `yyyy-MM-dd HHmm` (MINUTE), `yyyy-MM-dd HHmmss` (SECOND). Note `HHmm` is no-separator — use `1230`, not `12:30`. |
| `end` | yes | Time-range end. Format mirrors `start`. |
| `step` | yes | One of `MINUTE` / `HOUR` / `DAY`. Must be one of the metric's `downsamplings`. |
| `step` | yes | One of `MINUTE` / `HOUR` / `DAY`. For a locally-defined metric, must be one of the metric's `downsamplings`; for a foreign metric the requested step is trusted as-is. |
| `limit` | no | Server-side cap. Default 300, hard-capped at 300. |
| `valueColumn` | conditional | **Required when `metric` is not defined on this OAP.** The metric's value column (post-override physical name, e.g. `value`, `value_`, `double_value`, `datatable_value`, `dataset`). Ignored for a locally-defined metric. |
| `valueType` | conditional | **Required when `metric` is not defined on this OAP.** One of `LONG` / `INT` / `DOUBLE` / `LABELED`. Ignored for a locally-defined metric. |

The `limit` is applied as `LIMIT N` at the storage layer — it bounds the
total rows scanned (300 ≈ 10 buckets × 30 entities), not 300 distinct
Expand Down Expand Up @@ -159,6 +168,65 @@ curl 'http://oap-admin:17128/inspect/entities?metric=service_cpm&start=2026-05-1
}
```

### Foreign metrics (not defined on this OAP)

A metric persisted by **another OAP** — a different OAL/MAL/runtime-rule set —
is absent from this node's local registry, so its value column, type, and scope
cannot be recovered from the metric name alone (there is no OAL/MAL text here to
read). Supply `valueColumn` + `valueType` on the request and the backend resolves
the physical index/table/group from its own running configuration (the
deterministic metric → storage mapping that merging has used for years), with
**no storage schema / table-metadata read**:

* **ES** — the merged `metrics-all` index, filtered by the `metric_table`
discriminator. Not supported under `logicSharding=true`, where the physical
index is derived from the metric's stream class (returns `500`).
* **JDBC** — probes the node's aggregation-function metric tables
(`metrics_<fn>` / `meter_<fn>`) by the `table_name` discriminator.
* **BanyanDB** — synthesizes a read-only measure schema from the deterministic
measure/group mapping.

Because the scope is unknown, the response degrades gracefully:

* `scope` is `null` (the structural kind is per-row in `decoded`).
* `entity_id` is decoded **structurally**: a single entity yields `serviceName`
(plus a generic `name` leaf for a 2nd-level instance/endpoint — the two are
byte-identical and not distinguishable without the scope); a relation yields a
`source` / `destination` pair.
* **No `mqeEntity`** is produced — MQE needs the exact scope, and a foreign
metric is not MQE-queryable on this node anyway.

Existence is decided by the data probe itself, so an **empty result means "no
rows in range", not "metric absent"**. Nothing is validated against metadata up
front: a wrong `valueColumn` / `valueType` surfaces as a storage error (`500`)
or an empty result.

Tip: query the writing OAP's own `/inspect/metrics?regex=<metric>` to read the
exact `valueColumnName`, then pass that as `valueColumn`.

Example — `meter_custom_x`, defined on another OAP, inspected here:

```bash
curl 'http://oap-admin:17128/inspect/entities?metric=meter_custom_x&valueColumn=value&valueType=LONG&start=2026-05-10%201230&end=2026-05-10%201240&step=MINUTE'
```

```json
{
"metric": "meter_custom_x",
"scope": null,
"step": "MINUTE",
"start": "2026-05-10 1230",
"end": "2026-05-10 1240",
"rows": [
{
"entityId": "cGF5bWVudA==.1",
"decoded": { "serviceName": "payment", "isReal": true },
"layer": "GENERAL"
}
]
}
```

## Discovering the OAP REST URL for the MQE follow-up

To keep the surface minimal, the inspect API does not introduce a separate
Expand All @@ -173,8 +241,9 @@ session start is enough.

| Status | Body | Cause |
|--------|------|-------|
| 400 | `{"error":"unknown metric: foo"}` | Metric not in `ValueColumnMetadata`. |
| 400 | `{"error":"step DAY not supported by metric foo (MINUTE,HOUR)"}` | Metric not materialised at the requested downsampling. |
| 400 | `{"error":"metric unknown locally: foo — provide valueColumn and valueType to inspect a metric persisted by another OAP"}` | Metric not defined on this OAP, and the `valueColumn` / `valueType` pair was not supplied. See [Foreign metrics](#foreign-metrics-not-defined-on-this-oap). |
| 400 | `{"error":"valueType must be one of LONG / INT / DOUBLE / LABELED (got X)"}` | Invalid `valueType` on the foreign-metric path. |
| 400 | `{"error":"step DAY not supported by metric foo (MINUTE,HOUR)"}` | Metric not materialised at the requested downsampling (locally-defined metric only). |
| 400 | `{"error":"metric type HEATMAP is not MQE-queryable; /inspect/entities only accepts REGULAR_VALUE and LABELED_VALUE"}` | Metric is `HEATMAP` (`HISTOGRAM` `dataType`). |
| 400 | `{"error":"metric type SAMPLED_RECORD is out of scope for /inspect/entities"}` | Metric is `SAMPLED_RECORD`. |
| 400 | `{"error":"process scope is out of scope"}` | Scope is `Process` / `ProcessRelation`. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import java.util.LinkedHashMap;
import java.util.Map;
import org.apache.skywalking.oap.server.admin.inspect.response.MqeEntity;
import org.apache.skywalking.oap.server.core.Const;
import org.apache.skywalking.oap.server.core.analysis.IDManager;
import org.apache.skywalking.oap.server.core.query.enumeration.Scope;

Expand Down Expand Up @@ -92,6 +93,102 @@ public static Decoded decode(final Scope scope, final String entityId) {
}
}

/**
* Structural, scope-free decode for a metric this OAP does not define (no {@link Scope}
* available). The stored {@code entity_id} self-encodes the names with standard base64 plus
* the {@code .} / {@code _} / {@code -} delimiters (none of which appear in base64 output), so
* the entity kind is recoverable from delimiter structure alone:
* <ul>
* <li>no {@code -}, no {@code _} → service</li>
* <li>no {@code -}, one {@code _} → 2nd-level entity (service instance OR endpoint —
* byte-identical encoding, emitted as a generic {@code name})</li>
* <li>one {@code -} (2 parts) → service relation, or 2nd-level relation when each side has a
* {@code _}</li>
* <li>three {@code -} (4 parts) → endpoint relation</li>
* </ul>
* The only thing not recoverable is the instance-vs-endpoint label, so the leaf is reported as
* {@code name} and no {@link MqeEntity} is produced — MQE re-query needs the exact scope, and a
* foreign metric is not MQE-queryable on this node anyway.
*/
public static Decoded decodeUnknownScope(final String entityId) {
final String[] relationParts = entityId.split(Const.RELATION_ID_PARSER_SPLIT);
switch (relationParts.length) {
case 1:
return entityId.contains(Const.ID_CONNECTOR)
? decodeLevel2Generic(entityId)
: decodeServiceGeneric(entityId);
case 2:
return relationParts[0].contains(Const.ID_CONNECTOR)
? decodeLevel2RelationGeneric(entityId)
: decodeServiceRelationGeneric(entityId);
case 4:
return decodeEndpointRelationGeneric(entityId);
default:
throw new IllegalArgumentException(
"cannot structurally decode entity_id without scope: " + entityId);
}
}

private static Decoded decodeServiceGeneric(final String entityId) {
final IDManager.ServiceID.ServiceIDDefinition def = IDManager.ServiceID.analysisId(entityId);
final Map<String, Object> decoded = new LinkedHashMap<>();
decoded.put("serviceName", def.getName());
decoded.put("isReal", def.isReal());
return new Decoded(decoded, null, entityId);
}

private static Decoded decodeLevel2Generic(final String entityId) {
final IDManager.ServiceInstanceID.InstanceIDDefinition def =
IDManager.ServiceInstanceID.analysisId(entityId);
final IDManager.ServiceID.ServiceIDDefinition svc =
IDManager.ServiceID.analysisId(def.getServiceId());
return new Decoded(toLevel2Map(svc, def.getName()), null, def.getServiceId());
}

private static Decoded decodeServiceRelationGeneric(final String entityId) {
final IDManager.ServiceID.ServiceRelationDefine rel =
IDManager.ServiceID.analysisRelationId(entityId);
final IDManager.ServiceID.ServiceIDDefinition src = IDManager.ServiceID.analysisId(rel.getSourceId());
final IDManager.ServiceID.ServiceIDDefinition dst = IDManager.ServiceID.analysisId(rel.getDestId());
final Map<String, Object> decoded = new LinkedHashMap<>();
decoded.put("source", toServiceMap(src));
decoded.put("destination", toServiceMap(dst));
return new Decoded(decoded, null, rel.getSourceId());
}

private static Decoded decodeLevel2RelationGeneric(final String entityId) {
final IDManager.ServiceInstanceID.ServiceInstanceRelationDefine rel =
IDManager.ServiceInstanceID.analysisRelationId(entityId);
final IDManager.ServiceInstanceID.InstanceIDDefinition srcInst =
IDManager.ServiceInstanceID.analysisId(rel.getSourceId());
final IDManager.ServiceInstanceID.InstanceIDDefinition dstInst =
IDManager.ServiceInstanceID.analysisId(rel.getDestId());
final Map<String, Object> decoded = new LinkedHashMap<>();
decoded.put("source", toLevel2Map(IDManager.ServiceID.analysisId(srcInst.getServiceId()), srcInst.getName()));
decoded.put("destination", toLevel2Map(IDManager.ServiceID.analysisId(dstInst.getServiceId()), dstInst.getName()));
return new Decoded(decoded, null, srcInst.getServiceId());
}

private static Decoded decodeEndpointRelationGeneric(final String entityId) {
final IDManager.EndpointID.EndpointRelationDefine rel =
IDManager.EndpointID.analysisRelationId(entityId);
final IDManager.ServiceID.ServiceIDDefinition srcSvc =
IDManager.ServiceID.analysisId(rel.getSourceServiceId());
final IDManager.ServiceID.ServiceIDDefinition dstSvc =
IDManager.ServiceID.analysisId(rel.getDestServiceId());
final Map<String, Object> decoded = new LinkedHashMap<>();
decoded.put("source", toLevel2Map(srcSvc, rel.getSource()));
decoded.put("destination", toLevel2Map(dstSvc, rel.getDest()));
return new Decoded(decoded, null, rel.getSourceServiceId());
}

private static Map<String, Object> toLevel2Map(final IDManager.ServiceID.ServiceIDDefinition svc,
final String leafName) {
final Map<String, Object> map = toServiceMap(svc);
map.put("name", leafName);
return map;
}

private static Decoded decodeService(final String entityId) {
final IDManager.ServiceID.ServiceIDDefinition def = IDManager.ServiceID.analysisId(entityId);
final Map<String, Object> decoded = new LinkedHashMap<>();
Expand Down
Loading
Loading