From 116ba05293e4ec5291e84c73e43de843dd2b68a5 Mon Sep 17 00:00:00 2001 From: Wu Sheng Date: Tue, 23 Jun 2026 22:08:30 +0800 Subject: [PATCH 1/3] Extend GET /inspect/entities to inspect foreign metrics persisted by any OAP MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The inspect admin API could only enumerate entities for metrics defined on the queried OAP. Extend it so a metric persisted by ANY OAP — one this node never loaded the OAL/MAL/runtime-rule for — can be inspected when the caller supplies the metric's valueColumn + valueType. The backend resolves the physical index/table/group from its own running config (no DB schema read): ES uses the merged metrics-all index + metric_table discriminator, JDBC probes the node's function tables by the table_name discriminator, BanyanDB synthesizes a read-only measure schema. entity_id is decoded structurally (scope-free); locally-defined metrics keep exact field names, scope and mqeEntity. Adds a two-OAP inspect e2e (aware + foreign) across BanyanDB/ES/PostgreSQL driven by 'swctl admin inspect entities --value-column/--value-type' (apache/skywalking-cli#230; SW_CTL_COMMIT bumped to the cli commit that includes the flags). --- .github/workflows/skywalking.yaml | 8 ++ docs/en/changes/changes.md | 1 + docs/en/setup/backend/admin-api/inspect.md | 87 ++++++++++-- .../admin/inspect/decoder/EntityDecoder.java | 97 +++++++++++++ .../inspect/handler/InspectRestHandler.java | 128 +++++++++++++++++- .../inspect/decoder/EntityDecoderTest.java | 97 +++++++++++++ .../core/storage/query/IMetricsQueryDAO.java | 41 ++++-- .../plugin/banyandb/MetadataRegistry.java | 52 +++++++ .../measure/BanyanDBMetricsQueryDAO.java | 20 ++- .../elasticsearch/base/IndexController.java | 9 +- .../query/MetricsQueryEsDAO.java | 30 +++- .../storage/plugin/jdbc/TableMetaInfo.java | 10 ++ .../plugin/jdbc/common/TableHelper.java | 47 +++++++ .../jdbc/common/dao/JDBCMetricsQueryDAO.java | 24 +++- test/e2e-v2/cases/inspect/README.md | 30 ++++ .../cases/inspect/banyandb/docker-compose.yml | 79 +++++++++++ test/e2e-v2/cases/inspect/banyandb/e2e.yaml | 59 ++++++++ .../inspect/elasticsearch/docker-compose.yml | 92 +++++++++++++ .../cases/inspect/elasticsearch/e2e.yaml | 59 ++++++++ test/e2e-v2/cases/inspect/expected/ok.txt | 1 + .../cases/inspect/inspect-foreign-flow.sh | 106 +++++++++++++++ .../cases/inspect/otel-rules/inspect-e2e.yaml | 25 ++++ .../inspect/postgresql/docker-compose.yml | 92 +++++++++++++ test/e2e-v2/cases/inspect/postgresql/e2e.yaml | 59 ++++++++ .../inspect-entities-unknown-metric.yml | 7 +- test/e2e-v2/cases/storage/storage-cases.yaml | 9 +- test/e2e-v2/script/env | 2 +- 27 files changed, 1231 insertions(+), 40 deletions(-) create mode 100644 test/e2e-v2/cases/inspect/README.md create mode 100644 test/e2e-v2/cases/inspect/banyandb/docker-compose.yml create mode 100644 test/e2e-v2/cases/inspect/banyandb/e2e.yaml create mode 100644 test/e2e-v2/cases/inspect/elasticsearch/docker-compose.yml create mode 100644 test/e2e-v2/cases/inspect/elasticsearch/e2e.yaml create mode 100644 test/e2e-v2/cases/inspect/expected/ok.txt create mode 100755 test/e2e-v2/cases/inspect/inspect-foreign-flow.sh create mode 100644 test/e2e-v2/cases/inspect/otel-rules/inspect-e2e.yaml create mode 100644 test/e2e-v2/cases/inspect/postgresql/docker-compose.yml create mode 100644 test/e2e-v2/cases/inspect/postgresql/e2e.yaml diff --git a/.github/workflows/skywalking.yaml b/.github/workflows/skywalking.yaml index 221de0f4aec8..ae0e30d2b562 100644 --- a/.github/workflows/skywalking.yaml +++ b/.github/workflows/skywalking.yaml @@ -390,6 +390,14 @@ jobs: config: test/e2e-v2/cases/storage/es/es-sharding/e2e.yaml env: ES_VERSION=8.18.8 + - name: Inspect API BanyanDB + config: test/e2e-v2/cases/inspect/banyandb/e2e.yaml + - name: Inspect API PostgreSQL + config: test/e2e-v2/cases/inspect/postgresql/e2e.yaml + - name: Inspect API Elasticsearch 8.18.8 + config: test/e2e-v2/cases/inspect/elasticsearch/e2e.yaml + env: ES_VERSION=8.18.8 + - name: Runtime Rule MAL Storage BanyanDB config: test/e2e-v2/cases/runtime-rule/mal-storage/banyandb/e2e.yaml - name: Runtime Rule MAL Storage PostgreSQL diff --git a/docs/en/changes/changes.md b/docs/en/changes/changes.md index b046523837e9..fde18400e254 100644 --- a/docs/en/changes/changes.md +++ b/docs/en/changes/changes.md @@ -2,6 +2,7 @@ #### Project +* Extend the `GET /inspect/entities` admin API to inspect a metric persisted by **any** OAP, even one this node does not define locally. When the metric is unknown to the local registry, the caller supplies `valueColumn` + `valueType` and the storage backend resolves the physical index/table/group from its own running config (no DB schema/table-metadata read): ES uses the merged `metrics-all` index + `metric_table` discriminator, JDBC probes the node's function tables by the `table_name` discriminator, and BanyanDB synthesizes a read-only measure schema. Scope is no longer required — the `entity_id` is decoded structurally (service / 2nd-level / relations) with a generic `name` leaf. Locally-defined metrics keep the exact field names, scope, and `mqeEntity` as before. * Remove the always-on alarm-to-event conversion (`EventHookCallback`). A triggered alarm is no longer synthesized into the events pipeline as an `Alarm`/`AlarmRecovery` event; events now originate only from real event sources (agents, SkyWalking CLI, Kubernetes Event Exporter). Alarms remain available through the alarm store (`getAlarm`/`queryAlarms`) and the configured alarm hooks. This drops a documented "Known Event" and removes 1-2 synthetic event records per alarm fire. * **New `queryAlarms` GraphQL query — entity / layer / rule filters for alarms.** Adds a comprehensive alarm query API alongside the legacy `getAlarm`. The new diff --git a/docs/en/setup/backend/admin-api/inspect.md b/docs/en/setup/backend/admin-api/inspect.md index 9163d57e4729..c43dea12c958 100644 --- a/docs/en/setup/backend/admin-api/inspect.md +++ b/docs/en/setup/backend/admin-api/inspect.md @@ -6,9 +6,12 @@ let operators answer two questions without writing exploratory MQE: 1. *Which metrics has OAP registered, and at what downsampling?* 2. *For metric `X` in time range `T`, which entities currently hold values?* -The output of (2) carries a ready-to-paste `mqeEntity` payload, so the -follow-up MQE call against the public GraphQL `execExpression` mutation is -copy-paste from the inspect response. +For a locally-defined metric, the output of (2) carries a ready-to-paste +`mqeEntity` payload, so the follow-up MQE call against the public GraphQL +`execExpression` mutation is copy-paste from the inspect response. A metric +persisted by **another OAP** that this node does not define can also be +inspected with caller-supplied metadata (see +[Foreign metrics](#foreign-metrics-not-defined-on-this-oap)). ## Enabling @@ -83,8 +86,12 @@ curl 'http://oap-admin:17128/inspect/metrics?regex=service_cpm' ### `GET /inspect/entities` -For a metric + time range + step, returns the entities holding values, each -decoded into a human-readable shape and an MQE-ready `mqeEntity` payload. +For a metric + time range + step, returns the entities holding values. For a +metric this OAP defines locally, each row is decoded into a human-readable +shape and an MQE-ready `mqeEntity` payload. A metric persisted by **another +OAP** that this node does not define can also be inspected by additionally +supplying `valueColumn` + `valueType` — see +[Foreign metrics](#foreign-metrics-not-defined-on-this-oap). Restricted to `REGULAR_VALUE` and `LABELED_VALUE` metrics. The non-MQE metric types (`HEATMAP` / `SAMPLED_RECORD`) and the out-of-scope scopes @@ -94,11 +101,13 @@ Query parameters: | Name | Required | Description | |------|----------|-------------| -| `metric` | yes | Metric name. Must resolve in `ValueColumnMetadata`. | +| `metric` | yes | Metric name. If unknown to this OAP's local registry, also supply `valueColumn` + `valueType` (see [Foreign metrics](#foreign-metrics-not-defined-on-this-oap)). | | `start` | yes | Time-range start. Same format as MQE `Duration.start`: `yyyy-MM-dd` (DAY), `yyyy-MM-dd HH` (HOUR), `yyyy-MM-dd HHmm` (MINUTE), `yyyy-MM-dd HHmmss` (SECOND). Note `HHmm` is no-separator — use `1230`, not `12:30`. | | `end` | yes | Time-range end. Format mirrors `start`. | -| `step` | yes | One of `MINUTE` / `HOUR` / `DAY`. Must be one of the metric's `downsamplings`. | +| `step` | yes | One of `MINUTE` / `HOUR` / `DAY`. For a locally-defined metric, must be one of the metric's `downsamplings`; for a foreign metric the requested step is trusted as-is. | | `limit` | no | Server-side cap. Default 300, hard-capped at 300. | +| `valueColumn` | conditional | **Required when `metric` is not defined on this OAP.** The metric's value column (post-override physical name, e.g. `value`, `value_`, `double_value`, `datatable_value`, `dataset`). Ignored for a locally-defined metric. | +| `valueType` | conditional | **Required when `metric` is not defined on this OAP.** One of `LONG` / `INT` / `DOUBLE` / `LABELED`. Ignored for a locally-defined metric. | The `limit` is applied as `LIMIT N` at the storage layer — it bounds the total rows scanned (300 ≈ 10 buckets × 30 entities), not 300 distinct @@ -159,6 +168,65 @@ curl 'http://oap-admin:17128/inspect/entities?metric=service_cpm&start=2026-05-1 } ``` +### Foreign metrics (not defined on this OAP) + +A metric persisted by **another OAP** — a different OAL/MAL/runtime-rule set — +is absent from this node's local registry, so its value column, type, and scope +cannot be recovered from the metric name alone (there is no OAL/MAL text here to +read). Supply `valueColumn` + `valueType` on the request and the backend resolves +the physical index/table/group from its own running configuration (the +deterministic metric → storage mapping that merging has used for years), with +**no storage schema / table-metadata read**: + +* **ES** — the merged `metrics-all` index, filtered by the `metric_table` + discriminator. Not supported under `logicSharding=true`, where the physical + index is derived from the metric's stream class (returns `500`). +* **JDBC** — probes the node's aggregation-function metric tables + (`metrics_` / `meter_`) by the `table_name` discriminator. +* **BanyanDB** — synthesizes a read-only measure schema from the deterministic + measure/group mapping. + +Because the scope is unknown, the response degrades gracefully: + +* `scope` is `null` (the structural kind is per-row in `decoded`). +* `entity_id` is decoded **structurally**: a single entity yields `serviceName` + (plus a generic `name` leaf for a 2nd-level instance/endpoint — the two are + byte-identical and not distinguishable without the scope); a relation yields a + `source` / `destination` pair. +* **No `mqeEntity`** is produced — MQE needs the exact scope, and a foreign + metric is not MQE-queryable on this node anyway. + +Existence is decided by the data probe itself, so an **empty result means "no +rows in range", not "metric absent"**. Nothing is validated against metadata up +front: a wrong `valueColumn` / `valueType` surfaces as a storage error (`500`) +or an empty result. + +Tip: query the writing OAP's own `/inspect/metrics?regex=` to read the +exact `valueColumnName`, then pass that as `valueColumn`. + +Example — `meter_custom_x`, defined on another OAP, inspected here: + +```bash +curl 'http://oap-admin:17128/inspect/entities?metric=meter_custom_x&valueColumn=value&valueType=LONG&start=2026-05-10%201230&end=2026-05-10%201240&step=MINUTE' +``` + +```json +{ + "metric": "meter_custom_x", + "scope": null, + "step": "MINUTE", + "start": "2026-05-10 1230", + "end": "2026-05-10 1240", + "rows": [ + { + "entityId": "cGF5bWVudA==.1", + "decoded": { "serviceName": "payment", "isReal": true }, + "layer": "GENERAL" + } + ] +} +``` + ## Discovering the OAP REST URL for the MQE follow-up To keep the surface minimal, the inspect API does not introduce a separate @@ -173,8 +241,9 @@ session start is enough. | Status | Body | Cause | |--------|------|-------| -| 400 | `{"error":"unknown metric: foo"}` | Metric not in `ValueColumnMetadata`. | -| 400 | `{"error":"step DAY not supported by metric foo (MINUTE,HOUR)"}` | Metric not materialised at the requested downsampling. | +| 400 | `{"error":"metric unknown locally: foo — provide valueColumn and valueType to inspect a metric persisted by another OAP"}` | Metric not defined on this OAP, and the `valueColumn` / `valueType` pair was not supplied. See [Foreign metrics](#foreign-metrics-not-defined-on-this-oap). | +| 400 | `{"error":"valueType must be one of LONG / INT / DOUBLE / LABELED (got X)"}` | Invalid `valueType` on the foreign-metric path. | +| 400 | `{"error":"step DAY not supported by metric foo (MINUTE,HOUR)"}` | Metric not materialised at the requested downsampling (locally-defined metric only). | | 400 | `{"error":"metric type HEATMAP is not MQE-queryable; /inspect/entities only accepts REGULAR_VALUE and LABELED_VALUE"}` | Metric is `HEATMAP` (`HISTOGRAM` `dataType`). | | 400 | `{"error":"metric type SAMPLED_RECORD is out of scope for /inspect/entities"}` | Metric is `SAMPLED_RECORD`. | | 400 | `{"error":"process scope is out of scope"}` | Scope is `Process` / `ProcessRelation`. | diff --git a/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoder.java b/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoder.java index db8ea609e983..01d0b1abf196 100644 --- a/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoder.java +++ b/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoder.java @@ -21,6 +21,7 @@ import java.util.LinkedHashMap; import java.util.Map; import org.apache.skywalking.oap.server.admin.inspect.response.MqeEntity; +import org.apache.skywalking.oap.server.core.Const; import org.apache.skywalking.oap.server.core.analysis.IDManager; import org.apache.skywalking.oap.server.core.query.enumeration.Scope; @@ -92,6 +93,102 @@ public static Decoded decode(final Scope scope, final String entityId) { } } + /** + * Structural, scope-free decode for a metric this OAP does not define (no {@link Scope} + * available). The stored {@code entity_id} self-encodes the names with standard base64 plus + * the {@code .} / {@code _} / {@code -} delimiters (none of which appear in base64 output), so + * the entity kind is recoverable from delimiter structure alone: + *
    + *
  • no {@code -}, no {@code _} → service
  • + *
  • no {@code -}, one {@code _} → 2nd-level entity (service instance OR endpoint — + * byte-identical encoding, emitted as a generic {@code name})
  • + *
  • one {@code -} (2 parts) → service relation, or 2nd-level relation when each side has a + * {@code _}
  • + *
  • three {@code -} (4 parts) → endpoint relation
  • + *
+ * The only thing not recoverable is the instance-vs-endpoint label, so the leaf is reported as + * {@code name} and no {@link MqeEntity} is produced — MQE re-query needs the exact scope, and a + * foreign metric is not MQE-queryable on this node anyway. + */ + public static Decoded decodeUnknownScope(final String entityId) { + final String[] relationParts = entityId.split(Const.RELATION_ID_PARSER_SPLIT); + switch (relationParts.length) { + case 1: + return entityId.contains(Const.ID_CONNECTOR) + ? decodeLevel2Generic(entityId) + : decodeServiceGeneric(entityId); + case 2: + return relationParts[0].contains(Const.ID_CONNECTOR) + ? decodeLevel2RelationGeneric(entityId) + : decodeServiceRelationGeneric(entityId); + case 4: + return decodeEndpointRelationGeneric(entityId); + default: + throw new IllegalArgumentException( + "cannot structurally decode entity_id without scope: " + entityId); + } + } + + private static Decoded decodeServiceGeneric(final String entityId) { + final IDManager.ServiceID.ServiceIDDefinition def = IDManager.ServiceID.analysisId(entityId); + final Map decoded = new LinkedHashMap<>(); + decoded.put("serviceName", def.getName()); + decoded.put("isReal", def.isReal()); + return new Decoded(decoded, null, entityId); + } + + private static Decoded decodeLevel2Generic(final String entityId) { + final IDManager.ServiceInstanceID.InstanceIDDefinition def = + IDManager.ServiceInstanceID.analysisId(entityId); + final IDManager.ServiceID.ServiceIDDefinition svc = + IDManager.ServiceID.analysisId(def.getServiceId()); + return new Decoded(toLevel2Map(svc, def.getName()), null, def.getServiceId()); + } + + private static Decoded decodeServiceRelationGeneric(final String entityId) { + final IDManager.ServiceID.ServiceRelationDefine rel = + IDManager.ServiceID.analysisRelationId(entityId); + final IDManager.ServiceID.ServiceIDDefinition src = IDManager.ServiceID.analysisId(rel.getSourceId()); + final IDManager.ServiceID.ServiceIDDefinition dst = IDManager.ServiceID.analysisId(rel.getDestId()); + final Map decoded = new LinkedHashMap<>(); + decoded.put("source", toServiceMap(src)); + decoded.put("destination", toServiceMap(dst)); + return new Decoded(decoded, null, rel.getSourceId()); + } + + private static Decoded decodeLevel2RelationGeneric(final String entityId) { + final IDManager.ServiceInstanceID.ServiceInstanceRelationDefine rel = + IDManager.ServiceInstanceID.analysisRelationId(entityId); + final IDManager.ServiceInstanceID.InstanceIDDefinition srcInst = + IDManager.ServiceInstanceID.analysisId(rel.getSourceId()); + final IDManager.ServiceInstanceID.InstanceIDDefinition dstInst = + IDManager.ServiceInstanceID.analysisId(rel.getDestId()); + final Map decoded = new LinkedHashMap<>(); + decoded.put("source", toLevel2Map(IDManager.ServiceID.analysisId(srcInst.getServiceId()), srcInst.getName())); + decoded.put("destination", toLevel2Map(IDManager.ServiceID.analysisId(dstInst.getServiceId()), dstInst.getName())); + return new Decoded(decoded, null, srcInst.getServiceId()); + } + + private static Decoded decodeEndpointRelationGeneric(final String entityId) { + final IDManager.EndpointID.EndpointRelationDefine rel = + IDManager.EndpointID.analysisRelationId(entityId); + final IDManager.ServiceID.ServiceIDDefinition srcSvc = + IDManager.ServiceID.analysisId(rel.getSourceServiceId()); + final IDManager.ServiceID.ServiceIDDefinition dstSvc = + IDManager.ServiceID.analysisId(rel.getDestServiceId()); + final Map decoded = new LinkedHashMap<>(); + decoded.put("source", toLevel2Map(srcSvc, rel.getSource())); + decoded.put("destination", toLevel2Map(dstSvc, rel.getDest())); + return new Decoded(decoded, null, rel.getSourceServiceId()); + } + + private static Map toLevel2Map(final IDManager.ServiceID.ServiceIDDefinition svc, + final String leafName) { + final Map map = toServiceMap(svc); + map.put("name", leafName); + return map; + } + private static Decoded decodeService(final String entityId) { final IDManager.ServiceID.ServiceIDDefinition def = IDManager.ServiceID.analysisId(entityId); final Map decoded = new LinkedHashMap<>(); diff --git a/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java b/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java index 48f29c37370c..e3a5fb1b0b0a 100644 --- a/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java +++ b/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java @@ -74,6 +74,9 @@ public class InspectRestHandler { private static final int LIMIT_DEFAULT = 300; private static final int LIMIT_MAX = 300; + /** Value types a caller may declare for a foreign (locally-undefined) metric. */ + private static final Set ACCEPTED_FOREIGN_VALUE_TYPES = + Set.of("LONG", "INT", "DOUBLE", "LABELED"); private final ModuleManager moduleManager; @@ -161,17 +164,50 @@ public HttpResponse listMetrics(@Param("regex") final Optional regex, return HttpResponse.ofJson(MediaType.JSON_UTF_8, new MetricsResponse(rows)); } + /** + * Enumerate the entities holding values for a metric in a time range. + * + *

For a metric defined on this OAP, only {@code metric} + time params are needed; metadata + * is read from the local registry and the response carries exact field names, scope, and a + * re-queryable {@code mqeEntity}. + * + *

For a metric persisted by ANOTHER OAP that this node does not define (no local registry + * entry, no OAL/MAL text to recover it from), the caller MUST supply the metric's storage + * metadata, which cannot be inferred from the name: + * + * @param valueColumn The metric's value column. Required when the metric is unknown locally. + * A property of the metric's aggregation FUNCTION — one of the built-in + * value columns: {@code value} (common scalar), {@code double_value}, + * {@code int_value}, {@code percentage}, {@code datatable_value} (labeled), + * {@code dataset} (histogram). On MySQL / PostgreSQL pass the + * reserved-word-overridden physical name ({@code value_} etc.). + * @param valueType How to read/decode the value. Required when the metric is unknown locally. + * Accepted: {@code LONG} / {@code INT} / {@code DOUBLE} (scalar) or + * {@code LABELED} (DataTable). HISTOGRAM/heatmap and SAMPLED_RECORD are out + * of scope for this endpoint. + */ @Get("/inspect/entities") public HttpResponse listEntities(@Param("metric") final String metric, @Param("start") final String start, @Param("end") final String end, @Param("step") final String stepStr, - @Param("limit") final Optional limitOpt) { + @Param("limit") final Optional limitOpt, + @Param("valueColumn") final Optional valueColumnOpt, + @Param("valueType") final Optional valueTypeOpt) { // Resolve metadata. final Optional vcOpt = ValueColumnMetadata.INSTANCE.readValueColumnDefinition(metric); if (vcOpt.isEmpty()) { - return error(HttpStatus.BAD_REQUEST, "unknown metric: " + metric); + // Foreign metric: not defined on this OAP. There is no OAL/MAL text or local model to + // recover its value column / type / scope from, so the caller must supply them. Without + // both, fall back to the original "unknown metric" rejection. + if (valueColumnOpt.isEmpty() || valueTypeOpt.isEmpty()) { + return error(HttpStatus.BAD_REQUEST, + "metric unknown locally: " + metric + " — provide valueColumn and valueType to " + + "inspect a metric persisted by another OAP"); + } + return listForeignEntities(metric, valueColumnOpt.get(), valueTypeOpt.get(), + start, end, stepStr, limitOpt); } final ValueColumnMetadata.ValueColumn vc = vcOpt.get(); @@ -247,7 +283,7 @@ public HttpResponse listEntities(@Param("metric") final String metric, final List entityIds; try { entityIds = metricsQueryDAO() - .listEntityIdsInRange(metric, vc.getValueCName(), duration, limit); + .listEntityIdsInRange(metric, vc.getValueCName(), null, duration, limit); } catch (IOException e) { log.warn("listEntityIdsInRange failed for metric={} step={}", metric, step, e); return error(HttpStatus.INTERNAL_SERVER_ERROR, e.getMessage()); @@ -278,6 +314,92 @@ public HttpResponse listEntities(@Param("metric") final String metric, return HttpResponse.ofJson(MediaType.JSON_UTF_8, body); } + /** + * Foreign-metric path: the metric is not defined on this OAP, so the caller supplied + * {@code valueColumn} + {@code valueType}. Nothing is resolved from the local registry — the + * storage DAO derives the physical target from its own running config — and each entity_id is + * decoded structurally (scope-free), emitting a generic {@code name} leaf and no + * {@code mqeEntity}. Errors and empty results flow straight back to the caller; an empty result + * means "no rows in range", not a reliable "metric absent". + */ + private HttpResponse listForeignEntities(final String metric, + final String valueColumn, + final String valueType, + final String start, + final String end, + final String stepStr, + final Optional limitOpt) { + final String type = valueType.toUpperCase(); + if (!ACCEPTED_FOREIGN_VALUE_TYPES.contains(type)) { + return error(HttpStatus.BAD_REQUEST, + "valueType must be one of LONG / INT / DOUBLE / LABELED (got " + valueType + ")"); + } + + final Step step; + try { + step = Step.valueOf(stepStr.toUpperCase()); + } catch (Exception e) { + return error(HttpStatus.BAD_REQUEST, + "step must be one of MINUTE / HOUR / DAY (got " + stepStr + ")"); + } + if (step == Step.SECOND) { + return error(HttpStatus.BAD_REQUEST, + "step must be one of MINUTE / HOUR / DAY (got SECOND)"); + } + + final int limit = limitOpt.orElse(LIMIT_DEFAULT); + if (limit < 1 || limit > LIMIT_MAX) { + return error(HttpStatus.BAD_REQUEST, "limit must be between 1 and " + LIMIT_MAX); + } + + final Duration duration = new Duration(); + duration.setStart(start); + duration.setEnd(end); + duration.setStep(step); + try { + duration.getStartTimeBucket(); + duration.getEndTimeBucket(); + } catch (IllegalArgumentException | UnexpectedException e) { + return error(HttpStatus.BAD_REQUEST, + "start / end must follow the step's date format (DAY: yyyy-MM-dd, HOUR: " + + "yyyy-MM-dd HH, MINUTE: yyyy-MM-dd HHmm): " + e.getMessage()); + } + + final List entityIds; + try { + entityIds = metricsQueryDAO().listEntityIdsInRange(metric, valueColumn, type, duration, limit); + } catch (Exception e) { + // Optimistic read: surface the storage error directly. A wrong valueColumn/valueType, an + // unsupported storage mode (e.g. ES logicSharding), or a missing table lands here. + log.warn("foreign-metric listEntityIdsInRange failed for metric={} step={}", metric, step, e); + return error(HttpStatus.INTERNAL_SERVER_ERROR, e.getMessage()); + } + + final List rows = new ArrayList<>(); + for (final String entityId : entityIds) { + final EntityDecoder.Decoded decoded; + try { + decoded = EntityDecoder.decodeUnknownScope(entityId); + } catch (Exception e) { + log.warn("Failed to structurally decode entity_id={}", entityId, e); + continue; + } + final List layers = lookupLayers(decoded.serviceIdForLayer); + if (layers.isEmpty()) { + rows.add(new EntityRow(entityId, decoded.decodedFields, null, decoded.mqeEntity)); + } else { + for (final String layer : layers) { + rows.add(new EntityRow(entityId, decoded.decodedFields, layer, decoded.mqeEntity)); + } + } + } + + // scope is null: a foreign metric's structural kind is per-row in `decoded`, and a single + // metric's entities all share one structure anyway. + final EntitiesResponse body = new EntitiesResponse(metric, null, step.name(), start, end, rows); + return HttpResponse.ofJson(MediaType.JSON_UTF_8, body); + } + /** * Mirror of the {@code /inspect/entities} type acceptance set. Kept in one place so * the {@code mqeQueryable=true} filter on {@code /inspect/metrics} and the actual diff --git a/oap-server/server-admin/inspect/src/test/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoderTest.java b/oap-server/server-admin/inspect/src/test/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoderTest.java index 890c8bafbaf4..84486f1e174e 100644 --- a/oap-server/server-admin/inspect/src/test/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoderTest.java +++ b/oap-server/server-admin/inspect/src/test/java/org/apache/skywalking/oap/server/admin/inspect/decoder/EntityDecoderTest.java @@ -186,4 +186,101 @@ void serviceMqeEntityOmitsRelationFields() { assertNull(d.mqeEntity.getEndpointName()); assertTrue(d.mqeEntity.getNormal()); } + + // ==================== scope-free (foreign metric) decode ==================== + + @Test + void unknownScopeService() { + final String id = IDManager.ServiceID.buildId("payment", true); + final EntityDecoder.Decoded d = EntityDecoder.decodeUnknownScope(id); + assertEquals("payment", d.decodedFields.get("serviceName")); + assertEquals(Boolean.TRUE, d.decodedFields.get("isReal")); + assertNull(d.mqeEntity); + assertEquals(id, d.serviceIdForLayer); + } + + @Test + void unknownScopeServiceConjectured() { + final String id = IDManager.ServiceID.buildId("mysql", false); + final EntityDecoder.Decoded d = EntityDecoder.decodeUnknownScope(id); + assertEquals("mysql", d.decodedFields.get("serviceName")); + assertEquals(Boolean.FALSE, d.decodedFields.get("isReal")); + } + + @Test + void unknownScopeLevel2InstanceAndEndpointDecodeIdentically() { + // Instance and endpoint encode byte-identically (serviceId + "_" + base64(name)), so the + // scope-free decode yields the same shape with a generic "name" leaf for both. + final String svcId = IDManager.ServiceID.buildId("payment", true); + final String instId = IDManager.ServiceInstanceID.buildId(svcId, "pod-01"); + final String epId = IDManager.EndpointID.buildId(svcId, "POST:/charge"); + + final EntityDecoder.Decoded inst = EntityDecoder.decodeUnknownScope(instId); + assertEquals("payment", inst.decodedFields.get("serviceName")); + assertEquals("pod-01", inst.decodedFields.get("name")); + assertNull(inst.decodedFields.get("serviceInstanceName")); + assertNull(inst.mqeEntity); + assertEquals(svcId, inst.serviceIdForLayer); + + final EntityDecoder.Decoded ep = EntityDecoder.decodeUnknownScope(epId); + assertEquals("payment", ep.decodedFields.get("serviceName")); + assertEquals("POST:/charge", ep.decodedFields.get("name")); + assertNull(ep.decodedFields.get("endpointName")); + } + + @SuppressWarnings("unchecked") + @Test + void unknownScopeServiceRelation() { + final String src = IDManager.ServiceID.buildId("checkout", true); + final String dst = IDManager.ServiceID.buildId("payment", true); + final String id = IDManager.ServiceID.buildRelationId( + new IDManager.ServiceID.ServiceRelationDefine(src, dst)); + + final EntityDecoder.Decoded d = EntityDecoder.decodeUnknownScope(id); + final Map source = (Map) d.decodedFields.get("source"); + final Map dest = (Map) d.decodedFields.get("destination"); + assertEquals("checkout", source.get("serviceName")); + assertEquals("payment", dest.get("serviceName")); + assertNull(source.get("name")); + assertNull(d.mqeEntity); + assertEquals(src, d.serviceIdForLayer); + } + + @SuppressWarnings("unchecked") + @Test + void unknownScopeLevel2Relation() { + final String srcSvc = IDManager.ServiceID.buildId("consumer", true); + final String dstSvc = IDManager.ServiceID.buildId("provider", true); + final String srcInst = IDManager.ServiceInstanceID.buildId(srcSvc, "pod-a"); + final String dstInst = IDManager.ServiceInstanceID.buildId(dstSvc, "pod-b"); + final String id = IDManager.ServiceInstanceID.buildRelationId( + new IDManager.ServiceInstanceID.ServiceInstanceRelationDefine(srcInst, dstInst)); + + final EntityDecoder.Decoded d = EntityDecoder.decodeUnknownScope(id); + final Map source = (Map) d.decodedFields.get("source"); + final Map dest = (Map) d.decodedFields.get("destination"); + assertEquals("consumer", source.get("serviceName")); + assertEquals("pod-a", source.get("name")); + assertEquals("provider", dest.get("serviceName")); + assertEquals("pod-b", dest.get("name")); + assertEquals(srcSvc, d.serviceIdForLayer); + } + + @SuppressWarnings("unchecked") + @Test + void unknownScopeEndpointRelation() { + final String srcSvc = IDManager.ServiceID.buildId("consumer", true); + final String dstSvc = IDManager.ServiceID.buildId("provider", true); + final String id = IDManager.EndpointID.buildRelationId( + new IDManager.EndpointID.EndpointRelationDefine(srcSvc, "/order", dstSvc, "/charge")); + + final EntityDecoder.Decoded d = EntityDecoder.decodeUnknownScope(id); + final Map source = (Map) d.decodedFields.get("source"); + final Map dest = (Map) d.decodedFields.get("destination"); + assertEquals("consumer", source.get("serviceName")); + assertEquals("/order", source.get("name")); + assertEquals("provider", dest.get("serviceName")); + assertEquals("/charge", dest.get("name")); + assertEquals(srcSvc, d.serviceIdForLayer); + } } diff --git a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/query/IMetricsQueryDAO.java b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/query/IMetricsQueryDAO.java index 6b5b49426ae4..3b0061e45bcf 100644 --- a/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/query/IMetricsQueryDAO.java +++ b/oap-server/server-core/src/main/java/org/apache/skywalking/oap/server/core/storage/query/IMetricsQueryDAO.java @@ -118,26 +118,51 @@ List readLabeledMetricsValuesWithoutEntity(String metricName, /** * List distinct {@code entity_id}s that have at least one row for the given metric in the * given time range, capped at {@code limit}. Used by the {@code /inspect/entities} - * admin-server endpoint to enumerate the entities currently emitting values for a metric — - * feeds the decoded {@code mqeEntity} payload the inspect API hands back to operators so - * they can follow up with a public-GraphQL MQE query. + * admin-server endpoint to enumerate the entities currently emitting values for a metric. * *

Order: most recent timestamp first within the range so callers see live entities ahead * of stale ones. Backends dedup by {@code entity_id} before returning. The {@code limit} * argument is a server-side cap on the rows scanned, not a guarantee on distinct entities * (300 rows ≈ 10 buckets × 30 entities). * + *

Handles two cases through one path, mirroring the single {@code /inspect/entities} + * endpoint: + *

    + *
  • Locally-defined metric — the model / {@code ValueColumnMetadata} entry exists, + * so the backend resolves the physical index/table/group from its registry as before and + * the {@code valueType} hint is unused.
  • + *
  • Foreign metric — persisted by another OAP whose OAL/MAL/runtime-rule set this + * node never loaded, so there is no local model. The backend resolves the physical target + * from its OWN running configuration (the deterministic metric → storage mapping that + * merging has used for years) WITHOUT reading any storage schema/table metadata, using + * the caller-supplied {@code valueColumnName} + {@code valueType}. Existence is decided by + * the data probe itself (the merged-table discriminator {@code metric_table} / + * {@code table_name} on ES / JDBC, the synthesized measure on BanyanDB), so an empty + * result means "no rows in range", never a reliable "metric absent".
  • + *
+ * *

Abstract on purpose — any 3rd party storage backend that implements - * {@code IMetricsQueryDAO} after 10.5.0 MUST provide this override. A default - * (empty list or thrown exception) would let a missing override slip through - * compilation and surface as a runtime "no entities" or 500 the first time the - * inspect API hit that backend; the breaking-at-compile signal is the safer - * contract for the inspect storage path. + * {@code IMetricsQueryDAO} MUST provide this override. A default (empty list or thrown + * exception) would let a missing override slip through compilation and surface as a runtime + * "no entities" or 500 the first time the inspect API hit that backend; the breaking-at-compile + * signal is the safer contract for the inspect storage path. * + * @param metricName metric (model) name; also the merged-table discriminator value. + * @param valueColumnName the metric's value column (post-override physical name). Required for + * the foreign-metric path (BanyanDB projects/defines the field with it); + * ES / JDBC entity enumeration is value-column-agnostic. + * @param valueType value data type for a foreign metric — one of {@code LONG} / + * {@code INT} / {@code DOUBLE} / {@code LABELED}; drives BanyanDB + * field-type synthesis. {@code null} for a locally-defined metric (the + * backend reads the type from its local model). + * @param duration query time range + step. + * @param limit server-side row cap. + * @return distinct entity ids holding values for the metric in range, most-recent first. * @since 10.5.0 */ List listEntityIdsInRange(String metricName, String valueColumnName, + String valueType, Duration duration, int limit) throws IOException; diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java index a8591e5ee9a3..415e183a88ce 100644 --- a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java +++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/MetadataRegistry.java @@ -425,6 +425,58 @@ public Schema findMetricMetadata(final String modelName, DownSampling downSampli return this.registry.get(SchemaMetadata.formatName(modelName, downSampling)); } + /** + * Synthesize a read-only measure {@link Schema} for a metric this OAP does not define locally + * (persisted by another OAP). No storage schema read is performed: the measure name / group / + * entity_id tag are derived from the deterministic metric → measure mapping, and the value field + * from the caller-supplied {@code valueColumn} / {@code valueType}. The node-global namespace is + * borrowed from any registered measure (the node always has its own metrics). BanyanDB validates + * the projection server-side, so a wrong value column surfaces as a query error. + * + * @return the synthesized schema, or {@code null} if this node has no registered measure to + * borrow the namespace from (foreign-metric inspect is then unavailable here). + */ + public Schema synthesizeForeignMetricSchema(final String metricName, + final Step step, + final String valueColumn, + final String valueType) { + final DownSampling downSampling = deriveFromStep(step); + final String rawGroup; + switch (downSampling) { + case Minute: + rawGroup = BanyanDB.MeasureGroup.METRICS_MINUTE.getName(); + break; + case Hour: + rawGroup = BanyanDB.MeasureGroup.METRICS_HOUR.getName(); + break; + case Day: + rawGroup = BanyanDB.MeasureGroup.METRICS_DAY.getName(); + break; + default: + throw new IllegalArgumentException( + "foreign-metric inspect supports step MINUTE / HOUR / DAY only, got " + step); + } + // namespace is node-global; borrow it from any registered measure. convertGroupName treats a + // null/empty namespace as "no prefix". + String namespace = null; + for (final Schema registered : this.registry.values()) { + if (registered.getMetadata().getKind() == Kind.MEASURE) { + namespace = registered.getMetadata().getNamespace(); + break; + } + } + final SchemaMetadata metadata = new SchemaMetadata( + namespace, rawGroup, metricName, Kind.MEASURE, downSampling, null); + final Class fieldClass = "DOUBLE".equalsIgnoreCase(valueType) ? double.class : long.class; + return Schema.builder() + .metadata(metadata) + .tag(Metrics.ENTITY_ID, metadata.indexFamily()) + .field(valueColumn) + .spec(Metrics.ENTITY_ID, new ColumnSpec(ColumnType.TAG, String.class)) + .spec(valueColumn, new ColumnSpec(ColumnType.FIELD, fieldClass)) + .build(); + } + public Schema findRecordMetadata(final String modelName) { return this.registry.get(modelName); } diff --git a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsQueryDAO.java b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsQueryDAO.java index 1c139df0beec..413914177438 100644 --- a/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsQueryDAO.java +++ b/oap-server/server-storage-plugin/storage-banyandb-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/banyandb/measure/BanyanDBMetricsQueryDAO.java @@ -169,12 +169,26 @@ protected void apply(MeasureQuery query) { @Override public List listEntityIdsInRange(final String metricName, final String valueColumnName, + final String valueType, final Duration duration, final int limit) throws IOException { final boolean isColdStage = duration != null && duration.isColdStage(); - final MetadataRegistry.Schema schema = MetadataRegistry.INSTANCE.findMetricMetadata(metricName, duration.getStep()); - if (schema == null) { - throw new IOException("schema is not registered"); + final MetadataRegistry.Schema schema; + if (valueType != null) { + // Foreign metric: no local schema. Synthesize a read-only measure schema from the + // deterministic name → measure/group mapping plus the caller's value column / type. + schema = MetadataRegistry.INSTANCE.synthesizeForeignMetricSchema( + metricName, duration.getStep(), valueColumnName, valueType); + if (schema == null) { + throw new IOException( + "cannot inspect foreign metric " + metricName + " on BanyanDB: this node has no " + + "registered measure to resolve the namespace/group from"); + } + } else { + schema = MetadataRegistry.INSTANCE.findMetricMetadata(metricName, duration.getStep()); + if (schema == null) { + throw new IOException("schema is not registered"); + } } final MeasureQueryResponse resp = query( isColdStage, schema, diff --git a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/IndexController.java b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/IndexController.java index 29f4211d9d8b..d41ebf755443 100644 --- a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/IndexController.java +++ b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/base/IndexController.java @@ -53,12 +53,19 @@ public enum IndexController { @Getter private boolean enableCustomRouting = false; + /** + * The single physical index every metric model merges into when {@link #logicSharding} is off + * (the default for years). Used at install/write time and by the inspect read path for a + * foreign metric, whose physical index cannot be resolved through the (absent) local model. + */ + public static final String METRICS_LOGIC_TABLE_NAME = "metrics-all"; + public String getTableName(Model model) { if (!model.isTimeSeries()) { return "management"; } if (!logicSharding) { - return model.isMetric() ? "metrics-all" : + return model.isMetric() ? METRICS_LOGIC_TABLE_NAME : (model.isRecord() && !model.isSuperDataset() ? "records-all" : model.getName()); } String aggFuncName = FunctionCategory.uniqueFunctionName(model.getStreamClass()); diff --git a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/query/MetricsQueryEsDAO.java b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/query/MetricsQueryEsDAO.java index 82123c83c109..b4c44e0be2e9 100644 --- a/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/query/MetricsQueryEsDAO.java +++ b/oap-server/server-storage-plugin/storage-elasticsearch-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/elasticsearch/query/MetricsQueryEsDAO.java @@ -18,6 +18,7 @@ package org.apache.skywalking.oap.server.storage.plugin.elasticsearch.query; +import java.io.IOException; import java.util.ArrayList; import java.util.HashMap; import java.util.LinkedHashSet; @@ -197,8 +198,31 @@ public List readLabeledMetricsValuesWithoutEntity(final String me @Override public List listEntityIdsInRange(final String metricName, final String valueColumnName, + final String valueType, final Duration duration, - final int limit) { + final int limit) throws IOException { + // valueType != null signals a foreign metric (not defined on this OAP). The value column + // is unused by ES entity enumeration; only the physical index + discriminator differ. + final boolean foreign = valueType != null; + final String physicalIndex; + final boolean filterByDiscriminator; + if (foreign) { + // Resolve from running config, not the local registry. In the default merged mode + // every metric lives in the single METRICS_LOGIC_TABLE_NAME index and carries the + // metric_table discriminator. Under logicSharding the physical index is derived from + // the (absent) stream class, so it cannot be resolved without the local model. + if (IndexController.INSTANCE.isLogicSharding()) { + throw new IOException( + "inspecting a foreign metric is unsupported under ES logicSharding=true: the " + + "physical index is derived from the metric's stream class, which this OAP " + + "does not have for " + metricName); + } + physicalIndex = IndexController.METRICS_LOGIC_TABLE_NAME; + filterByDiscriminator = true; + } else { + physicalIndex = IndexController.LogicIndicesRegister.getPhysicalTableName(metricName); + filterByDiscriminator = IndexController.LogicIndicesRegister.isMergedTable(metricName); + } final SearchBuilder search = Search.builder().size(limit); // Most-recent-first ordering must be explicit — without sort the hit set is // score / index-internal ordered, so a hot entity that ingested late can be dropped @@ -207,7 +231,7 @@ public List listEntityIdsInRange(final String metricName, final BoolQueryBuilder query = Query.bool().must(Query.range(Metrics.TIME_BUCKET) .lte(duration.getEndTimeBucket()) .gte(duration.getStartTimeBucket())); - if (IndexController.LogicIndicesRegister.isMergedTable(metricName)) { + if (filterByDiscriminator) { query.must(Query.term( IndexController.LogicIndicesRegister.METRIC_TABLE_NAME, metricName @@ -215,7 +239,7 @@ public List listEntityIdsInRange(final String metricName, } search.query(query); final SearchResponse response = getClient().search(new TimeRangeIndexNameGenerator( - IndexController.LogicIndicesRegister.getPhysicalTableName(metricName), + physicalIndex, duration.getStartTimeBucketInSec(), duration.getEndTimeBucketInSec()), search.build()); // Top-N hits across the time range, dedup client-side on entity_id. LinkedHashSet diff --git a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/TableMetaInfo.java b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/TableMetaInfo.java index 559d7cf391a8..339eb8ac4bc9 100644 --- a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/TableMetaInfo.java +++ b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/TableMetaInfo.java @@ -20,6 +20,8 @@ import org.apache.skywalking.oap.server.core.storage.model.Model; +import java.util.Collection; +import java.util.Collections; import java.util.HashMap; import java.util.Map; @@ -39,4 +41,12 @@ public static void addModel(Model model) { public static Model get(String moduleName) { return TABLES.get(moduleName); } + + /** + * All locally-installed models. The inspect foreign-metric probe uses this to enumerate the + * node's metric function tables without a per-metric {@link Model} lookup. + */ + public static Collection getModels() { + return Collections.unmodifiableCollection(TABLES.values()); + } } diff --git a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java index 95575b6da78a..b3c7e358b695 100644 --- a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java +++ b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java @@ -174,6 +174,53 @@ public List getTablesWithinTTL(String modelName) { .collect(toList()); } + /** + * Distinct physical (raw) table names of every aggregation-FUNCTION metric model installed on + * this node — the closed set of {@code metrics_} / {@code meter_} tables that a foreign + * metric (defined by another OAP) must also live in. Used by the inspect probe. + * + *

Filtered to function metrics, NOT all {@code isMetric()} models: metadata "metrics" such as + * {@code ServiceTraffic} / {@code InstanceTraffic} / {@code EndpointTraffic} are {@link Metrics} + * subclasses (so {@code isMetric()} is true) but carry no aggregation function, no + * {@code entity_id}, and no {@code table_name} discriminator column. Probing them with + * {@code select entity_id ... where table_name = ?} would hit "column not found" and 500. Only + * function metrics are merged into the shared {@code metrics_} tables and always carry both + * columns. + */ + public static List getMetricRawTables() { + return TableMetaInfo.getModels().stream() + .filter(TableHelper::isFunctionMetric) + .map(TableHelper::getTableName) + .distinct() + .collect(toList()); + } + + /** + * Day-partitioned tables for a RAW physical table name (a metric function table) within a + * time-bucket range, filtered to those that actually exist. Unlike + * {@link #getTablesForRead(String, long, long)} this needs no local {@link Model}, so it backs + * the foreign-metric inspect probe across the node's known function tables. + */ + public List getExistingDayTables(String rawTableName, long timeBucketStart, long timeBucketEnd) { + final var timestampStart = TimeBucket.getTimestamp(timeBucketStart); + final var timestampEnd = TimeBucket.getTimestamp(timeBucketEnd); + final var timeBuckets = LongStream.builder(); + for (var timestamp = timestampStart; timestamp <= timestampEnd; timestamp += TimeUnit.DAYS.toMillis(1)) { + timeBuckets.add(TimeBucket.getTimeBucket(timestamp, DownSampling.Day)); + } + return timeBuckets.build() + .distinct() + .mapToObj(timeBucket -> getTable(rawTableName, timeBucket)) + .filter(table -> { + try { + return tableExistence.get(table); + } catch (Exception e) { + throw new RuntimeException(e); + } + }) + .collect(toList()); + } + public static String generateId(Model model, String originalID) { if (model.isRecord() && !model.isSuperDataset()) { return generateId(model.getName(), originalID); diff --git a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/dao/JDBCMetricsQueryDAO.java b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/dao/JDBCMetricsQueryDAO.java index 2ded45d40871..5aa18fc9e2a9 100644 --- a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/dao/JDBCMetricsQueryDAO.java +++ b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/dao/JDBCMetricsQueryDAO.java @@ -203,13 +203,27 @@ public List readLabeledMetricsValuesWithoutEntity(final String me @SneakyThrows public List listEntityIdsInRange(final String metricName, final String valueColumnName, + final String valueType, final Duration duration, final int limit) { - final var tables = tableHelper.getTablesForRead( - metricName, - duration.getStartTimeBucket(), - duration.getEndTimeBucket() - ); + // valueType != null signals a foreign metric (not defined on this OAP). Its physical table + // is per-function, derivable only from the absent model, so probe the node's known metric + // function tables; the table_name = ? discriminator below keeps only this metric's rows. A + // locally-defined metric resolves straight to its own table set. + final List tables; + if (valueType != null) { + tables = new ArrayList<>(); + for (final var rawTable : TableHelper.getMetricRawTables()) { + tables.addAll(tableHelper.getExistingDayTables( + rawTable, duration.getStartTimeBucket(), duration.getEndTimeBucket())); + } + } else { + tables = tableHelper.getTablesForRead( + metricName, + duration.getStartTimeBucket(), + duration.getEndTimeBucket() + ); + } // For each entity_id, track the latest time_bucket seen across every day-partitioned // table the range touches. Per-table query shape is GROUP BY entity_id with MAX(time_bucket) // and ORDER BY that max — portable across H2 / MySQL / PostgreSQL (Postgres rejects diff --git a/test/e2e-v2/cases/inspect/README.md b/test/e2e-v2/cases/inspect/README.md new file mode 100644 index 000000000000..32ff07a7404a --- /dev/null +++ b/test/e2e-v2/cases/inspect/README.md @@ -0,0 +1,30 @@ +# Inspect API e2e — aware (existing) + foreign-metric (new) paths + +Two independent OAPs share one storage backend (no cluster): + +- **oap-a** loads `otel-rules/inspect-e2e.yaml`, turning the OTLP emitter's + `e2e_rr_pool_size` into the Service metric `meter_inspect_e2e_pool`. +- **oap-b** loads no such rule, so that metric is **foreign** to it — absent from its + local registry but present in the shared storage. + +`inspect-foreign-flow.sh` drives both paths and asserts inline: + +| Path | OAP | Assertion | +|------|-----|-----------| +| aware (existing) | oap-a | `/inspect/metrics` lists it; `/inspect/entities` returns `inspect-e2e-svc` with an `mqeEntity` | +| — | oap-b | `/inspect/metrics` excludes it | +| aware, no metadata | oap-b | `/inspect/entities` → `400 metric unknown locally …` | +| **foreign (new)** | oap-b | `/inspect/entities --value-column --value-type` returns the same entity, `scope:null`, no `mqeEntity` | + +Covered storages: `banyandb/`, `elasticsearch/`, `postgresql/`. + +## CI wiring (gated on skywalking-cli) + +The foreign assertion calls `swctl admin inspect entities --value-column / --value-type`, +flags added in [skywalking-cli #230](https://github.com/apache/skywalking-cli/pull/230). +The e2e builds swctl from `SW_CTL_COMMIT` (`test/e2e-v2/script/env`), which is pinned to a +cli commit that includes those flags, and the three storage variants are wired into the +`e2e` matrix in `.github/workflows/skywalking.yaml`. + +To validate locally, build swctl from that commit (or newer) and run any variant's +`e2e.yaml` with `skywalking-infra-e2e`; all three storages pass. diff --git a/test/e2e-v2/cases/inspect/banyandb/docker-compose.yml b/test/e2e-v2/cases/inspect/banyandb/docker-compose.yml new file mode 100644 index 000000000000..10ce114659bd --- /dev/null +++ b/test/e2e-v2/cases/inspect/banyandb/docker-compose.yml @@ -0,0 +1,79 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Inspect foreign-metric e2e — BanyanDB. Two independent OAPs share one BanyanDB +# (no cluster). oap-a loads the inspect-e2e otel-rule (→ meter_inspect_e2e_pool); +# oap-b does not, so that metric is foreign to it. +services: + oap-a: + extends: + file: ../../../script/docker-compose/base-compose.yml + service: oap + volumes: + - ./../otel-rules/inspect-e2e.yaml:/skywalking/config/otel-rules/inspect-e2e.yaml + environment: + SW_ADMIN_SERVER: default + SW_INSPECT: default + JAVA_OPTS: "-Xms512m -Xmx1g" + SW_STORAGE: banyandb + SW_OTEL_RECEIVER: default + SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES: "inspect-e2e" + ports: + - "17128:17128" + - "12800:12800" + depends_on: + banyandb: + condition: service_healthy + + oap-b: + extends: + file: ../../../script/docker-compose/base-compose.yml + service: oap + environment: + SW_ADMIN_SERVER: default + SW_INSPECT: default + JAVA_OPTS: "-Xms512m -Xmx1g" + SW_STORAGE: banyandb + ports: + - "17129:17128" + - "12801:12800" + depends_on: + # Stagger boot: let oap-a initialize the shared storage schema first, then + # oap-b joins an already-initialized backend. Two OAPs racing first-time + # schema init on a fresh shared DB conflict on ES/JDBC; this also mirrors a + # real "second OAP joins the cluster" order. + oap-a: + condition: service_healthy + + banyandb: + extends: + file: ../../../script/docker-compose/base-compose.yml + service: banyandb + + otlp-emitter: + build: + context: ../../runtime-rule/mal-storage/otlp-emitter + networks: + - e2e + environment: + OTLP_ENDPOINT: http://oap-a:11800 + EMITTER_SERVICE: inspect-e2e-svc + EMITTER_INSTANCE: inspect-e2e-i1 + depends_on: + oap-a: + condition: service_healthy + +networks: + e2e: diff --git a/test/e2e-v2/cases/inspect/banyandb/e2e.yaml b/test/e2e-v2/cases/inspect/banyandb/e2e.yaml new file mode 100644 index 000000000000..f49abbddf129 --- /dev/null +++ b/test/e2e-v2/cases/inspect/banyandb/e2e.yaml @@ -0,0 +1,59 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Inspect API e2e (existing aware path + new foreign-metric path) — BanyanDB. +# Two OAPs share one BanyanDB; oap-a defines meter_inspect_e2e_pool, oap-b does not. +# The flow script asserts the aware path on oap-a (17128) and the foreign path on +# oap-b (17129, via --value-column / --value-type). + +setup: + env: compose + file: docker-compose.yml + timeout: 25m + init-system-environment: ../../../script/env + steps: + - name: set PATH + command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + - name: install yq + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq + - name: install swctl + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl + - name: install jq + command: | + if ! command -v jq >/dev/null 2>&1; then + curl -fsSL -o /tmp/skywalking-infra-e2e/bin/jq \ + https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64 + chmod +x /tmp/skywalking-infra-e2e/bin/jq + fi + - name: drive inspect aware + foreign flow + command: | + set -euo pipefail + export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + export A_REST=http://127.0.0.1:17128 + export B_REST=http://127.0.0.1:17129 + bash test/e2e-v2/cases/inspect/inspect-foreign-flow.sh + +verify: + retry: + count: 1 + interval: 1s + cases: + # The flow script drives every assertion inline; this trailing smoke check + # confirms the inspect port is still serving after the foreign-read phase. + - query: swctl --display json --admin-url=http://127.0.0.1:17128 admin inspect metrics --regex meter_inspect_e2e_pool >/dev/null && echo ok + expected: ../expected/ok.txt + +cleanup: + on: always diff --git a/test/e2e-v2/cases/inspect/elasticsearch/docker-compose.yml b/test/e2e-v2/cases/inspect/elasticsearch/docker-compose.yml new file mode 100644 index 000000000000..d408ac3928d1 --- /dev/null +++ b/test/e2e-v2/cases/inspect/elasticsearch/docker-compose.yml @@ -0,0 +1,92 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Inspect foreign-metric e2e — Elasticsearch. Two OAPs share one ES (no cluster). +# oap-a loads the inspect-e2e otel-rule (→ meter_inspect_e2e_pool); oap-b does not, +# so that metric is foreign to it (read via the merged metrics-all index). +services: + es: + image: elastic/elasticsearch:${ES_VERSION} + expose: + - 9200 + networks: + - e2e + environment: + - discovery.type=single-node + - cluster.routing.allocation.disk.threshold_enabled=false + - xpack.security.enabled=false + healthcheck: + test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/9200"] + interval: 5s + timeout: 60s + retries: 120 + + oap-a: + extends: + file: ../../../script/docker-compose/base-compose.yml + service: oap + volumes: + - ./../otel-rules/inspect-e2e.yaml:/skywalking/config/otel-rules/inspect-e2e.yaml + environment: + SW_ADMIN_SERVER: default + SW_INSPECT: default + JAVA_OPTS: "-Xms512m -Xmx1g" + SW_STORAGE: elasticsearch + SW_STORAGE_ES_CLUSTER_NODES: es:9200 + SW_OTEL_RECEIVER: default + SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES: "inspect-e2e" + ports: + - "17128:17128" + - "12800:12800" + depends_on: + es: + condition: service_healthy + + oap-b: + extends: + file: ../../../script/docker-compose/base-compose.yml + service: oap + environment: + SW_ADMIN_SERVER: default + SW_INSPECT: default + JAVA_OPTS: "-Xms512m -Xmx1g" + SW_STORAGE: elasticsearch + SW_STORAGE_ES_CLUSTER_NODES: es:9200 + ports: + - "17129:17128" + - "12801:12800" + depends_on: + # Stagger boot: let oap-a initialize the shared storage schema first, then + # oap-b joins an already-initialized backend. Two OAPs racing first-time + # index init on a fresh shared ES conflict; this also mirrors a real + # "second OAP joins the cluster" order. + oap-a: + condition: service_healthy + + otlp-emitter: + build: + context: ../../runtime-rule/mal-storage/otlp-emitter + networks: + - e2e + environment: + OTLP_ENDPOINT: http://oap-a:11800 + EMITTER_SERVICE: inspect-e2e-svc + EMITTER_INSTANCE: inspect-e2e-i1 + depends_on: + oap-a: + condition: service_healthy + +networks: + e2e: diff --git a/test/e2e-v2/cases/inspect/elasticsearch/e2e.yaml b/test/e2e-v2/cases/inspect/elasticsearch/e2e.yaml new file mode 100644 index 000000000000..4e12b3b05216 --- /dev/null +++ b/test/e2e-v2/cases/inspect/elasticsearch/e2e.yaml @@ -0,0 +1,59 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Inspect API e2e (existing aware path + new foreign-metric path) — Elasticsearch. +# Two OAPs share one Elasticsearch; oap-a defines meter_inspect_e2e_pool, oap-b does not. +# The flow script asserts the aware path on oap-a (17128) and the foreign path on +# oap-b (17129, via --value-column / --value-type). + +setup: + env: compose + file: docker-compose.yml + timeout: 25m + init-system-environment: ../../../script/env + steps: + - name: set PATH + command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + - name: install yq + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq + - name: install swctl + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl + - name: install jq + command: | + if ! command -v jq >/dev/null 2>&1; then + curl -fsSL -o /tmp/skywalking-infra-e2e/bin/jq \ + https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64 + chmod +x /tmp/skywalking-infra-e2e/bin/jq + fi + - name: drive inspect aware + foreign flow + command: | + set -euo pipefail + export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + export A_REST=http://127.0.0.1:17128 + export B_REST=http://127.0.0.1:17129 + bash test/e2e-v2/cases/inspect/inspect-foreign-flow.sh + +verify: + retry: + count: 1 + interval: 1s + cases: + # The flow script drives every assertion inline; this trailing smoke check + # confirms the inspect port is still serving after the foreign-read phase. + - query: swctl --display json --admin-url=http://127.0.0.1:17128 admin inspect metrics --regex meter_inspect_e2e_pool >/dev/null && echo ok + expected: ../expected/ok.txt + +cleanup: + on: always diff --git a/test/e2e-v2/cases/inspect/expected/ok.txt b/test/e2e-v2/cases/inspect/expected/ok.txt new file mode 100644 index 000000000000..9766475a4185 --- /dev/null +++ b/test/e2e-v2/cases/inspect/expected/ok.txt @@ -0,0 +1 @@ +ok diff --git a/test/e2e-v2/cases/inspect/inspect-foreign-flow.sh b/test/e2e-v2/cases/inspect/inspect-foreign-flow.sh new file mode 100755 index 000000000000..f86a130f2093 --- /dev/null +++ b/test/e2e-v2/cases/inspect/inspect-foreign-flow.sh @@ -0,0 +1,106 @@ +#!/usr/bin/env bash +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# +# Inspect API e2e — existing (OAP-aware) + new (foreign-metric) paths against two +# independent OAPs that SHARE one storage backend (no cluster). +# +# OAP-A loads otel-rules/inspect-e2e.yaml and turns the OTLP emitter's +# e2e_rr_pool_size into the Service metric meter_inspect_e2e_pool. +# OAP-B loads NO such rule, so meter_inspect_e2e_pool is "foreign" to it — +# absent from its local registry but present in the shared storage. +# +# Asserts: +# 1. OAP-A /inspect/metrics lists the metric, /inspect/entities (AWARE/old path) +# returns inspect-e2e-svc with an mqeEntity. +# 2. OAP-B /inspect/metrics does NOT list it. +# 3. OAP-B /inspect/entities WITHOUT valueColumn/valueType → "unknown locally". +# 4. OAP-B /inspect/entities WITH valueColumn/valueType (FOREIGN/new path) returns +# the SAME entity, scope=null, no mqeEntity. +set -euo pipefail + +A_REST="${A_REST:-http://127.0.0.1:17128}" +B_REST="${B_REST:-http://127.0.0.1:17129}" +METRIC="meter_inspect_e2e_pool" +SVC="inspect-e2e-svc" +SETTLE="${SETTLE:-360}" + +log() { echo "[inspect-flow] $*" >&2; } +fail() { echo "[inspect-flow] FAIL: $*" >&2; exit 1; } + +a_inspect() { swctl --display json --admin-url="${A_REST}" admin inspect "$@"; } +b_inspect() { swctl --display json --admin-url="${B_REST}" admin inspect "$@"; } + +DAY="$(date -u +%Y-%m-%d)" + +log "wait for both OAP inspect ports" +for _ in $(seq 1 90); do + a_inspect metrics --regex 'service_.*' >/dev/null 2>&1 && \ + b_inspect metrics --regex 'service_.*' >/dev/null 2>&1 && break + sleep 2 +done + +# --- 1. OAP-A registers + persists the custom metric (AWARE / existing path) --- +log "await ${METRIC} registered + producing entities on OAP-A (≤${SETTLE}s)" +deadline=$(( $(date +%s) + SETTLE )) +a_rows="" +while :; do + if a_inspect metrics --regex "${METRIC}" | jq -e '.metrics[]? | select(.name=="'"${METRIC}"'")' >/dev/null 2>&1; then + a_rows="$(a_inspect entities --metric "${METRIC}" --start "${DAY}" --end "${DAY}" --step DAY 2>/dev/null || true)" + if echo "${a_rows}" | jq -e '.rows[]? | select(.decoded.serviceName=="'"${SVC}"'")' >/dev/null 2>&1; then + break + fi + fi + (( $(date +%s) < deadline )) || fail "OAP-A never produced ${METRIC} entity ${SVC} within ${SETTLE}s" + sleep 5 +done +echo "${a_rows}" | jq -e '.rows[] | select(.decoded.serviceName=="'"${SVC}"'") | .mqeEntity.serviceName=="'"${SVC}"'"' >/dev/null \ + || fail "OAP-A aware row missing mqeEntity for ${SVC}: ${a_rows}" +log " ✓ OAP-A aware /inspect/entities returns ${SVC} with mqeEntity (old path)" + +# Value column as OAP-A reports it (applies the per-engine override, e.g. value -> value_ on jdbc). +VC="$(a_inspect metrics --regex "${METRIC}" | jq -r '.metrics[] | select(.name=="'"${METRIC}"'") | .valueColumnName')" +[ -n "${VC}" ] && [ "${VC}" != "null" ] || fail "could not read valueColumnName from OAP-A" +log " OAP-A reports valueColumn=${VC}" + +# --- 2. OAP-B does not know the metric --- +if b_inspect metrics --regex "${METRIC}" | jq -e '.metrics[]? | select(.name=="'"${METRIC}"'")' >/dev/null 2>&1; then + fail "OAP-B unexpectedly knows ${METRIC} (should be foreign)" +fi +log " ✓ OAP-B /inspect/metrics excludes ${METRIC}" + +# --- 3. OAP-B aware path (no metadata) is rejected --- +if out="$(b_inspect entities --metric "${METRIC}" --start "${DAY}" --end "${DAY}" --step DAY 2>&1)"; then + fail "OAP-B aware path unexpectedly succeeded for foreign metric: ${out}" +fi +echo "${out}" | grep -qi "unknown locally" \ + || fail "OAP-B aware path expected 'unknown locally', got: ${out}" +log " ✓ OAP-B aware /inspect/entities (no metadata) → unknown locally" + +# --- 4. OAP-B foreign path (valueColumn + valueType) reads the shared-storage rows --- +b_rows="$(b_inspect entities --metric "${METRIC}" --value-column "${VC}" --value-type LONG \ + --start "${DAY}" --end "${DAY}" --step DAY)" \ + || fail "OAP-B foreign path errored" +echo "${b_rows}" | jq -e '.rows[]? | select(.decoded.serviceName=="'"${SVC}"'")' >/dev/null \ + || fail "OAP-B foreign path returned no ${SVC} row: ${b_rows}" +# Foreign response degrades: scope null (the OAP emits null; swctl's typed model renders it as +# an empty string — accept either), no mqeEntity. +[ -z "$(echo "${b_rows}" | jq -r '.scope // ""')" ] \ + || fail "OAP-B foreign response should have empty/null scope: ${b_rows}" +echo "${b_rows}" | jq -e '.rows[] | select(.decoded.serviceName=="'"${SVC}"'") | .mqeEntity == null' >/dev/null \ + || fail "OAP-B foreign row should carry no mqeEntity: ${b_rows}" +log " ✓ OAP-B FOREIGN /inspect/entities returns ${SVC}, scope=null, no mqeEntity (new path)" + +log "=== inspect-foreign-flow.sh PASSED ===" diff --git a/test/e2e-v2/cases/inspect/otel-rules/inspect-e2e.yaml b/test/e2e-v2/cases/inspect/otel-rules/inspect-e2e.yaml new file mode 100644 index 000000000000..70477a3b6f34 --- /dev/null +++ b/test/e2e-v2/cases/inspect/otel-rules/inspect-e2e.yaml @@ -0,0 +1,25 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Mounted on OAP-A only. Turns the shared OTLP emitter's `e2e_rr_pool_size` gauge +# into the Service-scoped metric `meter_inspect_e2e_pool`. OAP-B does NOT load this +# rule, so the metric is "foreign" to B — readable only through the new +# valueColumn/valueType inspect path. (Reuses the runtime-rule otlp-emitter, which +# publishes e2e_rr_pool_size tagged with service_name.) +expSuffix: service(['service_name'], Layer.GENERAL) +metricPrefix: meter_inspect_e2e +metricsRules: + - name: pool + exp: e2e_rr_pool_size.sum(['service_name']) diff --git a/test/e2e-v2/cases/inspect/postgresql/docker-compose.yml b/test/e2e-v2/cases/inspect/postgresql/docker-compose.yml new file mode 100644 index 000000000000..e38a2c2a09f9 --- /dev/null +++ b/test/e2e-v2/cases/inspect/postgresql/docker-compose.yml @@ -0,0 +1,92 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Inspect foreign-metric e2e — PostgreSQL (JDBC). Two OAPs share one PostgreSQL +# (no cluster). oap-a loads the inspect-e2e otel-rule (→ meter_inspect_e2e_pool); +# oap-b does not, so that metric is foreign to it (read by probing the function +# tables on the table_name discriminator). +services: + postgres: + image: postgres:13 + expose: + - 5432 + networks: + - e2e + environment: + - POSTGRES_PASSWORD=123456 + - POSTGRES_DB=skywalking + healthcheck: + test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/5432"] + interval: 5s + timeout: 60s + retries: 120 + + oap-a: + extends: + file: ../../../script/docker-compose/base-compose.yml + service: oap + volumes: + - ./../otel-rules/inspect-e2e.yaml:/skywalking/config/otel-rules/inspect-e2e.yaml + environment: + SW_ADMIN_SERVER: default + SW_INSPECT: default + JAVA_OPTS: "-Xms512m -Xmx1g" + SW_STORAGE: postgresql + SW_JDBC_URL: "jdbc:postgresql://postgres:5432/skywalking" + SW_OTEL_RECEIVER: default + SW_OTEL_RECEIVER_ENABLED_OTEL_METRICS_RULES: "inspect-e2e" + ports: + - "17128:17128" + - "12800:12800" + depends_on: + postgres: + condition: service_healthy + + oap-b: + extends: + file: ../../../script/docker-compose/base-compose.yml + service: oap + environment: + SW_ADMIN_SERVER: default + SW_INSPECT: default + JAVA_OPTS: "-Xms512m -Xmx1g" + SW_STORAGE: postgresql + SW_JDBC_URL: "jdbc:postgresql://postgres:5432/skywalking" + ports: + - "17129:17128" + - "12801:12800" + depends_on: + # Stagger boot: let oap-a initialize the shared storage schema first, then + # oap-b joins an already-initialized backend. Two OAPs racing first-time + # DDL on a fresh shared PostgreSQL conflict; this also mirrors a real + # "second OAP joins the cluster" order. + oap-a: + condition: service_healthy + + otlp-emitter: + build: + context: ../../runtime-rule/mal-storage/otlp-emitter + networks: + - e2e + environment: + OTLP_ENDPOINT: http://oap-a:11800 + EMITTER_SERVICE: inspect-e2e-svc + EMITTER_INSTANCE: inspect-e2e-i1 + depends_on: + oap-a: + condition: service_healthy + +networks: + e2e: diff --git a/test/e2e-v2/cases/inspect/postgresql/e2e.yaml b/test/e2e-v2/cases/inspect/postgresql/e2e.yaml new file mode 100644 index 000000000000..f11ae2b7f0e8 --- /dev/null +++ b/test/e2e-v2/cases/inspect/postgresql/e2e.yaml @@ -0,0 +1,59 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# Inspect API e2e (existing aware path + new foreign-metric path) — PostgreSQL. +# Two OAPs share one PostgreSQL; oap-a defines meter_inspect_e2e_pool, oap-b does not. +# The flow script asserts the aware path on oap-a (17128) and the foreign path on +# oap-b (17129, via --value-column / --value-type). + +setup: + env: compose + file: docker-compose.yml + timeout: 25m + init-system-environment: ../../../script/env + steps: + - name: set PATH + command: export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + - name: install yq + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh yq + - name: install swctl + command: bash test/e2e-v2/script/prepare/setup-e2e-shell/install.sh swctl + - name: install jq + command: | + if ! command -v jq >/dev/null 2>&1; then + curl -fsSL -o /tmp/skywalking-infra-e2e/bin/jq \ + https://github.com/jqlang/jq/releases/download/jq-1.7.1/jq-linux-amd64 + chmod +x /tmp/skywalking-infra-e2e/bin/jq + fi + - name: drive inspect aware + foreign flow + command: | + set -euo pipefail + export PATH=/tmp/skywalking-infra-e2e/bin:$PATH + export A_REST=http://127.0.0.1:17128 + export B_REST=http://127.0.0.1:17129 + bash test/e2e-v2/cases/inspect/inspect-foreign-flow.sh + +verify: + retry: + count: 1 + interval: 1s + cases: + # The flow script drives every assertion inline; this trailing smoke check + # confirms the inspect port is still serving after the foreign-read phase. + - query: swctl --display json --admin-url=http://127.0.0.1:17128 admin inspect metrics --regex meter_inspect_e2e_pool >/dev/null && echo ok + expected: ../expected/ok.txt + +cleanup: + on: always diff --git a/test/e2e-v2/cases/storage/expected/inspect-entities-unknown-metric.yml b/test/e2e-v2/cases/storage/expected/inspect-entities-unknown-metric.yml index 80d0b6f0e309..f327020b8d66 100644 --- a/test/e2e-v2/cases/storage/expected/inspect-entities-unknown-metric.yml +++ b/test/e2e-v2/cases/storage/expected/inspect-entities-unknown-metric.yml @@ -12,6 +12,7 @@ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. -# /inspect/entities for a metric that is not in ValueColumnMetadata — 400 -# error JSON shape (curl -s strips status but the body is the ErrorResponse). -error: 'unknown metric: nonexistent_metric_xyz' +# /inspect/entities for a metric that is not in ValueColumnMetadata and carries no +# valueColumn/valueType — 400. The handler returns "metric unknown locally: — +# provide valueColumn and valueType …"; the query reconstructs the {error: …} prefix. +error: 'metric unknown locally: nonexistent_metric_xyz' diff --git a/test/e2e-v2/cases/storage/storage-cases.yaml b/test/e2e-v2/cases/storage/storage-cases.yaml index 799125791a51..155889aa1037 100644 --- a/test/e2e-v2/cases/storage/storage-cases.yaml +++ b/test/e2e-v2/cases/storage/storage-cases.yaml @@ -216,12 +216,13 @@ cases: DAY=$(date -u +"%Y-%m-%d") swctl --display json --admin-url=http://${oap_host}:${oap_17128} admin inspect entities --metric service_cpm --start "${DAY} 00" --end "${DAY} 23" --step HOUR | yq -P expected: expected/inspect-entities-service-cpm-hour.yml - # Negative — unknown metric: swctl exits non-zero and renders the typed error - # envelope ("... HTTP 400: unknown metric: "). Reconstruct the original - # {error: ...} shape from the message so the expected file is unchanged. + # Negative — metric absent from this OAP: swctl exits non-zero and renders the + # typed error envelope ("... HTTP 400: metric unknown locally: — provide + # valueColumn and valueType …"). The handler steers the caller toward the + # foreign-metric path; reconstruct the {error: ...} shape from the message prefix. - query: | DAY=$(date -u +"%Y-%m-%d") out=$(swctl --display yaml --admin-url=http://${oap_host}:${oap_17128} admin inspect entities --metric nonexistent_metric_xyz --start "${DAY}" --end "${DAY}" --step DAY 2>&1 || true) - msg=$(echo "$out" | grep -o 'unknown metric: [a-z_0-9]*' | head -1) + msg=$(echo "$out" | grep -o 'metric unknown locally: [a-z_0-9]*' | head -1) yq -n ".error = \"${msg}\"" expected: expected/inspect-entities-unknown-metric.yml diff --git a/test/e2e-v2/script/env b/test/e2e-v2/script/env index ced8ba857f25..373df11394fc 100644 --- a/test/e2e-v2/script/env +++ b/test/e2e-v2/script/env @@ -27,7 +27,7 @@ SW_BANYANDB_COMMIT=c2d925e4eae4d77edda94e1fd438243483960150 SW_AGENT_PHP_COMMIT=d1114e7be5d89881eec76e5b56e69ff844691e35 SW_PREDICTOR_COMMIT=54a0197654a3781a6f73ce35146c712af297c994 -SW_CTL_COMMIT=b447211a9319eeb29a445335e9c2536f8c1aa23d +SW_CTL_COMMIT=90365c4bc59de3704ff81b4cefe55d09f706d00d # Third-party image versions used by e2e infrastructure (not skywalking # components). Pinned here so the matrix is reproducible. From 29a959512ba24fbc4cd71ac434ab7fd5c72b5f91 Mon Sep 17 00:00:00 2001 From: Wu Sheng Date: Tue, 23 Jun 2026 23:08:47 +0800 Subject: [PATCH 2/3] Fix javadoc build: fold foreign-metric param docs into prose The listEntities javadoc used @param valueColumn / @param valueType, but the Optional parameters are named valueColumnOpt / valueTypeOpt, so javadoc failed with 'error: @param name not found' (failOnError), breaking the dist build and every dependent e2e job. Move the valueColumn/valueType documentation into the method-description list so there are no @param tags to mismatch. --- .../inspect/handler/InspectRestHandler.java | 22 +++++++++---------- 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java b/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java index e3a5fb1b0b0a..1d358542ad31 100644 --- a/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java +++ b/oap-server/server-admin/inspect/src/main/java/org/apache/skywalking/oap/server/admin/inspect/handler/InspectRestHandler.java @@ -172,19 +172,17 @@ public HttpResponse listMetrics(@Param("regex") final Optional regex, * re-queryable {@code mqeEntity}. * *

For a metric persisted by ANOTHER OAP that this node does not define (no local registry - * entry, no OAL/MAL text to recover it from), the caller MUST supply the metric's storage + * entry, no OAL/MAL text to recover it from), the caller MUST also supply the metric's storage * metadata, which cannot be inferred from the name: - * - * @param valueColumn The metric's value column. Required when the metric is unknown locally. - * A property of the metric's aggregation FUNCTION — one of the built-in - * value columns: {@code value} (common scalar), {@code double_value}, - * {@code int_value}, {@code percentage}, {@code datatable_value} (labeled), - * {@code dataset} (histogram). On MySQL / PostgreSQL pass the - * reserved-word-overridden physical name ({@code value_} etc.). - * @param valueType How to read/decode the value. Required when the metric is unknown locally. - * Accepted: {@code LONG} / {@code INT} / {@code DOUBLE} (scalar) or - * {@code LABELED} (DataTable). HISTOGRAM/heatmap and SAMPLED_RECORD are out - * of scope for this endpoint. + *

    + *
  • {@code valueColumn} — the metric's value column, a property of its aggregation FUNCTION: + * one of {@code value} (common scalar), {@code double_value}, {@code int_value}, + * {@code percentage}, {@code datatable_value} (labeled), {@code dataset} (histogram). On + * MySQL / PostgreSQL pass the reserved-word-overridden physical name ({@code value_}).
  • + *
  • {@code valueType} — how to read/decode the value: {@code LONG} / {@code INT} / + * {@code DOUBLE} (scalar) or {@code LABELED} (DataTable). HISTOGRAM/heatmap and + * SAMPLED_RECORD are out of scope for this endpoint.
  • + *
*/ @Get("/inspect/entities") public HttpResponse listEntities(@Param("metric") final String metric, From 3bf939838b0bec7caa3ecf1b278e5faa8161f7cb Mon Sep 17 00:00:00 2001 From: Wu Sheng Date: Tue, 23 Jun 2026 23:42:37 +0800 Subject: [PATCH 3/3] Fix javadoc on JDK 11: {@link Metrics} -> {@code Metrics} in TableHelper MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Metrics is not imported in TableHelper (only used in the doc), so JDK 11's doclint rejects the simple-name {@link} with 'reference not found', failing the dist build at :storage-jdbc-hikaricp-plugin. Newer local JDKs (21/25) resolve it via the classpath and don't flag it. Use {@code Metrics} to match the sibling {@code ServiceTraffic}/... refs — no import, no reference resolution. Verified the javadoc build of all changed modules under temurin-11. --- .../oap/server/storage/plugin/jdbc/common/TableHelper.java | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java index b3c7e358b695..7d02b4afceaa 100644 --- a/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java +++ b/oap-server/server-storage-plugin/storage-jdbc-hikaricp-plugin/src/main/java/org/apache/skywalking/oap/server/storage/plugin/jdbc/common/TableHelper.java @@ -180,7 +180,7 @@ public List getTablesWithinTTL(String modelName) { * metric (defined by another OAP) must also live in. Used by the inspect probe. * *

Filtered to function metrics, NOT all {@code isMetric()} models: metadata "metrics" such as - * {@code ServiceTraffic} / {@code InstanceTraffic} / {@code EndpointTraffic} are {@link Metrics} + * {@code ServiceTraffic} / {@code InstanceTraffic} / {@code EndpointTraffic} are {@code Metrics} * subclasses (so {@code isMetric()} is true) but carry no aggregation function, no * {@code entity_id}, and no {@code table_name} discriminator column. Probing them with * {@code select entity_id ... where table_name = ?} would hit "column not found" and 500. Only