From b48f88b944b0ff2ab31623973765bfd4f3f8eaba Mon Sep 17 00:00:00 2001 From: Jan-Erik Rediger Date: Mon, 22 Jun 2026 14:58:48 +0200 Subject: [PATCH] Update database docs [doc only] --- docs/dev/core/internal/database.md | 56 +++++++++---------- docs/dev/core/internal/directory-structure.md | 2 +- 2 files changed, 29 insertions(+), 29 deletions(-) diff --git a/docs/dev/core/internal/database.md b/docs/dev/core/internal/database.md index 0c7eaccf80..556b7e62cd 100644 --- a/docs/dev/core/internal/database.md +++ b/docs/dev/core/internal/database.md @@ -1,46 +1,45 @@ # Database format -The Glean SDK stores all recorded data in a database for persistence. +The Glean SDK stores all recorded data in an [SQLite] database for persistence. This data is read, written and transformed by the core implementation. -Some internal metrics are stored similar to user-defined metrics, -but encode additional implementation-defined information in the key or value of the entry. +Internal metrics are stored similar to user-defined metrics. We guarantee backwards-compatibility of already stored data. If necessary an old database will be converted to the new format. -## Database stores +## Database tables -The Glean SDK will use one store per metric lifetime: -`user`, `application` and `ping`. -This allows to separately read and clear metrics based on their respective lifetimes. - -## Key - -The key of a database entry uniquely identifies the stored metric data. -It encodes additional information about the stored data in the key name using special characters. -The full list of special characters in use is: - -`. # / +` - -These characters cannot be used in a user-defined ping name, metric category, metric name or label. - -A key will usually look like: +The Glean SDK will store all metric data in a table called `telemetry`. +This table has the following schema: ``` -ping#category.name[/label] +CREATE TABLE telemetry( + id TEXT NOT NULL, + ping TEXT NOT NULL, + lifetime TEXT NOT NULL, + labels TEXT NOT NULL, + value BLOB, + UNIQUE(id, ping, labels) +);", ``` -where: +| Column | Type | Description | +| ------ | ---- | ----------- | +| `id` | `TEXT` | A full metric identifier: `category.name`. | +| `ping` | `TEXT` | The ping this value is recorded for. | +| `lifetime` | `TEXT` | The lifetime of the stored value, one of `ping`, `app` or `user`. | +| `labels` | `TEXT` | The label or labels for this value. Multiple labels are separated by the record separator (`\x1E`). Empty string when no labels are specified. | +| `value` | `BLOB` | The encoded value. | + +### Indices -| Field | Description | Allowed characters | Maximum length\* | Note | -| ----- | ----------- | ------ | ----- | ----- | -| `ping` | The ping name this data is stored for | `[a-z0-9-]` | 30 | -| `category` | The metric's category | `[a-z0-9._]` | 40 | Empty string possible. | -| `name` | The metric's name | `[a-z0-9._#]` | 70 | -| `label` | The label (optional) | `[a-z0-9._-]` | 111 | +`UNIQUE(id, ping, labels)` -_\* The maximum length is not enforced for internal metrics, but is enforced for user metrics as per schema definition._ +Every row is unique by the id, ping and associated labels. +This allows for efficient fetching and updating of those values. +Recorded values for a specific `id` can go into multiple pings. +For a specific `id` values can be recorded for different labels. {{#include ../../../shared/blockquote-info.html}} @@ -55,4 +54,5 @@ _\* The maximum length is not enforced for internal metrics, but is enforced for The value is stored in an implementation-defined format to encode the value's data. It can be read, modified and serialized into the [Payload format]. +[SQLite]: https://sqlite.org/ [Payload format]: payload.md diff --git a/docs/dev/core/internal/directory-structure.md b/docs/dev/core/internal/directory-structure.md index 0b4b47d0b8..60ea45247c 100644 --- a/docs/dev/core/internal/directory-structure.md +++ b/docs/dev/core/internal/directory-structure.md @@ -12,7 +12,7 @@ For the Python bindings, if no directory is specified, it is stored in a tempora Within the `glean_data` directory are the following contents: -- `db`: Contains the [rkv](https://github.com/mozilla/rkv) database used to persist ping and user lifetime metrics. +- `db`: Contains the [SQLite](https://sqlite.org/) database files used to persist ping and user lifetime metrics. - `events`: Contains flat files containing persisted events before they are collected into pings.