Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 11 additions & 0 deletions SERIALIZATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,11 @@ Source-specific variants (parallel to the substrate, not derived from it):
oc_isamples_pqg.parquet (GCS, 11.8 M, narrow, OC-only)
oc_isamples_pqg_wide.parquet (GCS, 2.5 M, wide, OC-only)
└─► serve as upstream for OpenContext thumbnails folded into 202604 wide

Vocabulary labels (parallel to the substrate, sourced from isamplesorg/vocabularies):

vocab_labels.parquet (58 KB, 537 SKOS concepts)
└─► consumed by Search Explorer to render facet URIs as prefLabels
```

Arrows indicate derivation, not containment. Every file in the left
Expand Down Expand Up @@ -115,6 +120,12 @@ column can be rebuilt from its parent by a script in
| `isamples_202601_facet_summaries.parquet` | Baseline `(facet_type, facet_value, scheme, count)` | 2 KB | 56 | wide | Every tutorial (instant initial facet counts) | QUERY_SPEC §3.3 tier 1 |
| `isamples_202601_facet_cross_filter.parquet` | Pre-computed counts for single-filter cross-facet queries | 6 KB | 526 | wide | Search Explorer cross-filter UI | QUERY_SPEC §3.3 tier 2a |

### Tier: vocabulary labels

| File | Role | Size | Rows | Upstream | Consumers | Spec |
|---|---|---:|---:|---|---|---|
| `vocab_labels.parquet` | SKOS concept URI → human-readable `pref_label` map (plus `definition`, `alt_labels`, `scheme`); covers material, sample object type, and sampled feature type vocabularies | 58 KB | 537 | `isamplesorg/vocabularies` TTLs (built by `scripts/build_vocab_labels.py`) | Search Explorer (renders facet URIs as prefLabels); any tutorial that surfaces controlled-vocabulary URIs | issue #148 |

### Tier: alternative export formats (upstream of the aggregated Zenodo export)

The `export_client` can emit each source's records in multiple formats;
Expand Down
16 changes: 16 additions & 0 deletions data.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,7 @@ cite `https://data.isamples.org/<file>`.
| Aggregate map clusters by zoom | [`h3_summary_res{4,6,8}.parquet`](https://data.isamples.org/isamples_202601_h3_summary_res4.parquet) | ≤ 2.4 MB each |
| Filter by material / context / object-type | [`sample_facets_v2.parquet`](https://data.isamples.org/isamples_202601_sample_facets_v2.parquet) | 63 MB |
| Walk relationships (graph queries) | [`isamples_202512_narrow.parquet`](https://data.isamples.org/isamples_202512_narrow.parquet) | 820 MB |
| Translate vocabulary URIs to human-readable labels | [`vocab_labels.parquet`](https://data.isamples.org/vocab_labels.parquet) | 58 KB |

## 3. Copy-pasteable DuckDB snippets

Expand Down Expand Up @@ -129,6 +130,21 @@ con.sql("""
""").df()
```

### 3.6 Vocab labels: render facet URIs as human-readable text

```python
# Join sample facets to vocabulary prefLabels so the UI shows
# "Ceramic Clay" instead of the raw concept URI.
con.sql("""
SELECT f.pid, f.label, v.pref_label AS material_label
FROM read_parquet('https://data.isamples.org/isamples_202601_sample_facets_v2.parquet') f
LEFT JOIN read_parquet('https://data.isamples.org/vocab_labels.parquet') v
ON f.material = v.uri
WHERE f.material IS NOT NULL
LIMIT 10
""").df()
```

## 4. H3 tier breakpoints (for map authors)

The H3 summary files back a progressive-globe rendering pattern:
Expand Down
2 changes: 2 additions & 0 deletions how-to-use.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@ and counts instantly, without touching the 278 MB primary file:
| [`isamples_202601_facet_summaries.parquet`](https://data.isamples.org/isamples_202601_facet_summaries.parquet) | 2 KB | `(facet_type, facet_value, count)` for source, material, context, object_type | You want instant initial facet counts with no filters applied |
| [`isamples_202601_facet_cross_filter.parquet`](https://data.isamples.org/isamples_202601_facet_cross_filter.parquet) | 6 KB | Pre-computed counts for single-facet selections | You want instant cross-filtered counts for a single active filter |
| [`isamples_202601_sample_facets_v2.parquet`](https://data.isamples.org/isamples_202601_sample_facets_v2.parquet) | 63 MB | `(pid, material, context, object_type)` facet URIs per sample | You need to filter on *combinations* of facets at query time |
| [`vocab_labels.parquet`](https://data.isamples.org/vocab_labels.parquet) | 58 KB | `(uri, pref_label, definition, alt_labels, scheme)` for 537 SKOS concepts (material, sample object type, sampled feature type) | You need to render facet URIs as human-readable text |

### Geospatial aggregates (H3) {.unnumbered}

Expand Down Expand Up @@ -123,6 +124,7 @@ browsers use the parquet versions.
| `sample_facets_v2.parquet` | ● | ● | |
| `h3_summary_res4/6/8.parquet` | ● | | |
| `samples_map_lite.parquet` | ● | | |
| `vocab_labels.parquet` | ● | ● | |

### Quick query recipes {.unnumbered}

Expand Down
Loading