Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
6d488cc
added check for indirect path from data entities to root data entity …
floWetzels Feb 11, 2026
d9a90e7
test crate for cyclic datasets
floWetzels Feb 11, 2026
244f398
Merge branch 'crs4:develop' into fix-cyclic-datasets
floWetzels Feb 19, 2026
f8b0e55
feat(file-descriptor): :sparkles: add internal remote-context retriev…
kikkomep Feb 23, 2026
0ce2619
refactor(file-descriptor): :recycle: update checks to use the new met…
kikkomep Feb 23, 2026
4e3a27d
fix(file-descriptor): :pencil2: fix typo
kikkomep Feb 23, 2026
fd52f98
test(test-data): :wrench: extend valid crate context with remote cont…
kikkomep Feb 23, 2026
944ec82
test(file-descriptor): :white_check_mark: add tests for remote contex…
kikkomep Feb 23, 2026
fe5ba1c
fix(file-descriptor): :wrench: accept application/json and treat Link…
kikkomep Feb 23, 2026
4f0ed43
fix(test-data): :adhesive_bandage: replace incorrect scheme for schem…
kikkomep Feb 23, 2026
5fe8171
fix: :adhesive_bandage: allow terms defined by context prefixes
kikkomep Feb 23, 2026
928026e
test(test-data): :card_file_box: extend test data to include prefixed…
kikkomep Feb 23, 2026
45a7017
fix(file-descriptor): :adhesive_bandage: refine compacted JSON-LD key…
kikkomep Feb 23, 2026
99f9b0d
Merge pull request #152 from nfdi4plants/fix-cyclic-datasets
kikkomep Mar 24, 2026
2f2a873
feat(utils): :sparkles: extend HttpRequester constructor to support c…
kikkomep Nov 26, 2025
36ca0ac
chore(utils): :wrench: increase session cache max age to 300 seconds
kikkomep Nov 26, 2025
b2b47ba
feat(model): :sparkles: enable cache configuration in ValidationSettings
kikkomep Nov 26, 2025
564230f
feat(cli): :sparkles: add CLI options for cache configuration (`cache…
kikkomep Nov 26, 2025
c5848bc
docs(cli): :memo: document `-1` option for no cache expiration in `--…
kikkomep Mar 24, 2026
ee6a223
fix(core): :bug: fix import
kikkomep Mar 24, 2026
4b45368
Merge pull request #123 from kikkomep/feat/extend-caching-support
kikkomep Mar 24, 2026
90a9f06
🐛 fix(SHACL-core): improve SHACL violation parsing with better error …
kikkomep Mar 25, 2026
635c86b
Merge pull request #155 from kikkomep/fix/issue-128
kikkomep Mar 25, 2026
46ae808
Merge branch 'develop' into fix/issue-120
kikkomep Mar 25, 2026
517175d
Merge pull request #154 from kikkomep/fix/issue-120
kikkomep Mar 25, 2026
740266c
chore(SHACL-core): reformat and clean up
kikkomep Mar 26, 2026
7c89093
test(SHACL-core): :white_check_mark: add unit/integration tests
kikkomep Mar 27, 2026
867fe53
fix(ro-crate): :adhesive_bandage: refine file descriptor selector
kikkomep Feb 24, 2026
1a91aa4
fix(ro-crate): :bug: target metadata descriptor shapes by class (inst…
kikkomep Feb 24, 2026
2c6ea76
feat(ro-crate): :sparkles: refine constraint enforcing metadata descr…
kikkomep Feb 24, 2026
2abb275
test(ro-crate): :white_check_mark: test RO-Crate with `@base` set
kikkomep Mar 30, 2026
39bd761
fix(ro-crate): :bug: use SPARQL target to select the candidate RO-Cra…
kikkomep Mar 30, 2026
8219f27
fix(ro-crate): :adhesive_bandage: define `ROCrateMetadataFileDescript…
kikkomep Apr 1, 2026
ccb04a7
fix(core): :adhesive_bandage: remove extra quote
kikkomep Apr 1, 2026
61ddbb5
refactor(ro-crate): :wrench: relax ROCrateMetadataFileDescriptor clas…
kikkomep Apr 1, 2026
57f5c54
fix(shacl): extract @base from JSON-LD for ontology parsing
kikkomep Apr 1, 2026
6ab03fe
test(ro-crate): :wrench: reconfigure test dataset
kikkomep Apr 1, 2026
146a522
fix: add empty directory to example crate
kikkomep Apr 7, 2026
f5d3256
Merge pull request #159 from kikkomep/fix/issue-137
kikkomep Apr 7, 2026
1c1a0c4
Merge pull request #158 from kikkomep/fix/issue-142
kikkomep Apr 8, 2026
523fbf4
fix(core): :lipstick: fix output formatting
kikkomep Apr 20, 2026
8906c5d
Merge pull request #160 from kikkomep/fix/output-formatting
kikkomep Apr 20, 2026
d565c5d
ci(gh-actions): :arrow_up: update outdated GitHub Actions
kikkomep Apr 20, 2026
c534b74
chore(deps): :lock: update lock file
kikkomep Apr 20, 2026
81d5d89
chore: :bookmark: bump version number to 0.9
kikkomep Apr 20, 2026
1a1f548
Merge pull request #161 from kikkomep/ci/fix-outdated-github-actions
kikkomep Apr 20, 2026
8306911
Merge pull request #162 from kikkomep/prepare-release/0.9
kikkomep Apr 20, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,9 @@ jobs:
steps:
# Access the tag from the first workflow's outputs
- name: ⬇️ Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: 🐍 Set up Python
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: "3.x"
- name: 🚧 Set up Python Environment
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/testing.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,9 +46,9 @@ jobs:
steps:
# Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
- name: ⬇️ Checkout code
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: 🐍 Set up Python v${{ env.PYTHON_VERSION }}
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: 🔽 Install flake8
Expand All @@ -65,9 +65,9 @@ jobs:
needs: [lint]
steps:
- name: ⬇️ Checkout
uses: actions/checkout@v4
uses: actions/checkout@v6
- name: 🐍 Set up Python v${{ env.PYTHON_VERSION }}
uses: actions/setup-python@v5
uses: actions/setup-python@v6
with:
python-version: ${{ env.PYTHON_VERSION }}
- name: 🔄 Upgrade pip
Expand Down
884 changes: 506 additions & 378 deletions poetry.lock

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[tool.poetry]
name = "roc-validator"
version = "0.8.1"
version = "0.9.0"
description = "A Python package to validate RO-Crates"
authors = [
"Marco Enrico Piras <kikkomep@crs4.it>",
Expand Down
39 changes: 36 additions & 3 deletions rocrate_validator/cli/commands/validate.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from rich.rule import Rule

from rocrate_validator.utils import log as logging
from rocrate_validator import services
from rocrate_validator import constants, services
from rocrate_validator.cli.commands.errors import handle_error
from rocrate_validator.cli.main import cli
from rocrate_validator.cli.ui.text.validate import ValidationCommandView
Expand Down Expand Up @@ -205,6 +205,29 @@ def validate_uri(ctx, param, value):
show_default=True,
help="Width of the output line",
)
@click.option(
'--cache-max-age',
type=click.INT,
default=constants.DEFAULT_HTTP_CACHE_MAX_AGE,
show_default=True,
help="Maximum age of the HTTP cache in seconds ([bold green]-1[/bold green] for no expiration)",
Comment thread
douglowe marked this conversation as resolved.
)
@click.option(
'--cache-path',
type=click.Path(),
default=None,
show_default=True,
help="Path to the HTTP cache directory",
)
@click.option(
'-nc',
'--no-cache',
is_flag=True,
help="Disable the HTTP cache",
default=False,
show_default=True,
hidden=True
)
@click.pass_context
def validate(ctx,
profiles_path: Path = DEFAULT_PROFILES_PATH,
Expand All @@ -223,7 +246,10 @@ def validate(ctx,
verbose: bool = False,
output_format: str = "text",
output_file: Optional[Path] = None,
output_line_width: Optional[int] = None):
output_line_width: Optional[int] = None,
cache_max_age: int = constants.DEFAULT_HTTP_CACHE_MAX_AGE,
cache_path: Optional[Path] = None,
no_cache: bool = False):
"""
[magenta]rocrate-validator:[/magenta] Validate a RO-Crate against a profile
"""
Expand All @@ -247,6 +273,11 @@ def validate(ctx,
logger.debug("fail_fast: %s", fail_fast)
logger.debug("no fail fast: %s", not fail_fast)

# Cache settings
logger.debug("cache_max_age: %s", cache_max_age)
logger.debug("cache_path: %s", os.path.abspath(cache_path) if cache_path else None)
logger.debug("no_cache: %s", no_cache)

if rocrate_uri:
logger.debug("rocrate_path: %s", os.path.abspath(rocrate_uri))

Expand Down Expand Up @@ -282,7 +313,9 @@ def validate(ctx,
"rocrate_relative_root_path": relative_root_path,
"abort_on_first": fail_fast,
"skip_checks": skip_checks_list,
"metadata_only": metadata_only
"metadata_only": metadata_only,
"cache_max_age": cache_max_age if not no_cache else -1,
Comment thread
douglowe marked this conversation as resolved.
"cache_path": cache_path
}

# Print the application header
Expand Down
2 changes: 1 addition & 1 deletion rocrate_validator/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -87,5 +87,5 @@
JSON_OUTPUT_FORMAT_VERSION = "0.2"

# Http Cache Settings
DEFAULT_HTTP_CACHE_TIMEOUT = 60
DEFAULT_HTTP_CACHE_MAX_AGE = 300 # in seconds
DEFAULT_HTTP_CACHE_PATH_PREFIX = '/tmp/rocrate_validator_cache'
24 changes: 17 additions & 7 deletions rocrate_validator/models.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,9 @@
import enum_tools
from rdflib import RDF, RDFS, Graph, Namespace, URIRef

from rocrate_validator.utils import log as logging
from rocrate_validator import __version__
from rocrate_validator.constants import (DEFAULT_ONTOLOGY_FILE,
from rocrate_validator.constants import (DEFAULT_HTTP_CACHE_MAX_AGE,
DEFAULT_ONTOLOGY_FILE,
DEFAULT_PROFILE_IDENTIFIER,
DEFAULT_PROFILE_README_FILE,
IGNORED_PROFILE_DIRECTORIES,
Expand All @@ -48,11 +48,13 @@
ROCrateMetadataNotFoundError)
from rocrate_validator.events import Event, EventType, Publisher, Subscriber
from rocrate_validator.rocrate import ROCrate
from rocrate_validator.utils.collections import (MapIndex)
from rocrate_validator.utils import log as logging
from rocrate_validator.utils.collections import MapIndex, MultiIndexMap
from rocrate_validator.utils.http import HttpRequester
from rocrate_validator.utils.paths import get_profiles_path
from rocrate_validator.utils.python_helpers import get_requirement_name_from_file
from rocrate_validator.utils.python_helpers import \
get_requirement_name_from_file
from rocrate_validator.utils.uri import URI
from rocrate_validator.utils.collections import MultiIndexMap

# set the default profiles path
DEFAULT_PROFILES_PATH = get_profiles_path()
Expand Down Expand Up @@ -1774,7 +1776,7 @@ def update(self, event: Event, ctx: Optional[ValidationContext] = None) -> None:
logger.debug("Validation ended with result: %s", event.validation_result)

def to_dict(self) -> dict:
""""
"""
Get the computed validation statistics as a dictionary
"""
return {
Expand Down Expand Up @@ -2388,11 +2390,19 @@ class ValidationSettings:
metadata_dict: dict = None
#: Verbose output
verbose: bool = False
#: Cache max age in seconds
cache_max_age: Optional[int] = DEFAULT_HTTP_CACHE_MAX_AGE
#: Cache path
cache_path: Optional[Path] = None

def __post_init__(self):
# if requirement_severity is a str, convert to Severity
if isinstance(self.requirement_severity, str):
self.requirement_severity = Severity[self.requirement_severity]
# initialize the HTTP cache
HttpRequester.initialize_cache(cache_path=self.cache_path, cache_max_age=self.cache_max_age)
logger.debug("HTTP cache initialized at %s with max age %s seconds",
self.cache_path, self.cache_max_age)

def to_dict(self):
"""
Expand Down Expand Up @@ -2637,7 +2647,7 @@ def detect_rocrate_profiles(self) -> list[Profile]:
if len(unmatched_profiles) > 0:
logger.warning(
"The conformance to the following profiles could not be verified: %s",
unmatched_profiles,
", ".join(unmatched_profiles),
)
return candidate_profiles

Expand Down
115 changes: 95 additions & 20 deletions rocrate_validator/profiles/ro-crate/must/0_file_descriptor_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,14 @@
# See the License for the specific language governing permissions and
# limitations under the License.

import re
from typing import Any
from urllib.parse import urljoin

from rocrate_validator.utils import log as logging
from rocrate_validator.models import ValidationContext
from rocrate_validator.requirements.python import (PyFunctionCheck, check,
requirement)
from rocrate_validator.utils import log as logging
from rocrate_validator.utils.http import HttpRequester

# set up logging
Expand Down Expand Up @@ -86,17 +88,80 @@ class FileDescriptorJsonLdFormat(PyFunctionCheck):
The file descriptor MUST be a valid JSON-LD file
"""

def __get_remote_context__(self, context_uri: str) -> object:
raw_data = HttpRequester().get(context_uri, headers={"Accept": "application/ld+json, application/json"})
if raw_data.status_code != 200:
raise RuntimeError(f"Unable to retrieve the JSON-LD context '{context_uri}'", self)
Comment thread
douglowe marked this conversation as resolved.
logger.debug(f"Retrieved context from {context_uri}")

# Check if the response header contains the correct content type
content_type = raw_data.headers.get("Content-Type", "")
is_valid_content_type = "application/ld+json" in content_type or "application/json" in content_type
# If the content type is not application/ld+json or application/json,
# try to find an alternate link for the JSON-LD context in the response header
if not is_valid_content_type:
logger.debug(
f"The retrieved context from {context_uri} "
f"does not have a Content-Type of application/ld+json or application/json: "
f"the actual Content-Type is {content_type}. "
)
# check if the response header contains an alternate link location for the JSON-LD context
# (https headers are case-insensitive, according to RFC 7230,
# so we can use .get() without worrying about the case)
link_header = raw_data.headers.get("Link", "")
logger.debug(f"Checking Link header for alternate JSON-LD context: {link_header}")
has_alternate_link = ('rel="alternate"' in link_header and
('type="application/ld+json"' in link_header or
'type="application/json"' in link_header))

if has_alternate_link:
logger.debug(f"Found alternate link for JSON-LD context in Link header: {link_header}")
# extract the URL of the alternate link
match = re.search(r'<([^>]+)>;\s*rel="alternate";\s*type="application/(ld\+json|json)"', link_header)
if match:
alternate_url = match.group(1)
# If the alternate URL is relative, resolve it against the original context URI
if not alternate_url.startswith("http"):
alternate_url = urljoin(context_uri, alternate_url)
logger.debug(f"Trying to retrieve JSON-LD context from alternate URL: {alternate_url}")
raw_data = HttpRequester().get(alternate_url, headers={
"Accept": "application/ld+json, application/json"})
if raw_data.status_code != 200:
raise RuntimeError(
f"Unable to retrieve the JSON-LD context from alternate URL '{alternate_url}'", self)
logger.debug(f"Retrieved context from alternate URL {alternate_url}")
content_type = raw_data.headers.get("Content-Type", "")
if "application/ld+json" not in content_type and "application/json" not in content_type:
raise RuntimeError(
f"The retrieved context from alternate URL {alternate_url} "
"does not have a Content-Type of application/ld+json or application/json: "
f"the actual Content-Type is {content_type}. ", self)
else:
logger.debug(f"No valid alternate link found in Link header: {link_header}")
raise RuntimeError(
f"Unable to retrieve the JSON-LD context from {context_uri} and no valid "
f"alternate link found in Link header: {link_header}", self)
else:
logger.debug(f"No alternate link for JSON-LD context found in Link header: {link_header}")
raise RuntimeError(
f"Unable to retrieve the JSON-LD context from {context_uri} "
f"and no alternate link found in Link header: {link_header}", self)

# Try to parse the JSON-LD and access the context
jsonLD = raw_data.json()["@context"]
# logger.warning(f"Retrieved JSON-LD context: {jsonLD}")
assert isinstance(jsonLD, dict)
# return the JSON-LD context
return jsonLD

def __check_remote_context__(self, context_uri: str) -> bool:
# Try to retrieve the context
try:
raw_data = HttpRequester().get(context_uri, headers={"Accept": "application/ld+json"})
if raw_data.status_code != 200:
raise RuntimeError(f"Unable to retrieve the JSON-LD context '{context_uri}'", self)
logger.debug(f"Retrieved context from {context_uri}")

# Try to parse the JSON-LD and access the context
jsonLD = raw_data.json()["@context"]
assert isinstance(jsonLD, dict)
jsonLD = self.__get_remote_context__(context_uri)
assert isinstance(
jsonLD, dict), f"The retrieved context from {context_uri} is not \
a valid JSON-LD context: it is not a dictionary"
return True
except Exception as e:
if logger.isEnabledFor(logging.DEBUG):
Expand Down Expand Up @@ -306,16 +371,8 @@ def __get_remote_context_keys__(self, context_uri: str) -> set:
""" Get the keys of the context URI """

logger.debug(f"Retrieving context from {context_uri}...")
# Try to retrieve the context
raw_data = HttpRequester().get(context_uri, headers={"Accept": "application/ld+json"})
if raw_data.status_code != 200:
raise RuntimeError(f"Unable to retrieve the JSON-LD context '{context_uri}'")

logger.debug(f"Retrieved context from {context_uri}")

# Get the keys of the context
jsonLD = raw_data.json()
jsonLD_ctx = jsonLD["@context"]
jsonLD_ctx = self.__get_remote_context__(context_uri)
if not isinstance(jsonLD_ctx, dict):
raise RuntimeError("The context is not a dictionary", self)
return set(jsonLD_ctx.keys())
Expand All @@ -339,9 +396,27 @@ def add_unexpected_key(k: str, u_keys: dict) -> None:
# If the entity is a dictionary, check each key
if isinstance(entity, dict):
for k, v in entity.items():
if k not in context_keys and k not in SKIP_KEYS:
# If the key is in the skip keys, skip it
if k in SKIP_KEYS:
logger.debug(f"Key {k} is a reserved JSON-LD keyword, skipping")

# If the key is not in the context keys,
# it can be used in compacted format only if it is a valid compact IRI
# with a prefix that is in the context
elif k not in context_keys:
logger.debug(f"Key {k} not in context keys")
add_unexpected_key(k, unexpected_keys)

# Try to get the prefix of the compact IRI, if it has one
prefix = k.split(":", 1)[0] if ":" in k else None
logger.debug(f"Checking prefix {prefix} of key {k}")
# If the key does not have a prefix (no colon) or the prefix is not in the context keys,
# it cannot be used as a key in compacted format
if prefix is None or prefix not in context_keys:
logger.debug(
f"Key {k} does not have a valid prefix in context keys, adding to unexpected keys")
add_unexpected_key(k, unexpected_keys)

# If the value is a dictionary or a list, check its keys recursively
if isinstance(v, (dict, list)):
self.__check_entity_keys__(v, context_keys, unexpected_keys)

Expand Down Expand Up @@ -382,7 +457,7 @@ def check_compaction(self, context: ValidationContext) -> bool:
# Check if k is a term or a URI
if k.startswith("http"):
context.result.add_issue(
f'The The {v} occurrence{suffix} of the "{k}" URI cannot be used as a key{suffix} "'
f'The {v} occurrence{suffix} of the "{k}" URI cannot be used as a key{suffix} "'
'because the compacted format requires simple terms as keys '
'(see https://www.w3.org/TR/json-ld-api/#compaction for more details).', self)
else:
Expand Down
Loading