The current .nc files are just the standard IO type used by PyPSA. For this app they have tons of disadvantages and are just not made for it. PyPSA can't lazy load at all currently, for a small plot the full .nc file needs to be loaded into memory. The schema is very PyPSA specific and isn't cleanly structured and can't be easily expanded. ...
Proposed new data record/ schema:
- Schema derived by PyPSA, owned by App so we can adapt as needed.
- A data record is just an individual representation with data of that schema.
- Processing multiple records (queries for analytics, but could be for anything) can just simply work by pointing to N records, when the query and processing generalises over multiple dimensions. I should be able to get the energy balance of a single data record as well as multiple data records combined via the same defined query.
- Multiple scenarios are not represented by a single record. But the same paros logic can work on multiple records, which allows processing of multiple scenarios.
DataRecordA/
├── manifest.json # version, attribute catalog — immutable, user may not change anything here
├── snapshots.parquet
├── periods.parquet
├── components.parquet
├── scenarios.parquet # for stochastic optimization, not workflow scenarios
├── data/
│ └── <attr>.parquet # ComponentType | component | snapshot | scenario | period | value
└── results/
└── <attr>.parquet # ComponentType | component | snapshot | scenario | period | value
To be discussed/ unclear:
Todos:
The current
.ncfiles are just the standard IO type used by PyPSA. For this app they have tons of disadvantages and are just not made for it. PyPSA can't lazy load at all currently, for a small plot the full.ncfile needs to be loaded into memory. The schema is very PyPSA specific and isn't cleanly structured and can't be easily expanded. ...Proposed new data record/ schema:
To be discussed/ unclear:
Todos: