Skip to content

openclimatefix/data-platform

Repository files navigation

Data Platform

GRPC API and and database handler for storing and serving energy forecast data.

Shows a bar chart with benchmark results.

The Data Platform is a gRPC API server that provides efficient access to, and storage of, renewable energy forecast data. It has been architected to be performant under the specific workflows and data access patterns required by OCF's applications, in order to enable scaling, and to improve the developer experience when integrating with OCF's stack. With this in mind, there is a focus on not just the quality of the code, but also of the tooling surrounding the codebase. This replaces the old SQLAlchemy datamodel repositories and databases.

Quickstart

Running the server

The Data Platform gRPC API server is packaged for portability as a container. This can be run using a container orchestration tool, e.g. with Docker:

$ docker run -p 50051:50051 ghcr.io/openclimatefix/data-platform

Alternatively, it can be run locally using Go. See Local Running in the Development section.

Once running, the server RPCs can be investigated using a gRPC client tool.

Configuration

To connect to a backend database and have retention in the platform data, the server must be appropriately configured via environment variables. All available options are defined via the configuration file in cmd/server.conf.

Important

Whilst the configuration is held in a file, this is NOT intended to be overwritten or modified in order to configure the Data Platform. Configuration should always be handled via environment variables; the config file is simply provided as a version-controlled single point of reference for what those variables might be.

The available configuration may differ between versions of the Data Platform. Ensure you check the correct version of the configuration file for your deployment.

Connecting a client

There is an example Python script demonstrating how to use the Python bindings in a client to a Data Platform server. The example runs through a data analysis workflow. To run it, ensure first that the Data Platform Server is running on localhost:50051 (see Getting Started); and that the python bindings have been generated (see Generating Code). Then use uvx to run the notebook:

$ make gen.proto.python
$ uvx marimo edit --headless --sandbox examples/python-notebook/example.py 

For ease, the above process is wrapped in a Makefile target:

$ make run.notebook

Architecture

The Data Platform has clear separation boundaries between its components:

                +-------------------------------------------------------------+
                |                     Data Platform Server                    |
                +-------------------+                     +-------------------+
--- Clients --> | External Schema   | <-- Server Impl --- | Database Schema   | <-- Database
                +-------------------+                     +-------------------+
                |                                                             |
                +-------------------------------------------------------------+

gRPC API schema

The Data Platform defines a strongly typed data contract as its external interface, served via gRPC. This is the API that any external clients have to use to interact with the platform. The schema for this is defined in Protocol Buffers, located at proto/ocf/dp.

Boilerplate code for client and server implementations is generated in the required language from these .proto files using the protoc compiler.

Important

Changes to the schema modifies the data contract, and may require client and server implementations to regenerate their bindings and update their code. As such they should be made with purpose and care, and aim to be backwards compatible whenever the affect the hot path.

Database schema

The Data Platform can be configured to use different database backends. Each backend has a server implementation that inherits the External Schema. The currently supported backends are:

  • PostgreSQL
  • Dummy (a memoryless backend for quick testing)

and are selected according to the relevant environment variables (see the Configration section).

The schema for the PostgreSQL backend is defined using PostgreSQL's native SQL dialect in the internal/server/postgres/sql/migrations directory, and access functions to the data are defined in internal/server/postgres/sql/queries.

Boilerplate code for using these queries is generated using the sqlc tool. This generated code provides a strongly typed interface to the database.

Note

These changes can be made without having to update the data contract, and so will not require updates to clients using the Data Platform.

Having the queries defined in SQL allows for more efficient interaction with the database, as they can be written to take advantage of the design of the database's features and be written to be optimal with regards to its indexes.

Important

If using PostgreSQL as a backend, it is recommended that you tune your database instance according to the specifications of said instance (available CPU and RAM etc). This will ensure optimal performance for the Data Platform server.

Server

The Database Schema is mapped to the External Schema by implementing the server interface generated from the Data Contract. This is done in internal/server/<database>/serverimpl.go. It isn't much more than a conversion layer, with the business logic shared between the implemented functions and the SQL queries.

Development

Getting Started

This project requires the Go Toolchain to be installed.

Note

This project uses Go modules for dependency management. Ensure that your PATH environment variable has been updated to include the Go binary installation location, as per the instructions linked above, otherwise you may see errors.

Clone the repository, then run

$ make init

This will fetch the dependencies, and install the git hooks required for development.

Important

Since this project is uses lots of generated code, these hooks are vital to keep this generated code up to date, and as such running make init is a necessary step towards a smooth development experience.

Local running

The server can be run locally with no database connection via a fake database implementation via a Make target. This is recommended as it will ensure that code generation is up to date and that the running version has been embedded into the built binary.

$ make run

This will start the Data Platform API GRPC's server on localhost:50051. The RPCs can then be investigated using a tool such as grpcurl or grpcui. In this testing mode, the data returned by the server is entirely generated and has little bearing on the request objects themselves.

There is also an example Docker compose file in examples/docker-compose.yml, which runs the Data Platform API server in a container, backed by Postgres, and which also includes a GRPC UI for testing.

Testing

Unit tests can be run using make test. Benchmarks can be run using make bench. Both of these utilise TestContainers, so ensure you meet their general system requirements.

Generating Code

In order to make changes to the SQL queries, or add a new Database migration, you will need to add or modify the relevant .sql files in the sql directory. Then, regenerate the Go library code to reflect these changes. This can be done using

$ make gen

This will populate the internal/server/postgres/gen directory with language-specific bindings for implementations of server and client code. Next, update the serverimpl.go file for the given database to use the newly generated code, and ensure the test suite passes. Since the Data Platform container automatically migrates the database on startup, simply re-deploying the container will propagate the changes to your deployment environment.

In order to change the Data Contract, you will need to modify the .proto files in the proto directory, and regenerate the code. GRPC client/server interfaces - and boilerplate code - gets generated from these Protocol Buffer definitions. The make gen target already handles generating the go code used internall in the application, placing generated code in internal/gen.

Language-specific client/server bindings for external applications are generated as part of the CI pipeline, but can also be generated manually, e.g. for python

$ make gen.proto.python

This places the generated code in gen/python. See the Makefile for more external targets.

GRPC API Documentation

Messages (ocf/dp/dp-data.messages.proto)

CreateForecastRequest

Field Type Label Description
forecaster Forecaster

CreateForecastRequest.ForecastValue

Field Type Label Description
horizon_mins uint32

CreateForecastRequest.ForecastValue.OtherStatisticsFractionsEntry

Field Type Label Description
key string

CreateForecastResponse

Field Type Label Description
forecast_uuid string

CreateForecasterRequest

Field Type Label Description
name string

CreateForecasterResponse

Field Type Label Description
forecaster Forecaster

CreateLocationRequest

Field Type Label Description
location_name string

CreateLocationResponse

Field Type Label Description
location_uuid string

CreateObservationsRequest

Field Type Label Description
location_uuid string

CreateObservationsRequest.Value

Field Type Label Description
timestamp_utc google.protobuf.Timestamp

CreateObservationsResponse

CreateObserverRequest

Field Type Label Description
name string

CreateObserverResponse

Field Type Label Description
observer_uuid string

DeleteForecastRequest

Field Type Label Description
location_uuid string

DeleteForecastResponse

ForecastDatum

Field Type Label Description
init_timestamp google.protobuf.Timestamp

ForecastDatum.MetadataEntry

Field Type Label Description
key string

ForecastDatum.OtherStatisticsFractionsEntry

Field Type Label Description
key string

Forecaster

Forecaster represents a generative source of predicted values.

Field Type Label Description
forecaster_name string

GetForecastAsTimeseriesRequest

Field Type Label Description
location_uuid string

GetForecastAsTimeseriesResponse

Field Type Label Description
location_uuid string

GetForecastAsTimeseriesResponse.Value

Field Type Label Description
target_timestamp_utc google.protobuf.Timestamp

GetForecastAsTimeseriesResponse.Value.OtherStatisticsFractionsEntry

Field Type Label Description
key string

GetForecastAtTimestampRequest

Field Type Label Description
location_uuids string repeated

GetForecastAtTimestampResponse

Field Type Label Description
timestamp_utc google.protobuf.Timestamp

GetForecastAtTimestampResponse.Value

Field Type Label Description
location_uuid string

GetLatestForecastsRequest

Field Type Label Description
location_uuid string

GetLatestForecastsResponse

Field Type Label Description
forecasts GetLatestForecastsResponse.Forecast repeated

GetLatestForecastsResponse.Forecast

Field Type Label Description
initialization_timestamp_utc google.protobuf.Timestamp

GetLatestObservationsRequest

Field Type Label Description
location_uuids string repeated

GetLatestObservationsResponse

Field Type Label Description
observations GetLatestObservationsResponse.Observation repeated

GetLatestObservationsResponse.Observation

Field Type Label Description
location_uuid string

GetLocationAsTimeseriesRequest

Field Type Label Description
location_uuid string

GetLocationAsTimeseriesResponse

Field Type Label Description
values GetLocationAsTimeseriesResponse.LocationSnapshot repeated

GetLocationAsTimeseriesResponse.LocationSnapshot

Field Type Label Description
timestamp_utc google.protobuf.Timestamp

GetLocationRequest

Field Type Label Description
location_uuid string

GetLocationResponse

Field Type Label Description
location_uuid string

GetLocationsAsGeoJSONRequest

Field Type Label Description
location_uuids string repeated

GetLocationsAsGeoJSONResponse

Field Type Label Description
geojson string

GetObservationsAsTimeseriesRequest

Field Type Label Description
location_uuid string

GetObservationsAsTimeseriesResponse

Field Type Label Description
location_uuid string

GetObservationsAsTimeseriesResponse.Value

Field Type Label Description
timestamp_utc google.protobuf.Timestamp

GetObservationsAtTimestampRequest

Field Type Label Description
location_uuids string repeated

GetObservationsAtTimestampResponse

Field Type Label Description
timestamp_utc google.protobuf.Timestamp

GetObservationsAtTimestampResponse.Value

Field Type Label Description
location_uuid string

GetWeekAverageDeltasRequest

Field Type Label Description
location_uuid string

GetWeekAverageDeltasResponse

Field Type Label Description
deltas GetWeekAverageDeltasResponse.AverageDelta repeated

GetWeekAverageDeltasResponse.AverageDelta

Field Type Label Description
horizon_mins uint32

ListForecastersRequest

Field Type Label Description
forecaster_names_filter string repeated Optional filter to only return forecasters from a given set. If empty, all forecasters will be returned.

ListForecastersResponse

Field Type Label Description
forecasters Forecaster repeated

ListLocationsRequest

Field Type Label Description
energy_source_filter EnergySource optional Optional filter to only return locations of a specific energy source.

ListLocationsResponse

Field Type Label Description
locations ListLocationsResponse.LocationSummary repeated

ListLocationsResponse.LocationSummary

Field Type Label Description
location_uuid string

ListObserversRequest

Field Type Label Description
observer_names_filter string repeated Optional filter to only return observers from a given set. If empty, all observers will be returned.

ListObserversResponse

Field Type Label Description
observers ListObserversResponse.ObserverSummary repeated

ListObserversResponse.ObserverSummary

Field Type Label Description
observer_uuid string

StreamForecastDataRequest

Field Type Label Description
location_uuids string repeated

StreamForecastDataResponse

Field Type Label Description
values ForecastDatum repeated

TimeWindow

Field Type Label Description
start_timestamp_utc google.protobuf.Timestamp The start of the time window, inclusive. Cannot be more than 7 days before end_timestamp_utc, nor more than 1 month in the future.

UpdateForecasterRequest

Field Type Label Description
name string

UpdateForecasterResponse

Field Type Label Description
forecaster Forecaster

UpdateLocationRequest

Field Type Label Description
location_uuid string

UpdateLocationResponse

Field Type Label Description
location_uuid string

DataPlatformDataService (ocf/dp/dp-data.service.proto)

GetForecastAsTimeseries

GetForecastTimeseries fetches a 1-D horizontal slice of predicted data. These values can either come from a sample of many forecasts; or from one specific forecast. In the case of the sample, values whose timestamps are shared across overlapping forecasts are cherry-picked based on the lowest allowable lead time (horizon).

GetForecastAsTimeseriesRequest / GetForecastAsTimeseriesResponse

GetForecastAtTimestamp

GetForecastAtTimestamp fetches a 1-D vertical slice of predicted data. Useful for spatial snapshots at a given time, for instance to display on a map.

GetForecastAtTimestampRequest / GetForecastAtTimestampResponse

GetObservationsAsTimeseries

GetObservationsAsTimeseries fetches a 1-D horizontal slice of observed data. It is the observations analogue of GetForecastAsTimeseries.

GetObservationsAsTimeseriesRequest / GetObservationsAsTimeseriesResponse

GetObservationsAtTimestamp

GetObservationAtTimestamp fetches a 1-D vertical slice of observation data. It is the observations analogue of GetForecastsAtTimestamp.

GetObservationsAtTimestampRequest / GetObservationsAtTimestampResponse

GetLocation

GetLocation fetches a snapshot of information about a specific location at a point in time. It can also optionally return the geometry of the location.

GetLocationRequest / GetLocationResponse

GetLocationAsTimeseries

GetLocationAsTimeseries fetches the history of a location across a given time window.

GetLocationAsTimeseriesRequest / GetLocationAsTimeseriesResponse

CreateLocation

CreateLocation registers a new location in which to log or forecast generation.

CreateLocationRequest / CreateLocationResponse

UpdateLocation

UpdateLocation modifies various attributes associated with a given location.

UpdateLocationRequest / UpdateLocationResponse

ListLocations

ListLocations fetches a list of registered locations that match the supplied filters.

ListLocationsRequest / ListLocationsResponse

CreateForecaster

CreateForecaster registers a new forecaster. A forecaster is a producer of predicted values. Forecasters are differentiated by their name and version.

CreateForecasterRequest / CreateForecasterResponse

UpdateForecaster

UpdateForecaster modifies the version of an existing forecaster.

UpdateForecasterRequest / UpdateForecasterResponse

ListForecasters

ListForecasters fetches a list of registered forecasters that match the supplied filters.

ListForecastersRequest / ListForecastersResponse

CreateForecast

CreateForecast saves a timeseries of predicted values from a given forecaster.

CreateForecastRequest / CreateForecastResponse

GetLatestForecasts

GetLatestForecasts fetches metadata for the most recently produced forecasts.

GetLatestForecastsRequest / GetLatestForecastsResponse

DeleteForecast

DeleteForecast removes a series of forecast values from the database.

DeleteForecastRequest / DeleteForecastResponse

CreateObserver

CreateObserver registers a new observer. An observer is a producer of observed, or measured, values.

CreateObserverRequest / CreateObserverResponse

ListObservers

ListObservers fetches a list of registered observers that match the supplied filters.

ListObserversRequest / ListObserversResponse

CreateObservations

CreateObservations saves a timeseries of observed values from a given observer.

CreateObservationsRequest / CreateObservationsResponse

GetLatestObservations

GetLatestObservation fetches the most recent observation for a given location and observer.

GetLatestObservationsRequest / GetLatestObservationsResponse

GetLocationsAsGeoJSON

GetLocationsAsGeoJSON fetches a given set of locations as GeoJSON, suitable for display on a map or for integration with GIS software.

GetLocationsAsGeoJSONRequest / GetLocationsAsGeoJSONResponse

GetWeekAverageDeltas

GetWeekAverageDeltas fetches the average delta at the given init time over the past week. This is useful for making adjustments based on recent performance.

GetWeekAverageDeltasRequest / GetWeekAverageDeltasResponse

StreamForecastData

StreamForecastData streams forecast data for a given location, forecasters, and time range. Useful for analytics and performance monitoring.

StreamForecastDataRequest / StreamForecastDataResponse stream

Messages (ocf/dp/dp.common.proto)

LatLng

LatLng represents a WSG84 coordinate pair. Float precision enables a resolution of about 1cm, which is more precise than we'll ever have data for.

Field Type Label Description
latitude float

Enums (ocf/dp/dp.common.proto)

EnergySource

EnergySource indicates the type of energy generation. NOTE: These enum numbers are used to find the corresponding entry in the postgres database. Do not change without considering this first!

Name Number Description
ENERGY_SOURCE_UNSPECIFIED 0
ENERGY_SOURCE_SOLAR 1
ENERGY_SOURCE_WIND 2

LocationType

LocationType indicates the type of location. NOTE: These enum numbers are used to find the corresponding entry in the postgres database. Do not change without considering this first! The values are spaced apart in order to allow for future expansion.

Name Number Description
LOCATION_TYPE_UNSPECIFIED 0
LOCATION_TYPE_SITE 1
LOCATION_TYPE_GSP 2
LOCATION_TYPE_DNO 3
LOCATION_TYPE_NATION 4
LOCATION_TYPE_STATE 5
LOCATION_TYPE_COUNTY 6
LOCATION_TYPE_CITY 7
LOCATION_TYPE_PRIMARY_SUBSTATION 8

Permission

Permission indicates the level of access a user has to a resource. NOTE: These enum numbers are used to find the corresponding entry in the postgres database. Do not change without considering this first!

Name Number Description
PERMISSION_UNSPECIFIED 0
PERMISSION_READ 1
PERMISSION_WRITE 2

Scalar Value Types

.proto Type Notes C++ Java Python Go C# PHP Ruby
double double double float float64 double float Float
float float float float float32 float float Float
int32 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint32 instead. int32 int int int32 int integer Bignum or Fixnum (as required)
int64 Uses variable-length encoding. Inefficient for encoding negative numbers – if your field is likely to have negative values, use sint64 instead. int64 long int/long int64 long integer/string Bignum
uint32 Uses variable-length encoding. uint32 int int/long uint32 uint integer Bignum or Fixnum (as required)
uint64 Uses variable-length encoding. uint64 long int/long uint64 ulong integer/string Bignum or Fixnum (as required)
sint32 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int32s. int32 int int int32 int integer Bignum or Fixnum (as required)
sint64 Uses variable-length encoding. Signed int value. These more efficiently encode negative numbers than regular int64s. int64 long int/long int64 long integer/string Bignum
fixed32 Always four bytes. More efficient than uint32 if values are often greater than 2^28. uint32 int int uint32 uint integer Bignum or Fixnum (as required)
fixed64 Always eight bytes. More efficient than uint64 if values are often greater than 2^56. uint64 long int/long uint64 ulong integer/string Bignum
sfixed32 Always four bytes. int32 int int int32 int integer Bignum or Fixnum (as required)
sfixed64 Always eight bytes. int64 long int/long int64 long integer/string Bignum
bool bool boolean boolean bool bool boolean TrueClass/FalseClass
string A string must always contain UTF-8 encoded or 7-bit ASCII text. string String str/unicode string string string String (UTF-8)
bytes May contain any arbitrary sequence of bytes. string ByteString str []byte ByteString string String (ASCII-8BIT)

About

Data API and Storage Platform for OCF's Forecasts

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors