diff --git a/docs/hands-on.md b/docs/hands-on.md index a8ca8d9..fc71a6e 100644 --- a/docs/hands-on.md +++ b/docs/hands-on.md @@ -5,12 +5,12 @@ This page provides a step-by-step guide to help you run different components of --- ### Tile-based Training Implementation -Refer to the [training component documentation](./training-cwl.md) and run multiple training jobs with various model hyperparameter. +Refer to the [training component documentation](./training-cwl.md) and run multiple training jobs with various model hyperparameters. --- ### Tile-based Inference Implementation -Check the [inference component documentation](./inference-cwl.md) and run inference using different sentinel-2 products in parallel using calrissian or once in a time using cwltool. +Check the [inference component documentation](./inference-cwl.md) and run inference using different Sentinel-2 products in parallel using `calrissian`, or once at the time using `cwltool`. --- diff --git a/docs/index.md b/docs/index.md index 0ac0c99..c75bf80 100644 --- a/docs/index.md +++ b/docs/index.md @@ -3,9 +3,9 @@ ## Introduction -This learning resource demonstrates a machine learning system for classification of Sentinel-2 images into 10 different classes using cloud-native technologies. The system leverages MLFLOW to track the training process and select the best candidate trained model from MLFLOW server. +This learning resource demonstrates a Machine Learning (ML) system for classification of Sentinel-2 images into 10 different classes using cloud-native technologies. The system leverages MLFLOW to track the training process and selects the best candidate trained model from MLFLOW server. -There are two workflows developed one for training a deep learning model classifier on EuroSAT dataset and one for running prediction on a real world Sentinel-2 data. The automation is achieved using Kubernetes-native tools, making the setup scalable, modular, and suitable for Earth observation and geospatial applications. +There are two workflows developed: one for training a deep learning model classifier on EuroSAT dataset, and one for running a prediction on a real world Sentinel-2 data. The automation is achieved using Kubernetes-native tools, making the setup scalable, modular, and suitable for Earth Observation and geospatial applications. @@ -15,8 +15,8 @@ This setup integrates the following technologies and concepts: ### MLFLOW -* Manage end-to-end ML workflows, from development to production -* End-to-end MLOps solution for traditional ML, including integrations with traditional ML models, and Deep learning one. +* Manages end-to-end ML workflows, from development to production +* End-to-end MLOps solution for traditional ML, including integrations with traditional ML models, and Deep learning one * Simple, low-code performance tracking with autologging * State-of-the-art UI for model analysis and comparison @@ -28,7 +28,7 @@ This setup integrates the following technologies and concepts: The system is designed to handle the following flow: -1. Training pipeline: A CNN model trained on [EuroSAT](https://github.com/phelber/EuroSAT) dataset which already exist on a dedicated STAC endpoint. The MLFLOW track the whole process to monitor the life cycle of training. +1. Training pipeline: A CNN model trained on [EuroSAT](https://github.com/phelber/EuroSAT) dataset which already exist on a dedicated STAC endpoint. The MLFLOW tracks the whole process to monitor the life cycle of training. 2. Inference: Run the inference pipeline to perform tile-based classification on Sentinel-2 L1C products. diff --git a/docs/inference-container.md b/docs/inference-container.md index b718abb..ac292c4 100644 --- a/docs/inference-container.md +++ b/docs/inference-container.md @@ -1,18 +1,22 @@ # Inference container: -This module enables users to create an inference pipeline that take a Sentinel-2 STAC Item from the [Planetary Computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections), and generates a binary mask TIFF image using a pre-trained CNN model. For details on how the model was trained, refer to the [training container documentation](./training-container.md). +This module enables users to create an inference pipeline that takes a Sentinel-2 STAC Item from the [Planetary Computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections), and generates a binary mask TIFF image using a pre-trained CNN model. For details on how the model was trained, refer to the [training container documentation](./training-container.md). -## **Make Inference Module:** +## **`Make Inference` Module:** **Inputs**: -- `input_reference`: The reference to a Sentinel-2 product on [planetary computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections). The application will give you an accurate result if the sentinel-2 product has no/low cloud-cover. + +- `input_reference`: A list of Sentinel-2 product references from [Planetary Computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections). Note: the inference application provides accurate results only when the Sentinel-2 product has low or no cloud cover. High cloud coverage may significantly reduce prediction accuracy. **Outputs**: -- `{STAC_ITEM_ID}_classified.tif`: A binary `.tif` image in `COG` format classifies: +- `{STAC_ITEM_ID}_classified.tif`: A binary `.tif` image in `COG` format containing the full-resolution land cover classification predicted by the model, with each pixel assigned to a land cover class as defined in the table below. +- `overview_{STAC_ITEM_ID}_classified.tif`: A binary `.tif` image in `COG` format containing lower-resolution overview of the classification result, generated to support fast visualisation and efficient browsing across zoom levels. +- `STAC objects`: STAC objects related to the provided masks, including STAC Catalog and STAC Item. +*Land Cover Classes* | Class ID | Class Name | |----------|-----------------------| | 0 | AnnualCrop | @@ -27,28 +31,12 @@ This module enables users to create an inference pipeline that take a Sentinel-2 | 9 | SeaLake | | 10 | No Data | -- `overview_{STAC_ITEM_ID}_classified.tif`: A binary `.tif` image in `COG` format classifies: - -| Class ID | Class Name | -|----------|-----------------------| -| 0 | AnnualCrop | -| 1 | Forest | -| 2 | HerbaceousVegetation | -| 3 | Highway | -| 4 | Industrial | -| 5 | Pasture | -| 6 | PermanentCrop | -| 7 | Residential | -| 8 | River | -| 9 | SeaLake | -| 10 | No Data | - -- `STAC objects`: STAC objects related to the provided masks, including STAC catalog and STAC Item. ## How the Application Works -The application begins by reading a Sentinel-2 STAC Item from the [Planetary Computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections). It then filters and selects 12 specific asset references in the order expected by the machine learning model. These assets correspond to common Sentinel-2 bands, as shown below: +The application begins by reading the input Sentinel-2 STAC Item(s) from the [Planetary Computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections) and then extracting the 12 common Sentinel-2 spectral bands (see table below), ordered to match those expected by the trained ML model. +*Sentinel-2 Spectral Bands* | Index | Asset Key | Asset Common Name | |-------|------------|-------------------| | 1 | B01 | Coastal | @@ -64,11 +52,11 @@ The application begins by reading a Sentinel-2 STAC Item from the [Planetary Com | 11 | B11 | SWIR 1 (16) | | 12 | B12 | SWIR 2 (22) | -As a preprocessing step, all selected assets are resampled to a uniform resolution of 10 meters. +As part of the preprocessing, all selected bands are resampled to a consistent spatial resolution of 10 meters. -The pipeline then proceeds with a sliding window approach: it reads and stacks small image chips from the selected bands in the order listed above. These chips are fed into a trained CNN model, which predicts the corresponding class for each chip. +The pipeline then proceeds with a sliding window approach: it reads and stacks small image chips from the resampled bands (in the specified order), forming multi-band input arrays. These image chips are fed to the trained CNN model, which predicts the corresponding LC class for each chip. -Finally, the application generates: -- The classification prediction map (as a GeoTIFF mask) +At the end of the process, the application generates: +- The LC classification prediction map (COG mask) - A visual overview image -- An updated STAC item containing metadata and references to the output files \ No newline at end of file +- An updated STAC Catalog and Item containing metadata and references to the output files. \ No newline at end of file diff --git a/docs/inference-cwl.md b/docs/inference-cwl.md index 36db0fb..2cd44de 100644 --- a/docs/inference-cwl.md +++ b/docs/inference-cwl.md @@ -7,15 +7,15 @@ This Application Package provides a CWL document that performs inference by appl To execute the application, users have the option to use either [cwltool](https://github.com/common-workflow-language/cwltool) or [Calrissian](https://github.com/Duke-GCB/calrissian) as the CWL runner. ## Inputs: -- `input_reference`: A list of reference to a Sentinel-2 product on [planetary computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections). The application will give you an accurate result if the sentinel-2 product has no/low cloud-cover. +- `input_reference`: A list of Sentinel-2 product references from [Planetary Computer](https://planetarycomputer.microsoft.com/api/stac/v1/collections). Note: the inference application provides accurate results only when the Sentinel-2 product has low or no cloud cover. High cloud coverage may significantly reduce prediction accuracy. ## How to Execute the Application Package Before running the application with a CWL runner, make sure to download and use the latest version of the CWL document: ```bash -cd /workspace/machine-learning-process/inference/app-package -VERSION="0.0.4" +cd inference/app-package +VERSION=$(curl -s https://api.github.com/repos/eoap/machine-learning-process/releases/latest | jq -r '.tag_name') curl -L -o "tile-sat-inference.cwl" \ "https://github.com/eoap/machine-learning-process/releases/download/${VERSION}/tile-sat-inference.${VERSION}.cwl" ``` @@ -23,13 +23,13 @@ curl -L -o "tile-sat-inference.cwl" \ ### **Run the Application Package**: There are two methods to execute the application: -- Executing the `tile-sat-inference` app using `cwltool`: +- Executing `tile-sat-inference` using `cwltool`: ```bash cwltool --podman --debug --parallel tile-sat-inference.cwl#tile-sat-inference params.yml ``` -- Executing the `tile-sat-inference` using `calrissian`: +- Executing `tile-sat-inference` using `calrissian`: ```bash @@ -39,16 +39,14 @@ There are two methods to execute the application: > > `kubectl get pods` -## How the CWL document designed: -The CWL file can be triggered using `cwltool` or `calrissian`. The user provides a `params.yml` file that passes all inputs needed by the CWL file to execute the module. The CWL file is designed to execute the module based on the structure below: +## How the CWL document is designed: +The CWL file can be triggered using `cwltool` or `calrissian`. The execution requires a `params.yml` file, which supplies all the necessary inputs defined in the CWL specification. The workflow is structured to run the module according to the diagram outlined below: -![Inference Workflow](imgs/inference.png) +![image](imgs/inference.png "Inference Workflow") -> **`[]`** in the image above indicates that the user may pass a list of parameters to the application package. - -The Application Package will generate a list of directories containing intermediate or final output. The number of folders containing a `{STAC_ITEM_ID}_classified.tif` and the corresponding STAC objects, such as STAC Catalog and STAC Item, depends on the number of input Sentinel-2 items. +The Application Package will generate a number of directories containing intermediate and final outputs. Each directory will contain a `{STAC_ITEM_ID}_classified.tif` file, along with the corresponding STAC objects (i.e. the STAC Catalog and STAC Item). The number of directories depends on the number of input Sentinel-2 products provided. ## Troubleshooting -The user might encounter to memory issues during the execution with CWL Runners(especially with the `cwltool`). This can be addressing by reducing the `ramMax`(e.g. `ramMax: 1000`) parameter in the cwl file. \ No newline at end of file +Users might encounter memory-related issues when executing workflows with CWL Runners (especially with `cwltool`). These issues can often be mitigated by reducing the `ramMax` parameter (e.g. `ramMax: 1000`) specified in the CWL file, which can help prevent excessive memory allocation. \ No newline at end of file diff --git a/docs/insights.md b/docs/insights.md index 5ce3bae..ab71082 100644 --- a/docs/insights.md +++ b/docs/insights.md @@ -10,34 +10,28 @@ It also includes recommendations for future improvements and practical advice fo ### Modular Workflow Templates -Decision: Separate the CWL execution, training pipeline, and inference pipeline into distinct workflow templates. +* Decision: Separate the CWL execution, training pipeline, and inference pipeline into distinct workflow templates. -Outcome: -* Enhanced reusability for other geospatial pipelines requiring similar preprocessing steps. +* Outcome: Enhanced reusability for other geospatial pipelines requiring similar preprocessing steps. ### STAC Integration -Decision: Leverage the STAC API, Geoparquet, and DuckDB for querying and storing geospatial data. - -Outcome: -* Improved interoperability with other geospatial tools and standards. +* Decision: Leverage the STAC API, Geoparquet, and DuckDB for querying and storing geospatial data. +* Outcome: Improved interoperability with other geospatial tools and standards. ### Tracking the process -Decision: Use MLFLOW exclusively for tracking the process of training workflow and selecting the best model candidate. +* Decision: Use MLFLOW exclusively for tracking the process of training workflow and selecting the best model candidate. ### Test inference with Sentinel-2 product -Decision: Use Stars tool to stage-in a sentinel-2 product ready to pass to inference module. - +* Decision: Use Stars tool to stage-in a sentinel-2 product ready to pass to inference module. ## Challenges and Solutions - ### Build Docker Images -Challenge: Initially, we used an [advanced tooling technique](https://github.com/eoap/advanced-tooling) that leveraged **Taskfile** to build a Kaniko-based image and reference the CWL files. The image was then pushed to [ttl.sh](https://ttl.sh/), a temporary image registry. This will help us to execute the application packages using calrissian. However, this process was slow and hard to debug, often failing due to the large size of the Kaniko images. - -Solution: We now push the Docker images to a dedicated GitHub Container Registry. +* Challenge: Initially, we used an [advanced tooling technique](https://github.com/eoap/advanced-tooling) that leveraged **Taskfile** to build a Kaniko-based image and reference the CWL files. The image was then pushed to [ttl.sh](https://ttl.sh/), a temporary image registry. This helps to execute the application packages using `calrissian`. However, this process was slow and hard to debug, often failing due to the large size of the Kaniko images. +* Solution: We now push the Docker images to a dedicated GitHub Container Registry. diff --git a/docs/mlm.md b/docs/mlm.md index 5bb8325..eb92b82 100644 --- a/docs/mlm.md +++ b/docs/mlm.md @@ -1,15 +1,16 @@ -# Describes a trained machine learning model -This Item describe a trained machine learning model using [MLM](https://github.com/stac-extensions/mlm) STAC extension. The STAC Machine Learning Model (MLM) Extension provides a standard set of fields to describe machine learning models trained on overhead imagery and enable running model inference. +# Describes a trained Machine Learning model + +This tutorial describes a trained Machine Learning model using [MLM](https://github.com/stac-extensions/mlm) STAC extension. The STAC MLM Extension provides a standard set of fields to describe machine learning models trained on overhead imagery and enable running model inference. The main objectives of the extension are: -- to enable building model collections that can be searched alongside associated STAC datasets -- record all necessary bands, parameters, modeling artifact locations, and high-level processing steps to deploy an inference service. +- to enable building model collections that can be searched alongside associated STAC datasets; +- to record all necessary bands, parameters, modeling artifact locations, and high-level processing steps to deploy an inference service. For additional information please follow this [Describe-MLmodel](./Describe-MLmodel.md) notebook. ## For developers: -To run the notebook successfully, you must install the dependencies with hatch: +To run the notebook successfully, you must install the dependencies with `hatch`: ``` hatch shell prod diff --git a/docs/packages.md b/docs/packages.md index 8fcde81..adeda37 100644 --- a/docs/packages.md +++ b/docs/packages.md @@ -7,4 +7,4 @@ This tutorial provides two separate application packages: Each application package has its own Docker image, which has been published to a dedicated GitHub Container Registry. -For more details on how each package works, refer to the documentation for [training](./training-container.md) and [inference](./inference-container.md). +For more details on how each package works, refer to the Reference Guides for [training](./training-container.md) and [inference](./inference-container.md). diff --git a/docs/training-container.md b/docs/training-container.md index bc3cd39..3f8cdba 100644 --- a/docs/training-container.md +++ b/docs/training-container.md @@ -1,5 +1,8 @@ # Training a Machine Learning Model- Container -This tutorial containing a python application for training a deep learning model on EuroSAT dataset for tile-based classification task and employs [MLflow](https://mlflow.org/) for monitoring the ML model development cycle. MLflow is a crucial tool that ensures effective log tracking and preserves key information, including specific code versions, datasets used, and model hyperparameters. By logging this information, the reproducibility of the work drastically increases, enabling users to revisit and replicate past experiments accurately. Moreover, quality metrics such as classification accuracy, loss function fluctuations, and inference time are also tracked, enabling easy comparison between different models. The dataset used in this project consists of Sentinel-2 satellite images labeled with corresponding land use and cover categories. It provides a comprehensive representation of various land features. The dataset comprises 27,000 labeled and geo-referenced images, divided into 10 distinct classes. The multi-spectral version of the dataset includes all 13 Sentinel-2 bands, which retains the original value range of the Sentinel-2 bands, enabling access to a more comprehensive set of spectral information. You can find the dataset on a dedicated [STAC endpoint](https://radiantearth.github.io/stac-browser/#/external/ai-extensions-stac.terradue.com/collections/Euro_SAT). + +This tutorial contains a Python application for training a deep learning model on EuroSAT dataset for tile-based classification task, and employs [MLflow](https://mlflow.org/) for monitoring the ML model development cycle. MLflow is a crucial tool that ensures effective log tracking and preserves key information, including specific code versions, datasets used, and model hyperparameters. By logging this information, the reproducibility of the work drastically increases, enabling users to revisit and replicate past experiments accurately. Moreover, quality metrics such as classification accuracy, loss function fluctuations, and inference time are also tracked, enabling easy comparison between different models. + +The [EuroSAT dataset](https://github.com/phelber/EuroSAT) used in this tutorial consists of Sentinel-2 satellite images labeled with corresponding land use and cover categories. It provides a comprehensive representation of various land features. The dataset comprises 27,000 labeled and geo-referenced images, divided into 10 distinct classes. The multi-spectral version of the dataset includes all 13 Sentinel-2 bands, which retains the original value range of the Sentinel-2 bands, enabling access to a more comprehensive set of spectral information. This dataset has been published on the dedicated [STAC endpoint](https://radiantearth.github.io/stac-browser/#/external/ai-extensions-stac.terradue.com/collections/Euro_SAT).

Picture

@@ -7,7 +10,7 @@ This tutorial containing a python application for training a deep learning model ## Inputs -This application supports training the CNN model using either CPU or GPU to accelerate the process. It accepts the following input parameters: +This application supports training the Convolutional Neural Network (CNN) model using either CPU or GPU to accelerate the process. It accepts the following input parameters: | Parameter | Type | Default Value | Description | |------------------------|----------|----------------|-------------| @@ -29,8 +32,8 @@ This application supports training the CNN model using either CPU or GPU to acce - `mlruns`: Directory containing artifacts, metrics, and metadata for each training run, tracked and organized by MLflow. -## How the application structured internally -The training pipeline for developing the training module encompasses 4 main components including: +## How the training process is structured internally +The training pipeline for developing the training module encompasses four main components including: * [Data Ingestion](#data-ingestion) @@ -47,9 +50,22 @@ The pipeline for this task is illustrated in the diagram below: This component is designed to fetch data from a dedicated STAC endpoint containing a collection of STAC Items representing EuroSAT image chips. The user can query the collection using **DuckDB** on a **GeoParquet** file and split the resulting data into **training**, **validation**, and **test** datasets. ### Based Model Architecture -In this component, the user will design a CNN based model with 7 layers. The first layer serves as the input layer, accepting an image with a shape of (12, 64, 64) or any other cubic image shapes (e.g. (3,64,64)). This is followed by 4 convolutional layers, each employing a relu activation function, a **BatchNormalization** layer,a **2D MaxPooling** operation, and a **Dropout** layer. Subsequently, the model includes a Dense layer, and finally, the output layer generates a vector with 10 cells. Notably, the output layer utilizes the **softmax activation function** to produce the probabilities associated with each class. The user will choose a loss function, and an optimizer among available loss functions and optimizers. Eventually, the model is compiled and located under `output/prepare_based_model`. +In this component, the user will design a CNN composed of **seven distinct layers**. The model beginnings with an **input layer**, which accepts images of shape `(13, 64, 64)` or any other cubic image shapes `(e.g. (3, 64, 64))`. This is followed by **four convolutional blocks**, each of which consists of the following sequence: + +* A **Conv2D** layer with ReLU activation +* A **BatchNormalization** layer +* A **2D MaxPooling** layer +* A **Dropout** layer + +Although each block contains multiple operations, they are collectively treated as four convolutional layers in the context of the model architecture. + +After these convolutional blocks, the model includes a **dense (fully connected) layer**, typically used to transition from the convolutional feature extraction to classification. Finally, the **output layer** is a Dense layer with 10 units (corresponding to the number of output classes) and uses the **softmax** activation function to produce a probability distribution over the classes. + +The user is required to select an appropriate loss function and optimizer from the available options. Once configured, the model is compiled and saved under `output/prepare_based_model`. + ### Training This component is responsible for training the model for a specified number of epochs, as provided by the user through the application package inputs. + ### Evaluation The user can evaluate the trained model, and **MLflow** will track the process for each run under a designated experiment. Once the MLflow service is deployed and running on port `5000`, the UI can be accessed at [http://localhost:5000](http://localhost:5000). @@ -57,14 +73,14 @@ MLflow tracks the following: - **Evaluation metrics**, including `Accuracy`, `Precision`, `Recall`, and the loss value - **Trained machine learning model** saved after each run -- **Additional artifacts**, such as: - - Loss curve plot during training - - Confusion matrix +- **Additional artifacts**, such as the Loss curve plot during training, and the Confusion matrix. ## For developers -1. `src`/ `tile_based_training` / + +The folder structure is defined as: +`src`/ `tile_based_training` / - **components** / - - Containing all components such as data_ingestion.py, prepare_base_model.py, train_model.py , model_evaluation.py, inference.py. + - Containing all components such as `data_ingestion.py`, `prepare_base_model.py`, `train_model.py` , `model_evaluation.py`, `inference.py`. - **config** / - Containing all configuration needed for each component. - **utils** / diff --git a/docs/training-cwl.md b/docs/training-cwl.md index f0568e2..6a3ef1c 100644 --- a/docs/training-cwl.md +++ b/docs/training-cwl.md @@ -1,6 +1,6 @@ # Training Module & CWL Runner -This Application Package provides a CWL document containing a top-level workflow with a singleCommandLineToolstep that executes the training pipeline. It also supports **parallel execution**, allowing users to specify multiple sets of hyperparameter or training configurations. This makes it suitable for large-scale experiments and hyperparameter tuning on platforms like a Minikube cluster. +This Application Package provides a CWL document containing a top-level workflow with a single `CommandLineTool` step that executes the training pipeline. It also supports **parallel execution**, allowing users to specify multiple sets of hyperparameter or training configurations. This makes it suitable for large-scale experiments and hyperparameter tuning on platforms like a Minikube cluster. To execute the training workflow, users can choose between [cwltool](https://github.com/common-workflow-language/cwltool) and [Calrissian](https://github.com/Duke-GCB/calrissian) as their CWL runners. @@ -23,55 +23,49 @@ To execute the training workflow, users can choose between [cwltool](https://git | SAMPLES_PER_CLASS | int | Number of samples to use for training per class. | -## How to execute the application-package? +## How to execute the Application Ppackage? Before running the application with a CWL runner, make sure to download and use the latest version of the CWL document: ``` -cd /workspace/machine-learning-process/training/app-package -VERSION="0.0.4" +cd training/app-package +VERSION=$(curl -s https://api.github.com/repos/eoap/machine-learning-process/releases/latest | jq -r '.tag_name') curl -L -o "tile-sat-training.cwl" \ "https://github.com/eoap/machine-learning-process/releases/download/${VERSION}/tile-sat-training.${VERSION}.cwl" ``` -### **Run the Application package**: +### **Run the Application Package**: There are two methods to execute the application: -- Executing the tile-based-training using cwltool in a terminal: +- Executing the tile-based-training using `cwltool` in a terminal: ``` cwltool --podman --debug --parallel tile-sat-training.cwl#tile-sat-training params.yaml ``` -- Executing the tile-based classification using calrissian in a terminal: +- Executing the tile-based classification using `calrissian` in a terminal: ``` calrissian --debug --stdout /calrissian/out.json --stderr /calrissian/stderr.log --usage-report /calrissian/report.json --parallel --max-ram 10G --max-cores 2 --tmp-outdir-prefix /calrissian/tmp/ --outdir /calrissian/results/ --tool-logs-basepath /calrissian/logs tile-sat-training.cwl#tile-sat-training params.yaml ``` - > You can monitor the pod creation using command below: + > You can monitor the pod creation using `kubectl` command below: > > `kubectl get pods` +## How the CWL document is designed: +The CWL workflow can be executed using either `cwltool` or `calrissian`. The execution requires a `params.yml` file, which supplies all the necessary inputs defined in the CWL specification. The workflow is structured to run the module according to the diagram outlined below: - -## How the CWL document designed: -The CWL file can be triggered using `cwltool` or `calrissian`. The user provides a `params.yml` file that passes all inputs needed by the CWL file to execute the module. The CWL file is designed to execute the module based on the structure below: - -![Training Workflow](imgs/training.png) - +![image](imgs/training.png "Training Workflow") > **`[]`** in the image above indicates that the user may pass a list of parameters to the application package. ## For developers -The user may train several tile-based classifiers using the CWL runner. One of the tracked artifacts through MLflow is the model's weights. The next step is to retrieve the best model, based on the desired evaluation metric, from the MLflow artifact registry and convert it to the ONNX format. This activity is explained in ["Export the Best Model to ONNX Format"](./extract-model.md). Finally, this model can be integrated into the inference application package. - -> **Note:** This process has already been completed. However, users may need to repeat it with their own candidate models. - +Users can train multiple tile-based classifiers using the CWL runner, with model weights tracked as artifacts in MLflow. Once training is complete, the next step is to retrieve the best-performing model, based on the chosen evaluation metric, from the MLflow artifact registry and convert it to ONNX format. This process is detailed in the ["Export the Best Model to ONNX Format"](./extract-model.md) guide. The resulting ONNX model can then be integrated into the inference application package. ## Troubleshooting -The user might encounter to memory issues during the execution with CWL Runners(especially with the `cwltool`). This can be addressing by reducing the `ramMax`(e.g. `ramMax: 1000`) parameter in the cwl file. \ No newline at end of file +Users might encounter memory-related issues when executing workflows with CWL Runners (especially with `cwltool`). These issues can often be mitigated by reducing the `ramMax` parameter (e.g. `ramMax: 1000`) specified in the CWL file, which can help prevent excessive memory allocation. \ No newline at end of file diff --git a/mkdocs.yaml b/mkdocs.yaml index c37007e..0cb1d1c 100644 --- a/mkdocs.yaml +++ b/mkdocs.yaml @@ -1,7 +1,8 @@ site_name: Machine Learning Process theme: - logo: imgs/icon.png name: material + icon: + logo: fontawesome/solid/earth-europe palette: - media: "(prefers-color-scheme: light)" @@ -30,6 +31,7 @@ markdown_extensions: - abbr - admonition - footnotes + - pymdownx.caret - pymdownx.mark - pymdownx.tilde @@ -47,13 +49,15 @@ markdown_extensions: base_path: './' - pymdownx.highlight: line_spans: __span + - pymdownx.emoji: + emoji_index: !!python/name:material.extensions.emoji.twemoji + emoji_generator: !!python/name:material.extensions.emoji.to_svg extra_css: - styles/css/app.css extra_javascript: - javascripts/config.js - - https://polyfill.io/v3/polyfill.min.js?features=es6 - https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js nav: @@ -72,4 +76,4 @@ nav: - Technical Insights and Learnings: 'insights.md' -copyright: License CC BY-SA 4.0, by Creative Commons \ No newline at end of file +copyright: License CC BY-SA 4.0, by Creative Commons diff --git a/practice-labs/1-Application_Steps/1-training.ipynb b/practice-labs/1-Application_Steps/1-training.ipynb deleted file mode 100644 index 6db1428..0000000 --- a/practice-labs/1-Application_Steps/1-training.ipynb +++ /dev/null @@ -1,728 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Run the Training step\n", - "This notebook provides step-by-step instructions on how to install the training module for tile-based classification and execute a training run to evaluate its performance.\n", - "\n", - "> Note: Before proceeding, make sure to select the correct kernel. In the top-right corner of the notebook, choose the Jupyter kernel named `Bash`." - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Setup the environment" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "XDG_RUNTIME_DIR=/workspace/.local\n", - "RUNTIME=/workspace/machine-learning-process/runs\n", - "/workspace/machine-learning-process/runs\n" - ] - } - ], - "source": [ - "export WORKSPACE=/workspace/machine-learning-process\n", - "export RUNTIME=${WORKSPACE}/runs\n", - "mkdir -p ${RUNTIME}\n", - "cd ${RUNTIME}\n", - "printenv | grep RUNTIME\n", - "pwd" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Create a hatch environment\n", - "\n", - "The hatch environment provides a dedicated Python where the `make-ml-model` step dependencies are installed. This process can be done with hatch." - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[2K∙●∙ Unpacking distribution (tar|gzip) 0/09.17 MiB/49.49 MiB\u001b[1AA\n", - "\u001b[2K\u001b[32m.. \u001b[0m \u001b[1;35mCreating environment: default\u001b[0m0m\n", - "\u001b[2K\u001b[32m .\u001b[0m \u001b[1;35mInstalling project in development mode\u001b[0mt mode\u001b[0m\n", - "\u001b[1A\u001b[2K\u001b[?25l\u001b[32m. \u001b[0m \u001b[1;35mChecking dependencies\u001b[0m\n", - "\u001b[2K\u001b[32m ..\u001b[0m \u001b[1;35mSyncing dependencies\u001b[0mencies\u001b[0m\n", - "\u001b[1A\u001b[2K\n" - ] - } - ], - "source": [ - "cd ${WORKSPACE}/training/make-ml-model\n", - "hatch env prune\n", - "hatch env create default" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Run the make-ml-model application \n", - "\n", - "First dump the help:" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "2025-05-12 13:42:43.564862: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", - "2025-05-12 13:42:43.573013: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", - "2025-05-12 13:42:43.621030: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", - "2025-05-12 13:42:43.656655: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", - "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", - "E0000 00:00:1747057363.673890 739 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", - "E0000 00:00:1747057363.678972 739 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", - "W0000 00:00:1747057363.695506 739 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "W0000 00:00:1747057363.695531 739 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "W0000 00:00:1747057363.695534 739 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "W0000 00:00:1747057363.695536 739 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "2025-05-12 13:42:43.699237: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", - "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", - "[2025-05-12 13:42:46,106: INFO: font_manager: generated new fontManager]\n", - "[2025-05-12 13:42:47,270: INFO: common: created directory at: config]\n", - "\u001b[32m2025-05-12 13:42:47.272\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mtile_based_training.constants\u001b[0m:\u001b[36mwrite_yaml\u001b[0m:\u001b[36m13\u001b[0m - \u001b[1mYAML file: config/config.yaml written successfully\u001b[0m\n", - "Usage: tile-based-training [OPTIONS]\n", - "\n", - " A selected model with highest evaluation metrics will making an inference on\n", - " a sentinel-2 L1C data\n", - "\n", - "Options:\n", - " --stac_reference, --sr TEXT The url which point to STAC input reference\n", - " [default: https://raw.githubusercontent.com/\n", - " eoap/machine-learning-\n", - " process/main/training/app-package/EUROSAT-\n", - " Training-Dataset/catalog.json; required]\n", - " --BATCH_SIZE, --b INTEGER BATCH_SIZE [default: 2]\n", - " --CLASSES, --c INTEGER Number of classes to train [default: 10]\n", - " --DECAY, --d FLOAT DECAY - model metadata [default: 0.1]\n", - " --EPOCHS, --ep INTEGER Number of epochs\n", - " --EPSILON, --e FLOAT EPSILON - model metadata [default: 1e-06]\n", - " --LEARNING_RATE, --lr FLOAT LEARNING_RATE [default: 0.0001]\n", - " --LOSS, --lo TEXT loss function [default:\n", - " categorical_crossentropy]\n", - " --MEMENTUM, --m FLOAT MEMENTUM - model metadata [default: 0.95]\n", - " --OPTIMIZER, --o TEXT OPTIMIZER [default: Adam]\n", - " --REGULARIZER, --r TEXT REGULARIZER\n", - " --SAMPLES_PER_CLASS, --s INTEGER\n", - " number of sample for each class to train\n", - " model based on [default: 10]\n", - " --help Show this message and exit.\n" - ] - } - ], - "source": [ - "hatch run default:tile-based-training --help" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the cell below, the user can check the MLFLOW_TRACKING_URI which defined as environment variable during deployment of the code-server." - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "http://my-mlflow:5000\n" - ] - } - ], - "source": [ - "echo ${MLFLOW_TRACKING_URI} " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Now, run the `tile-based-training` command line tool with the parameters:\n", - "\n", - "- stac_reference: https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json\n", - "- BATCH_SIZE: 2 \n", - "- CLASSES: 10 \n", - "- DECAY: 0.1 \n", - "- EPOCHS: 50 \n", - "- EPSILON: 0.000001 \n", - "- LEARNING_RATE: 0.0001 \n", - "- LOSS: categorical_crossentropy \n", - "- MEMENTUM: 0.95 \n", - "- OPTIMIZER: Adam \n", - "- REGULARIZER: None \n", - "- SAMPLES_PER_CLASS: 1000" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Make sure your mlflow is running " - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "2025-05-12 13:42:51.353632: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", - "2025-05-12 13:42:51.354419: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", - "2025-05-12 13:42:51.358270: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", - "2025-05-12 13:42:51.369121: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", - "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", - "E0000 00:00:1747057371.385913 786 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", - "E0000 00:00:1747057371.391298 786 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", - "W0000 00:00:1747057371.406897 786 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "W0000 00:00:1747057371.406925 786 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "W0000 00:00:1747057371.406928 786 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "W0000 00:00:1747057371.406930 786 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", - "2025-05-12 13:42:51.411314: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", - "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", - "[2025-05-12 13:42:54,142: INFO: common: created directory at: config]\n", - "\u001b[32m2025-05-12 13:42:54.144\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mtile_based_training.constants\u001b[0m:\u001b[36mwrite_yaml\u001b[0m:\u001b[36m13\u001b[0m - \u001b[1mYAML file: config/config.yaml written successfully\u001b[0m\n", - "2025-05-12 13:42:55.531793: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)\n", - "[2025-05-12 13:42:55,531: INFO: main: MLFLOW URI: http://my-mlflow:5000]\n", - "[2025-05-12 13:42:55,532: INFO: main: /workspace/machine-learning-process/training/make-ml-model]\n", - "[2025-05-12 13:42:55,532: INFO: main: \n", - "=================================================================\n", - "Device name is: None \n", - "=================================================================]\n", - "{'BATCH_SIZE': 2,\n", - " 'CLASSES': 10,\n", - " 'DECAY': 0.1,\n", - " 'EPOCHS': 50,\n", - " 'EPSILON': 1e-06,\n", - " 'LEARNING_RATE': 0.0001,\n", - " 'LOSS': 'categorical_crossentropy',\n", - " 'MEMENTUM': 0.95,\n", - " 'OPTIMIZER': 'Adam',\n", - " 'REGULARIZER': 'None',\n", - " 'SAMPLES_PER_CLASS': 10,\n", - " 'stac_reference': 'https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json'}\n", - "[2025-05-12 13:42:55,533: INFO: common: YAML file: params.yaml written successfully]\n", - "[2025-05-12 13:42:55,533: INFO: main: \n", - "=================================================================\n", - ">>>>>> stage Data Ingestion stage started <<<<<<]\n", - "[2025-05-12 13:42:55,535: INFO: common: yaml file: config/config.yaml loaded successfully]\n", - "[2025-05-12 13:42:55,538: INFO: common: yaml file: params.yaml loaded successfully]\n", - "[2025-05-12 13:42:55,538: INFO: common: created directory at: output]\n", - "[2025-05-12 13:42:55,538: INFO: common: created directory at: src/tile_based_training/output/data_ingestion]\n", - "DataIngestionConfig(root_dir='src/tile_based_training/output/data_ingestion', stac_reference='https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json', local_data_file='src/tile_based_training/output/data_ingestion', data_classes=BoxList(['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']), samples_per_class=10)\n", - "[2025-05-12 13:42:55,538: INFO: data_ingestion: Accessing STAC endpoint]\n", - " 0%| | 0/10 [00:00>>>>> stage Data Ingestion stage completed <<<<<<\n", - "=================================================================]\n", - "[2025-05-12 13:48:26,167: INFO: main: \n", - "=================================================================\n", - ">>>>>> stage Prepare Base Model started <<<<<<]\n", - "[2025-05-12 13:48:26,169: INFO: common: yaml file: config/config.yaml loaded successfully]\n", - "[2025-05-12 13:48:26,170: INFO: common: yaml file: params.yaml loaded successfully]\n", - "[2025-05-12 13:48:26,171: INFO: common: created directory at: output]\n", - "[2025-05-12 13:48:26,171: INFO: common: created directory at: src/tile_based_training/output/prepare_base_model]\n", - "\u001b[1mModel: \"sequential\"\u001b[0m\n", - "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", - "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", - "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", - "│ conv2d (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m3,488\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ activation (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ conv2d_1 (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m9,248\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ activation_1 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ max_pooling2d (\u001b[94mMaxPooling2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ dropout (\u001b[94mDropout\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ conv2d_2 (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m18,496\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ activation_2 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ conv2d_3 (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m36,928\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ activation_3 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ max_pooling2d_1 (\u001b[94mMaxPooling2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ dropout_1 (\u001b[94mDropout\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ flatten (\u001b[94mFlatten\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m12544\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ dense (\u001b[94mDense\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m512\u001b[0m) │ \u001b[32m6,423,040\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ activation_4 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m512\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ dropout_2 (\u001b[94mDropout\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m512\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ dense_1 (\u001b[94mDense\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m10\u001b[0m) │ \u001b[32m5,130\u001b[0m │\n", - "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", - "│ activation_5 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m10\u001b[0m) │ \u001b[32m0\u001b[0m │\n", - "└─────────────────────────────────┴────────────────────────┴───────────────┘\n", - "\u001b[1m Total params: \u001b[0m\u001b[32m6,496,330\u001b[0m (24.78 MB)\n", - "\u001b[1m Trainable params: \u001b[0m\u001b[32m6,496,330\u001b[0m (24.78 MB)\n", - "\u001b[1m Non-trainable params: \u001b[0m\u001b[32m0\u001b[0m (0.00 B)\n", - "[2025-05-12 13:48:26,502: INFO: main: \n", - "=================================================================\n", - ">>>>>> stage Prepare Base Model completed <<<<<<\n", - "=================================================================]\n", - "[2025-05-12 13:48:26,502: INFO: main: \n", - "=================================================================\n", - ">>>>>> stage Training Model started <<<<<<]\n", - "[2025-05-12 13:48:26,505: INFO: common: yaml file: config/config.yaml loaded successfully]\n", - "[2025-05-12 13:48:26,506: INFO: common: yaml file: params.yaml loaded successfully]\n", - "[2025-05-12 13:48:26,506: INFO: common: created directory at: output]\n", - "[2025-05-12 13:48:26,506: INFO: common: json file loaded succesfully from: src/tile_based_training/output/data_ingestion/splitted_data.json]\n", - "[2025-05-12 13:48:26,507: INFO: common: json file loaded succesfully from: src/tile_based_training/output/data_ingestion/splitted_data.json]\n", - "[2025-05-12 13:48:26,507: INFO: common: created directory at: src/tile_based_training/output/training]\n", - "Loading data: 22%|██████▎ | 13/60 [00:16<00:41, 1.14it/s][2025-05-12 13:48:44,112: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 5174']\n", - "[2025-05-12 13:48:44,112: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 5174 bytes, expected 6656']\n", - "[2025-05-12 13:48:44,112: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:48:44,112: INFO: common: GDAL signalled an error: err_no=1, msg='Industrial_997.tif, band 1: IReadBlock failed at X offset 0, Y offset 1: TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:48:54,474: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 5174']\n", - "[2025-05-12 13:48:54,474: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 5174 bytes, expected 6656']\n", - "[2025-05-12 13:48:54,474: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:48:54,474: INFO: common: GDAL signalled an error: err_no=1, msg='Industrial_997.tif, band 1: IReadBlock failed at X offset 0, Y offset 1: TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:49:09,889: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 5174']\n", - "[2025-05-12 13:49:09,889: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 5174 bytes, expected 6656']\n", - "[2025-05-12 13:49:09,889: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:49:09,889: INFO: common: GDAL signalled an error: err_no=1, msg='Industrial_997.tif, band 1: IReadBlock failed at X offset 0, Y offset 1: TIFFReadEncodedStrip() failed.']\n", - "Loading data: 52%|██████████████▉ | 31/60 [01:21<00:23, 1.21it/s][2025-05-12 13:49:53,275: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", - "[2025-05-12 13:50:07,943: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", - "[2025-05-12 13:50:27,959: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", - "Loading data: 80%|███████████████████████▏ | 48/60 [02:38<00:08, 1.35it/s][2025-05-12 13:51:11,024: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 1295']\n", - "[2025-05-12 13:51:11,024: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 1295 bytes, expected 6656']\n", - "[2025-05-12 13:51:11,024: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:51:11,024: INFO: common: GDAL signalled an error: err_no=1, msg='AnnualCrop_954.tif, band 1: IReadBlock failed at X offset 0, Y offset 2: TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:51:26,218: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 1295']\n", - "[2025-05-12 13:51:26,218: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 1295 bytes, expected 6656']\n", - "[2025-05-12 13:51:26,218: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:51:26,218: INFO: common: GDAL signalled an error: err_no=1, msg='AnnualCrop_954.tif, band 1: IReadBlock failed at X offset 0, Y offset 2: TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:51:46,044: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 1295']\n", - "[2025-05-12 13:51:46,044: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 1295 bytes, expected 6656']\n", - "[2025-05-12 13:51:46,044: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:51:46,044: INFO: common: GDAL signalled an error: err_no=1, msg='AnnualCrop_954.tif, band 1: IReadBlock failed at X offset 0, Y offset 2: TIFFReadEncodedStrip() failed.']\n", - "Loading data: 100%|█████████████████████████████| 60/60 [03:53<00:00, 3.89s/it]\n", - "Loading data: 25%|███████▌ | 5/20 [00:04<00:12, 1.17it/s][2025-05-12 13:52:29,210: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3442']\n", - "[2025-05-12 13:52:29,210: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3442 bytes, expected 6656']\n", - "[2025-05-12 13:52:29,210: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:52:29,210: INFO: common: GDAL signalled an error: err_no=1, msg='PermanentCrop_987.tif, band 1: IReadBlock failed at X offset 0, Y offset 4: TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:52:45,021: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3442']\n", - "[2025-05-12 13:52:45,022: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3442 bytes, expected 6656']\n", - "[2025-05-12 13:52:45,022: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:52:45,022: INFO: common: GDAL signalled an error: err_no=1, msg='PermanentCrop_987.tif, band 1: IReadBlock failed at X offset 0, Y offset 4: TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:53:05,162: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3442']\n", - "[2025-05-12 13:53:05,163: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3442 bytes, expected 6656']\n", - "[2025-05-12 13:53:05,163: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:53:05,163: INFO: common: GDAL signalled an error: err_no=1, msg='PermanentCrop_987.tif, band 1: IReadBlock failed at X offset 0, Y offset 4: TIFFReadEncodedStrip() failed.']\n", - "Loading data: 100%|█████████████████████████████| 20/20 [01:23<00:00, 4.17s/it]\n", - "[2025-05-12 13:53:43,248: INFO: train_model: Device is: None, Built with CUDA: True]\n", - "Epoch 1/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 238ms/step - accuracy: 0.0083 - loss: 2.3066 - precision: 0.0000e+00 - recall: 0.0000e+00 \n", - "Epoch 1: val_accuracy improved from -inf to 0.10000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 557ms/step - accuracy: 0.0111 - loss: 2.3086 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1000 - val_loss: 2.2851 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 2/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 307ms/step - accuracy: 0.1292 - loss: 2.2710 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 2: val_accuracy did not improve from 0.10000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 388ms/step - accuracy: 0.1306 - loss: 2.2717 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1000 - val_loss: 2.2733 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 3/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 302ms/step - accuracy: 0.1292 - loss: 2.2612 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 3: val_accuracy improved from 0.10000 to 0.15000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 538ms/step - accuracy: 0.1306 - loss: 2.2575 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.2617 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 4/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 310ms/step - accuracy: 0.0562 - loss: 2.2317 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 4: val_accuracy did not improve from 0.15000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 390ms/step - accuracy: 0.0542 - loss: 2.2335 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.2552 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 5/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 267ms/step - accuracy: 0.1781 - loss: 2.2019 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 5: val_accuracy improved from 0.15000 to 0.25000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 528ms/step - accuracy: 0.1854 - loss: 2.1973 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.2500 - val_loss: 2.2461 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 6/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 301ms/step - accuracy: 0.1771 - loss: 2.2115 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 6: val_accuracy did not improve from 0.25000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 386ms/step - accuracy: 0.1736 - loss: 2.2073 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.2500 - val_loss: 2.2434 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 7/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 195ms/step - accuracy: 0.1302 - loss: 2.2168 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 7: val_accuracy improved from 0.25000 to 0.30000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 444ms/step - accuracy: 0.1424 - loss: 2.2163 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.3000 - val_loss: 2.2388 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 8/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 306ms/step - accuracy: 0.1937 - loss: 2.1773 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 8: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 392ms/step - accuracy: 0.1958 - loss: 2.1792 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.2000 - val_loss: 2.2393 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 9/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 200ms/step - accuracy: 0.2573 - loss: 2.1277 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 9: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 281ms/step - accuracy: 0.2493 - loss: 2.1300 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.2347 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 10/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 293ms/step - accuracy: 0.1854 - loss: 2.1958 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 10: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 372ms/step - accuracy: 0.1847 - loss: 2.1820 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.2257 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 11/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 276ms/step - accuracy: 0.2167 - loss: 2.1165 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 11: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 393ms/step - accuracy: 0.2056 - loss: 2.1213 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.2093 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 12/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 252ms/step - accuracy: 0.2896 - loss: 2.0376 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 12: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 336ms/step - accuracy: 0.2819 - loss: 2.0472 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.1924 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 13/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 258ms/step - accuracy: 0.2594 - loss: 2.1109 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 13: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 352ms/step - accuracy: 0.2729 - loss: 2.0909 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.1740 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 14/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 222ms/step - accuracy: 0.2198 - loss: 2.0708 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 14: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 305ms/step - accuracy: 0.2410 - loss: 2.0603 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.1519 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 15/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 216ms/step - accuracy: 0.2417 - loss: 1.9981 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 15: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 296ms/step - accuracy: 0.2389 - loss: 2.0087 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.1229 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 16/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 312ms/step - accuracy: 0.3063 - loss: 1.9678 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 16: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 391ms/step - accuracy: 0.3042 - loss: 1.9566 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.0963 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 17/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 251ms/step - accuracy: 0.3469 - loss: 1.9257 - precision: 0.0000e+00 - recall: 0.0000e+00\n", - "Epoch 17: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 328ms/step - accuracy: 0.3479 - loss: 1.9182 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.0673 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 18/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 236ms/step - accuracy: 0.3875 - loss: 1.9049 - precision: 0.5000 - recall: 0.0083 \n", - "Epoch 18: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 325ms/step - accuracy: 0.3917 - loss: 1.8983 - precision: 0.6667 - recall: 0.0111 - val_accuracy: 0.2000 - val_loss: 2.0420 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 19/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 306ms/step - accuracy: 0.4417 - loss: 1.7988 - precision: 0.2667 - recall: 0.0240\n", - "Epoch 19: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 388ms/step - accuracy: 0.4222 - loss: 1.8093 - precision: 0.2444 - recall: 0.0215 - val_accuracy: 0.2500 - val_loss: 2.0092 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 20/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 205ms/step - accuracy: 0.4604 - loss: 1.8450 - precision: 0.5833 - recall: 0.0323\n", - "Epoch 20: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 295ms/step - accuracy: 0.4681 - loss: 1.8216 - precision: 0.6111 - recall: 0.0326 - val_accuracy: 0.2500 - val_loss: 1.9760 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 21/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 254ms/step - accuracy: 0.4052 - loss: 1.7750 - precision: 1.0000 - recall: 0.0562\n", - "Epoch 21: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 337ms/step - accuracy: 0.4257 - loss: 1.7643 - precision: 1.0000 - recall: 0.0542 - val_accuracy: 0.2500 - val_loss: 1.9521 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 22/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 290ms/step - accuracy: 0.4917 - loss: 1.6915 - precision: 0.2500 - recall: 0.0083 \n", - "Epoch 22: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 376ms/step - accuracy: 0.4889 - loss: 1.6829 - precision: 0.3333 - recall: 0.0111 - val_accuracy: 0.2500 - val_loss: 1.9226 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 23/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 228ms/step - accuracy: 0.4292 - loss: 1.6548 - precision: 0.6250 - recall: 0.0406\n", - "Epoch 23: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 307ms/step - accuracy: 0.4472 - loss: 1.6383 - precision: 0.6667 - recall: 0.0437 - val_accuracy: 0.3000 - val_loss: 1.8824 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 24/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 280ms/step - accuracy: 0.5323 - loss: 1.5850 - precision: 0.9167 - recall: 0.0885\n", - "Epoch 24: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 361ms/step - accuracy: 0.5326 - loss: 1.5852 - precision: 0.8889 - recall: 0.0868 - val_accuracy: 0.3000 - val_loss: 1.8763 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 25/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 253ms/step - accuracy: 0.5156 - loss: 1.5344 - precision: 0.6250 - recall: 0.0406\n", - "Epoch 25: val_accuracy did not improve from 0.30000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 333ms/step - accuracy: 0.5104 - loss: 1.5419 - precision: 0.6667 - recall: 0.0437 - val_accuracy: 0.3000 - val_loss: 1.8499 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", - "Epoch 26/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 219ms/step - accuracy: 0.6115 - loss: 1.3640 - precision: 0.9286 - recall: 0.2094\n", - "Epoch 26: val_accuracy improved from 0.30000 to 0.35000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 479ms/step - accuracy: 0.5965 - loss: 1.3968 - precision: 0.9048 - recall: 0.2062 - val_accuracy: 0.3500 - val_loss: 1.7947 - val_precision: 1.0000 - val_recall: 0.1500\n", - "Epoch 27/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 358ms/step - accuracy: 0.6198 - loss: 1.3762 - precision: 0.8056 - recall: 0.2333\n", - "Epoch 27: val_accuracy improved from 0.35000 to 0.40000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 580ms/step - accuracy: 0.6076 - loss: 1.3890 - precision: 0.7778 - recall: 0.2278 - val_accuracy: 0.4000 - val_loss: 1.7429 - val_precision: 1.0000 - val_recall: 0.1500\n", - "Epoch 28/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 261ms/step - accuracy: 0.5135 - loss: 1.3772 - precision: 0.7620 - recall: 0.2406\n", - "Epoch 28: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 347ms/step - accuracy: 0.4868 - loss: 1.4124 - precision: 0.7433 - recall: 0.2271 - val_accuracy: 0.4000 - val_loss: 1.7178 - val_precision: 1.0000 - val_recall: 0.1500\n", - "Epoch 29/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 362ms/step - accuracy: 0.6187 - loss: 1.1856 - precision: 0.8258 - recall: 0.2656\n", - "Epoch 29: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 456ms/step - accuracy: 0.5958 - loss: 1.2019 - precision: 0.8283 - recall: 0.2604 - val_accuracy: 0.4000 - val_loss: 1.7165 - val_precision: 1.0000 - val_recall: 0.1500\n", - "Epoch 30/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 248ms/step - accuracy: 0.5646 - loss: 1.3748 - precision: 0.7926 - recall: 0.3135\n", - "Epoch 30: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 331ms/step - accuracy: 0.5653 - loss: 1.3648 - precision: 0.7748 - recall: 0.3035 - val_accuracy: 0.4000 - val_loss: 1.7342 - val_precision: 1.0000 - val_recall: 0.1000\n", - "Epoch 31/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 320ms/step - accuracy: 0.4927 - loss: 1.3821 - precision: 0.6761 - recall: 0.2115\n", - "Epoch 31: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 407ms/step - accuracy: 0.5007 - loss: 1.3584 - precision: 0.6932 - recall: 0.2299 - val_accuracy: 0.4000 - val_loss: 1.6941 - val_precision: 1.0000 - val_recall: 0.1000\n", - "Epoch 32/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 323ms/step - accuracy: 0.5396 - loss: 1.2235 - precision: 0.7324 - recall: 0.2896\n", - "Epoch 32: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 400ms/step - accuracy: 0.5319 - loss: 1.2543 - precision: 0.7202 - recall: 0.2819 - val_accuracy: 0.4000 - val_loss: 1.6481 - val_precision: 1.0000 - val_recall: 0.2000\n", - "Epoch 33/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 267ms/step - accuracy: 0.6208 - loss: 1.1212 - precision: 0.8533 - recall: 0.3781\n", - "Epoch 33: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 348ms/step - accuracy: 0.6194 - loss: 1.1349 - precision: 0.8489 - recall: 0.3688 - val_accuracy: 0.4000 - val_loss: 1.6342 - val_precision: 1.0000 - val_recall: 0.3000\n", - "Epoch 34/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 366ms/step - accuracy: 0.5875 - loss: 1.3199 - precision: 0.7833 - recall: 0.3792\n", - "Epoch 34: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 449ms/step - accuracy: 0.5750 - loss: 1.3080 - precision: 0.7778 - recall: 0.3806 - val_accuracy: 0.4000 - val_loss: 1.6416 - val_precision: 0.8571 - val_recall: 0.3000\n", - "Epoch 35/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 257ms/step - accuracy: 0.6115 - loss: 1.0886 - precision: 0.7989 - recall: 0.3542\n", - "Epoch 35: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 340ms/step - accuracy: 0.5965 - loss: 1.1189 - precision: 0.7795 - recall: 0.3472 - val_accuracy: 0.4000 - val_loss: 1.6461 - val_precision: 0.8571 - val_recall: 0.3000\n", - "Epoch 36/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 250ms/step - accuracy: 0.6531 - loss: 1.1970 - precision: 0.7583 - recall: 0.4260\n", - "Epoch 36: val_accuracy did not improve from 0.40000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 329ms/step - accuracy: 0.6521 - loss: 1.1967 - precision: 0.7611 - recall: 0.4118 - val_accuracy: 0.4000 - val_loss: 1.6288 - val_precision: 0.8571 - val_recall: 0.3000\n", - "Epoch 37/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 213ms/step - accuracy: 0.5740 - loss: 1.1781 - precision: 0.7014 - recall: 0.2990\n", - "Epoch 37: val_accuracy improved from 0.40000 to 0.45000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 476ms/step - accuracy: 0.5882 - loss: 1.1617 - precision: 0.7210 - recall: 0.3049 - val_accuracy: 0.4500 - val_loss: 1.6309 - val_precision: 0.8333 - val_recall: 0.2500\n", - "Epoch 38/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 242ms/step - accuracy: 0.5729 - loss: 1.2367 - precision: 0.6477 - recall: 0.2667\n", - "Epoch 38: val_accuracy did not improve from 0.45000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 325ms/step - accuracy: 0.5764 - loss: 1.2218 - precision: 0.6585 - recall: 0.2722 - val_accuracy: 0.4500 - val_loss: 1.6164 - val_precision: 0.7143 - val_recall: 0.2500\n", - "Epoch 39/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 234ms/step - accuracy: 0.6052 - loss: 1.0972 - precision: 0.7647 - recall: 0.4198\n", - "Epoch 39: val_accuracy did not improve from 0.45000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 321ms/step - accuracy: 0.6090 - loss: 1.0868 - precision: 0.7647 - recall: 0.4243 - val_accuracy: 0.4500 - val_loss: 1.5781 - val_precision: 0.7143 - val_recall: 0.2500\n", - "Epoch 40/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 275ms/step - accuracy: 0.5802 - loss: 1.1049 - precision: 0.7490 - recall: 0.3865\n", - "Epoch 40: val_accuracy improved from 0.45000 to 0.50000, saving model to src/tile_based_training/output/training/trained_model.keras\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 502ms/step - accuracy: 0.5757 - loss: 1.1191 - precision: 0.7438 - recall: 0.3799 - val_accuracy: 0.5000 - val_loss: 1.5713 - val_precision: 0.8750 - val_recall: 0.3500\n", - "Epoch 41/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 320ms/step - accuracy: 0.6292 - loss: 0.9834 - precision: 0.8090 - recall: 0.4760\n", - "Epoch 41: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 445ms/step - accuracy: 0.6306 - loss: 0.9940 - precision: 0.8155 - recall: 0.4785 - val_accuracy: 0.4500 - val_loss: 1.5856 - val_precision: 0.7500 - val_recall: 0.3000\n", - "Epoch 42/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 271ms/step - accuracy: 0.7552 - loss: 0.9854 - precision: 0.8351 - recall: 0.4583\n", - "Epoch 42: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 349ms/step - accuracy: 0.7257 - loss: 1.0237 - precision: 0.8171 - recall: 0.4444 - val_accuracy: 0.4000 - val_loss: 1.6037 - val_precision: 0.7778 - val_recall: 0.3500\n", - "Epoch 43/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 262ms/step - accuracy: 0.6052 - loss: 0.9190 - precision: 0.8382 - recall: 0.4604\n", - "Epoch 43: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 342ms/step - accuracy: 0.6090 - loss: 0.9271 - precision: 0.8431 - recall: 0.4681 - val_accuracy: 0.4000 - val_loss: 1.6242 - val_precision: 0.6000 - val_recall: 0.3000\n", - "Epoch 44/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 274ms/step - accuracy: 0.7333 - loss: 0.8572 - precision: 0.8038 - recall: 0.5312\n", - "Epoch 44: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 369ms/step - accuracy: 0.7278 - loss: 0.8875 - precision: 0.7990 - recall: 0.5208 - val_accuracy: 0.4000 - val_loss: 1.5952 - val_precision: 0.6000 - val_recall: 0.3000\n", - "Epoch 45/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 299ms/step - accuracy: 0.6542 - loss: 1.0279 - precision: 0.8237 - recall: 0.5240\n", - "Epoch 45: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 377ms/step - accuracy: 0.6639 - loss: 1.0210 - precision: 0.8284 - recall: 0.5215 - val_accuracy: 0.4500 - val_loss: 1.5619 - val_precision: 0.7000 - val_recall: 0.3500\n", - "Epoch 46/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 276ms/step - accuracy: 0.7177 - loss: 0.9175 - precision: 0.8376 - recall: 0.5542\n", - "Epoch 46: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 368ms/step - accuracy: 0.7174 - loss: 0.9214 - precision: 0.8269 - recall: 0.5306 - val_accuracy: 0.4500 - val_loss: 1.5599 - val_precision: 0.7778 - val_recall: 0.3500\n", - "Epoch 47/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 328ms/step - accuracy: 0.7573 - loss: 0.8398 - precision: 0.8816 - recall: 0.5406\n", - "Epoch 47: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 413ms/step - accuracy: 0.7493 - loss: 0.8488 - precision: 0.8772 - recall: 0.5437 - val_accuracy: 0.5000 - val_loss: 1.5482 - val_precision: 0.7000 - val_recall: 0.3500\n", - "Epoch 48/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 261ms/step - accuracy: 0.7240 - loss: 0.9242 - precision: 0.8130 - recall: 0.5635\n", - "Epoch 48: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 343ms/step - accuracy: 0.7049 - loss: 0.9255 - precision: 0.8087 - recall: 0.5535 - val_accuracy: 0.4500 - val_loss: 1.5520 - val_precision: 0.7000 - val_recall: 0.3500\n", - "Epoch 49/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 250ms/step - accuracy: 0.6698 - loss: 0.8783 - precision: 0.8491 - recall: 0.5010\n", - "Epoch 49: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 333ms/step - accuracy: 0.6743 - loss: 0.8739 - precision: 0.8544 - recall: 0.5118 - val_accuracy: 0.5000 - val_loss: 1.5703 - val_precision: 0.7778 - val_recall: 0.3500\n", - "Epoch 50/50\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 263ms/step - accuracy: 0.6708 - loss: 0.8069 - precision: 0.8604 - recall: 0.5969\n", - "Epoch 50: val_accuracy did not improve from 0.50000\n", - "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 348ms/step - accuracy: 0.6861 - loss: 0.8038 - precision: 0.8593 - recall: 0.5979 - val_accuracy: 0.5000 - val_loss: 1.5987 - val_precision: 0.7778 - val_recall: 0.3500\n", - "[2025-05-12 13:54:18,177: INFO: train_model: Training completed and session cleared.]\n", - "[2025-05-12 13:54:18,181: INFO: main: \n", - "=================================================================\n", - ">>>>>> stage Training Model completed <<<<<<\n", - "\n", - "x==========x]\n", - "[2025-05-12 13:54:18,181: INFO: main: \n", - "=================================================================\n", - ">>>>>> stage Evaluating Model started <<<<<<]\n", - "[2025-05-12 13:54:18,183: INFO: common: yaml file: config/config.yaml loaded successfully]\n", - "[2025-05-12 13:54:18,184: INFO: common: yaml file: params.yaml loaded successfully]\n", - "[2025-05-12 13:54:18,184: INFO: common: created directory at: output]\n", - "[2025-05-12 13:54:18,184: INFO: common: json file loaded succesfully from: src/tile_based_training/output/data_ingestion/splitted_data.json]\n", - "[2025-05-12 13:54:18,185: INFO: common: created directory at: mlruns]\n", - "[2025-05-12 13:54:27,512: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 719']\n", - "[2025-05-12 13:54:27,512: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 995 bytes, expected 6656']\n", - "[2025-05-12 13:54:27,513: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:54:27,513: INFO: common: GDAL signalled an error: err_no=1, msg='River_984.tif, band 1: IReadBlock failed at X offset 0, Y offset 0: TIFFReadEncodedStrip() failed.']\n", - "[2025-05-12 13:54:56,590: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", - "[2025-05-12 13:55:11,682: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", - "[2025-05-12 13:55:35,463: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", - "\u001b[1m1/1\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 97ms/step - accuracy: 0.3500 - loss: 1.5942 - precision: 0.6667 - recall: 0.2000\n", - "\u001b[1m1/1\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 84ms/step\n", - "\u001b[1m1/1\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 37ms/step\n", - "{'new_experiment': {'test_loss': 1.5941531658172607, 'test_accuracy': 0.3499999940395355, 'test_precision': 0.6666666865348816, 'test_recall': 0.20000000298023224}}\n", - "[2025-05-12 13:55:47,114: INFO: model_evaluation: MLFLOW_TRACKING_URI: http://my-mlflow:5000]\n", - "2025/05/12 13:55:48 INFO mlflow.tracking.fluent: Experiment with name 'EuroSAT_classification' does not exist. Creating a new experiment.\n", - "Successfully registered model 'CNN'.\n", - "2025/05/12 13:55:57 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: CNN, version 1\n", - "Created version '1' of model 'CNN'.\n", - "🏃 View run sassy-deer-933 at: http://my-mlflow:5000/#/experiments/1/runs/f373d14fb3e146aba4ca5bffcfaee313\n", - "🧪 View experiment at: http://my-mlflow:5000/#/experiments/1\n", - "[2025-05-12 13:55:57,450: INFO: main: \n", - "=================================================================\n", - ">>>>>> stage Evaluating Model completed <<<<<<\n", - "\n", - "x==========x]\n" - ] - } - ], - "source": [ - "hatch run default:tile-based-training \\\n", - " --stac_reference https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json \\\n", - " --BATCH_SIZE 2 \\\n", - " --CLASSES 10 \\\n", - " --DECAY 0.1 \\\n", - " --EPOCHS 5 \\\n", - " --EPSILON 0.000001 \\\n", - " --LEARNING_RATE 0.0001 \\\n", - " --LOSS categorical_crossentropy \\\n", - " --MEMENTUM 0.95 \\\n", - " --OPTIMIZER Adam \\\n", - " --REGULARIZER None \\\n", - " --SAMPLES_PER_CLASS 10\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "List the outputs:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "/workspace/machine-learning-process/training/make-ml-model/src/tile_based_training/output\n", - "├── data_ingestion\n", - "│   └── splitted_data.json\n", - "├── prepare_base_model\n", - "│   └── base_model.keras\n", - "└── training\n", - " └── trained_model.keras\n", - "\n", - "4 directories, 3 files\n" - ] - } - ], - "source": [ - "tree ${WORKSPACE}/training/make-ml-model/src/tile_based_training/output " - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Clean-up " - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "vscode": { - "languageId": "shellscript" - } - }, - "outputs": [], - "source": [ - "rm -fr ${RUNTIME}/envs ${WORKSPACE}/training/make-ml-model/src/tile_based_training/output " - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Bash", - "language": "bash", - "name": "bash" - }, - "language_info": { - "codemirror_mode": "shell", - "file_extension": ".sh", - "mimetype": "text/x-sh", - "name": "bash" - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} diff --git a/practice-labs/Alternative1-Application_Steps/Step1A-training.ipynb b/practice-labs/Alternative1-Application_Steps/Step1A-training.ipynb new file mode 100644 index 0000000..c587866 --- /dev/null +++ b/practice-labs/Alternative1-Application_Steps/Step1A-training.ipynb @@ -0,0 +1,566 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Run the Training step\n", + "This notebook provides step-by-step instructions on how to install the training module for tile-based classification and execute a training run to evaluate its performance.\n", + "\n", + "> Note: Before proceeding, make sure to select the correct kernel. In the top-right corner of the notebook, choose the Jupyter kernel named `Bash`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Setup the environment" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "XDG_RUNTIME_DIR=/workspace/.local\n", + "RUNTIME=/workspace/machine-learning-process/runs\n", + "/workspace/machine-learning-process/runs\n" + ] + } + ], + "source": [ + "export WORKSPACE=/workspace/machine-learning-process\n", + "export RUNTIME=${WORKSPACE}/runs\n", + "mkdir -p ${RUNTIME}\n", + "cd ${RUNTIME}\n", + "printenv | grep RUNTIME\n", + "pwd" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Create a hatch environment\n", + "\n", + "The hatch environment provides a dedicated Python where the `make-ml-model` step dependencies are installed. This process can be done with hatch." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[2K∙∙∙ Waiting on shared resource \n", + "\u001b[2K\u001b[32m.. \u001b[0m \u001b[1;35mCreating environment: default\u001b[0m0m\n", + "\u001b[2K\u001b[32m .\u001b[0m \u001b[1;35mInstalling project in development mode\u001b[0mt mode\u001b[0m\n", + "\u001b[1A\u001b[2K\u001b[?25l\u001b[32m. \u001b[0m \u001b[1;35mChecking dependencies\u001b[0m\n", + "\u001b[2K\u001b[32m \u001b[0m \u001b[1;35mSyncing dependencies\u001b[0mencies\u001b[0m\n", + "\u001b[1A\u001b[2K\n" + ] + } + ], + "source": [ + "cd ${WORKSPACE}/training/make-ml-model\n", + "hatch env prune\n", + "hatch env create default" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Run the make-ml-model application \n", + "\n", + "First dump the help:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2025-05-12 14:46:46.361114: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", + "2025-05-12 14:46:46.369642: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", + "2025-05-12 14:46:46.423756: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", + "2025-05-12 14:46:46.469253: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", + "E0000 00:00:1747061206.519975 1783 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "E0000 00:00:1747061206.534276 1783 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "W0000 00:00:1747061206.628545 1783 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "W0000 00:00:1747061206.628572 1783 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "W0000 00:00:1747061206.628576 1783 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "W0000 00:00:1747061206.628579 1783 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "2025-05-12 14:46:46.638581: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "[2025-05-12 14:46:49,189: INFO: font_manager: generated new fontManager]\n", + "[2025-05-12 14:46:50,346: INFO: common: created directory at: config]\n", + "\u001b[32m2025-05-12 14:46:50.348\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mtile_based_training.constants\u001b[0m:\u001b[36mwrite_yaml\u001b[0m:\u001b[36m13\u001b[0m - \u001b[1mYAML file: config/config.yaml written successfully\u001b[0m\n", + "Usage: tile-based-training [OPTIONS]\n", + "\n", + " A selected model with highest evaluation metrics will making an inference on\n", + " a sentinel-2 L1C data\n", + "\n", + "Options:\n", + " --stac_reference, --sr TEXT The url which point to STAC input reference\n", + " [default: https://raw.githubusercontent.com/\n", + " eoap/machine-learning-\n", + " process/main/training/app-package/EUROSAT-\n", + " Training-Dataset/catalog.json; required]\n", + " --BATCH_SIZE, --b INTEGER BATCH_SIZE [default: 2]\n", + " --CLASSES, --c INTEGER Number of classes to train [default: 10]\n", + " --DECAY, --d FLOAT DECAY - model metadata [default: 0.1]\n", + " --EPOCHS, --ep INTEGER Number of epochs\n", + " --EPSILON, --e FLOAT EPSILON - model metadata [default: 1e-06]\n", + " --LEARNING_RATE, --lr FLOAT LEARNING_RATE [default: 0.0001]\n", + " --LOSS, --lo TEXT loss function [default:\n", + " categorical_crossentropy]\n", + " --MEMENTUM, --m FLOAT MEMENTUM - model metadata [default: 0.95]\n", + " --OPTIMIZER, --o TEXT OPTIMIZER [default: Adam]\n", + " --REGULARIZER, --r TEXT REGULARIZER\n", + " --SAMPLES_PER_CLASS, --s INTEGER\n", + " number of sample for each class to train\n", + " model based on [default: 10]\n", + " --help Show this message and exit.\n" + ] + } + ], + "source": [ + "hatch run default:tile-based-training --help" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In the cell below, the user can check the MLFLOW_TRACKING_URI which defined as environment variable during deployment of the code-server." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "http://my-mlflow:5000\n" + ] + } + ], + "source": [ + "echo ${MLFLOW_TRACKING_URI} " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, run the `tile-based-training` command line tool with the parameters:\n", + "\n", + "- stac_reference: https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json\n", + "- BATCH_SIZE: 2 \n", + "- CLASSES: 10 \n", + "- DECAY: 0.1 \n", + "- EPOCHS: 50 \n", + "- EPSILON: 0.000001 \n", + "- LEARNING_RATE: 0.0001 \n", + "- LOSS: categorical_crossentropy \n", + "- MEMENTUM: 0.95 \n", + "- OPTIMIZER: Adam \n", + "- REGULARIZER: None \n", + "- SAMPLES_PER_CLASS: 1000" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Make sure your mlflow is running " + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2025-05-12 14:46:54.435264: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", + "2025-05-12 14:46:54.436021: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", + "2025-05-12 14:46:54.439686: I external/local_xla/xla/tsl/cuda/cudart_stub.cc:32] Could not find cuda drivers on your machine, GPU will not be used.\n", + "2025-05-12 14:46:54.450275: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:467] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered\n", + "WARNING: All log messages before absl::InitializeLog() is called are written to STDERR\n", + "E0000 00:00:1747061214.468311 1845 cuda_dnn.cc:8579] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered\n", + "E0000 00:00:1747061214.473466 1845 cuda_blas.cc:1407] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered\n", + "W0000 00:00:1747061214.487559 1845 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "W0000 00:00:1747061214.487589 1845 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "W0000 00:00:1747061214.487591 1845 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "W0000 00:00:1747061214.487595 1845 computation_placer.cc:177] computation placer already registered. Please check linkage and avoid linking the same target more than once.\n", + "2025-05-12 14:46:54.492010: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", + "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", + "[2025-05-12 14:46:57,187: INFO: common: created directory at: config]\n", + "\u001b[32m2025-05-12 14:46:57.189\u001b[0m | \u001b[1mINFO \u001b[0m | \u001b[36mtile_based_training.constants\u001b[0m:\u001b[36mwrite_yaml\u001b[0m:\u001b[36m13\u001b[0m - \u001b[1mYAML file: config/config.yaml written successfully\u001b[0m\n", + "2025-05-12 14:46:58.464878: E external/local_xla/xla/stream_executor/cuda/cuda_platform.cc:51] failed call to cuInit: INTERNAL: CUDA error: Failed call to cuInit: UNKNOWN ERROR (303)\n", + "[2025-05-12 14:46:58,464: INFO: main: MLFLOW URI: http://my-mlflow:5000]\n", + "[2025-05-12 14:46:58,465: INFO: main: /workspace/machine-learning-process/training/make-ml-model]\n", + "[2025-05-12 14:46:58,465: INFO: main: \n", + "=================================================================\n", + "Device name is: None \n", + "=================================================================]\n", + "{'BATCH_SIZE': 2,\n", + " 'CLASSES': 10,\n", + " 'DECAY': 0.1,\n", + " 'EPOCHS': 5,\n", + " 'EPSILON': 1e-06,\n", + " 'LEARNING_RATE': 0.0001,\n", + " 'LOSS': 'categorical_crossentropy',\n", + " 'MEMENTUM': 0.95,\n", + " 'OPTIMIZER': 'Adam',\n", + " 'REGULARIZER': 'None',\n", + " 'SAMPLES_PER_CLASS': 10,\n", + " 'stac_reference': 'https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json'}\n", + "[2025-05-12 14:46:58,466: INFO: common: YAML file: params.yaml written successfully]\n", + "[2025-05-12 14:46:58,466: INFO: main: \n", + "=================================================================\n", + ">>>>>> stage Data Ingestion stage started <<<<<<]\n", + "[2025-05-12 14:46:58,468: INFO: common: yaml file: config/config.yaml loaded successfully]\n", + "[2025-05-12 14:46:58,470: INFO: common: yaml file: params.yaml loaded successfully]\n", + "[2025-05-12 14:46:58,470: INFO: common: created directory at: output]\n", + "[2025-05-12 14:46:58,470: INFO: common: created directory at: src/tile_based_training/output/data_ingestion]\n", + "DataIngestionConfig(root_dir='src/tile_based_training/output/data_ingestion', stac_reference='https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json', local_data_file='src/tile_based_training/output/data_ingestion', data_classes=BoxList(['AnnualCrop', 'Forest', 'HerbaceousVegetation', 'Highway', 'Industrial', 'Pasture', 'PermanentCrop', 'Residential', 'River', 'SeaLake']), samples_per_class=10)\n", + "[2025-05-12 14:46:58,470: INFO: data_ingestion: Accessing STAC endpoint]\n", + " 0%| | 0/10 [00:00>>>>> stage Data Ingestion stage completed <<<<<<\n", + "=================================================================]\n", + "[2025-05-12 14:52:21,404: INFO: main: \n", + "=================================================================\n", + ">>>>>> stage Prepare Base Model started <<<<<<]\n", + "[2025-05-12 14:52:21,406: INFO: common: yaml file: config/config.yaml loaded successfully]\n", + "[2025-05-12 14:52:21,407: INFO: common: yaml file: params.yaml loaded successfully]\n", + "[2025-05-12 14:52:21,407: INFO: common: created directory at: output]\n", + "[2025-05-12 14:52:21,407: INFO: common: created directory at: src/tile_based_training/output/prepare_base_model]\n", + "\u001b[1mModel: \"sequential\"\u001b[0m\n", + "┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓\n", + "┃\u001b[1m \u001b[0m\u001b[1mLayer (type) \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1mOutput Shape \u001b[0m\u001b[1m \u001b[0m┃\u001b[1m \u001b[0m\u001b[1m Param #\u001b[0m\u001b[1m \u001b[0m┃\n", + "┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩\n", + "│ conv2d (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m3,488\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ activation (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m64\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ conv2d_1 (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m9,248\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ activation_1 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m62\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ max_pooling2d (\u001b[94mMaxPooling2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dropout (\u001b[94mDropout\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m32\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ conv2d_2 (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m18,496\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ activation_2 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m31\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ conv2d_3 (\u001b[94mConv2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m36,928\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ activation_3 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m29\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ max_pooling2d_1 (\u001b[94mMaxPooling2D\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dropout_1 (\u001b[94mDropout\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m14\u001b[0m, \u001b[32m64\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ flatten (\u001b[94mFlatten\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m12544\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense (\u001b[94mDense\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m512\u001b[0m) │ \u001b[32m6,423,040\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ activation_4 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m512\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dropout_2 (\u001b[94mDropout\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m512\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ dense_1 (\u001b[94mDense\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m10\u001b[0m) │ \u001b[32m5,130\u001b[0m │\n", + "├─────────────────────────────────┼────────────────────────┼───────────────┤\n", + "│ activation_5 (\u001b[94mActivation\u001b[0m) │ (\u001b[96mNone\u001b[0m, \u001b[32m10\u001b[0m) │ \u001b[32m0\u001b[0m │\n", + "└─────────────────────────────────┴────────────────────────┴───────────────┘\n", + "\u001b[1m Total params: \u001b[0m\u001b[32m6,496,330\u001b[0m (24.78 MB)\n", + "\u001b[1m Trainable params: \u001b[0m\u001b[32m6,496,330\u001b[0m (24.78 MB)\n", + "\u001b[1m Non-trainable params: \u001b[0m\u001b[32m0\u001b[0m (0.00 B)\n", + "[2025-05-12 14:52:21,668: INFO: main: \n", + "=================================================================\n", + ">>>>>> stage Prepare Base Model completed <<<<<<\n", + "=================================================================]\n", + "[2025-05-12 14:52:21,668: INFO: main: \n", + "=================================================================\n", + ">>>>>> stage Training Model started <<<<<<]\n", + "[2025-05-12 14:52:21,670: INFO: common: yaml file: config/config.yaml loaded successfully]\n", + "[2025-05-12 14:52:21,671: INFO: common: yaml file: params.yaml loaded successfully]\n", + "[2025-05-12 14:52:21,671: INFO: common: created directory at: output]\n", + "[2025-05-12 14:52:21,672: INFO: common: json file loaded succesfully from: src/tile_based_training/output/data_ingestion/splitted_data.json]\n", + "[2025-05-12 14:52:21,672: INFO: common: json file loaded succesfully from: src/tile_based_training/output/data_ingestion/splitted_data.json]\n", + "[2025-05-12 14:52:21,672: INFO: common: created directory at: src/tile_based_training/output/training]\n", + "Loading data: 23%|██████▊ | 14/60 [00:19<00:53, 1.17s/it][2025-05-12 14:52:42,550: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 1192']\n", + "[2025-05-12 14:52:42,550: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 1192 bytes, expected 6656']\n", + "[2025-05-12 14:52:42,550: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:52:42,550: INFO: common: GDAL signalled an error: err_no=1, msg='Industrial_968.tif, band 1: IReadBlock failed at X offset 0, Y offset 2: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:52:53,013: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 1192']\n", + "[2025-05-12 14:52:53,013: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 1192 bytes, expected 6656']\n", + "[2025-05-12 14:52:53,013: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:52:53,013: INFO: common: GDAL signalled an error: err_no=1, msg='Industrial_968.tif, band 1: IReadBlock failed at X offset 0, Y offset 2: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:53:08,425: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 1192']\n", + "[2025-05-12 14:53:08,425: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 1192 bytes, expected 6656']\n", + "[2025-05-12 14:53:08,426: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:53:08,426: INFO: common: GDAL signalled an error: err_no=1, msg='Industrial_968.tif, band 1: IReadBlock failed at X offset 0, Y offset 2: TIFFReadEncodedStrip() failed.']\n", + "Loading data: 53%|███████████████▍ | 32/60 [01:26<00:24, 1.14it/s][2025-05-12 14:53:53,682: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", + "[2025-05-12 14:54:08,916: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", + "[2025-05-12 14:54:29,227: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", + "Loading data: 82%|███████████████████████▋ | 49/60 [02:44<00:07, 1.43it/s][2025-05-12 14:55:10,315: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", + "[2025-05-12 14:55:24,315: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", + "[2025-05-12 14:55:43,054: INFO: __init__: GDAL signalled an error: err_no=1, msg='cpl_unzOpenCurrentFile() failed']\n", + "Loading data: 100%|█████████████████████████████| 60/60 [03:53<00:00, 3.90s/it]\n", + "Loading data: 25%|███████▌ | 5/20 [00:03<00:09, 1.54it/s][2025-05-12 14:56:19,799: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3442']\n", + "[2025-05-12 14:56:19,800: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3442 bytes, expected 6656']\n", + "[2025-05-12 14:56:19,800: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:56:19,800: INFO: common: GDAL signalled an error: err_no=1, msg='PermanentCrop_987.tif, band 1: IReadBlock failed at X offset 0, Y offset 4: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:56:30,088: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3442']\n", + "[2025-05-12 14:56:30,089: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3442 bytes, expected 6656']\n", + "[2025-05-12 14:56:30,089: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:56:30,089: INFO: common: GDAL signalled an error: err_no=1, msg='PermanentCrop_987.tif, band 1: IReadBlock failed at X offset 0, Y offset 4: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:56:45,373: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3442']\n", + "[2025-05-12 14:56:45,373: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3442 bytes, expected 6656']\n", + "[2025-05-12 14:56:45,373: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:56:45,374: INFO: common: GDAL signalled an error: err_no=1, msg='PermanentCrop_987.tif, band 1: IReadBlock failed at X offset 0, Y offset 4: TIFFReadEncodedStrip() failed.']\n", + "Loading data: 100%|█████████████████████████████| 20/20 [01:03<00:00, 3.19s/it]\n", + "[2025-05-12 14:57:19,675: INFO: train_model: Device is: None, Built with CUDA: True]\n", + "Epoch 1/5\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 221ms/step - accuracy: 0.1760 - loss: 2.2842 - precision: 0.0000e+00 - recall: 0.0000e+00\n", + "Epoch 1: val_accuracy improved from -inf to 0.00000, saving model to src/tile_based_training/output/training/trained_model.keras\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 582ms/step - accuracy: 0.1618 - loss: 2.2858 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.0000e+00 - val_loss: 2.3006 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", + "Epoch 2/5\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 222ms/step - accuracy: 0.1052 - loss: 2.3094 - precision: 0.0000e+00 - recall: 0.0000e+00\n", + "Epoch 2: val_accuracy did not improve from 0.00000\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 310ms/step - accuracy: 0.1090 - loss: 2.3063 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.0000e+00 - val_loss: 2.2952 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", + "Epoch 3/5\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 234ms/step - accuracy: 0.1448 - loss: 2.2620 - precision: 0.0000e+00 - recall: 0.0000e+00\n", + "Epoch 3: val_accuracy did not improve from 0.00000\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 311ms/step - accuracy: 0.1410 - loss: 2.2615 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.0000e+00 - val_loss: 2.2892 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", + "Epoch 4/5\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 263ms/step - accuracy: 0.0969 - loss: 2.2145 - precision: 0.0000e+00 - recall: 0.0000e+00\n", + "Epoch 4: val_accuracy improved from 0.00000 to 0.15000, saving model to src/tile_based_training/output/training/trained_model.keras\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 520ms/step - accuracy: 0.0979 - loss: 2.2173 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.2823 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", + "Epoch 5/5\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 328ms/step - accuracy: 0.1688 - loss: 2.2573 - precision: 0.0000e+00 - recall: 0.0000e+00\n", + "Epoch 5: val_accuracy did not improve from 0.15000\n", + "\u001b[1m2/2\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m1s\u001b[0m 406ms/step - accuracy: 0.1625 - loss: 2.2429 - precision: 0.0000e+00 - recall: 0.0000e+00 - val_accuracy: 0.1500 - val_loss: 2.2791 - val_precision: 0.0000e+00 - val_recall: 0.0000e+00\n", + "[2025-05-12 14:57:23,829: INFO: train_model: Training completed and session cleared.]\n", + "[2025-05-12 14:57:23,829: INFO: main: \n", + "=================================================================\n", + ">>>>>> stage Training Model completed <<<<<<\n", + "\n", + "x==========x]\n", + "[2025-05-12 14:57:23,830: INFO: main: \n", + "=================================================================\n", + ">>>>>> stage Evaluating Model started <<<<<<]\n", + "[2025-05-12 14:57:23,832: INFO: common: yaml file: config/config.yaml loaded successfully]\n", + "[2025-05-12 14:57:23,833: INFO: common: yaml file: params.yaml loaded successfully]\n", + "[2025-05-12 14:57:23,833: INFO: common: created directory at: output]\n", + "[2025-05-12 14:57:23,833: INFO: common: json file loaded succesfully from: src/tile_based_training/output/data_ingestion/splitted_data.json]\n", + "[2025-05-12 14:57:23,834: INFO: common: created directory at: mlruns]\n", + "[2025-05-12 14:57:25,831: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 719']\n", + "[2025-05-12 14:57:25,832: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 995 bytes, expected 6656']\n", + "[2025-05-12 14:57:25,832: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:57:25,832: INFO: common: GDAL signalled an error: err_no=1, msg='Highway_998.tif, band 1: IReadBlock failed at X offset 0, Y offset 0: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:57:36,102: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 719']\n", + "[2025-05-12 14:57:36,103: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 995 bytes, expected 6656']\n", + "[2025-05-12 14:57:36,103: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:57:36,103: INFO: common: GDAL signalled an error: err_no=1, msg='Highway_998.tif, band 1: IReadBlock failed at X offset 0, Y offset 0: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:57:51,373: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 719']\n", + "[2025-05-12 14:57:51,374: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 995 bytes, expected 6656']\n", + "[2025-05-12 14:57:51,374: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:57:51,374: INFO: common: GDAL signalled an error: err_no=1, msg='Highway_998.tif, band 1: IReadBlock failed at X offset 0, Y offset 0: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:58:30,463: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3128']\n", + "[2025-05-12 14:58:30,464: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3128 bytes, expected 6656']\n", + "[2025-05-12 14:58:30,464: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:58:30,464: INFO: common: GDAL signalled an error: err_no=1, msg='River_975.tif, band 1: IReadBlock failed at X offset 0, Y offset 1: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:58:44,241: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3128']\n", + "[2025-05-12 14:58:44,241: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3128 bytes, expected 6656']\n", + "[2025-05-12 14:58:44,241: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:58:44,241: INFO: common: GDAL signalled an error: err_no=1, msg='River_975.tif, band 1: IReadBlock failed at X offset 0, Y offset 1: TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:59:03,618: INFO: common: GDAL signalled an error: err_no=1, msg='In file /io/gdal-3.9.3/port/cpl_vsil_gzip.cpp, at line 1214, decompression failed with z_err = -1, return = 3128']\n", + "[2025-05-12 14:59:03,618: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip:Read error at scanline 4294967295; got 3128 bytes, expected 6656']\n", + "[2025-05-12 14:59:03,618: INFO: common: GDAL signalled an error: err_no=1, msg='TIFFReadEncodedStrip() failed.']\n", + "[2025-05-12 14:59:03,618: INFO: common: GDAL signalled an error: err_no=1, msg='River_975.tif, band 1: IReadBlock failed at X offset 0, Y offset 1: TIFFReadEncodedStrip() failed.']\n", + "\u001b[1m1/1\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 95ms/step - accuracy: 0.1500 - loss: 2.3326 - precision: 0.0000e+00 - recall: 0.0000e+00\n", + "\u001b[1m1/1\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 64ms/step\n", + "\u001b[1m1/1\u001b[0m \u001b[32m━━━━━━━━━━━━━━━━━━━━\u001b[0m\u001b[37m\u001b[0m \u001b[1m0s\u001b[0m 32ms/step\n", + "{'new_experiment': {'test_loss': 2.332581043243408, 'test_accuracy': 0.15000000596046448, 'test_precision': 0.0, 'test_recall': 0.0}}\n", + "[2025-05-12 14:59:29,407: INFO: model_evaluation: MLFLOW_TRACKING_URI: http://my-mlflow:5000]\n", + "2025/05/12 14:59:30 INFO mlflow.tracking.fluent: Experiment with name 'EuroSAT_classification' does not exist. Creating a new experiment.\n", + "Successfully registered model 'CNN'.\n", + "2025/05/12 14:59:39 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: CNN, version 1\n", + "Created version '1' of model 'CNN'.\n", + "🏃 View run thundering-roo-978 at: http://my-mlflow:5000/#/experiments/1/runs/75324bdb9d7c46c0a4537bf2ca9013e8\n", + "🧪 View experiment at: http://my-mlflow:5000/#/experiments/1\n", + "[2025-05-12 14:59:39,120: INFO: main: \n", + "=================================================================\n", + ">>>>>> stage Evaluating Model completed <<<<<<\n", + "\n", + "x==========x]\n" + ] + } + ], + "source": [ + "hatch run default:tile-based-training \\\n", + " --stac_reference https://raw.githubusercontent.com/eoap/machine-learning-process/main/training/app-package/EUROSAT-Training-Dataset/catalog.json \\\n", + " --BATCH_SIZE 2 \\\n", + " --CLASSES 10 \\\n", + " --DECAY 0.1 \\\n", + " --EPOCHS 5 \\\n", + " --EPSILON 0.000001 \\\n", + " --LEARNING_RATE 0.0001 \\\n", + " --LOSS categorical_crossentropy \\\n", + " --MEMENTUM 0.95 \\\n", + " --OPTIMIZER Adam \\\n", + " --REGULARIZER None \\\n", + " --SAMPLES_PER_CLASS 10\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "List the outputs:\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "/workspace/machine-learning-process/training/make-ml-model/src/tile_based_training/output\n", + "├── data_ingestion\n", + "│   └── splitted_data.json\n", + "├── prepare_base_model\n", + "│   └── base_model.keras\n", + "└── training\n", + " └── trained_model.keras\n", + "\n", + "4 directories, 3 files\n" + ] + } + ], + "source": [ + "tree ${WORKSPACE}/training/make-ml-model/src/tile_based_training/output " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The user may train several tile-based classifiers using the `tile-based-training` module. One of the tracked artifacts through MLflow is the model's weights. The next step is to retrieve the best model, based on the desired evaluation metric, from the MLflow artifact registry and convert it to the ONNX format. This activity is explained in [\"Export the Best Model to ONNX Format\"](./Step1B-ExtractModel.ipynb). Finally, this model can be integrated into the inference application package.\n", + "\n", + "> **Note:** This process has already been completed. However, users may need to repeat it with their own candidate models.\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Clean-up " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "vscode": { + "languageId": "shellscript" + } + }, + "outputs": [], + "source": [ + "rm -fr ${RUNTIME}/envs ${WORKSPACE}/training/make-ml-model/src/tile_based_training/output " + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Bash", + "language": "bash", + "name": "bash" + }, + "language_info": { + "codemirror_mode": "shell", + "file_extension": ".sh", + "mimetype": "text/x-sh", + "name": "bash" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/practice-labs/Alternative1-Application_Steps/Step1B-ExtractModel.ipynb b/practice-labs/Alternative1-Application_Steps/Step1B-ExtractModel.ipynb new file mode 100644 index 0000000..f7be326 --- /dev/null +++ b/practice-labs/Alternative1-Application_Steps/Step1B-ExtractModel.ipynb @@ -0,0 +1,144 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Export the Best Model to ONNX Format\n", + "\n", + "This notebook provides a step-by-step tutorial for exporting a selected model from the MLflow model registry to ONNX format. The converted model is saved within the inference Python module to support the development of a new Python application and the creation of an inference Docker image, which is then published to the designated container registry. \n", + "\n", + "> **Note**: This process has already been completed. However, users may need to repeat it with their own candidate models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pip install tf2onnx onnxmltools onnxruntime onnx mlflow tensorflow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Import dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import os\n", + "import mlflow\n", + "import tensorflow as tf\n", + "import tf2onnx\n", + "import keras\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Save Model in ONNX Format\n", + "\n", + "In the cells below, the user will download the best model artifact from the MLflow model registry and then save it in the ONNX format.\n", + "\n", + "> **Note:** You may need to decrease the `desired_test_accuracy` to find active runs in the MLflow model registry.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "params = {\n", + " \"MLFLOW_TRACKING_URI\": \"http://localhost:5000/\",\n", + " \"experiment_id\": \"EuroSAT_classification\",\n", + " \n", + "}\n", + "desired_test_accuracy = 0.85" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Search for best run\n", + "active_runs = (\n", + " mlflow.search_runs(\n", + " experiment_names=[params[\"experiment_id\"]],\n", + " filter_string=f\"metrics.test_accuracy > {desired_test_accuracy}\",\n", + " search_all_experiments=True,\n", + " )\n", + " .sort_values(by=[\"metrics.test_accuracy\", \"metrics.test_precision\"], ascending=False)\n", + " .reset_index()\n", + " .loc[0]\n", + ")\n", + "run_id = active_runs[\"run_id\"]\n", + "print(f\"Selected run_id: {run_id}\")\n", + "\n", + "# Download just the .keras file\n", + "model_uri = f\"runs:/{run_id}/model/model.keras/data/model.keras\"\n", + "keras_path = mlflow.artifacts.download_artifacts(artifact_uri=model_uri)\n", + "print(f\"Downloaded Keras file path: {keras_path}\")\n", + "\n", + "# Load the Keras v3 model\n", + "keras_model = keras.models.load_model(keras_path)\n", + "\n", + "# Define input signature\n", + "input_signature = [tf.TensorSpec([None, 64, 64, 12], tf.float32, name=\"input\")]\n", + "\n", + "@tf.function(input_signature=input_signature)\n", + "def model_func(x):\n", + " return keras_model(x)\n", + "\n", + "# Convert to ONNX\n", + "onnx_model, _ = tf2onnx.convert.from_function(\n", + " model_func,\n", + " input_signature=input_signature,\n", + " opset=13,\n", + " output_path=\"model.onnx\"\n", + ")\n", + "\n", + "print(\"✅ Successfully saved model.onnx\")\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/practice-labs/1-Application_Steps/2-inference.ipynb b/practice-labs/Alternative1-Application_Steps/Step2-inference.ipynb similarity index 100% rename from practice-labs/1-Application_Steps/2-inference.ipynb rename to practice-labs/Alternative1-Application_Steps/Step2-inference.ipynb diff --git a/practice-labs/2-Containers/1-training.ipynb b/practice-labs/Alternative2-Containers/Step1A-training.ipynb similarity index 99% rename from practice-labs/2-Containers/1-training.ipynb rename to practice-labs/Alternative2-Containers/Step1A-training.ipynb index 03fa753..9038b70 100644 --- a/practice-labs/2-Containers/1-training.ipynb +++ b/practice-labs/Alternative2-Containers/Step1A-training.ipynb @@ -1108,6 +1108,15 @@ "tree ${RUNTIME}" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The user may train several tile-based classifiers using the `tile-based-training` module. One of the tracked artifacts through MLflow is the model's weights. The next step is to retrieve the best model, based on the desired evaluation metric, from the MLflow artifact registry and convert it to the ONNX format. This activity is explained in [\"Export the Best Model to ONNX Format\"](./Step1B-ExtractModel.ipynb). Finally, this model can be integrated into the inference application package.\n", + "\n", + "> **Note:** This process has already been completed. However, users may need to repeat it with their own candidate models.\n" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/practice-labs/Alternative2-Containers/Step1B-ExtractModel.ipynb b/practice-labs/Alternative2-Containers/Step1B-ExtractModel.ipynb new file mode 100644 index 0000000..f7be326 --- /dev/null +++ b/practice-labs/Alternative2-Containers/Step1B-ExtractModel.ipynb @@ -0,0 +1,144 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Export the Best Model to ONNX Format\n", + "\n", + "This notebook provides a step-by-step tutorial for exporting a selected model from the MLflow model registry to ONNX format. The converted model is saved within the inference Python module to support the development of a new Python application and the creation of an inference Docker image, which is then published to the designated container registry. \n", + "\n", + "> **Note**: This process has already been completed. However, users may need to repeat it with their own candidate models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pip install tf2onnx onnxmltools onnxruntime onnx mlflow tensorflow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Import dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import os\n", + "import mlflow\n", + "import tensorflow as tf\n", + "import tf2onnx\n", + "import keras\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Save Model in ONNX Format\n", + "\n", + "In the cells below, the user will download the best model artifact from the MLflow model registry and then save it in the ONNX format.\n", + "\n", + "> **Note:** You may need to decrease the `desired_test_accuracy` to find active runs in the MLflow model registry.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "params = {\n", + " \"MLFLOW_TRACKING_URI\": \"http://localhost:5000/\",\n", + " \"experiment_id\": \"EuroSAT_classification\",\n", + " \n", + "}\n", + "desired_test_accuracy = 0.85" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Search for best run\n", + "active_runs = (\n", + " mlflow.search_runs(\n", + " experiment_names=[params[\"experiment_id\"]],\n", + " filter_string=f\"metrics.test_accuracy > {desired_test_accuracy}\",\n", + " search_all_experiments=True,\n", + " )\n", + " .sort_values(by=[\"metrics.test_accuracy\", \"metrics.test_precision\"], ascending=False)\n", + " .reset_index()\n", + " .loc[0]\n", + ")\n", + "run_id = active_runs[\"run_id\"]\n", + "print(f\"Selected run_id: {run_id}\")\n", + "\n", + "# Download just the .keras file\n", + "model_uri = f\"runs:/{run_id}/model/model.keras/data/model.keras\"\n", + "keras_path = mlflow.artifacts.download_artifacts(artifact_uri=model_uri)\n", + "print(f\"Downloaded Keras file path: {keras_path}\")\n", + "\n", + "# Load the Keras v3 model\n", + "keras_model = keras.models.load_model(keras_path)\n", + "\n", + "# Define input signature\n", + "input_signature = [tf.TensorSpec([None, 64, 64, 12], tf.float32, name=\"input\")]\n", + "\n", + "@tf.function(input_signature=input_signature)\n", + "def model_func(x):\n", + " return keras_model(x)\n", + "\n", + "# Convert to ONNX\n", + "onnx_model, _ = tf2onnx.convert.from_function(\n", + " model_func,\n", + " input_signature=input_signature,\n", + " opset=13,\n", + " output_path=\"model.onnx\"\n", + ")\n", + "\n", + "print(\"✅ Successfully saved model.onnx\")\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/practice-labs/2-Containers/2-inference.ipynb b/practice-labs/Alternative2-Containers/Step2-inference.ipynb similarity index 100% rename from practice-labs/2-Containers/2-inference.ipynb rename to practice-labs/Alternative2-Containers/Step2-inference.ipynb diff --git a/practice-labs/3-CWL-Workflows/1-training.ipynb b/practice-labs/Alternative3-CWL-Workflows/Step1A-training.ipynb similarity index 91% rename from practice-labs/3-CWL-Workflows/1-training.ipynb rename to practice-labs/Alternative3-CWL-Workflows/Step1A-training.ipynb index 852e30f..b3ba82f 100644 --- a/practice-labs/3-CWL-Workflows/1-training.ipynb +++ b/practice-labs/Alternative3-CWL-Workflows/Step1A-training.ipynb @@ -56,7 +56,7 @@ }, { "cell_type": "code", - "execution_count": 2, + "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" @@ -148,7 +148,7 @@ } ], "source": [ - "yq '.[\"$graph\"][0].inputs' ${WORKSPACE}/training/app-package/tile-sat-training.cwl" + "yq -e '.[\"$graph\"][0].inputs' ${WORKSPACE}/training/app-package/tile-sat-training.cwl" ] }, { @@ -160,7 +160,7 @@ }, { "cell_type": "code", - "execution_count": 3, + "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" @@ -176,7 +176,7 @@ } ], "source": [ - "yq '.[\"$graph\"][] | select(.class == \"CommandLineTool\") | .hints.DockerRequirement.dockerPull' ${WORKSPACE}/training/app-package/tile-sat-training.cwl" + "yq -e '.[\"$graph\"][] | select(.class == \"CommandLineTool\") | .hints.DockerRequirement.dockerPull' ${WORKSPACE}/training/app-package/tile-sat-training.cwl" ] }, { @@ -188,7 +188,7 @@ }, { "cell_type": "code", - "execution_count": 4, + "execution_count": null, "metadata": { "vscode": { "languageId": "shellscript" @@ -213,7 +213,7 @@ "curl -L -o ${WORKSPACE}/training/app-package/tile-sat-training.cwl \\\n", " \"https://github.com/eoap/machine-learning-process/releases/download/${VERSION}/tile-sat-training.${VERSION}.cwl\"\n", "\n", - "echo \"Updated DockerPull: \" && yq '.[\"$graph\"][] | select(.class == \"CommandLineTool\") | .hints.DockerRequirement.dockerPull' ${WORKSPACE}/training/app-package/tile-sat-training.cwl\n" + "echo \"Updated DockerPull: \" && yq -e '.[\"$graph\"][] | select(.class == \"CommandLineTool\") | .hints.DockerRequirement.dockerPull' ${WORKSPACE}/training/app-package/tile-sat-training.cwl\n" ] }, { @@ -298,6 +298,15 @@ "tree ${WORKSPACE}/runs" ] }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The user may train several tile-based classifiers using the `tile-based-training` module. One of the tracked artifacts through MLflow is the model's weights. The next step is to retrieve the best model, based on the desired evaluation metric, from the MLflow artifact registry and convert it to the ONNX format. This activity is explained in [\"Export the Best Model to ONNX Format\"](./Step1B-ExtractModel.ipynb). Finally, this model can be integrated into the inference application package.\n", + "\n", + "> **Note:** This process has already been completed. However, users may need to repeat it with their own candidate models.\n" + ] + }, { "cell_type": "markdown", "metadata": {}, diff --git a/practice-labs/Alternative3-CWL-Workflows/Step1B-ExtractModel.ipynb b/practice-labs/Alternative3-CWL-Workflows/Step1B-ExtractModel.ipynb new file mode 100644 index 0000000..f7be326 --- /dev/null +++ b/practice-labs/Alternative3-CWL-Workflows/Step1B-ExtractModel.ipynb @@ -0,0 +1,144 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Export the Best Model to ONNX Format\n", + "\n", + "This notebook provides a step-by-step tutorial for exporting a selected model from the MLflow model registry to ONNX format. The converted model is saved within the inference Python module to support the development of a new Python application and the creation of an inference Docker image, which is then published to the designated container registry. \n", + "\n", + "> **Note**: This process has already been completed. However, users may need to repeat it with their own candidate models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Install dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "pip install tf2onnx onnxmltools onnxruntime onnx mlflow tensorflow" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Import dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "import json\n", + "import os\n", + "import mlflow\n", + "import tensorflow as tf\n", + "import tf2onnx\n", + "import keras\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Save Model in ONNX Format\n", + "\n", + "In the cells below, the user will download the best model artifact from the MLflow model registry and then save it in the ONNX format.\n", + "\n", + "> **Note:** You may need to decrease the `desired_test_accuracy` to find active runs in the MLflow model registry.\n" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "params = {\n", + " \"MLFLOW_TRACKING_URI\": \"http://localhost:5000/\",\n", + " \"experiment_id\": \"EuroSAT_classification\",\n", + " \n", + "}\n", + "desired_test_accuracy = 0.85" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# Search for best run\n", + "active_runs = (\n", + " mlflow.search_runs(\n", + " experiment_names=[params[\"experiment_id\"]],\n", + " filter_string=f\"metrics.test_accuracy > {desired_test_accuracy}\",\n", + " search_all_experiments=True,\n", + " )\n", + " .sort_values(by=[\"metrics.test_accuracy\", \"metrics.test_precision\"], ascending=False)\n", + " .reset_index()\n", + " .loc[0]\n", + ")\n", + "run_id = active_runs[\"run_id\"]\n", + "print(f\"Selected run_id: {run_id}\")\n", + "\n", + "# Download just the .keras file\n", + "model_uri = f\"runs:/{run_id}/model/model.keras/data/model.keras\"\n", + "keras_path = mlflow.artifacts.download_artifacts(artifact_uri=model_uri)\n", + "print(f\"Downloaded Keras file path: {keras_path}\")\n", + "\n", + "# Load the Keras v3 model\n", + "keras_model = keras.models.load_model(keras_path)\n", + "\n", + "# Define input signature\n", + "input_signature = [tf.TensorSpec([None, 64, 64, 12], tf.float32, name=\"input\")]\n", + "\n", + "@tf.function(input_signature=input_signature)\n", + "def model_func(x):\n", + " return keras_model(x)\n", + "\n", + "# Convert to ONNX\n", + "onnx_model, _ = tf2onnx.convert.from_function(\n", + " model_func,\n", + " input_signature=input_signature,\n", + " opset=13,\n", + " output_path=\"model.onnx\"\n", + ")\n", + "\n", + "print(\"✅ Successfully saved model.onnx\")\n" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.14" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/practice-labs/3-CWL-Workflows/2-inference.ipynb b/practice-labs/Alternative3-CWL-Workflows/Step2-inference.ipynb similarity index 100% rename from practice-labs/3-CWL-Workflows/2-inference.ipynb rename to practice-labs/Alternative3-CWL-Workflows/Step2-inference.ipynb diff --git a/practice-labs/3-CWL-Workflows/params_inference.yaml b/practice-labs/Alternative3-CWL-Workflows/params_inference.yaml similarity index 100% rename from practice-labs/3-CWL-Workflows/params_inference.yaml rename to practice-labs/Alternative3-CWL-Workflows/params_inference.yaml diff --git a/practice-labs/3-CWL-Workflows/params_training.yaml b/practice-labs/Alternative3-CWL-Workflows/params_training.yaml similarity index 100% rename from practice-labs/3-CWL-Workflows/params_training.yaml rename to practice-labs/Alternative3-CWL-Workflows/params_training.yaml