Skip to content

ardabys/Object-Classification-with-Radar

Repository files navigation

Radar-Based Human Activity Classification

This repository contains a Python machine learning pipeline for classifying human activities from radar-derived feature tables. The classifier uses hand-crafted numerical features extracted from radar spectrograms and predicts one of six activity classes.

Label Activity
1 Walking back and forth
2 Sitting down on a chair
3 Standing up
4 Bending to pick up an object
5 Drinking from a cup
6 Falling down

The pipeline uses an RBF-kernel Support Vector Machine (SVM). During training, model performance is evaluated with subject-wise cross-validation using GroupKFold, feature subsets are selected with Sequential Forward Selection, and SVM hyperparameters are tuned with grid search. The final model is then trained on the full training set and evaluated once on the unseen test set.


Repository Structure

.
├── run_classifier_pipeline.py
├── training_classifier.py
├── testing_classifier.py
│
├── data/
│   ├── training_features.csv
│   ├── testing_features.csv
│   ├── sfs_subset_size_results.csv
│   ├── single_feature_results.csv
│   └── trained_classifier_config.json
│
└── figures/
    ├── macro_f1_vs_number_of_features.png
    ├── accuracy_vs_number_of_features.png
    ├── training_cv_confusion_matrix.png
    └── test_confusion_matrix.png

Only the input CSV files are required before running the pipeline:

data/training_features.csv
data/testing_features.csv

The other files in data/ and figures/ are generated by the scripts.


Input Data Format

Both the training and testing CSV files should have the following structure:

File,Activity,Feature_1,Feature_2,Feature_3,...

The required columns are:

Column Description
File Radar recording filename
Activity Activity label
Remaining columns Numerical features used for classification

The subject ID is extracted from the filename using the pattern P<number>. For example:

1P36A01R01.dat

is interpreted as subject 36.

This subject ID is used for GroupKFold, which keeps all samples from the same subject in the same fold. This avoids training and validating on the same subject.


Requirements

Install the required Python packages:

pip install pandas matplotlib scikit-learn

The scripts use:

pandas
matplotlib
scikit-learn

How to Run

Run the full pipeline

From the repository root, run:

python run_classifier_pipeline.py

This executes both stages:

  1. training_classifier.py
  2. testing_classifier.py

The training stage selects features, tunes the SVM, and saves the final configuration. The testing stage trains the final model on the full training set and evaluates it on the unseen test set.


Main Settings

The project does not use command-line arguments. Settings are defined directly in the scripts.

The main settings are in run_classifier_pipeline.py:

TRAINING_CSV_PATH = Path("data/training_features.csv")
TESTING_CSV_PATH = Path("data/testing_features.csv")

N_SPLITS = 5
N_FEATURES_TO_SELECT = 5
AUTO_CHOOSE_K = False
RUN_SFS_SUBSET_TEST = False

SUBSET_RESULTS_CSV = Path("data/sfs_subset_size_results.csv")
SINGLE_FEATURE_RESULTS_CSV = Path("data/single_feature_results.csv")
OUTPUT_CONFIG_PATH = Path("data/trained_classifier_config.json")

TEST_CONFUSION_MATRIX_PATH = Path("figures/test_confusion_matrix.png")
SHOW_PLOTS = False

Change these variables if you want to use different input files, select a different number of features, or recompute the expensive feature-subset experiment.


Important Options

N_FEATURES_TO_SELECT

Controls how many features are selected by Sequential Forward Selection.

N_FEATURES_TO_SELECT = 5

AUTO_CHOOSE_K

If set to True, the number of selected features is chosen from the saved subset-size results based on the best macro F1-score.

AUTO_CHOOSE_K = True

RUN_SFS_SUBSET_TEST

If set to True, the script recomputes the expensive experiment where performance is evaluated for increasing numbers of selected features.

RUN_SFS_SUBSET_TEST = True

If set to False, the script loads data/sfs_subset_size_results.csv. If this file is missing, the training script automatically runs the experiment and creates it.

SHOW_PLOTS

By default, plots are saved but not shown interactively.

SHOW_PLOTS = False

The scripts use Matplotlib's non-interactive Agg backend, so figures are saved safely without opening GUI windows.


Training Pipeline

The training pipeline performs the following steps:

Load training features
↓
Split into X, y, and subject groups
↓
Create GroupKFold cross-validation splits
↓
Evaluate baseline SVM with all features
↓
Evaluate each single feature
↓
Run or load SFS subset-size experiment
↓
Select final feature subset
↓
Tune SVM hyperparameters with GridSearchCV
↓
Evaluate selected + tuned model with cross-validation
↓
Save selected features and best parameters to JSON

Generated training outputs:

data/sfs_subset_size_results.csv
data/single_feature_results.csv
data/trained_classifier_config.json
figures/macro_f1_vs_number_of_features.png
figures/accuracy_vs_number_of_features.png
figures/training_cv_confusion_matrix.png

Testing Pipeline

The testing pipeline performs the following steps:

Load training features
↓
Load testing features
↓
Use selected features and best SVM parameters from training
↓
Train final model on the full training set
↓
Predict labels for the unseen testing set
↓
Print final performance metrics
↓
Save test confusion matrix

Generated testing output:

figures/test_confusion_matrix.png

The test set is only used for final evaluation. It should not be used for feature selection, hyperparameter tuning, or model selection.


Running Individual Scripts

Training only

python training_classifier.py

This runs the training pipeline and saves the configuration to:

data/trained_classifier_config.json

Testing only

python testing_classifier.py

This requires that data/trained_classifier_config.json already exists. The testing script loads the selected features and best SVM hyperparameters from this file.


Output Files

File Description
data/single_feature_results.csv Cross-validated performance of each individual feature
data/sfs_subset_size_results.csv Performance for increasing numbers of selected features
data/trained_classifier_config.json Selected features and best SVM hyperparameters
figures/macro_f1_vs_number_of_features.png Macro F1 versus number of features
figures/accuracy_vs_number_of_features.png Accuracy versus number of features
figures/training_cv_confusion_matrix.png Training cross-validation confusion matrix
figures/test_confusion_matrix.png Final unseen test confusion matrix

Notes

  • The SVM is implemented as a Pipeline with StandardScaler followed by SVC(kernel="rbf").
  • Scaling is performed inside the pipeline to avoid data leakage during cross-validation.
  • GroupKFold is used to prevent samples from the same subject appearing in both training and validation folds.
  • Existing output figures and CSV files are overwritten when the pipeline is rerun.
  • The final test results are the main independent estimate of model performance.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages