Radar-Based Human Activity Classification

This repository contains a Python machine learning pipeline for classifying human activities from radar-derived feature tables. The classifier uses hand-crafted numerical features extracted from radar spectrograms and predicts one of six activity classes.

Label	Activity
1	Walking back and forth
2	Sitting down on a chair
3	Standing up
4	Bending to pick up an object
5	Drinking from a cup
6	Falling down

The pipeline uses an RBF-kernel Support Vector Machine (SVM). During training, model performance is evaluated with subject-wise cross-validation using GroupKFold, feature subsets are selected with Sequential Forward Selection, and SVM hyperparameters are tuned with grid search. The final model is then trained on the full training set and evaluated once on the unseen test set.

Repository Structure

.
├── run_classifier_pipeline.py
├── training_classifier.py
├── testing_classifier.py
│
├── data/
│   ├── training_features.csv
│   ├── testing_features.csv
│   ├── sfs_subset_size_results.csv
│   ├── single_feature_results.csv
│   └── trained_classifier_config.json
│
└── figures/
    ├── macro_f1_vs_number_of_features.png
    ├── accuracy_vs_number_of_features.png
    ├── training_cv_confusion_matrix.png
    └── test_confusion_matrix.png

Only the input CSV files are required before running the pipeline:

data/training_features.csv
data/testing_features.csv

The other files in data/ and figures/ are generated by the scripts.

Input Data Format

Both the training and testing CSV files should have the following structure:

File,Activity,Feature_1,Feature_2,Feature_3,...

The required columns are:

Column	Description
`File`	Radar recording filename
`Activity`	Activity label
Remaining columns	Numerical features used for classification

The subject ID is extracted from the filename using the pattern P<number>. For example:

1P36A01R01.dat

is interpreted as subject 36.

This subject ID is used for GroupKFold, which keeps all samples from the same subject in the same fold. This avoids training and validating on the same subject.

Requirements

Install the required Python packages:

pip install pandas matplotlib scikit-learn

The scripts use:

pandas
matplotlib
scikit-learn

How to Run

Run the full pipeline

From the repository root, run:

python run_classifier_pipeline.py

This executes both stages:

training_classifier.py
testing_classifier.py

The training stage selects features, tunes the SVM, and saves the final configuration. The testing stage trains the final model on the full training set and evaluates it on the unseen test set.

Main Settings

The project does not use command-line arguments. Settings are defined directly in the scripts.

The main settings are in run_classifier_pipeline.py:

TRAINING_CSV_PATH = Path("data/training_features.csv")
TESTING_CSV_PATH = Path("data/testing_features.csv")

N_SPLITS = 5
N_FEATURES_TO_SELECT = 5
AUTO_CHOOSE_K = False
RUN_SFS_SUBSET_TEST = False

SUBSET_RESULTS_CSV = Path("data/sfs_subset_size_results.csv")
SINGLE_FEATURE_RESULTS_CSV = Path("data/single_feature_results.csv")
OUTPUT_CONFIG_PATH = Path("data/trained_classifier_config.json")

TEST_CONFUSION_MATRIX_PATH = Path("figures/test_confusion_matrix.png")
SHOW_PLOTS = False

Change these variables if you want to use different input files, select a different number of features, or recompute the expensive feature-subset experiment.

Important Options

`N_FEATURES_TO_SELECT`

Controls how many features are selected by Sequential Forward Selection.

N_FEATURES_TO_SELECT = 5

`AUTO_CHOOSE_K`

If set to True, the number of selected features is chosen from the saved subset-size results based on the best macro F1-score.

AUTO_CHOOSE_K = True

`RUN_SFS_SUBSET_TEST`

If set to True, the script recomputes the expensive experiment where performance is evaluated for increasing numbers of selected features.

RUN_SFS_SUBSET_TEST = True

If set to False, the script loads data/sfs_subset_size_results.csv. If this file is missing, the training script automatically runs the experiment and creates it.

`SHOW_PLOTS`

By default, plots are saved but not shown interactively.

SHOW_PLOTS = False

The scripts use Matplotlib's non-interactive Agg backend, so figures are saved safely without opening GUI windows.

Training Pipeline

The training pipeline performs the following steps:

Load training features
↓
Split into X, y, and subject groups
↓
Create GroupKFold cross-validation splits
↓
Evaluate baseline SVM with all features
↓
Evaluate each single feature
↓
Run or load SFS subset-size experiment
↓
Select final feature subset
↓
Tune SVM hyperparameters with GridSearchCV
↓
Evaluate selected + tuned model with cross-validation
↓
Save selected features and best parameters to JSON

Generated training outputs:

data/sfs_subset_size_results.csv
data/single_feature_results.csv
data/trained_classifier_config.json
figures/macro_f1_vs_number_of_features.png
figures/accuracy_vs_number_of_features.png
figures/training_cv_confusion_matrix.png

Testing Pipeline

The testing pipeline performs the following steps:

Load training features
↓
Load testing features
↓
Use selected features and best SVM parameters from training
↓
Train final model on the full training set
↓
Predict labels for the unseen testing set
↓
Print final performance metrics
↓
Save test confusion matrix

Generated testing output:

figures/test_confusion_matrix.png

The test set is only used for final evaluation. It should not be used for feature selection, hyperparameter tuning, or model selection.

Running Individual Scripts

Training only

python training_classifier.py

This runs the training pipeline and saves the configuration to:

data/trained_classifier_config.json

Testing only

python testing_classifier.py

This requires that data/trained_classifier_config.json already exists. The testing script loads the selected features and best SVM hyperparameters from this file.

Output Files

File	Description
`data/single_feature_results.csv`	Cross-validated performance of each individual feature
`data/sfs_subset_size_results.csv`	Performance for increasing numbers of selected features
`data/trained_classifier_config.json`	Selected features and best SVM hyperparameters
`figures/macro_f1_vs_number_of_features.png`	Macro F1 versus number of features
`figures/accuracy_vs_number_of_features.png`	Accuracy versus number of features
`figures/training_cv_confusion_matrix.png`	Training cross-validation confusion matrix
`figures/test_confusion_matrix.png`	Final unseen test confusion matrix

Notes

The SVM is implemented as a Pipeline with StandardScaler followed by SVC(kernel="rbf").
Scaling is performed inside the pipeline to avoid data leakage during cross-validation.
GroupKFold is used to prevent samples from the same subject appearing in both training and validation folds.
Existing output figures and CSV files are overwritten when the pipeline is rerun.
The final test results are the main independent estimate of model performance.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.idea		.idea
__pycache__		__pycache__
data		data
figures		figures
report		report
README.md		README.md
run_classifier_pipeline.py		run_classifier_pipeline.py
testing_classifier.py		testing_classifier.py
training_classifier.py		training_classifier.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Radar-Based Human Activity Classification

Repository Structure

Input Data Format

Requirements

How to Run

Run the full pipeline

Main Settings

Important Options

`N_FEATURES_TO_SELECT`

`AUTO_CHOOSE_K`

`RUN_SFS_SUBSET_TEST`

`SHOW_PLOTS`

Training Pipeline

Testing Pipeline

Running Individual Scripts

Training only

Testing only

Output Files

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Radar-Based Human Activity Classification

Repository Structure

Input Data Format

Requirements

How to Run

Run the full pipeline

Main Settings

Important Options

N_FEATURES_TO_SELECT

AUTO_CHOOSE_K

RUN_SFS_SUBSET_TEST

SHOW_PLOTS

Training Pipeline

Testing Pipeline

Running Individual Scripts

Training only

Testing only

Output Files

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`N_FEATURES_TO_SELECT`

`AUTO_CHOOSE_K`

`RUN_SFS_SUBSET_TEST`

`SHOW_PLOTS`

Packages