This repository contains a Python machine learning pipeline for classifying human activities from radar-derived feature tables. The classifier uses hand-crafted numerical features extracted from radar spectrograms and predicts one of six activity classes.
| Label | Activity |
|---|---|
| 1 | Walking back and forth |
| 2 | Sitting down on a chair |
| 3 | Standing up |
| 4 | Bending to pick up an object |
| 5 | Drinking from a cup |
| 6 | Falling down |
The pipeline uses an RBF-kernel Support Vector Machine (SVM). During training, model performance is evaluated with subject-wise cross-validation using GroupKFold, feature subsets are selected with Sequential Forward Selection, and SVM hyperparameters are tuned with grid search. The final model is then trained on the full training set and evaluated once on the unseen test set.
.
├── run_classifier_pipeline.py
├── training_classifier.py
├── testing_classifier.py
│
├── data/
│ ├── training_features.csv
│ ├── testing_features.csv
│ ├── sfs_subset_size_results.csv
│ ├── single_feature_results.csv
│ └── trained_classifier_config.json
│
└── figures/
├── macro_f1_vs_number_of_features.png
├── accuracy_vs_number_of_features.png
├── training_cv_confusion_matrix.png
└── test_confusion_matrix.png
Only the input CSV files are required before running the pipeline:
data/training_features.csv
data/testing_features.csv
The other files in data/ and figures/ are generated by the scripts.
Both the training and testing CSV files should have the following structure:
File,Activity,Feature_1,Feature_2,Feature_3,...
The required columns are:
| Column | Description |
|---|---|
File |
Radar recording filename |
Activity |
Activity label |
| Remaining columns | Numerical features used for classification |
The subject ID is extracted from the filename using the pattern P<number>. For example:
1P36A01R01.dat
is interpreted as subject 36.
This subject ID is used for GroupKFold, which keeps all samples from the same subject in the same fold. This avoids training and validating on the same subject.
Install the required Python packages:
pip install pandas matplotlib scikit-learnThe scripts use:
pandas
matplotlib
scikit-learn
From the repository root, run:
python run_classifier_pipeline.pyThis executes both stages:
training_classifier.pytesting_classifier.py
The training stage selects features, tunes the SVM, and saves the final configuration. The testing stage trains the final model on the full training set and evaluates it on the unseen test set.
The project does not use command-line arguments. Settings are defined directly in the scripts.
The main settings are in run_classifier_pipeline.py:
TRAINING_CSV_PATH = Path("data/training_features.csv")
TESTING_CSV_PATH = Path("data/testing_features.csv")
N_SPLITS = 5
N_FEATURES_TO_SELECT = 5
AUTO_CHOOSE_K = False
RUN_SFS_SUBSET_TEST = False
SUBSET_RESULTS_CSV = Path("data/sfs_subset_size_results.csv")
SINGLE_FEATURE_RESULTS_CSV = Path("data/single_feature_results.csv")
OUTPUT_CONFIG_PATH = Path("data/trained_classifier_config.json")
TEST_CONFUSION_MATRIX_PATH = Path("figures/test_confusion_matrix.png")
SHOW_PLOTS = FalseChange these variables if you want to use different input files, select a different number of features, or recompute the expensive feature-subset experiment.
Controls how many features are selected by Sequential Forward Selection.
N_FEATURES_TO_SELECT = 5If set to True, the number of selected features is chosen from the saved subset-size results based on the best macro F1-score.
AUTO_CHOOSE_K = TrueIf set to True, the script recomputes the expensive experiment where performance is evaluated for increasing numbers of selected features.
RUN_SFS_SUBSET_TEST = TrueIf set to False, the script loads data/sfs_subset_size_results.csv. If this file is missing, the training script automatically runs the experiment and creates it.
By default, plots are saved but not shown interactively.
SHOW_PLOTS = FalseThe scripts use Matplotlib's non-interactive Agg backend, so figures are saved safely without opening GUI windows.
The training pipeline performs the following steps:
Load training features
↓
Split into X, y, and subject groups
↓
Create GroupKFold cross-validation splits
↓
Evaluate baseline SVM with all features
↓
Evaluate each single feature
↓
Run or load SFS subset-size experiment
↓
Select final feature subset
↓
Tune SVM hyperparameters with GridSearchCV
↓
Evaluate selected + tuned model with cross-validation
↓
Save selected features and best parameters to JSON
Generated training outputs:
data/sfs_subset_size_results.csv
data/single_feature_results.csv
data/trained_classifier_config.json
figures/macro_f1_vs_number_of_features.png
figures/accuracy_vs_number_of_features.png
figures/training_cv_confusion_matrix.png
The testing pipeline performs the following steps:
Load training features
↓
Load testing features
↓
Use selected features and best SVM parameters from training
↓
Train final model on the full training set
↓
Predict labels for the unseen testing set
↓
Print final performance metrics
↓
Save test confusion matrix
Generated testing output:
figures/test_confusion_matrix.png
The test set is only used for final evaluation. It should not be used for feature selection, hyperparameter tuning, or model selection.
python training_classifier.pyThis runs the training pipeline and saves the configuration to:
data/trained_classifier_config.json
python testing_classifier.pyThis requires that data/trained_classifier_config.json already exists. The testing script loads the selected features and best SVM hyperparameters from this file.
| File | Description |
|---|---|
data/single_feature_results.csv |
Cross-validated performance of each individual feature |
data/sfs_subset_size_results.csv |
Performance for increasing numbers of selected features |
data/trained_classifier_config.json |
Selected features and best SVM hyperparameters |
figures/macro_f1_vs_number_of_features.png |
Macro F1 versus number of features |
figures/accuracy_vs_number_of_features.png |
Accuracy versus number of features |
figures/training_cv_confusion_matrix.png |
Training cross-validation confusion matrix |
figures/test_confusion_matrix.png |
Final unseen test confusion matrix |
- The SVM is implemented as a
PipelinewithStandardScalerfollowed bySVC(kernel="rbf"). - Scaling is performed inside the pipeline to avoid data leakage during cross-validation.
GroupKFoldis used to prevent samples from the same subject appearing in both training and validation folds.- Existing output figures and CSV files are overwritten when the pipeline is rerun.
- The final test results are the main independent estimate of model performance.