Skip to content

Add Gaussian Naive Bayes classifier in machine_learning/#14853

Open
PRERITARYA wants to merge 2 commits into
TheAlgorithms:masterfrom
PRERITARYA:add/gaussian-naive-bayes
Open

Add Gaussian Naive Bayes classifier in machine_learning/#14853
PRERITARYA wants to merge 2 commits into
TheAlgorithms:masterfrom
PRERITARYA:add/gaussian-naive-bayes

Conversation

@PRERITARYA

Copy link
Copy Markdown

Describe your change:

Add Gaussian Naive Bayes classifier implemented from scratch without any
external ML libraries (no sklearn).

Implements the full pipeline:

  • separate_by_class: splits training data by class label
  • compute_mean_variance: computes per-feature Gaussian statistics
  • train: fits priors and per-class feature summaries
  • gaussian_log_probability: evaluates the Gaussian PDF in log space
  • predict / predict_single: classifies new samples
  • accuracy: evaluates classifier performance
  • Add an algorithm?
  • Fix a bug or typo in an existing algorithm?
  • Add or change doctests?
  • Documentation change?

Checklist:

  • I have read CONTRIBUTING.md.
  • This pull request is all my own work -- I have not plagiarized.
  • I know that pull requests will not be merged if they fail the automated tests.
  • This PR only changes one algorithm file. To ease review, please open separate PRs for separate algorithms.
  • All new Python files are placed inside an existing directory.
  • All filenames are in all lowercase characters with no spaces or dashes.
  • All functions and variable names follow Python naming conventions.
  • All function parameters and return values are annotated with Python type hints.
  • All functions have doctests that pass the automated testing.
  • All new algorithms include at least one URL that points to Wikipedia or another similar explanation.

Copilot AI review requested due to automatic review settings June 24, 2026 03:20
@algorithms-keeper algorithms-keeper Bot added awaiting reviews This PR is ready to be reviewed require descriptive names This PR needs descriptive function and/or variable names labels Jun 24, 2026

@algorithms-keeper algorithms-keeper Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Click here to look at the relevant links ⬇️

🔗 Relevant Links

Repository:

Python:

Automated review generated by algorithms-keeper. If there's any problem regarding this review, please open an issue about it.

algorithms-keeper commands and options

algorithms-keeper actions can be triggered by commenting on this PR:

  • @algorithms-keeper review to trigger the checks for only added pull request files
  • @algorithms-keeper review-all to trigger the checks for all the pull request files, including the modified files. As we cannot post review comments on lines not part of the diff, this command will post all the messages in one comment.

NOTE: Commands are in beta and so this feature is restricted only to a member or owner of the organization.


return priors, summaries


Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide descriptive name for the parameter: x

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a from-scratch Gaussian Naive Bayes classifier implementation under machine_learning/, intended to provide a lightweight probabilistic classifier without external ML dependencies.

Changes:

  • Introduces training helpers to compute per-class priors and per-feature Gaussian summaries (mean/variance).
  • Implements log-space Gaussian likelihood scoring for stable prediction.
  • Adds doctests for core helpers and an executable doctest.testmod() entrypoint.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +147 to +161
n_samples = len(data)
separated = separate_by_class(data, labels)

priors: dict[int, float] = {}
summaries: dict[int, list[tuple[float, float]]] = {}

for class_label, class_samples in separated.items():
priors[class_label] = math.log(len(class_samples) / n_samples)
# transpose to get per-feature lists
features_by_column = [
[row[col] for row in class_samples] for col in range(len(class_samples[0]))
]
summaries[class_label] = [
compute_mean_variance(column) for column in features_by_column
]
Comment on lines +226 to +229
for class_label, feature_summaries in summaries.items():
score = priors[class_label]
for feature_value, (mean, variance) in zip(feature_vector, feature_summaries):
score += gaussian_log_probability(feature_value, mean, variance)
Comment on lines +302 to +305
if not predictions:
raise ValueError("Inputs must not be empty.")
if len(predictions) != len(actual):
raise ValueError("Predictions and actual labels must have the same length.")
Comment on lines +24 to +25
Time Complexity: O(n * k * d) for training, O(k * d) for prediction
where n = samples, k = classes, d = features
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed require descriptive names This PR needs descriptive function and/or variable names

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants