Naive Bayes Classifier Calculator & Visual Guide

Build a simple naive Bayes classifier with discrete, binary features: set class priors and conditional probabilities, classify an observation, and see the posterior probabilities and Bayes rule steps.

Naive Bayes · Bayes’ theorem · Posterior probabilities Discrete/Bernoulli features · Machine learning

1. Configure the naive Bayes model structure

Choose how many classes and binary features your model has. Features are treated as present/absent; for each class you will specify \(P(\text{feature present} \mid \text{class})\).

E.g. spam vs not spam, positive/neutral/negative sentiment, etc.

Features are binary (present/absent) in this tool.

Input mode

This calculator uses probabilities as input. To go from counts to probabilities, divide the count with feature present by the total count per class. For sparse data you may want to use Laplace smoothing.

2. Enter priors and conditional probabilities

For each class \(C_k\), enter a prior \(P(C_k)\). If you leave all priors 0 or empty, the calculator will assume equal priors. For each feature \(X_j\) and class \(C_k\), enter \(P(X_j = \text{present} \mid C_k)\).

Class Prior P(Class) Class name

Each cell is \(P(\text{feature present} \mid \text{class})\) in [0, 1]. The probability of absence is \(1 - p\).

To avoid zero-probability issues, probabilities of exactly 0 or 1 are internally clipped slightly toward the interior of (0, 1) when computing log-likelihoods.

3. Describe the observation to classify

Use the toggles to indicate which features are present in the observation. Features left off are treated as absent.

Naive Bayes classifier in a nutshell

A naive Bayes classifier uses Bayes’ theorem to assign a class label to an observation. For a discrete feature vector \(x = (x_1, \dots, x_d)\) and classes \(C_1, \dots, C_K\), the classifier predicts the class with the largest posterior probability \[ P(C_k \mid x) = \frac{P(C_k) P(x \mid C_k)}{P(x)}. \] Since \(P(x)\) is the same for every class, we only need to compare \[ P(C_k \mid x) \propto P(C_k) P(x \mid C_k). \]

The “naive” conditional independence assumption

The difficult part in high dimensions is modelling the joint likelihood \(P(x \mid C_k)\). Naive Bayes makes a simplifying – and often unrealistic – assumption: conditional independence of features given the class:

\[ P(x \mid C_k) = P(x_1, \dots, x_d \mid C_k) \approx \prod_{j=1}^{d} P(x_j \mid C_k). \]

For binary features (present/absent) this becomes especially simple. If we denote \(X_j \in \{0,1\}\) and write \[ \theta_{jk} = P(X_j = 1 \mid C_k), \] then \[ P(x_j \mid C_k) = \begin{cases} \theta_{jk}, & \text{if } x_j = 1,\\[4pt] 1 - \theta_{jk}, & \text{if } x_j = 0. \end{cases} \]

Posterior computation in this calculator

The calculator computes an unnormalized log-score for each class:

\[ s_k = \log P(C_k) + \sum_{j=1}^{d} \log P(x_j \mid C_k). \]

To avoid numerical underflow, we work in log space and then subtract the maximum score before exponentiating:

\[ \tilde p_k = \exp(s_k - s_{\max}), \quad P(C_k \mid x) = \frac{\tilde p_k}{\sum_{\ell=1}^{K} \tilde p_\ell}. \]

This yields stable posterior probabilities that sum to 1, even when there are many features.

From counts to probabilities (training a naive Bayes model)

In practice, you usually start from a labelled dataset. For a binary feature \(X_j\) and class \(C_k\), let:

  • \(n_{jk}^{(1)}\): number of training examples in class \(C_k\) with feature \(X_j = 1\)
  • \(n_{k}\): total number of training examples in class \(C_k\)

A simple estimate of the conditional probability is:

\[ \hat \theta_{jk} = \frac{n_{jk}^{(1)}}{n_k}. \]

To avoid zero probabilities, many implementations use Laplace (add-one) smoothing:

\[ \hat \theta_{jk}^{(\text{Laplace})} = \frac{n_{jk}^{(1)} + \alpha}{n_k + 2\alpha}, \] where \(\alpha = 1\) is a common choice for Bernoulli features.

FAQ – naive Bayes classifier

How does naive Bayes compare to logistic regression?

Both are probabilistic classifiers, but they model different quantities. Naive Bayes models the class-conditional likelihoods \(P(x \mid C_k)\) and priors \(P(C_k)\), then uses Bayes’ theorem to obtain the posterior. Logistic regression directly models \(P(C_k \mid x)\) using a linear function of x and the logistic link. Logistic regression does not assume feature independence but usually requires more data and iterative optimization to train.

What are typical applications of naive Bayes?

Classic applications include spam filtering, document classification, sentiment analysis and simple medical triage. Because the model is extremely fast to train and to evaluate, it is often used as a baseline for large-scale text and recommender systems.

What are the main limitations of naive Bayes?

The independence assumption is often violated: correlated features can lead to over-confident probabilities. In addition, naive Bayes with simple likelihoods (e.g. Bernoulli or Gaussian) may underfit complex class-conditional structures. It is therefore not a universal solution but a strong baseline and a good choice when data are limited and interpretability and speed are important.

Can I use this calculator for continuous features?

This specific implementation assumes binary features (present/absent). In practice, continuous naive Bayes models (e.g. Gaussian naive Bayes) are common: each feature is modelled as a normal distribution conditional on the class. To use this calculator with continuous features, you would need to first discretize them into binary indicators (for example, high vs low).