The Real Reason Why Logistic Regression Uses the Sigmoid Function
Elijah TobsBy Elijah Tobs
Tech
Jun 1, 2026 • 7:09 AM
8m8 min read
Verified
Source: Unsplash
The Core Insight
This article deconstructs the common, often flawed, explanations for why the sigmoid function is used in logistic regression. By moving beyond the 'squashing' intuition, it provides a formal derivation using Bayes' theorem, showing that the sigmoid function arises naturally when modeling posterior probabilities for binary classification under Gaussian assumptions. It further explores the trade-offs between generative and discriminative modeling approaches.
Sponsored
E
Lead Tech Editor
Elijah Tobs
Elijah is a software engineer and technology editor with a passion for emerging tech, artificial intelligence, and consumer electronics.
The Kodawire Editorial Team consists of experienced journalists and subject matter experts dedicated to delivering accurate, well-researched, and engaging content.
The Sigmoid isn't arbitrary: It emerges naturally from Bayes' Rule when modeling binary classification with Gaussian class-conditional distributions.
Log-odds is a result, not a cause: Logistic regression doesn't "start" by modeling log-odds; that relationship is a mathematical consequence of the sigmoid function.
Generative vs. Discriminative: Use generative models (like LDA) when you have prior knowledge of data distributions; use discriminative models (like Logistic Regression) when you prefer flexibility and feature engineering.
Feature Engineering is key: If your data has unequal variances or priors, standard logistic regression will fail to capture the non-linear decision boundary unless you manually add polynomial features.
If you’ve spent time in data science, you’ve likely been told that logistic regression uses the sigmoid function to "squash" linear outputs into a probability range of [0, 1]. It’s a convenient, tidy explanation. But after years of working with these models, I’ve found it to be fundamentally hollow. It treats the sigmoid as a magic wand rather than a mathematical necessity.
The sigmoid function is a mathematical inevitability, not an arbitrary choice. (Credit: Jeswin Thomas via Unsplash)
Most online resources treat the sigmoid function as an arbitrary choice. They suggest it’s just a way to prevent gradient issues or that it’s "just what we do." These explanations are lazy and technically incorrect. The sigmoid function isn't a design choice; it is a mathematical inevitability derived from first principles.
Why You Can Trust This
I’ve spent the last decade building and auditing machine learning pipelines. When I set out to demystify the sigmoid, I didn't rely on the standard "squashing" narrative. Instead, I went back to the foundational probability theory, specifically Bayes' Rule, to see how the sigmoid function emerges when we treat classification as a problem of estimating posterior probabilities. This article is the result of that deep-dive, stripping away the marketing-speak often found in introductory tutorials.
Deriving Sigmoid from First Principles
To understand why the sigmoid is the "correct" function, we have to stop looking at it as a transformation of linear regression and start looking at it as a posterior probability. Imagine we have two classes, A and B, sampled from two normal distributions with equal variance and equal priors. If we want to classify a new point, we need to calculate the posterior probability $P(y=1|X)$.
When you substitute the Gaussian probability density functions into this equation and assume equal variance and priors, the quadratic terms cancel out. What remains is a ratio that simplifies perfectly into the sigmoid function. This is the "Aha!" moment: the sigmoid isn't something we force onto the data; it is the natural shape of the posterior probability when our class distributions are Gaussian.
In my own testing, I’ve found that the "linear" assumption of logistic regression is its greatest weakness. If you are working with datasets where the classes have unequal variances, common in real-world fraud detection or medical diagnostics, a standard logistic regression model will produce a linear decision boundary that misses the nuance of the data. To fix this, you must perform polynomial feature engineering. Without it, you are essentially forcing a straight line through a parabolic reality.
Visualizing class distributions helps determine if a linear boundary is sufficient. (Credit: Brian McGowan via Unsplash)
The Other Side of the Story
Most practitioners argue that logistic regression is "simple and robust." I disagree. It is only robust if you understand the underlying distribution of your features. If you blindly apply logistic regression to complex, non-linear data without feature engineering, you aren't being "simple", you are being inaccurate. The "simplicity" of logistic regression is often a mask for a lack of rigorous data analysis.
Generative vs. Discriminative: The Strategic Trade-off
The choice between generative models (like Naive Bayes or LDA) and discriminative models (like Logistic Regression) is a strategic one. Generative models require you to make strong assumptions about the data distribution. If those assumptions are correct, you need significantly less data to reach high accuracy. Discriminative models, however, are more flexible. They don't care about the underlying distribution of the features, but they demand that you do the heavy lifting through feature engineering.
The Decision Matrix
Not sure which model to use? Follow this logic:
Do you have prior knowledge of the data distribution? Use a Generative Model (e.g., LDA).
Is your data complex or are you unsure of the distribution? Use a Discriminative Model (e.g., Logistic Regression).
Is your decision boundary non-linear? If using Logistic Regression, you must add polynomial features.
Future-Proofing Your Setup
While deep learning and transformer-based models dominate the headlines, logistic regression remains a staple for interpretability. However, the reliance on "black box" models is being challenged by a need for explainable AI (XAI). Logistic regression is inherently interpretable, but only if you don't over-engineer your features to the point of obfuscation. Keep your features meaningful, and your model will remain relevant.
Analytical Synthesis: When Assumptions Break
When we violate the Gaussian assumptions, specifically when we have unequal variances or priors, the decision boundary becomes parabolic. If you try to model this with a standard logistic regression, you will see the model struggle to separate the classes. This is where the practitioner's skill comes in. You aren't just training a model; you are mapping the geometry of your data. If the geometry is curved, your model must be capable of curvature.
Rigorous feature engineering is essential when dealing with non-linear data boundaries. (Credit: Mehedi Hasan via Unsplash)
My Recommended Setup
Scikit-Learn: For standard logistic regression and quick baseline testing.
Statsmodels: When I need deep statistical summaries and p-values to validate feature significance.
Pandas/NumPy: For the manual feature engineering required to handle non-linear boundaries.
The Practical Verdict
Logistic regression is not just a "linear model with a sigmoid." It is a powerful tool that, when understood through the lens of Bayes' Rule, reveals exactly why it works and where it fails. Stop treating the sigmoid as a black box. Start treating it as a consequence of your data's distribution. If you do that, you’ll stop guessing which model to use and start knowing.
Do you prefer the "black box" flexibility of modern neural networks, or do you still find yourself reaching for the interpretability of logistic regression in your daily work? I’ll be in the comments for the next 24 hours to discuss your experiences with model selection.
The sigmoid function is not an arbitrary choice; it is a mathematical inevitability that emerges from Bayes' Rule when modeling binary classification with Gaussian class-conditional distributions.
You should use a generative model (like LDA) when you have prior knowledge of the data distribution, as they can reach high accuracy with less data under correct assumptions.
Standard logistic regression will produce a linear decision boundary that may fail to separate classes effectively. You must perform polynomial feature engineering to capture non-linear boundaries.
Active Engagement
Was this information helpful?
Join Discussions
0 Thoughts
Editorial Team • Question of the Day
"How often do you find yourself performing manual feature engineering to fix a model's performance, versus just switching to a more complex algorithm?"