Math & Data
Abductive Reasoning
Retroduction
A form of logical inference which starts with an observation or set of observations, and then seeks to find the simplest and most likely explanation for the observations.
Algorithm
A specification of how to solve a class of problems — can be a calculation, a process for data, or a form of automated reasoning.
Ambiguous Middle Term
Ambiguous Middle
A categorical syllogism that uses an ambiguous middle term to make its three-part claim, such as, "Only man is a rational animal. No woman is a man. Therefore, no woman is a rational animal."
Ansatz
A starting assumption or educated guess used to solve a problem, which is then validated by whether it produces correct results. Common in mathematics and physics.
Anscombe's Quartet
A set of four datasets that share nearly identical summary statistics yet reveal strikingly different patterns when graphed, illustrating the importance of data visualization.
Asymptote
In geometry, an asymptote of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the x or y coordinates tends to infinity — i.e. approaching a value or curve arbitrarily closely.
Bayes' Theorem
Bayesian Thinking
Mode of applying probability where rather than thinking in terms of frequency or likelihood of some phenomenon, one thinks in terms of current expectations, current states of knowledge, and a quantification of personal belief, wherein new information is processed in a systematic way as it comes in to continually improve on a given estimate.
Belief Bias
The tendency to judge the strength of arguments based on the plausibility of their conclusion rather than the validity of the reasoning. People accept arguments with believable conclusions and reject those with unbelievable ones, regardless of logical structure.
Bell Curve
Normal Distribution
The bell-shaped curve of a very common distribution of probabilities (hence it being called the 'normal' distribution) where the most probable events in a series of data occur at the highest point, and all other probabilities distribute uniformly below that in both directions, creating rare event 'tails' on the sides.
Benford's Law
Newcomb-Benford's Law · Law of Anomalous Numbers · First-Digit Law
The surprising observation that in many real-world datasets, the leading digit is far more likely to be 1 than 9. This counterintuitive pattern is so reliable it's used to detect fraud in financial data.
Berkson's Paradox
Berkson's Bias · Berkson's Fallacy
A statistical illusion where two positive traits appear negatively correlated because you're only observing a filtered subset. The missing data you never see is what creates the false pattern.
Bulverism
Psychogenetic Fallacy
The act of inferring why an argument is being used, associating it to some psychological reason, then assuming it is invalid as a result. The assumption that if the origin of an idea comes from a biased mind, then the idea itself must also be a falsehood.
Campbell's Law
The adage that "the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
Central Limit Theorem
In probability, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed.
Cherry Picking
Suppressed Evidence · Incomplete Evidence
The act of pointing at individual cases or data that seem to confirm a particular position, while ignoring a significant portion of related cases or data that may contradict that position.
Circular Reasoning
Petitio Principii · Begging the Question
A logical fallacy where the conclusion is assumed in the premise — the argument goes in a circle, proving nothing. "It's true because it's true."
Clustering Illusion
The tendency to erroneously consider the inevitable "streaks" or "clusters" arising in small samples from random distributions to be non-random.
Confidence Interval
Error Bar
A range of values (interval) that act as good estimates of the unknown overall population parameter.
Deductive Reasoning
Reasoning from one or more premises to reach a certain conclusion. If the premises are true, then the deduction is necessarily true — it is "top-down" (in contrast with 'Induction' which is "bottom-up").
Dimensionality Reduction
Dimension Reduction
In statistics, machine learning, and information theory, the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
Double Counting
Counting events or occurrences more than once — seen in accounting as a mathematical error, but in macroeconomics as an embedded challenge where boundary problems and logical unit problems arise (i.e. household input is institutional output).
Early Stopping Problem
Optimal Stopping Problem
In mathematics, a situation concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost.
Expected Value
The probability-weighted average of all possible values. For example, the expected value in rolling a six-sided die is 3.5, because the average of all the numbers that come up in an extremely large number of rolls is close to 3.5.
Fallibilism
The philosophical claim that no belief can have justification which guarantees the truth of the belief.
Fermi Problem
Back-of-the-Envelope Calculation · Ballpark · Guesstimation
A rough calculation to arrive at a reasonable estimate — unknowns and all — where the result could be considered logically approximate.
Fibonacci Numbers
Fibonacci Sequence
Numbers characterized by the fact that every number after the first two is the sum of the two preceding ones, in the following sequence: 1, 1, 2, 3, 5, 8, etc.
Galileo Gambit
A logical fallacy in which someone argues that their rejected or ridiculed ideas must be correct because Galileo was also ridiculed and turned out to be right — ignoring that being mocked does not make one correct — i.e., "everyone says I am wrong, _therefore_ I am right."
Gambler's Fallacy
Monte Carlo Fallacy · Fallacy of the Maturity of Chances
The mistaken belief that, if something happens more frequently than normal during a given period, it will happen less frequently in the future (or vice versa).
Golden Ratio
Golden Mean · Golden Section
A mathematical relationship where the ratio of two quantities equals the ratio of their sum to the larger of the two quantities (~1.618), a pattern that appears in natural environments including the spiral arrangement of leaves and other plant parts.
Goodhart's Law
Juking the Stats
When a measure becomes a target, it ceases to be a good measure. People optimize for the metric itself rather than the outcome it was meant to track.
Gray Rhino
An analogy referring to a highly probable, high impact yet neglected threat. Example include the impact of new technologies, global climate change, rising inequality, etc.
Hasty Generalization
Blanket Statement · Overgeneralization · Fallacy of Insufficient Statistics
Drawing a broad conclusion from too few instances or an unrepresentative sample, ignoring that the evidence is insufficient to support the generalization.
Hill Climbing
A mathematical optimization technique where an iterative algorithm starts with an arbitrary solution to a problem, then attempts to find a better solution by making an incremental change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found.
Hot Hand Fallacy
Hot Hand Phenomenon
The apparent phenomenon that a person who experiences a successful outcome with a random event has a greater probability of success in further attempts. Not necessarily a fallacy, as recent studies using modern statistical analysis show there is evidence for the "hot hand" in some sporting activities.
Ignoring a Common Cause
Assuming that correlations within data show that one variable causes another, and ignoring a possible underlying variable that is responsible for variables to correlate.
Improbability Factor
The dangerous assumption that because a known error is unlikely, it won't happen. In complex systems, improbable events become inevitable given enough time or scale.
Included Middle
Theory proposing that logic has a three-part structure: asserting something, the negation of this assertion, and a third position that is neither or both.
Inductive Reasoning
Induction
A method of reasoning in which the premises are viewed as supplying some evidence for the truth of the conclusion (in contrast to deductive reasoning and abductive reasoning).
Insensitivity to Sample Size
Sample Size Neglect
The cognitive bias that occurs when people judge the probability of obtaining a sample statistic without respect to the sample size. In other words, variation is more likely in smaller samples, but people may not expect this.
Inverse Gambler's Fallacy
The fallacy of concluding, on the basis of an unlikely outcome of a random process, that the process is likely to have occurred many times before.
Kettle Logic
Using multiple contradictory arguments to defend the same position, none of which are consistent with each other. Named after Freud's story of the borrowed kettle.
Kurtosis Risk
In statistics and decision theory, the risk that results when a statistical model assumes the normal distribution, but is applied to observations that have a tendency to occasionally be much farther (in terms of number of standard deviations) from the average than is expected for a normal distribution.
Law of Large Numbers
In probability theory, a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
Long-Tail Distribution
In statistics, a model which describes a distribution of occurrences where a large portion of the distribution are far from the "head" or central part of the distribution. Often applied in a business, to apply to business models that can offer many different varieties of uncommon goods (Amazon or Netflix), as opposed to few varieties of common goods (Walmart).
Look Elsewhere Effect
Multiple Comparisons Problem
A phenomenon where an apparently statistically significant observation may have actually arisen by chance due to the sheer size of the parameter space to be searched.
Ludic Fallacy
The belief that the outcomes of non-regulated random occurrences can be encapsulated by statistics and modeling.
Modifiable Areal Unit Problem (MAUP)
MAUP
A source of statistical bias where point-based measures of spatial phenomena are aggregated into districts, (such as population density or illness rates), and the resulting summary values (e.g., totals, rates, proportions, densities) are arbitrarily influenced by both the shape and scale of the aggregation unit.
Monte Carlo Simulation
Monte Carlo Method
Algorithmic approach for building simulations and predictive models where the intervention of random variables makes them hard to predict in more standard models.
Neglect of Probability
Base Rate Neglect · Probability Neglect
The tendency to disregard probability when making a decision under uncertainty. Small risks are typically either neglected entirely or hugely overrated.
Newton's Flaming Laser Sword
Alder's Razor
Philosophical razor asserting that what cannot be settled by experiment is not worth debating — a stricter position than Occam's Razor, reflecting the view that disputes without observable consequences fall outside meaningful discourse.
Order of Magnitude
A way of expressing how large a number is by which power of ten it is closest to — used for rough comparison, where a difference of one order of magnitude means roughly a tenfold difference.
Outlier
An observation point that is distant from other observation, due perhaps to variability in the measurement, or perhaps an indication of experimental error.
Paradox
A statement that, despite apparently valid reasoning from true premises, leads to an apparently-self-contradictory or logically unacceptable conclusion.
Power Law
Pareto Distribution · Zipf's Law
A functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities.
Prevalence Effect
Low-Prevalence Effect
The phenomenon that one is more likely to miss (or fail to detect) a target with a low prevalence (or frequency) than a target with a high prevalence or frequency.
Proxy
Proxy Variable
A variable that is not in itself directly relevant, but that serves in place of an unobservable or immeasurable variable.
Pseudocertainty Effect
The tendency to treat an outcome as certain when it's merely probable, especially when framed as eliminating risk. Perceived certainty distorts decisions.
Recursion
Tautology
When something is defined in terms of itself — a function that calls itself, a story within a story. A powerful pattern in computing, mathematics, and thought.
Regression Toward the Mean
The statistical tendency that for any event where luck or probability plays a role, the extreme outcomes are followed by outcomes closer to the actual average.
S Curve
Sigmoid Function · Logistic Curve
A mathematical curve shaped like the letter S — starting slow, accelerating through a rapid growth phase, then leveling off at a natural limit. It describes technology adoption, population growth, learning curves, and many other natural processes.
Sampling Bias
A bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others, which results in a biased sample (a non-random sample of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected).
Selection Bias
Sampling Bias
The selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed.
Self-Similarity
In mathematics, the characteristic of something being exactly or approximately similar to a part of itself (i.e. the whole has the same shape as one or more of the parts). Many objects in the real world, such as coastlines, are statistically self-similar, as parts of them show the same statistical properties at many scales.
Simpson's Paradox
Low Birth-Weight Paradox
A problem in statistics where trends appear in different groups of data but disappear (or even reverse) when these groups are combined.
Stochastic
A process or system that is inherently random or involves unpredictability.
Stochastic Volatility Models
Models where the variance of a stochastic process is itself randomly distributed — used in the field of mathematical finance to evaluate derivative securities, such as options.
Streetlight Effect
Drunkard's Search
The tendency to search for answers only where it's easiest to look, rather than where they're most likely to be found. Named after the joke about a man looking for lost keys under a streetlight because "that's where the light is."
Subadditivity Effect
The tendency to judge the probability of an overall event as less than the sum of its component parts — leading to probability estimates that add up to more than 100%.
Subway Uncertainty vs Coconut Uncertainty
A framework for two kinds of risk: subway uncertainty covers predictable, quantifiable variability (your commute might be late, but within bounds), while coconut uncertainty covers rare, unforeseeable events that defy statistical modeling.
Survivorship Bias
A focus on the examples that survive some process while accidentally overlooking those that did not survive — because they are no longer visible.
Systematic Bias
Systematic Error
The inherent tendency of a process to support particular outcomes — generally referring to human systems such as institutions, but also the bias in non-human systems (such as measurement instruments or mathematical models) that leads to systematic error in measurements or estimates.
Tail Distributions
Set of probability distributions that display particular characteristics, owing to their statistical makeup, such as a or 'fat-tailed' distribution, meaning they decay like a power law, or a 'normal' tail which follows the normal distribution.
Type I and Type II Errors
False Positives Vs. False Negatives
In statistical hypothesis testing, a Type I Error is the rejection of a true null hypothesis (also known as a "false positive" finding), while a Type II Error is failing to reject a false null hypothesis (also known as a "false negative" finding).
Zipf's Law
The observation that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table. Therefore, the most frequent word will occur about twice as often as the second most frequent word, three times as often as the third most frequent word, etc.