Math & Data
Abductive Reasoning
Retroduction
A form of logical inference which starts with an observation or set of observations, and then seeks to find the simplest and most likely explanation for the observations.
Algorithm
A specification of how to solve a class of problems — can be a calculation, a process for data, or a form of automated reasoning.
Ambiguous Middle Term
Ambiguous Middle
A categorical syllogism that uses an ambiguous middle term to make its three-part claim, such as, "Only man is a rational animal. No woman is a man. Therefore, no woman is a rational animal."
Ansatz
An educated guess that is verified later by its results.
Anscombe's Quartet
An illustration of the difficulty of data visualization where four datasets that have nearly identical simple descriptive statistics, can be made to appear very different when graphed.
Asymptote
In geometry, an asymptote of a curve is a line such that the distance between the curve and the line approaches zero as one or both of the x or y coordinates tends to infinity — i.e. approaching a value or curve arbitrarily closely.
Bayes' Theorem
Bayesian Thinking
Mode of applying probability where rather than thinking in terms of frequency or likelihood of some phenomenon, one thinks in terms of current expectations, current states of knowledge, and a quantification of personal belief, wherein new information is processed in a systematic way as it comes in to continually improve on a given estimate.
Belief Bias
The tendency to judge the strength of arguments based on the plausibility of their conclusion rather than how strongly they support that conclusion. A person is more likely to accept arguments that supports a conclusion that aligns with their values, beliefs and prior knowledge, while rejecting counter arguments to the conclusion.
Benford's Law
Newcomb-Benford's Law · Law of Anomalous Numbers · First-Digit Law
An observation about the frequency distribution of leading digits in many real-life sets of numerical data, where in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the most significant digit about 30% of the time, while 9 appears as the most significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time.
Berkson's Paradox
Berkson's Bias · Berkson's Fallacy
A counterintuitive result in conditional probability and statistics where there is a false observation of a negative correlation between two positive traits, i.e., that members of a population which have some positive trait tend to lack a second. For example, a person may observe from their experience that fast food restaurants in their area which serve good hamburgers tend to serve bad fries and vice versa; but because they would likely not eat anywhere where both were bad, they fail to allow for the large number of restaurants in this category which would weaken or even flip the correlation.
Bulverism
Psychogenetic Fallacy
The act of inferring why an argument is being used, associating it to some psychological reason, then assuming it is invalid as a result. The assumption that if the origin of an idea comes from a biased mind, then the idea itself must also be a falsehood.
Campbell's Law
The adage that "the more any quantitative social indicator is used for social decision-making, the more subject it will be to corruption pressures and the more apt it will be to distort and corrupt the social processes it is intended to monitor."
Central Limit Theorem
In probability, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a "bell curve") even if the original variables themselves are not normally distributed.
Cherry Picking
Suppressed Evidence · Incomplete Evidence
The act of pointing at individual cases or data that seem to confirm a particular position, while ignoring a significant portion of related cases or data that may contradict that position.
Circular Reasoning
Assuming the Conclusion · Circulus in Demonstrando
The reasoner begins with what he or she is trying to end up with.
Clustering Illusion
The tendency to erroneously consider the inevitable "streaks" or "clusters" arising in small samples from random distributions to be non-random.
Confidence Interval
Error Bar
A range of values (interval) that act as good estimates of the unknown overall population parameter.
Deductive Reasoning
Reasoning from one or more premises to reach a certain conclusion. If the premises are true, then the deduction is necessarily true — it is "top-down" (in contrast with 'Induction' which is "bottom-up").
Dimensionality Reduction
Dimension Reduction
In statistics, machine learning, and information theory, the process of reducing the number of random variables under consideration by obtaining a set of principal variables.
Double Counting
Counting events or occurrences more than once — seen in accounting as a mathematical error, but in macroeconomics as an embedded challenge where boundary problems and logical unit problems arise (i.e. household input is institutional output).
Expected Value
The probability-weighted average of all possible values. For example, the expected value in rolling a six-sided die is 3.5, because the average of all the numbers that come up in an extremely large number of rolls is close to 3.5.
Fallibilism
The philosophical claim that no belief can have justification which guarantees the truth of the belief.
Fermi Problem
Back-of-the-Envelope Calculation · Ballpark · Guesstimation
A rough calculation to arrive at a reasonable estimate — unknowns and all — where the result could be considered logically approximate.
Fibonacci Sequence
Fibonacci Numbers
Numbers characterized by the fact that every number after the first two is the sum of the two preceding ones, in the following sequence: 1, 1, 2, 3, 5, 8, etc.
Galileo Gambit
Fallacious argument that someone's unconventional or controversial views are true simply because they were initially ridiculed or rejected, drawing a false comparison to Galileo's persecution by the church for his scientific discoveries.
Gambler's Fallacy
Monte Carlo Fallacy · Fallacy of the Maturity of Chances
The mistaken belief that, if something happens more frequently than normal during a given period, it will happen less frequently in the future (or vice versa).
Golden Ratio
Golden Mean · Golden Section
A mathematical relationship where the ratio of two quantities equals the ratio of their sum to the larger of the two quantities (~1.618), a pattern that appears in natural environments including the spiral arrangement of leaves and other plant parts.
Goodhart's Law
Juking the Stats
An axiom stating that once a measure becomes a target, it ceases to be a good measure.
Gray Rhino
An analogy referring to a highly probable, high impact yet neglected threat. Example include the impact of new technologies, global climate change, rising inequality, etc.
Hasty Generalization
Blanket Statement · Law of Small Numbers
Making a rushed conclusion without considering all of the environmental factors or variables.
Hill Climbing
A mathematical optimization technique where an iterative algorithm starts with an arbitrary solution to a problem, then attempts to find a better solution by making an incremental change to the solution. If the change produces a better solution, another incremental change is made to the new solution, and so on until no further improvements can be found.
Hot Hand Fallacy
Hot Hand Phenomenon
The apparent phenomenon that a person who experiences a successful outcome with a random event has a greater probability of success in further attempts. Not necessarily a fallacy, as recent studies using modern statistical analysis show there is evidence for the "hot hand" in some sporting activities.
Ignoring a Common Cause
Assuming that correlations within data show that one variable causes another, and ignoring a possible underlying variable that is responsible for variables to correlate.
Improbability Factor
Assuming that it is improbable that a known error will occur.
Included Middle
Theory proposing that logic has a three-part structure: asserting something, the negation of this assertion, and a third position that is neither or both.
Inductive Reasoning
A method of reasoning in which the premises are viewed as supplying some evidence for the truth of the conclusion (in contrast to deductive reasoning and abductive reasoning).
Insensitivity to Sample Size
The cognitive bias that occurs when people judge the probability of obtaining a sample statistic without respect to the sample size. In other words, variation is more likely in smaller samples, but people may not expect this.
Inverse Gambler's Fallacy
The fallacy of concluding, on the basis of an unlikely outcome of a random process, that the process is likely to have occurred many times before.
Kettle Logic
Using multiple, jointly-inconsistent arguments to defend a position.
Kurtosis Risk
In statistics and decision theory, the risk that results when a statistical model assumes the normal distribution, but is applied to observations that have a tendency to occasionally be much farther (in terms of number of standard deviations) from the average than is expected for a normal distribution.
Law of Large Numbers
In probability theory, a theorem that describes the result of performing the same experiment a large number of times. According to the law, the average of the results obtained from a large number of trials should be close to the expected value, and will tend to become closer as more trials are performed.
Long-Tail Distribution
In statistics, a model which describes a distribution of occurrences where a large portion of the distribution are far from the "head" or central part of the distribution. Often applied in a business, to apply to business models that can offer many different varieties of uncommon goods (Amazon or Netflix), as opposed to few varieties of common goods (Walmart).
Ludic Fallacy
The belief that the outcomes of non-regulated random occurrences can be encapsulated by statistics and modeling.
Modifiable Areal Unit Problem (MAUP)
A source of statistical bias where point-based measures of spatial phenomena are aggregated into districts, (such as population density or illness rates), and the resulting summary values (e.g., totals, rates, proportions, densities) are arbitrarily influenced by both the shape and scale of the aggregation unit.
Monte Carlo Simulation
Algorithmic approach for building simulations and predictive models where the intervention of random variables makes them hard to predict in more standard models.
Multiple Comparisons Problem
Look Elsewhere Effect · Multiplicity · Multiple Testing Problem
A phenomenon where an apparently statistically significant observation may have actually arisen by chance due to the sheer size of the parameter space to be searched.
Neglect of Probability
The tendency to disregard probability when making a decision under uncertainty. Small risks are typically either neglected entirely or hugely overrated.
Newton's Flaming Laser Sword
Alder's Razor
An anecdote on the conflicting positions of scientists and philosophers on epistemology and knowledge, which can be summarized as, "what cannot be settled by experiment is not worth debating."
Normal Distribution
Bell Curve
The bell-shaped curve of a very common distribution of probabilities (hence it being called the 'normal' distribution) where the most probable events in a series of data occur at the highest point, and all other probabilities distribute uniformly below that in both directions, creating rare event 'tails' on the sides.
Optimal Stopping Problem
Early Stopping Problem · Secretary Problem
In mathematics, a situation concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost.
Order of Magnitude
An approximate measure of the number of digits that a number has in the commonly-used base-ten number system. It is equal to the logarithm (base 10) rounded to a whole number. For example, the order of magnitude of 1500 is 3, because 1500 = 1.5 × 103.
Outlier
An observation point that is distant from other observation, due perhaps to variability in the measurement, or perhaps an indication of experimental error.
Paradox
A statement that, despite apparently valid reasoning from true premises, leads to an apparently-self-contradictory or logically unacceptable conclusion.
Power Law
A functional relationship between two quantities, where a relative change in one quantity results in a proportional relative change in the other quantity, independent of the initial size of those quantities.
Prevalence Effect
The phenomenon that one is more likely to miss (or fail to detect) a target with a low prevalence (or frequency) than a target with a high prevalence or frequency.
Proxy
A variable that is not in itself directly relevant, but that serves in place of an unobservable or immeasurable variable.
Pseudocertainty Effect
The tendency for people to perceive an outcome as certain while it is actually uncertain.
Recursion
Tautology
The instance of a thing being defined in terms of itself or of its type.
Regression Toward the Mean
The statistical tendency that for any event where luck or probability plays a role, the extreme outcomes are followed by outcomes closer to the actual average.
S Curve
Sigmoid Function
A mathematical function having a characteristic "S"-shaped (or "sigmoid") curve that exhibits a progression from small beginnings that accelerates and approaches a climax over time — displayed by many natural processes. Examples include the response of crop yield to the soil salinity, and depth to the water table in soil.
Sampling Bias
A bias in which a sample is collected in such a way that some members of the intended population are less likely to be included than others, which results in a biased sample (a non-random sample of a population (or non-human factors) in which all individuals, or instances, were not equally likely to have been selected).
Selection Bias
The selection of individuals, groups or data for analysis in such a way that proper randomization is not achieved, thereby ensuring that the sample obtained is not representative of the population intended to be analyzed.
Self-Similarity
In mathematics, the characteristic of something being exactly or approximately similar to a part of itself (i.e. the whole has the same shape as one or more of the parts). Many objects in the real world, such as coastlines, are statistically self-similar, as parts of them show the same statistical properties at many scales.
Simpson's Paradox
Low Birth-Weight Paradox
A problem in statistics where trends appear in different groups of data but disappear (or even reverse) when these groups are combined.
Stochastic
Derived from the Greek word "stochastikos," meaning "pertaining to conjecture" or "random." It is used across various fields to describe processes or systems that are inherently random or involve a degree of randomness and unpredictability.
Stochastic Volatility Models
Models where the variance of a stochastic process is itself randomly distributed — used in the field of mathematical finance to evaluate derivative securities, such as options.
Streetlight Effect
Drunkard's Search
A type of observational bias that occurs when people only search for something where it is easiest to look. Named for the well-known joke where a policeman sees a drunk man searching for something under a streetlight and asks what the drunk has lost. He says he lost his keys and they both look under the streetlight together. After a few minutes the policeman asks if he is sure he lost them here, and the drunk replies, no, and that he lost them in the park. The policeman asks why he is searching here, and the drunk replies, "this is where the light is."
Subadditivity Effect
The tendency to judge probability of the whole to be less than the probabilities of the parts.
Subway Uncertainty vs Coconut Uncertainty
Two related types of risks: Subway Uncertainty refers to well-defined, calculable risks, while Coconut Uncertainty refers to more unpredictable, unexpected risks.
Survivorship Bias
A focus on the examples that survive some process while accidentally overlooking those that did not survive — because they are no longer visible.
Systematic Bias
The inherent tendency of a process to support particular outcomes — generally referring to human systems such as institutions, but also the bias in non-human systems (such as measurement instruments or mathematical models) that leads to systematic error in measurements or estimates.
Tail Distributions
Set of probability distributions that display particular characteristics, owing to their statistical makeup, such as a or 'fat-tailed' distribution, meaning they decay like a power law, or a 'normal' tail which follows the normal distribution.
Type I and Type II Errors
False Positives Vs. False Negatives
In statistical hypothesis testing, a Type I Error is the rejection of a true null hypothesis (also known as a "false positive" finding), while a Type II Error is failing to reject a false null hypothesis (also known as a "false negative" finding).
Zipf's Law
The observation that given a large sample of words used, the frequency of any word is inversely proportional to its rank in the frequency table. Therefore, the most frequent word will occur about twice as often as the second most frequent word, three times as often as the third most frequent word, etc.