Topic: Statistics (Page 7)

You are looking at all articles with the topic "Statistics". We found 70 matches.

Hint: To view all topics, click here. Too see the most popular topics, click here instead.

🔗 Chinese restaurant process
🛈

🔗 Statistics

In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 sits at the first table. The next customer either sits at the same table as customer 1, or the next table. This continues, with each customer choosing to either sit at an occupied table with a probability proportional to the number of customers already there (i.e., they are more likely to sit at a table with many customers than few), or an unoccupied table. At time n, the n customers have been partitioned among m ≤ n tables (or blocks of the partition). The results of this process are exchangeable, meaning the order in which the customers sit does not affect the probability of the final distribution. This property greatly simplifies a number of problems in population genetics, linguistic analysis, and image recognition.

David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book.

Discussed on

"Chinese restaurant process" | 2014-02-17 | 11 Upvotes 5 Comments

🔗 Datasaurus dozen – Different datasets with the same descriptive statistics
🛈

🔗 Mathematics 🔗 Statistics

The Datasaurus dozen comprises thirteen data sets that have nearly identical simple descriptive statistics to two decimal places, yet have very different distributions and appear very different when graphed. It was inspired by the smaller Anscombe's quartet that was created in

Discussed on

"Datasaurus dozen – Different datasets with the same descriptive statistics" | 2024-12-14 | 14 Upvotes 1 Comments

🔗 Survivorship bias – Wikipedia
🛈

🔗 Statistics

Survivorship bias or survival bias is the logical error of concentrating on the people or things that made it past some selection process and overlooking those that did not, typically because of their lack of visibility. This can lead to false conclusions in several different ways. It is a form of selection bias.

Survivorship bias can lead to overly optimistic beliefs because failures are ignored, such as when companies that no longer exist are excluded from analyses of financial performance. It can also lead to the false belief that the successes in a group have some special property, rather than just coincidence (correlation proves causality). For example, if three of the five students with the best college grades went to the same high school, that can lead one to believe that the high school must offer an excellent education. This could be true, but the question cannot be answered without looking at the grades of all the other students from that high school, not just the ones who "survived" the top-five selection process. Another example of a distinct mode of survivorship bias would be thinking that an incident was not as dangerous as it was because everyone you communicate with afterwards survived. Even if you knew that some people are dead, they wouldn't have their voice to add to the conversation, leading to bias in the conversation.

Discussed on

"Survivorship bias – Wikipedia" | 2017-07-11 | 14 Upvotes 1 Comments

🔗 Benford's Law: Fraud Detection
🛈

🔗 Mathematics 🔗 Statistics

Benford's law, also called the Newcomb–Benford law, the law of anomalous numbers, or the first-digit law, is an observation about the frequency distribution of leading digits in many real-life sets of numerical data. The law states that in many naturally occurring collections of numbers, the leading digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time. If the digits were distributed uniformly, they would each occur about 11.1% of the time. Benford's law also makes predictions about the distribution of second digits, third digits, digit combinations, and so on.

The graph to the right shows Benford's law for base 10, one of infinitely many cases of a generalized law regarding numbers expressed in arbitrary (integer) bases, which rules out the possibility that the phenomenon might be an artifact of the base 10 number system. Further generalizations were published by Hill in 1995 including analogous statements for both the nth leading digit as well as the joint distribution of the leading n digits, the latter of which leads to a corollary wherein the significant digits are shown to be a statistically dependent quantity. ).

It has been shown that this result applies to a wide variety of data sets, including electricity bills, street addresses, stock prices, house prices, population numbers, death rates, lengths of rivers, and physical and mathematical constants. Like other general principles about natural data—for example the fact that many data sets are well approximated by a normal distribution—there are illustrative examples and explanations that cover many of the cases where Benford's law applies, though there are many other cases where Benford's law applies that resist a simple explanation. It tends to be most accurate when values are distributed across multiple orders of magnitude, especially if the process generating the numbers is described by a power law (which are common in nature).

The law is named after physicist Frank Benford, who stated it in 1938 in a paper titled "The Law of Anomalous Numbers", although it had been previously stated by Simon Newcomb in 1881.

Discussed on

"Benford's Law: Fraud Detection" | 2020-11-06 | 13 Upvotes 2 Comments

🔗 Edward Tufte
🛈

🔗 Biography 🔗 Mathematics 🔗 Statistics 🔗 Systems 🔗 Biography/science and academia 🔗 Systems/Visualization 🔗 Graphic design

Edward Rolf Tufte (; born March 14, 1942) is an American statistician and professor emeritus of political science, statistics, and computer science at Yale University. He is noted for his writings on information design and as a pioneer in the field of data visualization.

Discussed on

"Edward Tufte" | 2009-12-26 | 12 Upvotes 2 Comments

🔗 Gauss–Markov theorem
🛈

🔗 Russia 🔗 Mathematics 🔗 Statistics 🔗 Russia/science and education in Russia

In statistics, the Gauss–Markov theorem states that the ordinary least squares (OLS) estimator has the lowest sampling variance within the class of linear unbiased estimators, if the errors in the linear regression model are uncorrelated, have equal variances and expectation value of zero. The errors do not need to be normal, nor do they need to be independent and identically distributed (only uncorrelated with mean zero and homoscedastic with finite variance). The requirement that the estimator be unbiased cannot be dropped, since biased estimators exist with lower variance. See, for example, the James–Stein estimator (which also drops linearity) or ridge regression.

The theorem was named after Carl Friedrich Gauss and Andrey Markov, although Gauss' work significantly predates Markov's. But while Gauss derived the result under the assumption of independence and normality, Markov reduced the assumptions to the form stated above. A further generalization to non-spherical errors was given by Alexander Aitken.

🔗 A graph is moral if two nodes that have a common child are married
🛈

🔗 Computing 🔗 Mathematics 🔗 Statistics 🔗 Robotics

In graph theory, a moral graph is used to find the equivalent undirected form of a directed acyclic graph. It is a key step of the junction tree algorithm, used in belief propagation on graphical models.

The moralized counterpart of a directed acyclic graph is formed by adding edges between all pairs of non-adjacent nodes that have a common child, and then making all edges in the graph undirected. Equivalently, a moral graph of a directed acyclic graph G is an undirected graph in which each node of the original G is now connected to its Markov blanket. The name stems from the fact that, in a moral graph, two nodes that have a common child are required to be married by sharing an edge.

Moralization may also be applied to mixed graphs, called in this context "chain graphs". In a chain graph, a connected component of the undirected subgraph is called a chain. Moralization adds an undirected edge between any two vertices that both have outgoing edges to the same chain, and then forgets the orientation of the directed edges of the graph.

🔗 Gompertz Function
🛈

🔗 Mathematics 🔗 Statistics

The Gompertz curve or Gompertz function is a type of mathematical model for a time series, named after Benjamin Gompertz (1779–1865). It is a sigmoid function which describes growth as being slowest at the start and end of a given time period. The right-side or future value asymptote of the function is approached much more gradually by the curve than the left-side or lower valued asymptote. This is in contrast to the simple logistic function in which both asymptotes are approached by the curve symmetrically. It is a special case of the generalised logistic function. The function was originally designed to describe human mortality, but since has been modified to be applied in biology, with regard to detailing populations.

🔗 Accuracy and Precision
🛈

🔗 Mathematics 🔗 Statistics 🔗 Psychology

Accuracy and precision are two measures of observational error. Accuracy is how close a given set of measurements (observations or readings) are to their true value. Precision is how close the measurements are to each other.

The International Organization for Standardization (ISO) defines a related measure: trueness, "the closeness of agreement between the arithmetic mean of a large number of test results and the true or accepted reference value."

While precision is a description of random errors (a measure of statistical variability), accuracy has two different definitions:

More commonly, a description of systematic errors (a measure of statistical bias of a given measure of central tendency, such as the mean). In this definition of "accuracy", the concept is independent of "precision", so a particular set of data can be said to be accurate, precise, both, or neither. This concept corresponds to ISO's trueness.
A combination of both precision and trueness, accounting for the two types of observational error (random and systematic), so that high accuracy requires both high precision and high trueness. This usage corresponds to ISO's definition of accuracy (trueness and precision).

🔗 Zipf's Law
🛈

🔗 Mathematics 🔗 Statistics 🔗 Linguistics 🔗 Linguistics/Applied Linguistics

Zipf's law (, not as in German) is an empirical law formulated using mathematical statistics that refers to the fact that many types of data studied in the physical and social sciences can be approximated with a Zipfian distribution, one of a family of related discrete power law probability distributions. Zipf distribution is related to the zeta distribution, but is not identical.

Zipf's law was originally formulated in terms of quantitative linguistics, stating that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.: the rank-frequency distribution is an inverse relation. For example, in the Brown Corpus of American English text, the word the is the most frequently occurring word, and by itself accounts for nearly 7% of all word occurrences (69,971 out of slightly over 1 million). True to Zipf's Law, the second-place word of accounts for slightly over 3.5% of words (36,411 occurrences), followed by and (28,852). Only 135 vocabulary items are needed to account for half the Brown Corpus.

The law is named after the American linguist George Kingsley Zipf (1902–1950), who popularized it and sought to explain it (Zipf 1935, 1949), though he did not claim to have originated it. The French stenographer Jean-Baptiste Estoup (1868–1950) appears to have noticed the regularity before Zipf. It was also noted in 1913 by German physicist Felix Auerbach (1856–1933).