Topic: Statistics (Page 4)

You are looking at all articles with the topic "Statistics". We found 70 matches.

Hint: To view all topics, click here. Too see the most popular topics, click here instead.

🔗 History of the Monte Carlo method
🛈

🔗 Computing 🔗 Computer science 🔗 Mathematics 🔗 Physics 🔗 Statistics

Monte Carlo methods, or Monte Carlo experiments, are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. The underlying concept is to use randomness to solve problems that might be deterministic in principle. They are often used in physical and mathematical problems and are most useful when it is difficult or impossible to use other approaches. Monte Carlo methods are mainly used in three problem classes: optimization, numerical integration, and generating draws from a probability distribution.

In physics-related problems, Monte Carlo methods are useful for simulating systems with many coupled degrees of freedom, such as fluids, disordered materials, strongly coupled solids, and cellular structures (see cellular Potts model, interacting particle systems, McKean–Vlasov processes, kinetic models of gases).

Other examples include modeling phenomena with significant uncertainty in inputs such as the calculation of risk in business and, in mathematics, evaluation of multidimensional definite integrals with complicated boundary conditions. In application to systems engineering problems (space, oil exploration, aircraft design, etc.), Monte Carlo–based predictions of failure, cost overruns and schedule overruns are routinely better than human intuition or alternative "soft" methods.

In principle, Monte Carlo methods can be used to solve any problem having a probabilistic interpretation. By the law of large numbers, integrals described by the expected value of some random variable can be approximated by taking the empirical mean (a.k.a. the sample mean) of independent samples of the variable. When the probability distribution of the variable is parameterized, mathematicians often use a Markov chain Monte Carlo (MCMC) sampler. The central idea is to design a judicious Markov chain model with a prescribed stationary probability distribution. That is, in the limit, the samples being generated by the MCMC method will be samples from the desired (target) distribution. By the ergodic theorem, the stationary distribution is approximated by the empirical measures of the random states of the MCMC sampler.

In other problems, the objective is generating draws from a sequence of probability distributions satisfying a nonlinear evolution equation. These flows of probability distributions can always be interpreted as the distributions of the random states of a Markov process whose transition probabilities depend on the distributions of the current random states (see McKean–Vlasov processes, nonlinear filtering equation). In other instances we are given a flow of probability distributions with an increasing level of sampling complexity (path spaces models with an increasing time horizon, Boltzmann–Gibbs measures associated with decreasing temperature parameters, and many others). These models can also be seen as the evolution of the law of the random states of a nonlinear Markov chain. A natural way to simulate these sophisticated nonlinear Markov processes is to sample multiple copies of the process, replacing in the evolution equation the unknown distributions of the random states by the sampled empirical measures. In contrast with traditional Monte Carlo and MCMC methodologies, these mean-field particle techniques rely on sequential interacting samples. The terminology mean field reflects the fact that each of the samples (a.k.a. particles, individuals, walkers, agents, creatures, or phenotypes) interacts with the empirical measures of the process. When the size of the system tends to infinity, these random empirical measures converge to the deterministic distribution of the random states of the nonlinear Markov chain, so that the statistical interaction between particles vanishes.

Despite its conceptual and algorithmic simplicity, the computational cost associated with a Monte Carlo simulation can be staggeringly high. In general the method requires many samples to get a good approximation, which may incur an arbitrarily large total runtime if the processing time of a single sample is high. Although this is a severe limitation in very complex problems, the embarrassingly parallel nature of the algorithm allows this large cost to be reduced (perhaps to a feasible level) through parallel computing strategies in local processors, clusters, cloud computing, GPU, FPGA, etc.

Discussed on

"History of the Monte Carlo method" | 2022-09-18 | 94 Upvotes 26 Comments

🔗 Herd Immunity
🛈

🔗 Medicine 🔗 Statistics 🔗 Microbiology 🔗 Game theory

Herd immunity (also called herd effect, community immunity, population immunity, or social immunity) is a form of indirect protection from infectious disease that occurs when a large percentage of a population has become immune to an infection, whether through previous infections or vaccination, thereby providing a measure of protection for individuals who are not immune. In a population in which a large proportion of individuals possess immunity, such people being unlikely to contribute to disease transmission, chains of infection are more likely to be disrupted, which either stops or slows the spread of disease. The greater the proportion of immune individuals in a community, the smaller the probability that non-immune individuals will come into contact with an infectious individual, helping to shield non-immune individuals from infection.

Individuals can become immune by recovering from an earlier infection or through vaccination. Some individuals cannot become immune due to medical reasons, such as an immunodeficiency or immunosuppression, and in this group herd immunity is a crucial method of protection. Once a certain threshold has been reached, herd immunity gradually eliminates a disease from a population. This elimination, if achieved worldwide, may result in the permanent reduction in the number of infections to zero, called eradication. Herd immunity created via vaccination contributed to the eventual eradication of smallpox in 1977 and has contributed to the reduction of the frequencies of other diseases. Herd immunity does not apply to all diseases, just those that are contagious, meaning that they can be transmitted from one individual to another. Tetanus, for example, is infectious but not contagious, so herd immunity does not apply.

The term "herd immunity" was first used in 1923. It was recognized as a naturally occurring phenomenon in the 1930s when it was observed that after a significant number of children had become immune to measles, the number of new infections temporarily decreased, including among susceptible children. Mass vaccination to induce herd immunity has since become common and proved successful in preventing the spread of many infectious diseases. Opposition to vaccination has posed a challenge to herd immunity, allowing preventable diseases to persist in or return to communities that have inadequate vaccination rates.

Discussed on

"Herd Immunity" | 2020-03-14 | 46 Upvotes 72 Comments

🔗 Anscombe's Quartet
🛈

🔗 Mathematics 🔗 Statistics

Anscombe's quartet comprises four data sets that have nearly identical simple descriptive statistics, yet have very different distributions and appear very different when graphed. Each dataset consists of eleven (x,y) points. They were constructed in 1973 by the statistician Francis Anscombe to demonstrate both the importance of graphing data when analyzing it, and the effect of outliers and other influential observations on statistical properties. He described the article as being intended to counter the impression among statisticians that "numerical calculations are exact, but graphs are rough."

Discussed on

"Anscombe's Quartet" | 2022-11-05 | 89 Upvotes 10 Comments

🔗 Optimal Stopping
🛈

🔗 Mathematics 🔗 Statistics

In mathematics, the theory of optimal stopping or early stopping is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost. Optimal stopping problems can be found in areas of statistics, economics, and mathematical finance (related to the pricing of American options). A key example of an optimal stopping problem is the secretary problem. Optimal stopping problems can often be written in the form of a Bellman equation, and are therefore often solved using dynamic programming.

Discussed on

"Optimal Stopping" | 2021-04-05 | 77 Upvotes 22 Comments

🔗 Howland Will Forgery Trial
🛈

🔗 Law 🔗 Statistics

The Howland will forgery trial was a U.S. court case in 1868 to decide Henrietta Howland Robinson's contest of the will of Sylvia Ann Howland. It is famous for the forensic use of mathematics by Benjamin Peirce as an expert witness.

Discussed on

"Howland Will Forgery Trial" | 2020-07-31 | 66 Upvotes 21 Comments

🔗 List of probability distributions
🛈

🔗 Lists 🔗 Statistics

Many probability distributions that are important in theory or applications have been given specific names.

Discussed on

"List of probability distributions" | 2013-05-25 | 47 Upvotes 36 Comments

🔗 Alexey Chervonenkis found dead
🛈

🔗 Biography 🔗 Russia 🔗 Statistics 🔗 Biography/science and academia 🔗 Russia/science and education in Russia

Alexey Yakovlevich Chervonenkis (Russian: Алексей Яковлевич Червоненкис; 7 September 1938 – 22 September 2014) was a Soviet and Russian mathematician, and, with Vladimir Vapnik, was one of the main developers of the Vapnik–Chervonenkis theory, also known as the "fundamental theory of learning" an important part of computational learning theory. Chervonenkis held joint appointments with the Russian Academy of Sciences and Royal Holloway, University of London.

Alexey Chervonenkis got lost in Losiny Ostrov National Park on 22 September 2014, and later during a search operation was found dead near Mytishchi, a suburb of Moscow. He had died of hypothermia.

Discussed on

"Alexey Chervonenkis found dead" | 2014-09-24 | 50 Upvotes 24 Comments

🔗 Wikipedia list of algorithms
🛈

🔗 Computing 🔗 Statistics 🔗 Computational Biology

The following is a list of algorithms along with one-line descriptions for each.

Discussed on

"Wikipedia list of algorithms" | 2009-01-30 | 62 Upvotes 12 Comments

🔗 Pólya Urn Model
🛈

🔗 Statistics

In statistics, a Pólya urn model (also known as a Pólya urn scheme or simply as Pólya's urn), named after George Pólya, is a type of statistical model used as an idealized mental exercise framework, unifying many treatments.

In an urn model, objects of real interest (such as atoms, people, cars, etc.) are represented as colored balls in an urn or other container. In the basic Pólya urn model, the urn contains x white and y black balls; one ball is drawn randomly from the urn and its color observed; it is then returned in the urn, and an additional ball of the same color is added to the urn, and the selection process is repeated. Questions of interest are the evolution of the urn population and the sequence of colors of the balls drawn out.

This endows the urn with a self-reinforcing property sometimes expressed as the rich get richer.

Note that in some sense, the Pólya urn model is the "opposite" of the model of sampling without replacement, where every time a particular value is observed, it is less likely to be observed again, whereas in a Pólya urn model, an observed value is more likely to be observed again. In both of these models, the act of measurement has an effect on the outcome of future measurements. (For comparison, when sampling with replacement, observation of a particular value has no effect on how likely it is to observe that value again.) In a Pólya urn model, successive acts of measurement over time have less and less effect on future measurements, whereas in sampling without replacement, the opposite is true: After a certain number of measurements of a particular value, that value will never be seen again.

One of the reasons for interest in this particular rather elaborate urn model (i.e. with duplication and then replacement of each ball drawn) is that it provides an example in which the count (initially x black and y white) of balls in the urn is not concealed, which is able to approximate the correct updating of subjective probabilities appropriate to a different case in which the original urn content is concealed while ordinary sampling with replacement is conducted (without the Pólya ball-duplication). Because of the simple "sampling with replacement" scheme in this second case, the urn content is now static, but this greater simplicity is compensated for by the assumption that the urn content is now unknown to an observer. A Bayesian analysis of the observer's uncertainty about the urn's initial content can be made, using a particular choice of (conjugate) prior distribution. Specifically, suppose that an observer knows that the urn contains only identical balls, each coloured either black or white, but he does not know the absolute number of balls present, nor the proportion that are of each colour. Suppose that he holds prior beliefs about these unknowns: for him the probability distribution of the urn content is well approximated by some prior distribution for the total number of balls in the urn, and a beta prior distribution with parameters (x,y) for the initial proportion of these which are black, this proportion being (for him) considered approximately independent of the total number. Then the process of outcomes of a succession of draws from the urn (with replacement but without the duplication) has approximately the same probability law as does the above Pólya scheme in which the actual urn content was not hidden from him. The approximation error here relates to the fact that an urn containing a known finite number m of balls of course cannot have an exactly beta-distributed unknown proportion of black balls, since the domain of possible values for that proportion are confined to being multiples of $1/m$ , rather than having the full freedom to assume any value in the continuous unit interval, as would an exactly beta distributed proportion. This slightly informal account is provided for reason of motivation, and can be made more mathematically precise.

This basic Pólya urn model has been enriched and generalized in many ways.

Discussed on

"Pólya Urn Model" | 2022-03-18 | 59 Upvotes 3 Comments

🔗 Buffon's Needle Problem
🛈

🔗 Statistics

In mathematics, Buffon's needle problem is a question first posed in the 18th century by Georges-Louis Leclerc, Comte de Buffon:

Suppose we have a floor made of parallel strips of wood, each the same width, and we drop a needle onto the floor. What is the probability that the needle will lie across a line between two strips?

Buffon's needle was the earliest problem in geometric probability to be solved; it can be solved using integral geometry. The solution for the sought probability p, in the case where the needle length l is not greater than the width t of the strips, is

p={\frac {2}{\pi }}{\frac {l}{t}}.

This can be used to design a Monte Carlo method for approximating the number π, although that was not the original motivation for de Buffon's question.

Discussed on

"Buffon's Needle Problem" | 2019-09-28 | 51 Upvotes 7 Comments