STATISTICS
Random variables
A random variable is a function that assigns a numerical value to each possible outcome of a random experiment, capturing important aspects of those outcomes in a simplified form.
Discrete vs. Continuous
• Discrete 离散的: If the random variable has a finite or countably infinite set of values (e.g., count of allergic reactions), it’s discrete.
• Continuous: If it can take any real number in a range, it’s continuous.
Example:
- In a dice game, let X be the outcome of a single roll. Here, X can be any integer from 1 to 6.
- In a study measuring people’s heights, let Y represent the height of a randomly selected person. Here, Y is continuous, as it can take on any value within a range (e.g., 150 cm to 200 cm).
A random variable is usually denoted by a capital letter (e.g., X , Y ), and its specific values are represented by lowercase letters (e.g., X = 3 means the random variable X takes the value 3).
Binomial distribution
Probability Formula for the Binomial Distribution
The probability of having exactly
where:
is the probability of getting exactly k successes. is the binomial coefficient (number of ways to choose k successes from n trials). is the probability of success on each trial. is the probability of failure on each trial. is the number of failures.
example1
An information technology center uses nine aging disk drives for storage. The probability that any one of them is out of service is 0.06. For the center to function properly, at least seven of the drives must be available. What is the probability that the computing center can get its work done?
example2
Kingwest Pharmaceuticals is experimenting with a new affordable AIDS medication, PM-17. Thirty monkeys infected with the HIV complex have been given the drug. Any inexpensive drug capable of being effective 60% of the time would be considered a major breakthrough; medications whose chances of success are 50% or less are not likely to have any commercial potential.
Kingwest hopes to avoid making either of two errors:
(1) rejecting a drug that would ultimately prove to be marketable 拒绝一种最终会被证明可以上市的药物 and (2) spending additional development dollars on a drug whose effectiveness, in the long run, would be 50% or less.花费额外的研发费用购买一种从长远来看疗效只有 50%或更低的药物。
As a tentative “decision rule,” the project manager suggests that unless sixteen or more of the monkeys show improvement, research on PM-17 should be discontinued.
a. What are the chances that the “sixteen or more” rule will cause the company to reject PM-17, even if the drug is 60% effective?
Assume
b. How often will the “sixteen or more” rule allow a 50%-effective drug to be perceived as a major breakthrough? 简单理解就是假突破
example3
The junior mathematics class at Superior High School knows that the probability of making a 600 or greater on the SAT Reasoning Test in Mathematics is 0.231, while the similar probability for the Critical Reading Test is 0.191. 获得优秀数学分数的概率是0.231,获得优秀文学奖的概率是0.191.
Each group will select four students and have them take the respective test. The mathematics students will win the challenge if more of their members exceed 600 on the mathematics test than do the other students on the Critical Reading Test. 每个组选4个人,如果数学优秀的人数大于优秀文学的人数,则数学组赢,否则文学组赢
Assume M denote the number of mathematicians scores of 600 and more, CR denotes the similar number for the Critical Reading Test. It is indepent joint probability:
CR \ M | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
0 | 0.1498 | 0.1800 | 0.0811 | 0.0162 | 0.0012 |
1 | 0.1415 | 0.1700 | 0.0766 | 0.0153 | 0.0012 |
2 | 0.0501 | 0.0602 | 0.0271 | 0.0054 | 0.0004 |
3 | 0.0079 | 0.0095 | 0.0043 | 0.0009 | 0.0001 |
4 | 0.0005 | 0.0006 | 0.0003 | 0.0001 | 0.0000 |
Actually, The sum of the probabilities in the MATH win cells is 0.3775. The mathematics students need to study more probability.
Hypergeometric Distribution
Suppose an urn contains r red chips and w white chips, where
if replacement, 如果有放回 the question becomes Binomial distribution
where:
: The number of ways to choose successes from r items. : The number of ways to choose failures from w items. : The total number of ways to choose n items from N .
example1
A tax collector, finding himself short of funds, delayed depositing a large property tax payment ten different times. The money was subsequently repaid, and the whole amount deposited in the proper account. The tip-off to this behavior was the delay of the deposit (bad behaviour!). During the period of these irregularities, there was a total of 470 tax collections. 一名税务征收员因资金短缺,将一笔大额财产税款的存款延迟了十次。此后该款项已偿还,并全额存入相应账户。这种行为的提示是存款延迟。在此期间,共进行了 470 次税收。
An auditing firm was preparing to do a routine annual audit of these transactions. They decided to randomly sample nineteen of the collections (approximately 4%) of the payments. The auditors would assume a pattern of malfeasance only if they saw three or more irregularities. What is the probability that three or more of the delayed deposits would be chosen in this sample? 一家审计公司准备对这些交易进行例行年度审计。他们决定随机抽查 19 笔收款. 审计人员只有在发现三个或三个以上的违规行为时,才会认为存在渎职行为。在这个样本中选择三笔或更多延迟存款的概率是多少?
Here,
example2
Biting into a plump, juicy apple is one of the innocent pleasures of autumn. Critical to that enjoyment is the firmness硬度 of the apple, a property that growers and shippers monitor closely. The apple industry goes so far as to set a lowest acceptable limit for firmness, which is measured (in lbs) by inserting a probe into the apple. For the Red Delicious variety, for example, firmness is supposed to be at least 12 lbs; in the state of Washington, wholesalers are not allowed to sell apples if more than 10% of their shipment falls below that 12-lb limit.
How can shippers demonstrate that their apples meet the 10% standard? Sampling as the only viable strategy.
Suppose, for example, a shipper has a supply of 144 apples. suppose there are actually 10 defective apples among the original 144, Since
If we use Hypergeometric Distribution for sampling? We can calculate the possibility of Sample passes inspection:
Seems nice? But if the bad apples increase to 30,
Seems worse than before, we have 36% of the time can pass the exam even though the bad standard.
Every sampling plan invariably allows for two kinds of errors : rejecting shipments that should be accepted and accepting shipments that should be rejected. The probabilities of committing these errors can be manipulated by redefining the decision rule and/or changing the sample size. More detail refer to HYPOTHESIS TESTING.
Discrete Random Variables
Definition 3.3.1. Suppose that
for each
Then
Definition 3.3.2. A function whose domain is a sample space
Definition 3.3.3. Associated with every discrete random variable
Note that
example
- Binomial distribution:
- Hypergeometric Distribution:
Consider again the rolling of two dice. Let
Assume
2 | 8 | ||
3 | 9 | ||
4 | 10 | ||
5 | 11 | ||
6 | 12 | ||
7 |
Discrete Cumulative Distribution Function
Definition 3.3.4. Let
example
Suppose that two fair dice are rolled. Let the random variable
(a) Find
(b) Find
Continuous Random Variables
Definition 3.4.1. A probability function
Comment If a probability function
for any set
Conversely, suppose a function
for all . .
Fitting to Data: The Density-Scaled Histogram
We can use a continuous probability function to approximate an integervalued discrete probability model
example
Suppose an electronic surveillance monitor监视器 is turned on briefly at the beginning of every hour and has a 0.905 probability of working properly, regardless of how long it has remained in service.
We can draw the hisgram of
example2
suppose we have reason to believe that these forty y i ’s may be a random sample from a uniform probability function 均匀概率函数 defined over the interval [20, 70]
In Frequency histogram, the sum of the areas is not 1, we can use formula to calculate Density:
For example, integrating积分 that constant over the interval [20,30) would give
Continuous Probability Density Functions (pdf)
Definition 3.4.2. Let
The function
As in the discrete case, the cumulative distribution function (cdf) is defined by
Expected Values
Definition 3.5.1. Let
Similarly, if
Comment We assume that both the sum and the integral in Definition 3.5.1 converge absolutely:
If not, we say that the random variable has no finite expected value. One immediate reason for requiring absolute convergence is that a convergent sum that is not absolutely convergent depends on the order in which the terms are added, and order should obviously not be a consideration when defining an average.
Riemann Series Theorem (Rearrangement Theorem) 黎曼数列定理(重排定理)
黎曼数列定理指出,对于任何有条件收敛的数列,都可以重新排列项,使其和为任何实数,甚至发散。这一性质表明,如果没有绝对收敛,和(或积分)就缺少一个稳定、唯一的值。它可以根据项相加的顺序产生不同的结果。对于定义一个有意义的随机变量的期望值来说,这种对顺序的依赖是有问题的,因为结果应该是唯一的,并且在项的重新排列下是不变的。
- Binomial random variable
with parameters and . Then . - Hypergeometric random variable
with parameters , , and . That is, suppose an urn contains red balls and white balls. A sample of size is drawn simultaneously from the urn. Let be the number of red balls in the sample. Then
example
Among the more common versions of the “numbers” racket is a game called D.J., its name deriving from the fact that the winning ticket is determined from Dow Jones averages.
Let
On the average, then, we lose $1.50 on a $5.00 bet.
example2
Consider the following game. A fair coin is flipped until the first tail appears; we win $2 if it appears on the first toss, $4 if it appears on the second toss, and, in general,
Known as the St. Petersburg paradox, this problem has a rather unusual answer.
First, note that
Therefore,
That is,