Statistics: Expected Value and Standard Deviation
Table of contents
The expected value is the theoretically anticipated mean in a measurement. Expected value \( \class{red}{\mu} \) of a random variable \(X\) with \(n\) measurements \(x_1\), \(x_2\), \(x_3\), ..., \(x_n\) and corresponding probabilities \(p_1\), \(p_2\), \(p_3\), ..., \(p_n\) to obtain these values is given by the following formula: $$ \class{red}{\mu} ~=~ x_1 \, p_1 ~+~ x_2 \, p_2 ~+~ ... ~+~ x_n \, p_n $$
In an experiment, we do not know the probabilities, so we use the empirical mean for the calculation of the expected value. Empirical mean \( \class{red}{\bar{x}} \) is the sum of the measurements \(x_1\), \(x_2\), \(x_3\), ..., \(x_n\) divided by the number of measurements \(n\): $$ \class{red}{\bar{x}} ~=~ \frac{x_1 ~+~ x_2 ~+~ ... ~+~ x_n}{n} $$
Variance \( \sigma^2 \) gives the sum of the squared deviations \( (x_1 - \class{red}{\mu})^2 \), \( (x_2 - \class{red}{\mu})^2 \), and so on, from the expected value \( \class{red}{\mu} \): $$ \sigma^2 ~=~ (x_1 - \class{red}{\mu})^2 p_1 ~+~ (x_2 - \class{red}{\mu})^2 p_2 ~+~ ... ~+~ (x_n - \class{red}{\mu})^2 p_n $$
We can calculate the empirical variance \( \sigma_{\text e}^2 \) in an experiment as follows: $$ \sigma_{\text e}^2 ~=~ \frac{(x_1 - \class{red}{\bar{x}})^2 ~+~ (x_2 - \class{red}{\bar{x}})^2 ~+~ ... ~+~ (x_n - \class{red}{\bar{x}})^2}{n-1}$$
The square root of the variance yields the (Empirical) standard deviation \( \sigma \) or \( \sigma_{\text e} \). The standard deviation indicates how much the measurements \(x_1\), \(x_2\), \(x_3\), ..., \(x_n\) deviate on average from the expected value \( \class{red}{\mu} \) or mean \( \class{red}{\bar{x}} \): $$ \sigma ~=~ \sqrt{(x_1 - \class{red}{\mu})^2 p_1 ~+~ (x_2 - \class{red}{\mu})^2 p_2 ~+~ ... ~+~ (x_n - \class{red}{\mu})^2 p_n} $$ $$ \sigma_{\text e} ~=~ \sqrt{\frac{(x_1 - \class{red}{\bar{x}})^2 ~+~ (x_2 - \class{red}{\bar{x}})^2 ~+~ ... ~+~ (x_n - \class{red}{\bar{x}})^2}{n-1}}$$
If \( \class{red}{\bar{x}} \) is the mean and the measurements are normally distributed, then:
- 68% of all measurements lie within \( \class{red}{\bar{x}} \pm \sigma \).
- 95.4% of all measurements lie within \( \class{red}{\bar{x}} \pm 2\sigma \).
- 99.7% of all measurements lie within \( \class{red}{\bar{x}} \pm 3\sigma \).
The standard deviation \( \sigma(\class{red}{\bar{x}}) \) of the mean \( \class{red}{\bar{x}} \) for \(n\) measurements is given by the following formula: $$ \sigma(\class{red}{\bar{x}}) ~=~ \frac{\sigma_{\text e}}{\sqrt{n}} $$
Doubling the accuracy requires quadrupling the number of measurements!
- For multiplication \( \class{red}{\bar{x}_1} \cdot \class{red}{\bar{x}_2}\) and division \( \frac{\class{red}{\bar{x}_1}}{\class{red}{\bar{x}_2}} \) of two means, their relative errors \(f_1\) and \(f_2\) add up to a total relative error: \(f = f_1 + f_2\).
- For addition \( \class{red}{\bar{x}_1} + \class{red}{\bar{x}_2} \) and subtraction \( \class{red}{\bar{x}_1} - \class{red}{\bar{x}_2} \) of two means, their absolute errors \( \Delta x_1 \) and \( \Delta x_2 \) add up to a total absolute error: \( \Delta x = \Delta x_1 + \Delta x_2 \).
Exercises with Solutions
Use this formula eBook if you have problems with physics problems.Exercise #1: Mean and Standard Deviation of a Measurement
10 measurements were taken:
Number \( i \) | Measurement \( x_i \) |
---|---|
1 | 45.0 |
2 | 45.7 |
3 | 44.6 |
4 | 45.2 |
5 | 45.6 |
6 | 44.5 |
7 | 44.9 |
8 | 45.2 |
9 | 45.8 |
10 | 44.7 |
- What is the mean \( \class{red}{\bar{x}} \) of the sample?
- What is the empirical standard deviation \( s \) of the sample?
- How much does the mean deviate from the true value \( x \) with 95% confidence?
Tips:
- Use the formula for the mean.
- Standard deviation is given by: \[ s ~=~ \sqrt{ \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar x)^2 } \]
- Use the so-called (Student's) t-distribution. For your case \( N = 10 \), \( t = 2.30 \). Calculate: \[ x ~=~ \bar{x} \pm \frac{s}{\sqrt N} \, t \]
Solution to Exercise #1.1
Using the formula for the mean: \[ \bar x ~=~ \frac{1}{N} \sum_{i=1}^N x_i \] By substituting the 10 measurements from the table: \[ \bar x ~=~ \frac{1}{10} \cdot (45.0 + 45.7 + 44.6 + 45.2 + 45.6 + 44.5 + 44.9 + 45.2 + 45.8 + 44.7) \] Typed into the calculator, the mean is \( \bar x ~=~ 45.12 \)
Solution to Exercise #1.2
Using the formula for standard deviation from the tips: \[ s ~=~ \sqrt{ \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar x)^2 } \] you find the empirical standard deviation of the sample. Also, use the mean calculated in Exercise #1.1 and the measurements from the table: \[ s ~=~ \sqrt{ \frac{1}{9} \sum_{i=1}^{10} (x_i - 45.12)^2 } \]
Entered into the calculator: \[ s ~=~ 0.463 \]
Solution to Exercise #1.3
To find out how much the mean calculated in Exercise #1.1 deviates from the true value; namely, with a confidence of 95%, use the t-value from the t-distribution, which is appropriate for your sample. That is: \( N = 10 \) and \( P = 95 \)%. You always use the t-distribution when the standard deviation of the population is not known. So, the t-distribution is useful for a sample like in this task.
For your case \( N = 10 \) and \( P = 95 \)% is \( t = 2.30 \). With the formula from the hint: \[ x ~=~ \bar{x} \pm \frac{s}{\sqrt N} \, t \] You find out how much the true value \( x \) deviates from the mean \( \bar{x} \): \[ x ~=~ 45.12 \pm \frac{0.463}{\sqrt{10}} \cdot 2.30 \] So about \( \pm 0.337 \).
Exercise #2: Frequency Distribution - Relative Cumulative Frequency
A sample of 200 capacitors was taken from production to perform a quality control of the capacities \( C_i \). The capacities of the capacitors were measured and divided into class midpoints as shown in the following table.
Class | Class Midpoint in \( \text{nF} \) | Number of Capacitors |
---|---|---|
1 | 841 | 3 |
2 | 842 | 4 |
3 | 843 | 3 |
4 | 844 | 10 |
5 | 845 | 2 |
6 | 846 | 35 |
7 | 847 | 70 |
8 | 848 | 50 |
9 | 849 | 23 |
- Determine the relative frequencies \( h_i \) in percent.
- Determine the relative cumulative frequencies \( H_i \) in percent.
Tips: The relative frequency \( h_i \) indicates what percentage the capacitors of a class midpoint make up of the total sample.
The relative cumulative frequency \( H_i \) is the sum of all relative frequencies up to the \(i\)-th class midpoint.
Solution to Exercise #2.1
The relative frequency \( h_i \) is calculated for a sample of 200 capacitors as follows: \[ h_i ~=~ \frac{\text{Number in a class}}{200} ~\cdot~ 100 \]
For example, for the 1st class: \begin{align} h_1 &~=~ \frac{3}{200} ~\cdot~ 100 \\\\ &~=~ \frac{3}{2} \, \% \\\\ &~=~ 1.5 \, \% \end{align}
If you do the same for each class, you get the following table with relative frequencies:
Class | Number of Capacitors | Relative Frequency \( h_i \) in % |
---|---|---|
1 | 3 | 1.5 |
2 | 4 | 2 |
3 | 3 | 1.5 |
4 | 10 | 5 |
5 | 2 | 1 |
6 | 35 | 17.5 |
7 | 70 | 35 |
8 | 50 | 25 |
9 | 23 | 11.5 |
Solution to Exercise #2.2
To calculate the relative cumulative frequency \( H_n \), sum all relative frequencies \( h_i \) up to the \(n\)-th class. \[ H_n ~=~ h_1 ~+~ h_2 ~+~...~+~ h_n \]
For example, relative cumulative frequency up to the 3rd class: \begin{align} H_3 &~=~ h_1 + h_2 + h_3 \\\\ &~=~ 2.5\% + 2\% + 2.5\% \\\\ &~=~ 7\% \end{align}
Class | Number of Capacitors | Relative Cumulative Frequency \( H_n \) in % |
---|---|---|
1 | 3 | 2.5 |
2 | 4 | 3.5 |
3 | 3 | 5 |
4 | 10 | 10 |
5 | 2 | 11 |
6 | 35 | 28.5 |
7 | 70 | 63.5 |
8 | 50 | 88.5 |
9 | 23 | 100 |