Alexander Fufaev
My name is Alexander FufaeV and here I write about:

Statistics: Expected Value and Standard Deviation

Table of contents
  1. Exercises with Solutions

The expected value is the theoretically anticipated mean in a measurement. Expected value \( \class{red}{\mu} \) of a random variable \(X\) with \(n\) measurements \(x_1\), \(x_2\), \(x_3\), ..., \(x_n\) and corresponding probabilities \(p_1\), \(p_2\), \(p_3\), ..., \(p_n\) to obtain these values is given by the following formula: $$ \class{red}{\mu} ~=~ x_1 \, p_1 ~+~ x_2 \, p_2 ~+~ ... ~+~ x_n \, p_n $$

In an experiment, we do not know the probabilities, so we use the empirical mean for the calculation of the expected value. Empirical mean \( \class{red}{\bar{x}} \) is the sum of the measurements \(x_1\), \(x_2\), \(x_3\), ..., \(x_n\) divided by the number of measurements \(n\): $$ \class{red}{\bar{x}} ~=~ \frac{x_1 ~+~ x_2 ~+~ ... ~+~ x_n}{n} $$

Variance \( \sigma^2 \) gives the sum of the squared deviations \( (x_1 - \class{red}{\mu})^2 \), \( (x_2 - \class{red}{\mu})^2 \), and so on, from the expected value \( \class{red}{\mu} \): $$ \sigma^2 ~=~ (x_1 - \class{red}{\mu})^2 p_1 ~+~ (x_2 - \class{red}{\mu})^2 p_2 ~+~ ... ~+~ (x_n - \class{red}{\mu})^2 p_n $$

We can calculate the empirical variance \( \sigma_{\text e}^2 \) in an experiment as follows: $$ \sigma_{\text e}^2 ~=~ \frac{(x_1 - \class{red}{\bar{x}})^2 ~+~ (x_2 - \class{red}{\bar{x}})^2 ~+~ ... ~+~ (x_n - \class{red}{\bar{x}})^2}{n-1}$$

The square root of the variance yields the (Empirical) standard deviation \( \sigma \) or \( \sigma_{\text e} \). The standard deviation indicates how much the measurements \(x_1\), \(x_2\), \(x_3\), ..., \(x_n\) deviate on average from the expected value \( \class{red}{\mu} \) or mean \( \class{red}{\bar{x}} \): $$ \sigma ~=~ \sqrt{(x_1 - \class{red}{\mu})^2 p_1 ~+~ (x_2 - \class{red}{\mu})^2 p_2 ~+~ ... ~+~ (x_n - \class{red}{\mu})^2 p_n} $$ $$ \sigma_{\text e} ~=~ \sqrt{\frac{(x_1 - \class{red}{\bar{x}})^2 ~+~ (x_2 - \class{red}{\bar{x}})^2 ~+~ ... ~+~ (x_n - \class{red}{\bar{x}})^2}{n-1}}$$

If \( \class{red}{\bar{x}} \) is the mean and the measurements are normally distributed, then:

  • 68% of all measurements lie within \( \class{red}{\bar{x}} \pm \sigma \).
  • 95.4% of all measurements lie within \( \class{red}{\bar{x}} \pm 2\sigma \).
  • 99.7% of all measurements lie within \( \class{red}{\bar{x}} \pm 3\sigma \).

The standard deviation \( \sigma(\class{red}{\bar{x}}) \) of the mean \( \class{red}{\bar{x}} \) for \(n\) measurements is given by the following formula: $$ \sigma(\class{red}{\bar{x}}) ~=~ \frac{\sigma_{\text e}}{\sqrt{n}} $$

Doubling the accuracy requires quadrupling the number of measurements!

  • For multiplication \( \class{red}{\bar{x}_1} \cdot \class{red}{\bar{x}_2}\) and division \( \frac{\class{red}{\bar{x}_1}}{\class{red}{\bar{x}_2}} \) of two means, their relative errors \(f_1\) and \(f_2\) add up to a total relative error: \(f = f_1 + f_2\).
  • For addition \( \class{red}{\bar{x}_1} + \class{red}{\bar{x}_2} \) and subtraction \( \class{red}{\bar{x}_1} - \class{red}{\bar{x}_2} \) of two means, their absolute errors \( \Delta x_1 \) and \( \Delta x_2 \) add up to a total absolute error: \( \Delta x = \Delta x_1 + \Delta x_2 \).

Exercises with Solutions

Use this formula eBook if you have problems with physics problems.

Exercise #1: Mean and Standard Deviation of a Measurement

10 measurements were taken:

Number \( i \) Measurement \( x_i \)
145.0
245.7
344.6
445.2
545.6
644.5
744.9
845.2
945.8
1044.7
  1. What is the mean \( \class{red}{\bar{x}} \) of the sample?
  2. What is the empirical standard deviation \( s \) of the sample?
  3. How much does the mean deviate from the true value \( x \) with 95% confidence?

Tips:

  1. Use the formula for the mean.
  2. Standard deviation is given by: \[ s ~=~ \sqrt{ \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar x)^2 } \]
  3. Use the so-called (Student's) t-distribution. For your case \( N = 10 \), \( t = 2.30 \). Calculate: \[ x ~=~ \bar{x} \pm \frac{s}{\sqrt N} \, t \]

Solution to Exercise #1.1

Using the formula for the mean: \[ \bar x ~=~ \frac{1}{N} \sum_{i=1}^N x_i \] By substituting the 10 measurements from the table: \[ \bar x ~=~ \frac{1}{10} \cdot (45.0 + 45.7 + 44.6 + 45.2 + 45.6 + 44.5 + 44.9 + 45.2 + 45.8 + 44.7) \] Typed into the calculator, the mean is \( \bar x ~=~ 45.12 \)

Solution to Exercise #1.2

Using the formula for standard deviation from the tips: \[ s ~=~ \sqrt{ \frac{1}{N-1} \sum_{i=1}^N (x_i - \bar x)^2 } \] you find the empirical standard deviation of the sample. Also, use the mean calculated in Exercise #1.1 and the measurements from the table: \[ s ~=~ \sqrt{ \frac{1}{9} \sum_{i=1}^{10} (x_i - 45.12)^2 } \]

Entered into the calculator: \[ s ~=~ 0.463 \]

Solution to Exercise #1.3

To find out how much the mean calculated in Exercise #1.1 deviates from the true value; namely, with a confidence of 95%, use the t-value from the t-distribution, which is appropriate for your sample. That is: \( N = 10 \) and \( P = 95 \)%. You always use the t-distribution when the standard deviation of the population is not known. So, the t-distribution is useful for a sample like in this task.

For your case \( N = 10 \) and \( P = 95 \)% is \( t = 2.30 \). With the formula from the hint: \[ x ~=~ \bar{x} \pm \frac{s}{\sqrt N} \, t \] You find out how much the true value \( x \) deviates from the mean \( \bar{x} \): \[ x ~=~ 45.12 \pm \frac{0.463}{\sqrt{10}} \cdot 2.30 \] So about \( \pm 0.337 \).

Exercise #2: Frequency Distribution - Relative Cumulative Frequency

A sample of 200 capacitors was taken from production to perform a quality control of the capacities \( C_i \). The capacities of the capacitors were measured and divided into class midpoints as shown in the following table.

Class Class Midpoint in \( \text{nF} \) Number of Capacitors
18413
28424
38433
484410
58452
684635
784770
884850
984923
  1. Determine the relative frequencies \( h_i \) in percent.
  2. Determine the relative cumulative frequencies \( H_i \) in percent.

Tips: The relative frequency \( h_i \) indicates what percentage the capacitors of a class midpoint make up of the total sample.

The relative cumulative frequency \( H_i \) is the sum of all relative frequencies up to the \(i\)-th class midpoint.

Solution to Exercise #2.1

The relative frequency \( h_i \) is calculated for a sample of 200 capacitors as follows: \[ h_i ~=~ \frac{\text{Number in a class}}{200} ~\cdot~ 100 \]

For example, for the 1st class: \begin{align} h_1 &~=~ \frac{3}{200} ~\cdot~ 100 \\\\ &~=~ \frac{3}{2} \, \% \\\\ &~=~ 1.5 \, \% \end{align}

If you do the same for each class, you get the following table with relative frequencies:

Class Number of Capacitors Relative Frequency \( h_i \) in %
131.5
242
331.5
4105
521
63517.5
77035
85025
92311.5

Solution to Exercise #2.2

To calculate the relative cumulative frequency \( H_n \), sum all relative frequencies \( h_i \) up to the \(n\)-th class. \[ H_n ~=~ h_1 ~+~ h_2 ~+~...~+~ h_n \]

For example, relative cumulative frequency up to the 3rd class: \begin{align} H_3 &~=~ h_1 + h_2 + h_3 \\\\ &~=~ 2.5\% + 2\% + 2.5\% \\\\ &~=~ 7\% \end{align}

Class Number of Capacitors Relative Cumulative Frequency \( H_n \) in %
132.5
243.5
335
41010
5211
63528.5
77063.5
85088.5
923100