This is the second post about statistical testing. In the first one I explained the principal concept behind statistical tests.

In the following I will describe the four steps that I already showed in my introduction especially for the **one-sample t-test**. In the last part I will show how to perform it in R and how to interpret the results.

Contents

### A. One sample t-test

The one sample t-test is used when you have **i.i.d. (identically independently distributed)** data points. Single data points are also often called “sample” but in my opinion this leads to confusion because also the complete set of data points/samples is called **sample**. The sample is drawn from a population with unknown mean \(\mu\) and standard deviation \(\sigma\) (or variance \(\sigma^2\).

The Central Limit Theorem states that the mean \(\mu\) of a sample X can be assumed to be normally distributed with \(n\ge 30\) samples (and it does not matter what the underlying distribution of X is).

The **purpose** of the one sample t-test is to find out whether the mean \(\bar{X}\) of our sample is the same as the true mean \(\mu\) of the population.

#### 1. Formulate \(H_0\)

As I mentioned above we want to test our sample mean \(\bar{X}\) against the true population mean \(\mu\). Which leads to the following hypotheses:

$$H_0: \mu = \bar{X}$$

$$H_1: \mu \ne \bar{X}$$

Instead of \(\bar{X}\) you often see \(\mu_0\) or \(\hat{\mu}\) which represents any reference mean.

#### 2. Identify a test statistic

For the computation of the test statistic we need the sample mean \(\bar{X}\) and the sample variance \(s^2\). I skip these two formulas because most people know how to compute it, but the two links will help you otherwise.

To do it in R just write:

```
mean(X) # computes the mean of a numerical vector
var(X) # computes the sample variance of a numerical vector
```

So, the test statistic is:

$$T = \left|\frac{\bar{X}-\mu}{s}\sqrt{n}\right|$$

You might notice that this is the standardized value of \(\bar{X}\).

#### 3. Compute a p-value

To compute a p-value, which corresponds to the area under the curve that is \(\ge T\) away from 0 (positive and negative direction), we need a statistics program or a table.

The first thing we need to choose is the reference distribution. In the introduction I described the most commonly used distributions in detail. As I mentioned above, the mean of a sample X is normally distributed if we have enough samples. Otherwise it is t distributed. In general the decision rule is \(n \ge 30\).

Secondly we need to know, whether we have one-sided or two-sided t-test. I will not go into detail about this in that post, but you can check out the link I provided.

From how we formulated the hypotheses \(H_0: \mu = \bar{X}\) and \(H_1: \mu \ne \bar{X}\) it should be clear that it is a two-sided test, which is depicted in the plot:

To compute the p-value for a given quantile in R you type:

```
pt(T, df) # t distribution
pnorm(T) # normal distribution
```

You will notice that for the t distribution you require degrees of freedom *df* which is the number of data points in the sample minus 1 (\(n-1\)).

This gives you the area, in \([-\infty;T]\), but we want the area between \([T; \infty]\). Then we can multiply it by 2 to get the two-sided p-value (because the curve is symmetric).

```
(1-pt(T, df))*2 # t distribution
(1-pnorm(T))*2 # normal distribution
```

#### 4. Compare to an appropriate \(\alpha\)-value

In general you use an \(\alpha\)-value of \(0.05\). A little more detail can be found in the introduction.

An \(\alpha\)-value of \(0.05\) is the same as a confidence interval of \(95\%\).

**Reject or not reject?**

- \(p \lt \alpha\): reject
- \(p \ge \alpha\): don’t reject

#### 3. + 4. Directly compare test statistic

We can also directly compare T to the z- or t-score of the reference distribution.

To get this value, we can look it up in a table or use a statistics program like R.

```
qnorm(alpha) # normal distribution
qt(alpha, df) # t distribution
```

What we get is the z-score \(u\) (I do not know why it is mostly called u) or the t-score t.

**Reject or not reject?**

- \(T \gt t\): reject
- \(T \le t\): don’t reject

#### Perform this test in R

Phew. This was a lot of explanations and many variables and a few formulas. Now to the simple part, let’s do this in R!

Let’s say we have a sample with 20 data points, with a true mean of \(\mu=15\) and a variance \(\sigma^2=2\) (Could be temperature or any metric-continuous data). I let three of my friends guess what the mean of the data is.

```
X1<-rnorm(20, 15, 2)
mu1<-15.5 # friend 1
mu2<-12 # friend 2
mu3<-15 # friend 3
t.test(X1, mu = mu1)
t.test(X1, mu = mu2)
t.test(X1, mu = mu3)
```

The output for the first sample looks like this:

```
One Sample t-test
data: X1
t = -0.3139, df = 19, p-value = 0.757
alternative hypothesis: true mean is not equal to 15.5
95 percent confidence interval:
14.47450 16.25801
sample estimates:
mean of x
15.36625
```

The method reports the test statistic (here denoted by lowercase t), the degrees of freedom, the p-value, the confidence interval and mean of the sample. The interpretation is up to you.

The results from the *t.test()* function are:

- p-value: 0.757, t: -0.3139,
*my interpretation: don't reject* - p-value: \(2.016 \times 10^{-7}\), t: 7.9009,
*my interpretation: reject* - p-value: 0.4007, t: 0.8596,
*my interpretation: don't reject*

You might notice a few things:

- The test statistic t can be positive or negative.
- The p-values can get pretty high, but even for the "correct" guess, it is not one.
- Two of the p-values are too big to be rejected.

We might get some more data points and do our t-tests again:

```
X2<-rnorm(1000, 15, 2)
t.test(X2, mu = mu1)
t.test(X2, mu = mu2)
t.test(X2, mu = mu3)
```

The results from the *t.test()* function are:

- p-value: 0.03879, t: -2.22,
*my interpretation: reject* - p-value: \(3.149 \times 10^{-8}\), t: 8.9306,
*my interpretation: reject* - p-value: 0.5381, t: -0.627,
*my interpretation: don't reject*

This time the first guess is also rejected (but not with a very low p-value, which means there is still a \(3.8\%\) that the decision is wrong).

### Thoughts

Comparing the two scenarios we clearly see that more data also helps us to reject although the difference between the true value and the guess is the same. What we should also have learned is that **not rejecting** the null hypothesis does not mean **accepting** and it clearly does not mean that the null hypothesis is true.

This test is important and should be understood because it is e.g. the basis for the paired t-test and it also helps understand other, more complicated statistical tests.