How to interpret chi-square probability plot

Chi square test

In this post, we'll show you how to use the Chi square test can easily carry out. We not only explain to you how you can carry out the chi square test step by step, but also explain important requirements to you. We also provide you with descriptive information Chi square test examplesfor further understanding.

Our learning video provides a visual explanation of the chi square test!

Chi square test explained in simple terms

The chi-square test is a statistical test method that can make statements about the relationship between variables that are scaled either nominally or ordinally. The chi square test is also a type of hypothesis test. By definition, it is often also called Chi square goodness test orChi square independence test because the test checks the relationship between the variables in terms of stochastic independence. The display and calculation are mainly carried out using the crosstab. We will provide you with the exact calculation and specific examples in the following sections.

Calculate Chi square

There is a general formula for calculating the chi-square test and the resulting chi-square value, which looks like this:

The calculation procedure can also be derived directly from the formula. In the first step, you use the row and column sums to calculate the expected frequencies for each cell of your crosstab, so that you can then proceed according to the formula together with the observed frequencies and calculate values ​​for each cell, which you then finally use Chi square value sum up. The calculated chi square value must be tested at the end in the course of the interpretation on a critical value.

Chi square test requirements

The main requirement for performing a chi square test is the appropriate scale level. The variables under consideration should therefore be either nominal or ordinal. In addition, the sample size or the number of cases should be more than 50 people. This is because a cell should not assume an expected frequency less than 5. Otherwise the chi square test would not produce any valid results. The smaller the number of cases, the more likely such a scenario would occur. In such cases it makes more sense to use what is known as Fisher's exact test.

Chi square test example

The procedure just explained will now be made clear on the basis of two different examples. In the first example we work with two nominally scaled variables and a 2 × 2 crosstab and in the second example with two ordinally scaled variables and a 3 × 3 crosstab.

Calculation with nominally scaled variables

We start with the example in which we want to check the relationship between the two nominal variables gender and playing an instrument using the chi-square test. For this we assume that 100 primary school children were interviewed in a survey. The results are then grouped and recorded in a crosstab.

Gender / instrumentplays the instrumentdoesn't play an instrument
male232952
Female381048
6139100

In the last column and the last row, the column and row sums are also recorded, which are absolutely necessary for the first step in calculating the chi-square value. In this first step, the associated expected frequency is calculated for each cell in the crosstab. To do this, you simply have to multiply the column and row total of the cell under consideration and then divide by the number of cases that you have also noted in the bottom right of the table. To illustrate the process, here is the formula for calculating the expected probability:

If you now carry out this calculation for all four cells of your 2 × 2 crosstab, you will get these results:

In the case of a distribution according to the expected frequencies, there would be absolutely no connection between the two variables.

In order to keep track of your values ​​that belong together, note the expected frequencies you have just calculated next to the observed frequencies that have already been entered in the table.

Gender / instrumentplays the instrumentdoesn't play an instrument
male23 | 31,7229 | 20,2852
Female38 | 29,2810 | 18,7248
6139100

Although you cannot make a final statement about the relationship between the variables before the full calculation has been completed, clear deviations can be seen in some places at first glance. These offer at least a first numerical clue that you can possibly prove a connection between the variables here.

Chi square value

In the next step, the formula of the chi square test, which you already got to know, is used. By calculating the expected frequencies for all cells, you already have all the necessary values ​​together. Specifically, the differences between the observed and expected frequency are now calculated for each cell, then squared and then divided by the expected frequency. The results for the individual cells are then used in a final step Chi square value summed up.

As a reminder, here is the formula again:

You now carry out the calculation for each cell individually, which the fraction of the formula specifies. It looks like this for the respective cells:

Now the individual results only have to be added up to get the chi square value.

Chi square = 2.40 + 3.75 + 2.60 + 4.06 = 12.81

How you can use the chi square value to ultimately be able to make statements about the relationship between your variables, you will learn in the chapter on the interpretation of chi square. In any case, you can remember that the calculation of the chi-square value only completes the first part of the chi-square test.

Before diving into the further use of the chi square value and its subsequent interpretation, let's consider a second example for the sake of completeness, since the chi square test can also be carried out on ordinal variables.

Calculation with ordinally scaled variables

In the second example of the chi-square test, we are now working with two variables with ordinal scale levels, which can be represented in a 3 × 3 crosstab. The aim is to use the chi square test to check the relationship between school leaving qualifications and income. When it comes to school-leaving qualifications, a distinction is made between the categories of secondary school leaving certificate, secondary school leaving certificate and (technical) high school diploma. There are three different levels of income: Level I: € 0 - € 30,000, Level II: € 30,001 - € 60,000 and Level III: € 60,001 - € 100,000. In a fictitious survey, people were therefore asked who had one of these school-leaving qualifications and who earned between € 0 and a maximum of € 100,000 per year. In the crosstab, all of this information can then be displayed again as follows:

School leaving certificate / income (year)0€ - 30.000€30.001€ - 60.000€60.001€ - 100.000€
Secondary school leaving certificate39401392
medium maturity
22481787
(Technical) high school diploma116644121
7215474300

As in the first example, you use the column and row sums as well as the total number of cases to calculate the expected frequencies for each individual cell.

If you now transfer all expected frequencies into the table as in the first example, the following situation arises:

School leaving certificate / income (year)0€ - 30.000€30.001€ - 60.000€60.001€ - 100.000€
Secondary school leaving certificate39 | 22,0840 | 47,2313 | 22,7092
medium maturity22 | 20,8848 | 44,6617 | 21,4687
(Technical) high school diploma11 | 29,0466 | 62,1144 | 29,85121
7215474300

Now you carry out the calculation of the formula again on each individual cell, so that at the end you can add everything up to the entire Chi square value and thus your result.

So the chi-square value for your crosstab with the underlying variables school-leaving certificate and income is 37.39.

Using both examples, in the following section you will now learn exactly how to use the chi square value in the course of the chi square test and how to work with the chi square distribution table in this context.

Chi square test interpretation

Now that you have calculated the chi square value for two different examples, you are probably wondering to what extent this should help you to make a statement about the relationship between the variables under consideration. The sign and the height of the chi square value alone are in fact not indicative of its direction or its strength. Therefore, based on the score, you could not claim that gender and playing an instrument with a chi square value of 12.81 are less related than school leaving qualifications and income with a chi square value of 37.39. Therefore, the second part of the chi square test is now used.

Chi square degrees of freedom

With the chi-square test, as with other hypothesis tests (e.g. the t test), a critical value must be read from the chi-square distribution table using the level of significance and the degrees of freedom in order to test the chi-square value on this value. This is the only way to determine whether there is a statistically significant relationship.

For example 1, the relationship between gender and playing an instrument, you have a chi square value of 12.81 calculated. For a calculation of this type, you simply set the significance level to 5%. In the distribution table you therefore have to look up the column with the value 0.950. On the other hand, you have to briefly calculate the degrees of freedom. The size of your crosstab plays a role here, which is known to be 2 × 2 in example 1. The calculation is then very simple following this formula:

= (Number of columns - 1) (Number of lines - 1)

In the case of example 1 you can now for a significance level of 5% and a number of degrees of freedom of look up the relevant critical value in the chi square distribution table.

For example 2, the significance level of 5% also applies to look up the table. The number of degrees of freedom is included according to the formula .

Chi square table

You can now read off the correct critical values ​​for the two examples in the distribution table. Without this step, the chi-square test would not be complete and therefore not meaningful. To read it off, you can now see the relevant excerpt from the chi-square distribution table.

df | Level of significance0,70,750,80,850,90,95
11,071,321,642,072,713,84
22,412,773,223,794,615,99
33,664,114,645,326,257,81
44,885,395,996,747,789,49
56,066,637,298,129,2411,07

Based on your information about the level of significance and the number of degrees of freedom, you can read off a critical value of 3.84 for example 1. The calculated chi square value is significantly higher at 12.81. Therefore, after performing the chi square test, you can claim that there is a statistically significant relationship between the variables gender and playing an instrument.

For example 2, you can also look up the critical value in the table and determine a result of 9.49. Here, too, your calculated chi square value of 37.39 is well above the critical value. With the chi-square test, you can also establish a connection between the variables under consideration in the second example, namely school-leaving qualifications and income.

Summary chi square test

In summary, the chi-square test can make the following statement: is there a statistically significant relationship between two variables or are the variables independent of one another?

However, if you want to make more detailed claims about the relationship between your variables, you have to carry out further calculations such as Phi, Cramer's V or Pearson's contingency coefficient. Namely, they can provide information about the direction or strength of a connection. Unfortunately, the chi-square test cannot do this on its own.