# Statistical analysis and multiple comparison correction for EEG data

## Introduction

The objective of this tutorial is to give an introduction to the statistical analysis of EEG data using different methods to control for the false alarm rate. The tutorial starts with sketching the background of cluster-based permutation tests. Subsequently it is shown how to use FieldTrip to perform statistical analysis (including cluster-based permutation tests) on the time-frequency response to a movement, and to the auditory mismatch negativity. The tutorial makes use of a between-trials (within-subject) design.

In this tutorial we will continue working on the dataset described in the Preprocessing and event-related activity and the Time-frequency analysis of MEG and EEG tutorials. We will repeat some code here to select the trials and preprocess the data. We assume that the preprocessing and the computation of the ERFs/TFRs are already clear to the reader.

This tutorial is not covering group analysis. Look here for that. If you are interested, you can read the other tutorials that cover cluster-based permutation tests on event related fields and on time-frequency data. If you are interested in a more gentle introduction as to how parametric statistical tests can be used with FieldTrip, you can read the Parametric and non-parametric statistics on event-related fields tutorial.

This tutorial contains the hands-on material of the NatMEG workshop. The background is explained in this lecture, which was recorded at the Aston MEG-UK workshop.

Embedded video is disabled due to your privacy settings, you can find the video on YouTube.

## Background

The topic of this tutorial is the statistical analysis of multi-channel EEG data. In cognitive experiments, the data is usually collected in different experimental conditions, and the experimenter wants to know whether there is a difference in the data observed in these conditions. In statistics, a result (for example, a difference among conditions) is statistically significant if it is unlikely given the null hypothesis that there is no difference between the conditions. The “unlikeliness” is evaluated according to a predetermined threshold probability, the alpha level.

An important feature of EEG data is that it has spatio-temporal structure, i.e. the data is sampled at multiple time-points and sensors. The nature of the data influences which kind of statistics is the most suitable for comparing conditions. If the experimenter is interested in a difference in the signal at a certain time-point and sensor, then the more widely used parametric tests are sufficient. If it is not possible to predict where the differences are, then many statistical comparisons are necessary which lead to the multiple comparisons problem (MCP). The MCP arises from the fact that the effect of interest (i.e., a difference between experimental conditions) is evaluated at an extremely large number of (channel,time)-pairs. This number is usually in the order of several thousands. Now, the MCP involves that, due to the large number of statistical comparisons (one per (channel,time)-pair), it is not possible to control the so called family-wise error rate (FWER) by means of the standard statistical procedures that operate at the level of single (channel,time)-pairs. The FWER is the probability, under the hypothesis of no effect, of falsely concluding that there is a difference between the experimental conditions at one or more (channel,time)-pairs. A solution of the MCP requires a procedure that controls the FWER at some critical alpha-level (typically, 0.05 or 0.01). The FWER is also called the false alarm rate.

When parametric statistics are used, one method that addresses this problem is the so-called Bonferroni correction. The idea is if the experimenter is conducting n number of statistical tests then each of the individual tests should be tested under a significance level that is divided by n. The Bonferroni correction was derived from the observation that if n tests are performed with an alpha significance level then the probability that one test comes out significant is smaller than or equal to n times alpha (Boole’s inequality). In order to keep this probability lower, we can use an alpha that is divided by n for each test. However, the correction comes at the cost of increasing the probability of false negatives, i.e. the test does not have enough power to reveal differences among conditions.

In contrast to the familiar parametric statistical framework, it is straightforward to solve the MCP in the nonparametric framework. Nonparametric tests offer more freedom to the experimenter regarding which test statistics are used for comparing conditions, and help to maximize the sensitivity to the expected effect. For more details see the publication by Maris and Oostenveld (2007).

## Procedure for ERPs

### Before we begin

We will clear all variables that we have in the workspace, restore the default path, add fieldtrip and run ft_defaults

clear variables
restoredefaultpath

ft_defaults


### Load the ERP data and preprocess

load cleaned_data_ERP.mat


First, we load the data that we created in the ERP tutorial

Note the naming convention used - each saved .mat-file contains one and only one variable, which has the same name as the .mat-file
This makes it clear what variables are loaded into the workspace.

We then apply the same preprocessing as before.

cfg                = [];
cfg.lpfilter       = 'yes';
cfg.lpfreq         = 30;
cfg.demean         = 'yes'; % we demean (baseline correct) ...
cfg.detrend        = 'yes'; % removing linear trends
cfg.baselinewindow = [-Inf 0];% using the mean activity in this window

data_EEG_filt = ft_preprocessing(cfg, cleaned_data_ERP);


#### The Student’s t-test

A ubiquitous test used to assess statistical significance is the Student’s t-test Wikipedia entry original article.
To perform the t-test, t-values need to be calculated, which a bit simplified are: \frac{µ}{SEM}, where µ is the mean difference between the conditions and SEM is the standard deviation divided by \sqrt{n}, where n is the number of observations.

We will do a within-subject between-trials statistical test. We proceed by computing the statistical test, which returns the t-value, the probability and a binary mask that contains a 0 for all data points where the probability is below the a-priori threshold, and 1 where it is above the threshold. The cfg.design field specifies the condition in which each of the trials is observed. For the indepsamplesT statistic, it should contain 1’s and 2’s.

cfg           = [];

cfg.method    = 'analytic'; % using a parametric test
cfg.statistic = 'ft_statfun_indepsamplesT'; % using independent samples
cfg.correctm  = 'no'; % no multiple comparisons correction
cfg.alpha     = 0.05;

cfg.design    = data_EEG_filt.trialinfo; % indicating which trials belong ...
% to what category
cfg.ivar      = 1; % indicating that the independent variable is found in ...
% first row of cfg.design

stat_t = ft_timelockstatistics(cfg, data_EEG_filt);


stat_t contains:

stat_t =

stat: [128x200 double]
df: 572
critval: [-1.9641 1.9641]
prob: [128x200 double]
dimord: 'chan_time'
elec: [1x1 struct]
label: {128x1 cell}
time: [1x200 double]
cfg: [1x1 struct]

• stat contains the t-values at all channels and time points
• df is the degrees of freedom determining the t-distribution that the t-value is compared against to obtain the p-values
• critval contains the critical values for the test performed; if a t-value is lesser than the negative value or greater than the positive value, the difference is declared significant. The critval is dependent on cfg.alpha and the degrees of freedom (df)
• prob contains the p-values assciated with the t-values given the degrees of freedom (df)
• mask is a logical matrix, 0’s are where p-values (prob) are greater than cfg.alpha, 1’s are where they are lesser than cfg.alpha
• dimord indicates the ordering of dimensions, rows are channels and columns are time
• elec contains information about the electrodes, e.g., positions and names
• label contains the names of all channels
• time is a row vector with the time points in seconds
• cfg shows the cfg that gave rise to this structure

We can now plot the ERPs using the field cfg.maskparameter of the plotting functions: ft_multiplotER, ft_singleplotER and ft_topoplotER

Here, we will show ft_singleplotER

ERP_standard.mask = stat_t.mask; % adding mask to ERP

figure

cfg               = [];
cfg.layout        = 'natmeg_customized_eeg1005.lay';
cfg.channel       = 'EEG124';
cfg.ylim          = [-5e-6 5e-6]; % Volts

ft_singleplotER(cfg, ERP_standard, ERP_deviant);
hold on
xlabel('Time (s)')
ylabel('Electric Potential (V)')
plot([ERP_standard.time(1), ERP_standard.time(end)], [0 0], 'k--') % hor. line
plot([0 0], cfg.ylim, 'k--') % vert. l
axes = gca;
legend(axes.Children([10 3]), {'Standard', 'Deviant'})

print -dpng singleplot_t_uncorrected.png


Figure 1: Single channel plot - no correction

Note that we see the MMN difference (~120-200 ms), but we also see other differences and even a pre-zero one. We cannot decide which are true positives and which are false positives. In fact, we know that there will be a lot false positives (assuming that the data were from identical distributions, i.e. the null hypothesis is true, we would expect that 5% of our significant differences are false positives)

#### Bonferroni correction

We will now control the false positives by using the Bonferroni Correction Wikipedia article

cfg           = [];

cfg.method    = 'analytic'; % using a parametric test
cfg.statistic = 'ft_statfun_indepsamplesT'; % using independent samples
cfg.correctm  = 'bonferroni'; % correction method
cfg.alpha     = 0.05;

cfg.design    = data_EEG_filt.trialinfo; % indicating which trials belong ...
% to what category
cfg.ivar      = 1; % indicating that the independent variable is found in ...
% first row of cfg.design

stat_t_bonferroni = ft_timelockstatistics(cfg, data_EEG_filt);


Figure 2: Single channel plot - Bonferroni correction

The Bonferroni correction has eliminated some likely false positives, but probably at the expense of introducing some false negatives. According to our image, only the peak of our MMN is significant. But EEG responses are not peaky in nature, they wax and wane smoothly in time and are smeared out over several electrodes. The Bonferroni correction implicitly assumes that EEG responses are uncorrelated, which they are patently not. Our next correction, the cluster correction addresses the issue of correlation.

#### Cluster-based correction for multiple comparisons

As noted above, EEG data is smooth over the spatio-temporal dimensions. We can easily imaging how we can build clusters in the temporal dimension. These are simply data points that are neighbouring each other in time. For the spatial dimension, it is necessary to build a neighbour structure using ft_prepare_neighbours. This uses the (digitized) positions of the electrodes.

##### Neighbors

Sometimes we use the word ‘neighbour’ according to the British spelling and sometimes we use it as neighbor according to the American spelling.

cleaned_data_ERP.elec = elec; % add the elec structure

cfg          = [];
cfg.method   = 'triangulation';
cfg.feedback = 'yes'; % visualizes the neighbors

neighbors = ft_prepare_neighbours(cfg, cleaned_data_ERP);

print -dpng neighbor_structure.png


Figure 3: Electrode neighbor structure

##### Permutation

The cluster correction is not meaningful for parametric statistics, e.g., t-tests, therefore we are going to use a non-parameteric test.
We will first run it and then discuss some of the options and details afterwards.

cfg                  = [];
cfg.method           = 'montecarlo'; % use montecarlo to permute the data
cfg.statistic        = 'ft_statfun_indepsamplesT'; % function to use when ...
% calculating the ...
% parametric t-values

cfg.correctm         = 'cluster'; % the correction to use
cfg.clusteralpha     = 0.05; % the alpha level used to determine whether or ...
% not a channel/time pair can be included in a ...
% cluster
cfg.alpha            = 0.025; % corresponds to an alpha level of 0.05, since ...
% two tests are made ...
% (negative and positive: 2*0.025=0.05)
cfg.numrandomization = 100;  % number of permutations run

cfg.design           = data_EEG_filt.trialinfo; % same design as before
cfg.ivar             = 1; % indicating that the independent variable is found in ...
% first row of cfg.design
cfg.neighbours       = neighbors; % the spatial structure
cfg.minnbchan        = 2; % minimum number of channels required to form a cluster

stat_t_cluster = ft_timelockstatistics(cfg, data_EEG_filt);


The output of stat_t_cluster is:

stat_t_cluster =

prob: [128x200 double]
posclusters: [1x11 struct]
posclusterslabelmat: [128x200 double]
posdistribution: [1x100 double]
negclusters: [1x14 struct]
negclusterslabelmat: [128x200 double]
negdistribution: [1x100 double]
cirange: [128x200 double]
stat: [128x200 double]
ref: [128x200 double]
dimord: 'chan_time'
label: {128x1 cell}
time: [1x200 double]
cfg: [1x1 struct]

• prob contains the p-values for each channel/time pair inherited from the cluster that pair is part of
• posclusters contains a structure for each positive cluster found that contains
• prob contains the p-value associated with the cluster
• clusterstat contains the T-value associated with this cluster
• stddev contains the standard deviation of the prob of this cluster
• cirange contains the 95% confidence interval for the prob of thus cluster - if this range include cfg.alpha a warning is issued recommending increasing the number of permutations
• posclusterslabelmat contains a matrix with a number indicating which positive cluster each channel/time pair belongs (0 if it doesn’t belong to
• posdistribution contains a row vector containing the T-value for each of the permutations
• negclusters contains the negative equivalent of posclusters
• negclusterslabelmat contains the negative equivalent of negclusterslabelmat
• negdistribution contains the negative equivalent of posdistribution
• cirange contains the 95% confidence interval for the prob
• mask is a logical matrix, 0’s are where p-values (prob) are greater than cfg.alpha, 1’s are where they are lesser than cfg.alpha
• stat contains the t-values (the first step)
• ref contains the mean value of the t-value over all of the permutations
• dimord indicates the ordering of dimensions, rows are channels and columns are time
• label contains the names of all channels
• time is a row vector with the time points in seconds
• cfg shows the cfg that gave rise to this structure

There’s quite a lot to unpack here. It is critical to distinguish between t-values and T-values here. We will now state the procedure step by step.

1. Do a t-test similar to above (stat_t) - these provide the t-values in stat_t_cluster.stat
2. Find the T-values for each cluster of t-values that pass the analytic significance test based on cfg.clusteralpha. The T-value for a cluster is the sum of all the t-values in that cluster
3. Permute the condition labels (cfg.design) as many times as set in cfg.numrandomization; then compute the t-values (as in step 1 above), and compute the T-values for each of the clusters that pass the analytic significance test based on cfg.clusteralpha (as in step 2 above).
4. For each of the cfg.numrandomization permutations, retrieve the maximum T-value, and create a permutation based distribution of T-values
5. Observe the likelihood of the maximum T-value under the permuted distribution for the positive clusters direction and evaluate at cfg.alpha
6. Repeat step 5 for the negative clusters

Below follows some figures and operations illustrating key features of these steps - (the code for these plots in the Appendix below)

###### Equivalence of the t-values (step 1)
>> isequal(stat_t.stat, stat_t_cluster.stat)

ans =

1


Figure 4: Equivalence of t-values

###### Cluster T-values (step 2)

Figure 5: The T-values of each of the clusters

###### Maximum positive and negative cluster T-values compared to the permuted distributions (steps 3 and 4)

Figure 6: The observed positive and negative T-values compared to the permuted distributions

###### Compare against cfg.alpha (steps 5 and 6)

Since both the positive and negative p-values are lesser than cfg.alpha (0.025), we reject the null hypothesis for both the positive and negative directions.
Put informally: our way of labelling the conditions does matter

Let’s have a look at the cluster corrected channel:

Figure 7: Single channel plot - Cluster correction

And here the three tested corrections are side by side

Figure 8: Single channel plot - corrections side by side

We can also do topographical plots.
Here are the three side by side at 168 ms

n_plots = 3;
figure('units', 'normalized', 'outerposition', [0 0 0.5 0.5]);
stats = {stat_t stat_t_bonferroni stat_t_cluster};

for plot_index = 1:n_plots

stat = stats{plot_index};
subplot(1, 3, plot_index)

cfg               = [];
cfg.layout        = 'natmeg_customized_eeg1005.lay';
cfg.xlim          = [0.168 0.168];
cfg.zlim          = [-4.5e-6 4.5e-6]; % Volts
if plot_index > 1;
cfg.comment   = 'no';
end

ft_topoplotER(cfg, difference_wave)

title(['Correction: ' stat.cfg.correctm])

end

print -dpng difference_wave_topoplots.png


Figure 9: Difference wave (MMN) topographical plots

And here’s the difference between the normal t-mask and the t-cluster-mask. Note it is that not big.

Figure 10: Difference wave showing the difference in masks coming from stat_t and stat_t_cluster

Do note that we, in EEG, most of the time do not make the inference at the level of the individual subject, but it is relevant to do so in for example diagnostic measurements.

We will now do a quick example of applying this to time-frequency data (TFR).

## Procedure for TFRs

Here, we’ll just quickly show how to do within-subject statistics on TFRs

First, we’ll load the data, both the ones with the trials and the ones with the average of the trials. We append the two trial data structures to one another using ft_appendfreq and we calculate the difference between the two averages using ft_math.

load tfr_left_trials.mat

cfg = [];

tfr = ft_appendfreq(cfg, tfr_right_trials, tfr_left_trials);

cfg = [];
cfg.parameter    = 'powspctrm';
cfg.operation    = '(x1-x2) / (x1+x2)';

tfr_difference = ft_math(cfg, tfr_right, tfr_left);


We can see the sizes and ordering of dimensions using the in-built function size and checking the dimord

>> size(tfr.powspctrm)

ans =

110   128    20    26

>> tfr.dimord

ans =

rpt_chan_freq_time


meaning that we have 110 trials on 128 channels at 20 frequencies and at 26 time points.
For the difference between the averages, we have:

>> size(tfr_difference.powspctrm)

ans =

128    20    26

>> tfr_difference.dimord

ans =

chan_freq_time


meaning that we have a difference between averages on 128 channels at 20 frequencies and at 26 time points.

### Applying the tests

Note that we here apply the tests on the frequency range between 15 Hz and 30 Hz (the beta band range) and on the time interval between 400 ms and 1,000 ms (the time range of the beta rebound). This is set using cfg.frequency and cfg.latency. We do this because we have a priori knowledge that it is around here we should observe our beta rebound.

#### t-test with no correction

cfg           = [];
cfg.method    = 'analytic'; % using a parametric test
cfg.statistic = 'ft_statfun_indepsamplesT'; % using independent samples
cfg.correctm  = 'no'; % no multiple comparisons correction
cfg.alpha     = 0.05;
cfg.frequency = [15 30];
cfg.latency   = [0.400 1.000];

cfg.design    = zeros(1, length(tfr.trialinfo));
cfg.design(tfr.trialinfo == 256)  = 1; % indicating which trials belong ...
cfg.design(tfr.trialinfo == 4096) = 2; % to what category

cfg.ivar      = 1; % indicating that the independent variable is found in ...
% first row of cfg.design

stat_t_freq = ft_freqstatistics(cfg, tfr);


#### t-test with Bonferroni correction

cfg           = [];
cfg.method    = 'analytic'; % using a parametric test
cfg.statistic = 'ft_statfun_indepsamplesT'; % using independent samples
cfg.correctm  = 'bonferroni'; % no multiple comparisons correction
cfg.alpha     = 0.05;
cfg.frequency = [15 30];
cfg.latency   = [0.400 1.000];

cfg.design    = zeros(1, length(tfr.trialinfo));
cfg.design(tfr.trialinfo == 256)  = 1; % indicating which trials belong ...
cfg.design(tfr.trialinfo == 4096) = 2; % to what category

cfg.ivar      = 1; % indicating that the independent variable is found in ...
% first row of cfg.design

stat_t_bonferroni_freq = ft_freqstatistics(cfg, tfr);


#### Permutation test with cluster correction

cfg                  = [];
cfg.method           = 'montecarlo'; % use montecarlo to permute the data
cfg.statistic        = 'ft_statfun_indepsamplesT'; % function to use when ...
% calculating the ...
% parametric t-values
cfg.alpha            = 0.025; % corresponds to an alpha level of 0.05, since ...
% two tests are made ...
% (negative and positive: 2*0.025=0.05)
cfg.frequency = [15 30];
cfg.latency   = [0.400 1.000];

cfg.correctm         = 'cluster'; % the correction to use
cfg.clusteralpha     = 0.05; % the alpha level used to determine whether or ...
% not a channel/time pair can be included in a ...
% cluster
cfg.clustertail      = 0; % two-way t-test
cfg.clusterstatistic = 'maxsum';

cfg.numrandomization = 1000;  % number of permutations run

cfg.design    = zeros(1, length(tfr.trialinfo));
cfg.design(tfr.trialinfo == 256)  = 1; % indicating which trials belong ...
cfg.design(tfr.trialinfo == 4096) = 2; % to what category
cfg.ivar             = 1; % indicating that the independent variable is found in ...
% first row of cfg.design
cfg.neighbours       = neighbors; % the spatial structire
cfg.minnbchan        = 2; % minimum number of channels required to form a cluster

stat_t_cluster_freq = ft_freqstatistics(cfg, tfr);


### Plotting the test results

stats = {stat_t_freq stat_t_bonferroni_freq stat_t_cluster_freq};
n_tests = length(stats);
h = figure;
for test_index = 1:n_tests

subplot(1, 3, test_index)
stat = stats{test_index};

cfg = [];
cfg.frequency = [stat.freq(1) stat.freq(end)];
cfg.latency   = [stat.time(1) stat.time(end)];

this_tfr = ft_selectdata(cfg, tfr_difference);

cfg = [];
cfg.layout = 'natmeg_customized_eeg1005.lay';
cfg.parameter = 'powspctrm';

ft_multiplotTFR(cfg, this_tfr);
c = colorbar('location', 'southoutside');
c.Label.String = 'Power ratio (right over left)';
title(['Correction: ' stat.cfg.correctm]);

end

set(h, 'units', 'normalized', 'outerposition', [0 0 1 1])


Figure 11: Three multiplots showing the differences between the three tests/corrections

Note how the cluster correction seems to catch the clusters that “catch” our eyes, while not showing the many non-clustered values that the non-corrected t-test showed. Also note that the Bonferroni correction removed all significant effects, showing that it is too conservative to apply to TFR data.

## Extending the frequency and time ranges of the tests

Do the three tests again without setting cfg.frequency and cfg.latency (you can comment them out)
Compare with the plot below - why may it be important to use one’s prior knowledge?

Figure 12: Testing on all frequencies and all latencies Three multiplots showing the differences between the three tests/corrections. Note what this means for the cluster corrected test

## Appendix - code snippets for producing images

%% INDICATE EQUIVALENCE (FIG. 4)
figure
subplot(1, 2, 1)
plot(stat_t.time, stat_t.stat(1, :)) % plot t-values for first EEG
title('Stat field for "stat_t"', 'interpreter', 'none')
xlim([stat_t.time(1) stat_t.time(end)])
xlabel('Time (s)')
ylabel('t');

subplot(1, 2, 2)
plot(stat_t_cluster.time, stat_t_cluster.stat(1, :)) % repeat
title('Stat field for "stat_t_cluster"', 'interpreter', 'none')
xlim([stat_t_cluster.time(1) stat_t_cluster.time(end)])
xlabel('Time (s)')
ylabel('t-value');

print -dpng stat_equivalence.png

%% PLOT CLUSTER T-VALUES (FIG. 5)
figure
hold on

plot([stat_t_cluster.posclusters.clusterstat], 'ro')
plot([stat_t_cluster.negclusters.clusterstat], 'bo')
xlabel('Cluster index')
ylabel('T-value')
ylim([-9000 9000])
title('T-values per cluster')
legend({'Positive clusters' 'Negative clusters'})

print -dpng cluster_T_values.png

%% PLOT DISTRIBUTIONS (FIG. 6)

figure('units', 'normalized', 'outerposition', [0 0 0.35 0.5]);
hold on

positive_p = stat_t_cluster.posclusters(1).prob;
negative_p = stat_t_cluster.negclusters(1).prob;

positive_T = stat_t_cluster.posclusters(1).clusterstat;
negative_T = stat_t_cluster.negclusters(1).clusterstat;

positive_text = sprintf(['Observed T\np = ' num2str(round(positive_p, 4))]);
negative_text = sprintf(['Observed T\np = ' num2str(round(negative_p, 4))]);

subplot(1, 2, 1)
histogram(stat_t_cluster.posdistribution, 'facecolor', 'r')
xlabel('Permuted T-values')
ylabel('Observations (#)')
title('Permutation Distribution - Positive')
xlim([0 positive_T + 2500])
ylim([0 60])
arrow([positive_T, 10], [positive_T, 0])
text(positive_T - 2000 , 12, positive_text)

subplot(1, 2, 2)
histogram(stat_t_cluster.negdistribution, 'facecolor', 'b')
xlabel('Permuted T-values')
ylabel('Observations (#)')
title('Permutation Distribution - Negative')
xlim([negative_T - 1000 0])
ylim([0 60])
arrow([negative_T, 10], [negative_T, 0])
text(negative_T - 500 , 12, negative_text)

print -dpng permutation_distributions.png