In the case of two identical unimodal von Mises distributions, seven tests did not maintain Type-I error near the nominal 5% level, at least when sample sizes were small. These tests were the Kuiper two-sample test, the non equal concentration parameters approach ANOVA, the P-test, the Watson’s large-sample nonparametric test, the Watson–Williams test and the Rao dispersion test (Fig. 2). The Type-I error results were similar for the unimodal wrapped skew-normal distribution, except that the Wallraff test and Fisher’s method also showed Type-I error inflation (Fig. S1). No other methods showed evidence of failure to control Type-I error rate across different testing situations (Figs. S2–S5), except for the Log-likelihood ratio ANOVA in the case of two identical asymmetrical bimodal distributions (Fig. S3). In summary, only eight out of 18 tests reliably controlled the Type-I error rate near the nominal 5% level across all the situations investigated. These included five tests for identical distribution, the Watson’s U2 test, the Large-sample Mardia–Watson–Wheeler test, the Watson-Wheeler test, the embedding approach ANOVA, the MANOVA approach, the Rao polar test for differences in mean direction, and two tests for differences in concentration, the Levene’s test and the concentration test. We focus only on these tests in our explorations of statistical power.
Power to detect differences in concentration
The most powerful test to detect concentration differences between two von Mises distributions was the MANOVA approach, which offered superior power especially at lower sample sizes (Fig. 3). The Watson’s U2 test was also very powerful, followed by the Watson–Wheeler and the Large-sample Mardia–Watson–Wheeler tests with only marginally lower power. The embedding approach ANOVA had lower power, but, notably, was still more powerful than the Concentration test and Levene’s test, both specifically designed to detect differences in concentration. As expected, the Rao polar test was not sensitive to differences in concentration. The general results for two unimodal wrapped skew-normal distributions were comparable to the results for unimodal von Mises distributions, with the only exception of superior performance of Levene’s test in situations with highly asymmetric samples sizes (Fig. S6).
When comparing axial von Mises distributions, only the Watson’s U2 test offered acceptable power (Fig. S7). For the symmetrical trimodal distributions, overall power was very low, and again, only the Watson’s U2 providing some power (Fig. S8). The asymmetrical bimodal (Fig. S9) situation showed acceptable power of the MANOVA approach and Watson’s U2, however, for the asymmetrical trimodal distribution power was low with the Watson’s U2 providing the best results (Fig. S10).
Power to detect differences in the mean/median
The power to detect angular differences between two von Mises distributions was highest for the MANOVA approach at small sample sizes (n = 10), followed by the Watson’s U2, Watson-Wheeler test and the Large-sample Mardia-Watson-Wheeler test (Fig. 4). Notably, the Levene’s test also showed acceptable power levels, clearly failing to detect specifically concentration differences (to which it was less sensitive, see Fig. 4). The concentration test was not sensitive to the differences in mean direction. Special cases were the embedding approach ANOVA and the Rao polar test. The ANOVA approach showed, with the exception of very unequal sample sizes (n = 10/50), a unimodal response, with increasing power levels from 0° to 90° difference, but then rapidly decreasing power towards 180° difference. The Rao polar test showed an even stranger pattern, with, at higher sample sizes, very good power when the difference was either around 45° or 135°, but with power levels dropping to 0.05 in between these two peaks (at 90°). The results were similar for the wrapped skew-normal distribution, with the exceptions that the Rao polar test showed strongly reduced power and switched from a bimodal to a unimodal power curve with a peak around 60°, and the Levene’s test completely lost its power (Fig. S11).
For axial distributions, only the Watson’s U2 test offered acceptable power levels, although large sample sizes (~ n = 100) were required for the power to reach over 50% (Fig. S12). All other tests failed to detect the difference in mean direction between two axial distributions. For symmetric trimodal distributions none of the tests used was sensitive to differences in mean direction (Fig. S13).
When comparing asymmetrical bimodal distributions, the general trends were similar to the unimodal case. However, over all sample sizes the MANOVA approach offered the best power. The Watson–Wheeler test was considerably less powerful in this situation, as were the Watson’s U2 test and the Large-sample Mardia–Watson–Wheeler test (Fig. S14). The Levene’s test showed a unimodal-shaped power curve. The asymmetrical trimodal situation was, again, similar to the asymmetrical bimodal situation (Fig. S15), with the exception of the Levene’s test, which showed steady power increase with angular difference (instead of the hump-shaped curve).
Power to detect differences in distribution type
When comparing a unimodal and an axial bimodal distribution, which increased similarly in concentration, we found that the MANOVA approach again offered the best power in particular at low samples sizes, followed by the Watson’s U2 test, the Large-sample Mardia–Watson–Wheeler test and Watson–Wheeler test (Fig. 5). While the embedding approach ANOVA and the Levene’s test had varying but usable power levels, the concentration test was only sensitive to such differences at low concentration values. The Rao polar test was not sensitive to such differences.
The picture was only marginally different when comparing a von Mises with a wrapped skew-normal distribution (Fig. S16). For low sample sizes (n = 10) the MANOVA approach offered great power, followed by the embedding approach ANOVA. The latter offered good power throughout the range of sample sizes tested, followed by the Watson’s U2 test, the Large-sample Mardia–Watson–Wheeler test and Levene’s test. Also, the Rao polar test showed lower, but acceptable sensitivity to distribution type. The concentration test only showed very low power, that (as expected) increased with increasing concentrations of the respective distributions.
We summarize the results obtained in the power analysis in Table 2. In all situations, either the Watson’s U2 test or the MANOVA approach offered the best power.
Real data examples
Testing the performance of the robust tests on real data sets revealed, predominantly, the expected test behavior. In the example of homing pigeons where a difference in concentration was expected, all tests, with the exception of the Rao polar test and, notably, the concentration test, showed a significant difference between the distributions (Fig. 6A). Therefore, we can conclude, in accordance with the respective publication19, that sectioning of the olfactory nerve disrupted the homing behavior of pigeons.
In the ant example, where no difference between the groups was expected, there was no significant difference between the distributions detected by most of the tests (Fig. 6B). Only the concentration test showed a significant difference. Based on the other tests we would conclude that there was no biological meaningful difference between the two distributions. Therefore, ants appear to be able to transfer visual information from one eye to the other.
In the bat example, where a difference in mean direction was expected, the Watson’s U2, the Mardia–Watson–Wheeler, Watson-Wheeler test and the MANOVA approach showed a significant difference (Fig. 6C). Notably, the Rao polar, Levene’s, and concentration tests and the embedding approach ANOVA failed to show a significant difference. At least for the Rao polar test, one would have expected a significant difference, as the two distributions are clearly 180° apart. This outcome concurs with our simulation results where the Rao polar test failed to distinguish distributions on the same and orthogonal axes (Fig. 4). As the results of the tests where quite mixed this example highlights the need for choosing a test with appropriate power to detect the expected differences. Based on the results of the most powerful tests, we conclude that the bats showed a mirrored orientation, as expected in the experimental design.
Source: Ecology - nature.com