Deep learning-assisted comparative analysis of animal trajectories with DeepHL

DeepHL system architecture

The DeepHL system consists of three server computers. The first one is a web server that receives a trajectory data file from a user and provides analysis results to the user (Intel Xeon E5-2620 v4, 16 cores, 32 GB RAM, Ubuntu 14.04). The second one is a storage server that stores data files and analysis results. The third one is a GPU server that analyzes data provided by the user (Intel Xeon E5-2620 v4, 32 cores, 512 GB RAM, four NVIDIA Quadro P6000, Ubuntu 14.04). Supplementary Information, Algorithm, provides a complete description of the DeepHL method. DeepHL is accessible on the Internet through http://www-mmde.ist.osaka-u.ac.jp/maekawa/deephl/. Supplementary Information, User guide to DeepHL, provides a user guide to DeepHL. In addition, Supplementary Information, Usage of Python-based Software, and Supplementary Software 1 present the Python code of DeepHL.

Preprocessing

An input trajectory is a series of timestamps and X/Y coordinates associated with a class label. To perform position- and rotation-independent analysis, we convert the series into time series of speed and relative angular speed and then standardize them (Supplementary Information, Algorithm). Note that the absolute coordinates of wild animals, which can relate to the distance from a nest or feeding location, for example, are important in understanding behavior of the animals. Hence, DeepHL allows the original coordinates to be input to DeepHL-Net along with the speed and relative angular speed. In addition, other biological time-series sensor data measured by the user can be fed into DeepHL-Net when these time-series data are included in a data file uploaded by the user. For example, a time series of the heading direction of animals obtained from digital compasses can be useful for behavior understanding. Moreover, primitive features usually used in trajectory analysis can be easily fed into DeepHL-Net. DeepHL automatically computes the travel distance from the initial position, the straight-line distance from the initial position, and the angle from the initial position (Supplementary Table 1) as primitive features. Using the web interface of DeepHL, the user can easily select primitive features and other sensor data to be fed into DeepHL-Net (Supplementary Information, User guide to DeepHL). See Supplementary Information, Effect of input features, for effects of input features on classification accuracy. Normally, the inputs of DeepHL-Net are two-dimensional time series, that is, speed and relative angular speed. When we input an additional time series (such as the original coordinates) into DeepHL-Net, the additional time series are added as additional dimensions of the inputs.

Multi-scale layer-wise attention model (DeepHL-Net)

Here, we explain DeepHL-Net shown in Fig. 2f in detail. The input of the model is a time series of primitive features, that is, an l_MAX × N_f matrix, where l_MAX is the maximum length of the input trajectories and N_f is the dimensionality of the time series, that is, the number of the primitive features. Because the lengths of observed trajectories are not identical to each other in many cases, we fill in missing elements in the matrix with −1.0 and mask them when we train DeepHL-Net. In each 1D convolutional layer of the convolutional stacks, we extract features by convolving input features through the time dimension using a filter with a width (kernel size) of F_t. We use different filter widths in the four convolutional stacks (3%, 6%, 9%, and 12% of l_MAX) to extract features at different levels of scale. We use a stride (step size) of one sample in terms of the time axis. We also use padding to allow the outputs of a layer to have the same length as the layer inputs. In addition, to reduce an overfitting, we employ a dropout, which is a simple regularization technique in which randomly selected neurons are dropped during training⁴⁴. The dropout rate used in this study is 0.5.

In each LSTM layer of the LSTM stacks, we extract features considering the long-term dependencies of the input features. LSTM is a recurrent neural network architecture with memory cells, and it permits us to learn temporal relationships over a long time scale. LSTM learns long-term dependencies by employing memory cells that hold past information, updating the cell state using write, read, and reset operations with input, output, and forget gates (see Supplementary Information, Algorithm). In addition, we employ dropout to reduce overfitting. The attention information of each layer is computed by using Eq. (1), and then it is multiplied by the layer output. Here, the softmax and tanh functions in Eq. (1) are defined as follows:

$$,{text{softmax}},({x}_{j})=frac{exp ({x}_{j})}{{sum }_{i}exp ({x}_{i})},$$

(2)

$$tanh ({x}_{j})=frac{exp ({x}_{j})-exp (-{x}_{j})}{exp ({x}_{j})+exp (-{x}_{j})}.$$

(3)

Note that parameters in Eq. (1) for each layer, that is, W_a and b_a, as well as parameters in the convolutional and LSTM layers are estimated during the network training phase. Here, we introduced the tanh activation function into Eq. (1) to smooth out the output attention values. When an outlying large value is included in W_aZ^T + b_a at time t, attention values other than time t become extremely small without using the tanh function. When we visualize a trajectory using such attention values, only a single data point is colored in red, making it difficult for a user to identify important segments.

Training and testing of DeepHL-Net

The DeepHL user can select the parameters of DeepHL-Net used in the analysis, that is, the number of convolutional/LSTM layers and the number of neurons in each layer (default: four layers with 16 neurons). Then, DeepHL-Net is trained on 80% of randomly selected trajectories to minimize the binary classification error of the training data, employing backpropagation based on Adam⁴⁵ (Supplementary Information, Algorithm). (Note that each trajectory has a class label for binary classification.) Then, the trained DeepHL-Net is tested using the remaining 20% of trajectories to compute the classification accuracy, providing an indication of the degree of difference between the two classes.

Computing the score of each layer

To screen the layers in DeepHL-Net, we compute a score for each layer according to Eq. (4)

$$s({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})={s}_{mathrm{fc}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})+{s}_{mathrm{it}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}}).$$

(4)

Here, ({A}_{i,{C}_{mathrm{A}}}) is a set of attention vectors calculated from trajectories belonging to class A using the ith layer. In addition, ({A}_{i,{C}_{mathrm{B}}}) is a set of attention vectors calculated from trajectories belonging to class B using the ith layer. As mentioned in the main text, an attention vector from a discriminator layer should have large values within limited segments. Therefore, ({s}_{mathrm{fc}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})) in Eq. (4) calculates the averaged variance of the attention values normalized by the average length of the trajectories, as described in Eq. (5). When the layer focuses on a part of a trajectory, the variance increases

$${s}_{mathrm{fc}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})=sqrt{frac{1}{| {A}_{i,{C}_{mathrm{A}}}cup {A}_{i,{C}_{mathrm{B}}}| cdot l({A}_{i,{C}_{mathrm{A}}}cup {A}_{i,{C}_{mathrm{B}}})}sum _{{bf{a}}in {A}_{i,{C}_{mathrm{A}}}cup {A}_{i,{C}_{mathrm{B}}}}V({bf{a}})}.$$

(5)

Note that V(⋅) calculates the variance and l(⋅) calculates the average length of the trajectories. We take the square root of the average variance to derive the average standard deviation. Using (l({A}_{i,{C}_{mathrm{A}}}cup {A}_{i,{C}_{mathrm{B}}})), which calculates the average length of ({A}_{i,{C}_{mathrm{A}}}cup {A}_{i,{C}_{mathrm{B}}}), we normalize the computed variance. Because the softmax function in Eq. (1) ensures that all values sum to 1, resulting in a larger variance for longer trajectories, we normalize the average variance using the average length.

In addition, as mentioned in the main text, the distribution of attention values by the layer for one class should be different from that for another class. Therefore, ({s}_{mathrm{it}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})) calculates the difference between the distributions of the attention values of classes A and B as follows:

$${s}_{mathrm{it}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})=(1-,{mathrm{Intersect}},(h(A_{{i,{C}}_{mathrm{A}}}),h({{A}}_{{i,{C}}_{mathrm{B}}}))).$$

(6)

Here, h(⋅) calculates a normalized histogram of attention with 200 bins, and Intersect(⋅ , ⋅) calculates the area overlap between two histograms, and is described as follows:

$${mathrm{Intersect}},(H_{1},H_{2})=mathop{sum}limits_{i}min (H_{1}(i),H_{2}(i)),$$

(7)

where H₁(i) shows the normalized frequency of the ith bin of histogram H₁. As described in Eq. (4), the final score is calculated as the sum of the two scores of ({s}_{mathrm{fc}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})) and ({s}_{mathrm{it}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})).

Here, ({s}_{mathrm{fc}}({A}_{i,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}})) in Eq. (4) is used to find a layer that focuses only on a portion of a trajectory. Owing to the term, only a small important portion of trajectories is highlighted in many cases, as shown in Figs. 3, 5, and 6, especially for the trajectories of beetles. However, substantial portions of several trajectories of the normal mice are highlighted, as shown in Fig. 4d. Because the characteristics of the normal mouse trajectories are the distance from the initial position, the segments in the trajectories far from the initial position are highlighted.

Computing the correlation between attention values and handcrafted features

To help the user understand the meaning of the highlights, DeepHL automatically computes the Pearson correlation coefficients between the attention values of each layer and handcrafted features computed by DeepHL, as shown in Supplementary Table 1. In addition, the correlation coefficients with sensor data and handcrafted features included in a trajectory data file are automatically computed. Computing the correlation with environmental sensor data can reveal the relationship between a behavior and environmental conditions. If a specific behavior is exhibited only when the temperature is high, for example, we can infer that the behavior relates to the high temperature condition. Furthermore, DeepHL automatically computes the moving average, moving variance, and derivative of each of the above features/sensor data, and then computes the correlation coefficients with the attention values, which are presented to the user (Supplementary Fig. 1).

Computing the difference between distributions of each handcrafted feature for the two classes within highlighted segments

To help the user understand the meaning of the highlights, DeepHL automatically computes the difference between distributions of each handcrafted feature for two classes within highlighted segments. The difference is computed as follows:

$${mathrm{diff}}({A}_{i,{C}_{mathrm{A}}},{F}_{j,{C}_{mathrm{A}}},{A}_{i,{C}_{mathrm{B}}},{F}_{j,{C}_{mathrm{B}}})=1-,{mathrm{Intersect}},(h(m({{A}}_{{i,{C}}_{mathrm{A}}},{{F}}_{{j,{C}}_{mathrm{A}}})),h(m({{A}}_{{i,{C}}_{mathrm{B}}},{{F}}_{{j,{C}}_{mathrm{B}}}))).$$

(8)

Here, ({F}_{j,{C}_{mathrm{A}}}) is a set of time series of the jth handcrafted feature calculated from trajectories belonging to class A. In addition, m(⋅ , ⋅) is a masking function that extracts feature values within highlighted segments. Because the softmax function in each attention layer ensures that all attention values in a sum of 1, we consider an attention value larger than c/(# time slices) as a potential attended value (c = 1.2 in our implementation).

Data acquisition of worms

Data acquisition was performed according to Yamazoe-Umemoto et al.²². In brief, several worms were placed in the center of an agar plate in a 9-cm Petri dish, 30% 2-nonanone (v/v, EtOH) was spotted on the left side of the plate, which was covered by a lid and placed on the bench upside down. Then, the images of the plate were captured with a high-resolution USB camera for 12 min at 1 Hz. Because the worms do not exhibit odor avoidance behavior during the first 2 min because of the rapid increase in odor concentration⁴⁶, the data for the following 10 min (i.e., 600 s) was used. From the images, individual worms were identified and the position of the centroid was recorded by an image processing software Move-tr/2D (v. 8.31; Library Inc., Japan). The number of recorded trajectories is 325 (Supplementary Table 2). The comparison was between the naive worms (control class) and the worms after preexposure to the odor (preexposed class).

DeepHL analysis of worms

A multivariate time series of movement speed, relative angular speed, distances from the initial position, and angle from the initial position extracted from the time series of trajectories was fed into DeepHL-Net, yielding a binary classification accuracy of 93.9%, where 20% of the data are used as test data. The discriminator layer used in this investigation has the highest score of all layers. As shown in Fig. 3d, which was calculated from the moving variance of the speed within highlighted segments, we can state that the changes in the speed of preexposed worms is larger than those of control worms. Figure 3e shows spectrograms of the speed calculated from entire trajectories (Fig. 3c) with a 128-s wide sliding window shifted in 1-sample intervals. In addition, Fig. 3f shows histograms of the dominant frequency of speed calculated from entire trajectories using the 128-s wide sliding window shifted in 1-sample intervals. These results also indicate the difference in the frequency of speed between the preexposed and control worms. Our investigation revealed that the dominant frequency of speed significantly differs between the preexposed and control worms using GLMM with Gaussian distributions (t = −6.60; d.f. = 322.8; p = 1.68 × 10⁻¹⁰, effect size(r²) = 0.232). The p value is two sided. Individual factors were treated as random effects. The number of data points for the control class is n = 76, 784 and that for the preexposed class is n = 75, 750. We used GLMM with Gaussian distributions because the objective variable has a continuous value and we used the lmerTest package (v. 2.0–36) of R (v. 3.4.3) for the analysis.

Data acquisition of mice

We collected 52 trajectories of normal mice and unilateral 6-hydroxydopamine (OHDA) lesion mouse models of PD while they freely moved for 10 min in an open field (60 × 55 cm², wall height = 20 cm; normal: 22, PD: 30). The trajectories were detected by the animal’s head position, which was captured by an overhead digital video camera (60 fps). Two sets of small red and green light-emitting diodes were mounted above the animal’s head so that it could be located in each frame. Custom softwares based on Matlab (R2018b, Mathworks, MA, USA) and LabVIEW (Labview 2018, National Instruments, TX, USA) were used for tracking. We then created 30-s segments by splitting each trajectory because training a DNN requires a number of trajectories. We used 966 segments in total (normal: 374, PD: 592) collected from nine C57BL/6J mice (normal: 5, PD: 4). Note that we excluded 30-s segments that contain no movements of a mouse.

DeepHL analysis of mice

Movement speed, relative angular speed, travel distances, straight-line and travel distances from the initial position, and angle from the initial position were fed into our model. The accuracy for the binary classification of normal and 6-OHDA model mice was 74.7%, where 20% of the data are used as test data. The score of the discriminator layer was the highest of all LSTM layers and the sixth highest of all layers. Our investigation revealed that the behavior of visiting locations far away from the initial position can be characteristic of normal mice.

To evaluate PD symptoms from animal behaviors, previous studies have exclusively focused on the movement speed of animals in the open-field tests (frequency and bout duration of ambulation as well as immobility or fine movement) because typical symptoms in the animal model of PD are thought to be slowness of movement and a paucity of spontaneous movements. As shown in Fig. 4e–g, we found significant differences in average movement speed during ambulation periods, average movement speed during fine movement periods, and average maximum distance within a ±60-s window in a session. These differences were derived from the findings of DeepHL using the two-sided Wilcoxon rank-sum test (W = 544, p = 3.486 × 10⁻⁵, effect size (Cliff’s delta) = −0.648; W = 511, p = 5.869 × 10⁻⁴, effect size (Cliff’s delta) = −0.548; W = 521, p = 2.666 × 10⁻⁴, effect size (Cliff’s delta) = −0.579). The 95% confidence intervals are [1.222, 3.481], [0.139, 0.468], and [13.726, 43.175], respectively. We used the exactRankTests package (v. 0.8–29) of R (v. 3.2.3). Note that these behavioral features are extracted from original 10-min trajectories.

The maximum distance, which was derived from a finding of DeepHL, is more useful for evaluating the PD symptoms than conventional measures based on the movement speed. Note that the new feature is designed based on an insight drawn from an analysis by deep learning. These results suggest that DeepHL helps find a novel measure not directly linked to the movement speed, that is, a straight-line distance within a certain time window. When the aim of an animal is to visit all locations in an area, the travel distance over a short duration commonly becomes longer. Besides, it is well known that rodents, including mice and rats, spontaneously prefer to explore an environment, particularly in novel places. Thus, DeepHL may capture the fact that the abnormal behavior of the 6-OHDA lesion model of PD hinders such spontaneous behavioral traits of normal mice. Indeed, the 6-OHDA lesion mouse model appears to remain in the same place. Although this hypothesis should be verified based on the causality between behavioral traits and neural activity patterns underlying PD symptoms using neuronal recording together with its optogenetic manipulation in the basal ganglia and motor cortex²³, it is beyond the scope of this study.

Behavioral features of mice

According to Kravitz et al.²³, ambulation was defined as periods when the velocity of the animal’s center point averaged >2 cm/s for at least 0.5 s. Immobility was defined as continuous periods of time during which the average change of the trajectory was <1 cm for at least 1 s. Fine movement was defined as any movement that was not ambulation or immobility. Maximum travel distance within a ±60-s window was defined as the maximum straight-line distance between the center of the window and each point within the window. Note that each feature value is computed for each entire 10-min trajectory.

6-OHDA injection of mice

Under isoflurane anesthesia, 6-OHDA (4 mg/ml; Sigma) was injected through the implanted cannulae (AP −1.2 mm, ML 1.1 mm, DV 5.0 mm, 2 μl). Animals were allowed to recover for at least 1 week before post-lesion behavioral testing.

Histological verification of dopaminergic cell loss

After the mice were sacrificed by pentobarbital sodium overdose and perfused with formalin, their brains were frozen and cut coronally at 30 μl with a sliding microtome. For immunostaining, sections were divided into six interleaved sets. Immunohistochemistry was performed on the free-floating sections. Sections were pretreated with 3% hydrogen peroxide and incubated overnight with primary antibody mouse anti-tyrosine hydroxylase (1:1000; Millipore). As a secondary antibody, we used biotinylated donkey anti-mouse IgG (1:100; Jackson ImmunoResearch Inc.), followed by incubation with avidin–biotin–peroxidase complex solution (1:100; VECTASTAIN Elite ABC STANDARD KIT, Vector Laboratories). The immunoreactivities were visualized by 3-3′ diaminobenzidine tetrahydrochloride (Dojindo Laboratories). The degree of dopaminergic cell loss was estimated by dividing the number of cells manually counted across three sections of the SNc (most rostral, most caudal, and the intermediate between them) of the lesioned hemisphere from that of the non-lesioned hemisphere.

Data acquisition of beetles

In the present study, we analyzed 419 walking trails collected from S- and L-strain beetles freely moving on a treadmill⁴⁷ (tracking software: custom software based on OpenCV, v. 2.4.9) using DeepHL (Supplementary Table 2). The number of the S-strain (L-strain) beetles is 20, consisting of 10 males and 10 females. The sampling rate of the treadmill is ~14.3 Hz, and the average duration of the trajectories is 52 s.

DeepHL analysis of beetles

In addition to the movement speed and relative angular speed, the distances and angle from the initial position were fed into DeepHL-Net. The classification accuracy for the binary classification between S and L strains was 84.5%, where 20% of the data are used as test data. The score of the discriminator layer in Fig. 5b was the third highest of all layers. Because the layer seems to focus on turns, we computed an angle of a trajectory segment for each point according to Fig. 5e. Figure 5f shows the average angles for the S and L strains (the number of data points for the S strain is 185,884 and the number of data points for the L strain is 219,497). We found that the angle for the S strain is significantly larger than that for the L strain, indicating that beetles of the S-strain beetles walk with more angle changing. Note that we used two-sided analysis of variance (ANOVA) (F = 12.57; d.f. = 1; p = 0.001; effect size(η²) = 0.09). Because multiple data points were computed from each individual beetle’s trajectory, we treated the individuals as a random factor. The 95% confidence interval is [0.05, 0.18]. We used JMP 12.2.0., SAS. This result could indicate the difference in strategies for survival between the S- and L-strain beetles. The L-strain beetles can survive because of their long duration of TI against predators. In contrast, the S-strain beetles attempt to escape from a predator by frequently changing their moving directions.

Previous studies have shown a significantly lower expression level of brain dopamine in the beetles derived from the L strain than those from the S strain³⁰. Nishi et al.⁴⁸ showed that injection of caffeine, which activates dopamine, decreases the duration of immobility in the L strain of T. castaneum. These phenomena concerning dopamine show an analogy to PD, which alters walking patterns⁴⁹. In many animals, dopamine expression level relates to movement patterns, and a specific trajectory segment pattern of the L strain might be similarly deeply affected. To test this new hypothesis, the relationship between dopamine expression and detailed analysis for walking ability, which should be done apart from the present study, should be examined in the future. In conclusion, the analysis using DeepHL revealed significantly different walking trajectories between beetles from the S and L strains using ANOVA: the S-strain beetles walk with more angle changes along the direction of travel compared to the L-strain beetles.

Ethics statement

The studies on streaked shearwaters, mice, and bears were approved by the Animal Experimental Committees of Nagoya University (streaked shearwaters), the Doshisha University Institutional Animal Care and Use Committees (mice), and the Institutional Animal Care and Use Committee of Tokyo University of Agriculture and Technology (bears), respectively. The research on streaked shearwaters was conducted with permits from the Ministry of the Environment, Japan. All experimental procedures used in the bear research followed the Guidelines Concerning Animal Experimentation of the Tokyo University of Agriculture and Technology and the Mammal Society of Japan. They specify no requirements for the treatment of insects in experiments. Details of animals used in this study are described in Supplementary Information, Animals.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Source: Ecology - nature.com