New machine learning-based automatic high-throughput video tracking system for assessing water toxicity using Daphnia Magna locomotory responses

Test organisms and exposures

In this study, we used test organisms and reagents according to the Acute Toxicity Test Method of Daphnia magna Straus(Cladocera, Crustacea); ES 04704.1b²⁹. Daphnia magna were fostered at the National Institute of Environmental Research and were adopted. During the test, adult female Daphnia magna over two weeks of age, cultured over several generations, were transferred to a freshly prepared container the day before the test. Daphnia magna are neonates for less than 24 h after birth²⁹. To maintain the sensitivity of the organism, young individuals less than 24 h old that reproduced the following day were used. Individuals of a similar size were selected for the test. Daphnia magna was fed YCT, which is a mixture of green algae in Chlorella sp., yeast, Cerophy II(R), and trout chow. Sufficient amounts of prey were supplied 2 h before the test to minimize the effects of prey during the test. The test medium was prepared by dissolving KCl (8 mg/L), (text {MgSO}_4) (120 mg/L), (text {CaSO}_4 cdot 2 text {H}_2 text {O} ) (120 mg/L), and (text {NaHCO}_3) (192 mg/L) in deionized water.

Automatic high-throughput Daphnia magna tracking system

To build an automatic high-throughput Daphnia magna tracking system, we equipped the system with a video analysis algorithm as well as flow cells (Fig. 1). In the tracking system, six flow cells filled with culture medium were installed in the device. Each flow cell contained 10 Daphnia magna. Subsequently, to automatically measure the state of Daphnia magna, the six flow cells were photographed at 15 frames per second using a camera (Industrial Development Systems imaging) equipped with a CMOSIS sensor capable of infrared imaging. A red light close to the infrared spectrum was placed at the back of the flow cells for uniform illumination and to minimize stress on Daphnia magna. To capture the size and movement of the Daphnia magna as accurately as possible, the camera was set to a frame rate of 15 fps and a resolution of 2048 (times ) 1088 (2.23 MB), using a 12 mm lens. The distance between the flow cell and the camera was set to 16 cm. To measure the number of mobile Daphnia magna, their lethality, and swimming inhibition automatically and simultaneously, one camera for every two cells was used to collect the status data of Daphnia magna. For assessing ecotoxicity, the video analysis system used images obtained from the six flow cells to track each Daphnia magna and estimate key statistics such as the number of mobile individuals, average distance, and radius of activity.

Figure 1

New automatic high-throughput video tracking system for behavioral analysis using Daphnia magna as a model organism

Full size image

The automatic high-throughput video tracking system in the ecotoxicity measuring device was designed to continuously measure the ecotoxicity of Daphnia magna (Fig. 2). Daphnia magna moves faster at high temperatures and is less active at low temperatures. Thus, a constant temperature module that can be set to an appropriate Daphnia magna habitat temperature (20 ± 2 (^{circ })C) was added to create a suitable culture environment for Daphnia magna²⁹. Natural pseudo-light ((lambda >590) nm, 3000 k) was installed on the upper part of the detector for proper habitat light intensity (500 Lux–1000 Lux). The size of the flow cell was set as small as possible while observing the movement of the Daphnia magna. An automatic feeding system was installed so that food could be injected during the replacement cycle. The six independent multi-flow cells were designed with an automatic dilution injection module; therefore, these flow cells were diluted to six different concentrations (100%, 50%, 25%, 12.5%, 6.25%, and 0%).

Figure 2

Schematic representation of the automatic high-throughput video tracking system

Full size image

Automatic tracking algorithm

The CPU used for Daphnia magna tracking was Intel i5-9300H @ 2.40 GHz, with 8 GB of memory and Windows 10 Pro 64-bit operating system. In this experiment, the algorithms were trained using 12 Daphnia magna videos and tested using an additional four Daphnia magna videos. Subsequently, the detection and tracking methods were compared. The videos, each of which had a duration of 30 s, were captured at a rate of 15 frames per second. Generally, for long-time or real-time videos, the following factors must be considered in tracking Daphnia magna: automatic binarization between the object and background, effective classification of Daphnia magna or noise, and the speed of the algorithm. Therefore, to develop an efficient tracking algorithm, we propose the following tracking process (Fig. 3A). In this process, each frame is initially converted into an image and the background is identified from the obtained video (Fig. 3B). The background is the average of the frames over the previous 20 s, and the tracking system takes 20 s to capture the first background image. The background is subtracted from the image for object detection (Fig. 3C). The objects include Daphnia magna and noise such as droplets and sediment. The difference between the background and frame images is binarized, and each area of the binarized values is regarded as an object. Conventionally, the binarized values are manually generated using specific thresholds. In this study, the images are automatically binarized using k-means clustering to select the threshold value. After binarization, several machine learning methods are used to classify the objects as Daphnia magna or noise (Fig. 3D). For a faster tracking algorithm, we use simple machine learning methods such as random forest (RF) and support vector machine (SVM). The predicted Daphnia magna are tracked using SORT²⁴, which is a fast and highly accurate tracking algorithm (Fig. 3E). Finally, based on the tracked results, statistics for assessing ecotoxicity, such as the number of mobile individuals, average distance, and radius of activity, are estimated to evaluate the toxicity of the aquatic environment.

Figure 3

Automatic Daphnia magna tracking algorithm process. (A) Overview of automatic tracking algorithm process. (B) Image extraction step. (C) Background subtraction step. (D) Daphnia magna detection step. (E) Daphnia magna tracking step.

Full size image

k-means clustering for automatic background subtraction

Many tracking algorithms assume that the background is fixed. With fixed backgrounds, the difference between the frame and background can be used to identify objects. However, automatically selecting the precise threshold value for image pixel binarization becomes one of the key problems in identifying objects. The proposed method applies k-means clustering to the pixel values of the subtracted image³⁰, and the center value of each calculated cluster mean is selected as the threshold value (Fig. 4). In the k-means clustering method, grouping is repeatedly performed using the distance between data points³¹. For binarization, two groups are formed. Let (mu _1 (t)) be the mean of pixels less than the threshold and (mu _2(t)) be the mean of pixels greater than the threshold. At first, (mu _1(t), mu _2(t)) are randomly initialized. Subsequently, each pixel is grouped into a closer mean of each group. The above steps are repeated several times until the group experiences a few changes. Finally, the threshold is calculated as an average of the two means.

Figure 4

Example of automatic threshold value setting for binarization between objects and background using k-means clustering

Full size image

Classification methods

Object detection based solely on the subtraction between the background and frame images may have low accuracy. As the background in the proposed process is the average value of the frame images, noise may occur. Although this noise is removed by threshold selection in binarization, using only the threshold selection is not efficient for long or real-time videos. Therefore, additional noise must be classified and removed using machine learning models, requiring the construction of a database. In the database, the obtained objects are manually labeled as noise or Daphnia magna and are called ground truth. For classification, the resized 8 (times ) 8 image of each object is stored in the database. The resized image is transformed into a feature using the Sobel edge detection algorithm³² and entered as inputs to the classification models. In this study, classification models such as RF³³ SVM³⁴ were used.

RF is a model that integrates several decision tree models³⁵. All training data are sampled with a replacement for training each decision tree model. The decision tree model is trained to split intervals of each independent variable by minimizing the gini index (Eq. 1) or entropy index (Eq. 2). The gini index and entropy index denote the impurity within the intervals.

$$begin{aligned} G= & {} 1- sum _{i=1}^{c} p_i ^2 end{aligned}$$

(1)

$$begin{aligned} E= & {} – sum _{i=1}^{c} p_i log_2 p_i end{aligned}$$

(2)

where (p_i) is a probability within i-th interval, and c is the number of intervals. For better performance, the RF selects independent variables of training data randomly. This step serves to reduce the correlation of each model. If predictions of each decision tree are uncorrelated, then the variance of an integrated prediction of models is smaller than the variance of each model. RF integrates several model predictions using the voting method. An advantage of the RF method is that it avoids overfitting because the model uses the average of many predictions.

SVM is a model designed to search for a hyperplane to maximize the distance, or margin, between support vectors. The hyperplane refers to the plane that divides two different groups, and the support vector represents the closest vector to the hyperplane. Let (D=({textbf{x}}_i, y_i), i=1, ldots , n, {textbf{x}}_i in {mathbb {R}}^p, y_n in { -1,1 }) be training data. Suppose that the training data are completely separated linearly by a hyperplane; then, the hyperplane is expressed as Eq. 3.

$$begin{aligned} {textbf{w}}^T {textbf{x}} + b = 0, end{aligned}$$

(3)

where ({textbf{w}}) is a weight vector of the hyperplane, and b is a bias. The weight vector is updated by minimizing Eq. 4.

$$begin{aligned} L = {1 over 2} {textbf{w}}^T {textbf{w}} text { subject to } y_i ({textbf{w}}^T {textbf{x}} + b) ge 1 end{aligned}$$

(4)

We can transform Eqs. 4 to 5 by using the Lagrange multiplier method.

$$begin{aligned} L^* = {1 over 2} {textbf{w}}^T {textbf{w}} – sum _{i=1}^n a_i { y_i ({textbf{w}}^T x_i + {-}) – 1 }, end{aligned}$$

(5)

where (a_i) is the Lagrange multiplier. We can efficiently solve Eq. 5 using a dual form. Furthermore, Eq. 5 can be solved in a case where it is not completely separated using a slack variable and a kernel trick can be used to estimate the nonlinear hyperplane.

SORT tracker

SORT, one of the frameworks for solving the multiple object tracking (MOT) problem, aims to achieve efficient real-time tracking²⁴. The SORT method framework is created by combining the estimation step and the association step. The estimation step forecasts the next position of each predicted Daphnia magna. The association step matches the forecasting position and next true position of each predicted Daphnia magna. In the estimation step, the SORT framework uses the Kalman filter to forecast the position of the predicted Daphnia magna in the next frame. The position of each predicted Daphnia magna is expressed as Eq. 6.

$$begin{aligned} {textbf{x}} = [u,v,s,r,{dot{u}}, {dot{v}}, {dot{s}}]^T end{aligned}$$

(6)

where u and v are the center positions of each predicted Daphnia magna, s is the scale size of the bounding box, and r is the aspect ratio of the bounding box. ({dot{u}}), ({dot{v}}), and ({dot{s}}) are the amounts of change in each variable. In the association step, to associate the forecasting position and true position, the framework adopts the intersection-over-union (IOU)³⁶ as the association metric. The Hungarian algorithm is loaded into the SORT framework to perform fast and efficient Daphnia magna association prediction. In this study, a mixed metric of IOU³⁶ and Euclidean distance³⁷ was used instead of only the IOU that is used in SORT (Eq. 7) for more efficient association.

$$begin{aligned} C_{ij} = (1-lambda ) {max_d – d_{ij} over max_d} + lambda cdot IOU_{ij} end{aligned}$$

(7)

where (d_{ij}) is the Euclidean distance between the i-th predicted Daphnia magna in the before frame and the j-th predicted Daphnia magna in the next frame, and (lambda ) is the weight of (IOU_{ij}). (IOU_{ij}) is the IOU between the i-th predicted Daphnia magna in the before-frame and the j-th predicted Daphnia magna in the next frame.

Metrics

The binary confusion matrix consists of true positive (TP), true negative (TN), false positive (FP), and false negative (FN)³⁸. TP is the number of cases where the predicted Daphnia magna matches the actual Daphnia magna, TN is the number of cases where the objects predicted as noise are actual noise, FP is the number of cases where the predicted Daphnia magna differs from the actual Daphnia magna, and FN is the number of cases where the objects predicted as noise are not actual noise. In this study, accuracy, recall, precision, and F1 scores (Eq. 8) were used as the metrics for comparing the machine learning methods.

$$begin{aligned} begin{aligned} Accuracy&= {TP + FP over TP + TN + FP + FN} Recall&= {TP over TP + TN} Precision&= {TP over TP + FP} F1 score&= 2 times {Precision times Recall over Precision + Recall} end{aligned} end{aligned}$$

(8)

Standard MOT metrics to evaluate tracking performance include multi-object tracking accuracy (MOTA) and multi-object tracking precision (MOTP). An important task of MOT is to identify and track the same object across two frames. Identification (ID) precision (IDP), ID recall (IDR), ID F1 measure (IDF1), and ID switches (IDs) may be used as measures for evaluating the identification and tracking of the same objects^39,40.

Data analysis

The toxicity test using Daphnia magna was performed following the Korean official Acute Toxicity Test Method²⁹. The test medium was prepared by dissolving KCl (8 mg/L), (text {MgSO}_4) (120 mg/L), (text {CaSO}_4 cdot 2 text {H}_2 text {O} ) (120 mg/L), and (text {NaHCO}_3) (192 mg/L) in deionized water. Considering that Daphnia magna are neonates for less than 24 h after birth²⁹, five neonates were exposed to 50 mL of different concentrations of heavy metals such as Potassium dichromate, Copper(II) sulfate pentahydrate, and Lead(II) sulfate (6.25, 12.5, 25, 50, and 100%) and 50 mL of culture media. Potassium dichromate is a common inorganic reagent used as an oxidizing agent in chemical industries. Copper(II) sulfate pentahydrate is a trace material widely used in industrial processes and agriculture. A significant amount of copper is emitted in semiconductor manufacturing processes, which adversely impacts the aquatic ecosystem. When present as an ion in water, copper can be acutely toxic to aquatic organisms such as Daphnia magna. Lead(II) sulfate is another nonessential and nonbiodegradable heavy metal. It is highly toxic to numerous organisms even at low concentrations and can accumulate in aquatic ecosystems⁴¹. Twenty Daphnia magna (four replicates of five each) were exposed to each test solution for 24 h. The term “immobility” means that the Daphnia magna remains stationary after exposure to chemicals such as Potassium dichromate, Copper(II) sulfate pentahydrate, and Lead(II) sulfate. In this study, immobility was used as an endpoint identifier, and the number of mobile Daphnia magna were counted to evaluate the EC50 values for the samples using the ToxCalc 5.0 program (Tidepoll Software, USA).

The locomotory responses of Daphnia magna were tested after 0, 12, 18, and 24 h of exposure at different concentrations. Potassium dichromate ((text {K}_2text {Cr}_2text {O}_7)) at 2 mg/L was connected to the Daphnia magna tracking system, and standard toxic substances were automatically diluted to 100%, 50%, 25%, 12.5%, and 6.25%. The automatic high-throughput Daphnia magna tracking system automatically measured the tracking results of a 1-minute-long video at hourly intervals. The average moving distance for 20 s of each Daphnia magna in each chamber was analyzed using a repeated measures ANOVA (RMANOVA). RMANOVA was used for the analysis of data obtained by repeatedly measuring the same Daphnia magna⁴². It analyzes the concentration effect excluding the time effect at each hour. The time effect means the change in average distance per 20 s. RMANOVA was implemented using the agricolae package of the R 4.0.4 program⁴³. To remove the noise affecting RMANOVA, the Daphnia magna that remained stationary for 20 s or more were removed from the observations. In this study, we used the significance level at 5%.

Source: Ecology - nature.com

New machine learning-based automatic high-throughput video tracking system for assessing water toxicity using Daphnia Magna locomotory responses

Test organisms and exposures

Automatic high-throughput Daphnia magna tracking system

Automatic tracking algorithm

k-means clustering for automatic background subtraction

Classification methods

SORT tracker

Metrics

Data analysis

Coastal algal blooms have intensified over the past 20 years

Integrating humans with AI in structural design

ITALIAN LANGUAGE

ENGLISH LANGUAGE