in

# As good as human experts in detecting plant roots in minirhizotron images but efficient and reproducible: the convolutional neural network “RootDetector”

### Datasets

#### Image acquisition

For this study, we assembled three datasets: one for training of the RootDetector Convolutional Neural Network (Training-Set), one for a performance comparison between humans and RootDetector in segmenting roots in minirhizotron images (Comparison-Set), and one for the validation of the algorithm (Validation-Set). The Training-Set contained 129 images comprised of 17 randomly selected minirhizotron images sampled in a mesocosm experiment (see “Mesocosm sampling” Section), 47 randomly selected minirhizotron images sampled in a field study (see “Field sampling” Section) as well as the 65 minirhizotron images of soy roots published by Wang et al.15. The Comparison-Set contained 25 randomly selected minirhizotron images from the field-study which all were not part of the images included in the Training- and Validation-Sets. The Validation-Set contained 10 randomly selected minirhizotron images from the same field study, which had not been used in the Training-Set. All images were recorded with 2550 ✕ 2273 pixels at 300 dpi with a CI-600 In-Situ Root Imager (CID Bio-Science Inc., Camas, WA, USA) and stored as .tiff files to reduce compression loss. For all training and evaluation purposes we used raw, unprocessed output images from the CI-600.

#### Mesocosm sampling

The mesocosm experiment was established in 2018 on the premises of the Institute for Botany and Landscape Ecology of the University of Greifswald (Fig. S1). It features 108 heavy duty plastic buckets of 100 l each, filled to two thirds of their height with moderately decomposed sedge fen peat. Each mesocosm contained one minirhizotron (inner diameter: 64 mm, outer diameter: 70 mm, length: 650 mm) installed at a 45°angle and capped in order to avoid penetration by light. The mesocosms were planted with varying compositions of plant species that typically occur in north-east German sedge fens (Carex rostrata, Carex acutiformis, Glyceria maxima, Equisetum fluviatile, Juncus inflexus, Mentha aquatica, Acorus calamus and Lycopus europaeus). The mesocosms were subjected to three different water table regimes: stable at soil surface level, stable at 20 cm below soil surface and fluctuating between the two levels every two weeks. The minirhizotrons were scanned weekly at two levels of soil depth (0–20 cm and 15–35 cm) between April 2019 and December 2021, resulting in roughly 9500 minirhizotron images of 216 × 196 mm. Manual quantification of root length would, based on own experience, take approximately three hours per image, resulting in approximately 28,500 h of manual processing for the complete dataset. Specimens planted were identified by author Dr. Blume-Werry, however no voucher specimen were deposited. All methods were carried out in accordance with relevant institutional, national, and international guidelines and legislation.

#### Field sampling

The field study was established as part of the Wetscapes project in 201716. The study sites were located in Mecklenburg-Vorpommern, Germany, in three of the most common wetland types of the region: alder forest, percolation fen and coastal fen (Fig. S2). For each wetland type, a pair of drained versus rewetted study sites was established. A detailed description of the study sites and the experimental setup can be found in Jurasinski et al.16. At each site, 15 minirhizotrons (same diameter as above, length: 1500 mm) were installed at 45° angle along a central boardwalk. The minirhizotrons have been scanned biweekly since April 2018, then monthly since January 2019 at two to four levels of soil depth (0–20 cm, 20–40 cm, 40–60 cm and 60–80 cm), resulting in roughly 12,000 minirhizotron images of 216 × 196 cm, i.e. an estimated 36,000 h of manual processing for the complete dataset. Permission for the study was obtained from the all field owners.

### The CNN RootDetector

#### Image annotation

For the generation of training data for the CNN, human analysts manually masked all root pixels in the 74 images of the Training-Set using GIMP 2.10.12. The resulting ground truth data are binary, black-and-white images in Portable Network Graphics (.png) format, where white pixels represent root structures and black pixels represent non-root objects and soil (Fig. 2b). All training data were checked and, if required, corrected by an expert (see “Selection of participants” for definition). The Validation-Set was created in the same way but exclusively by experts.

#### Architecture

RootDetector’s core consists of a Deep Neural Network (DNN) based on the U-Net image segmentation architecture[27]nd is implemented in TensorFlow and Keras frameworks18. Although U-Net was originally developed for biomedical applications, it has since been successfully applied to other domains due to its generic design.

RootDetector is built up of four down-sampling blocks, four up-sampling blocks and a final output block (Fig. 1). Every block contains two 3 × 3 convolutional layers, each followed by rectified linear units (ReLU). The last output layer instead utilizes Sigmoid activation. Starting from initial 64 feature channels, this number is doubled in every down-block and the resolution is halved via 2 × 2 max-pooling. Every up-block again doubles the resolution via bilinear interpolation and a 1 × 1 convolution which halves the number of channels. Importantly, after each up-sampling step, the feature map is concatenated with the corresponding feature map from the down-sampling path. This is crucial to preserve fine spatial details.

Our modifications from the original architecture include BatchNormalization19 after each convolutional layer which significantly helps to speed up the training process and zero-padding instead of cropping as suggested by Ronneberger, Fischer, & Brox20 to preserve the original image size.

In addition to the root segmentation network, we trained a second network to detect foreign objects, specifically the adhesive tape that is used as a light barrier on the aboveground part of the minirhizotrons. We used the same network architecture as above and trained in a supervised fashion with the binary cross-entropy loss. During inference, the result is thresholded (predefined threshold value: 0.5) and used without post-processing.

#### Training

We pre-trained RootDetector on the COCO dataset21 to generate a starting point. Although the COCO dataset contains a wide variety of image types and classes not specifically related to minirhizotron images, Majurski et al.22 showed, that for small annotation counts, transfer-learning even from unrelated datasets may improve a CNNs performance by up to 20%. We fine-tuned for our dataset with the Adam optimizer23 for 15 epochs and trained on a total of 129 images from the Training-Set (17 mesocosm images, 47 field-experiment images, 65 soy root images). To enhance the dataset size and reduce over-fitting effects, we performed a series of augmentation operations as described by Shorten & Khoshgoftaar24. In many images, relatively coarse roots (> 3 mm) occupied a major part of the positive (white) pixel space, which might have caused RootDetector to underestimate fine root details overall. Similarly, negative space (black pixels) between tightly packed, parallel roots was often very small and might have impacted the training process to a lesser extent when compared to large areas with few or no roots (Fig. 2). To mitigate both effects, we multiplied the result of the cross-entropy loss map with a weight map which emphasizes positive–negative transitions. This weight map is generated by applying the following formula to the annotated ground truth images:

$$omega left( x right) = 1 – left( {tanh left( {2tilde{x} – 1} right)} right)^{2}$$

(1)

where ω(x) is the average pixel value of the annotated weight map in a 5 × 5 neighborhood around pixel x. Ronneberger, Fischer, & Brox20 implemented a similar weight map, however with stronger emphasis on space between objects. As this requires computation of distances between two comparatively large sets of points, we adapted and simplified their formula to be computable in a single 5 × 5 convolution.

For the loss function we applied a combination of cross-entropy and Dice loss 25:

$${mathcal{L}} = {mathcal{L}}_{CE} + lambda {mathcal{L}}_{Dice} = – frac{1}{N}sumnolimits_{i} {wleft( {x_{i} } right)y_{i} log left( {x_{i} } right) + lambda frac{{2sumnolimits_{i} {x_{i} y_{i} } }}{{sumnolimits_{i} {x_{i}^{2} sumnolimits_{i} {y_{i}^{2} } } }}}$$

(2)

where x are the predicted pixels, y the corresponding ground truth labels, N the number of pixels in an image and λ a balancing factor which we set to 0.01. This value was derived empirically. The Dice loss is applied per-image to counteract the usually high positive-to-negative pixel imbalance. Since this may produce overly confident outputs and restrict the application of weight maps, we used a relatively low value for λ.

#### Output and post-processing

RootDetector generates two types of output. The first type of output are greyscale .png files in which white pixels represent pixels associated with root structures and black pixels represent non-root structures and soil (Fig. 2c). The advantage of .png images is their lossless ad artifact-free compression at relatively small file sizes. RootDetector further skeletonizes the output images and reduces root-structures to single-pixel representations using the skeletonize function of scikit-image v. 0.17.1 (26; Fig. 2e,f). This helps to reduce the impact of large diameter roots or root-like structures such as rhizomes in subsequent analyses and is directly comparable to estimations of root length. The second type of output is a Comma-separated values (.csv) file, with numerical values indicating the number of identified root pixels, the number of root pixels after skeletonization, the number of orthogonal and diagonal connections between pixels after skeletonization and an estimation of the physical combined length of all roots for each processed image. The latter is a metric commonly used in root research as in many species, fine roots provide most vital functions such as nutrient and water transport3. Therefore, the combined length of all roots in a given space puts an emphasis on fine roots as they typically occupy a relatively smaller fraction of the area in a 2D image compared to often much thicker coarse roots. To derive physical length estimates from skeletonized images, RootDetector counts orthogonal- and diagonal connections between pixels of skeletonized images and employs the formula proposed by Kimura et al.17 (Eq. 3).

$$L = left[ {N_{d}^{2} + left( {N_{d} + N_{o} /2} right)^{2} } right]^{{1/2}} + N_{o} /2$$

(3)

where Nd is the number of diagonally connected and No the number of orthogonally connected skeleton pixels. To compute Nd we convolve the skeletonized image with two 2 × 2 binary kernels, one for top-left-to-bottom-right connections and another for bottom-left-to-top-right connections and count the number of pixels with maximum response in the convolution result. Similarly, No is computed with a 1 × 2 and a 2 × 1 convolutional kernels.

### Performance comparison

#### Selection of participants

For the performance comparison, we selected 10 human analysts and divided them into three groups of different expertise levels in plant physiology and with the usage of digital root measuring tools. The novice group consisted of 3 ecology students (2 bachelor’s, 1 master’s) who had taken or were taking courses in plant physiology but had no prior experience with minirhizotron images or digital root measuring tools. This group represents undergraduate students producing data for a Bachelor thesis or student assistants employed to process data. The advanced group consisted of 3 ecology students (1 bachelor’s, 2 master’s) who had already taken courses in plant physiology and had at least 100 h of experience with minirhizotron images and digital root measuring tools. The expert group consisted of 4 scientists (2 PhD, 2 PhD candidates) who had extensive experience in root science and at least 250 h of experience with digital root measuring tools. All methods were carried out in accordance with relevant institutional, national, and international guidelines and legislation and informed consent was obtained from all participants.

#### Instruction and root tracing

All three groups were instructed by showing them a 60 min live demo of an expert tracing roots in minirhizotron images, during which commonly encountered challenges and pitfalls were thoroughly discussed. Additionally, all participants were provided with a previously generated, in-depth manual containing guidelines on the identification of root structures, the correct operation of the root tracing program and examples of often encountered challenges and suggested solutions. Before working on the Comparison-Set, all participants traced roots in one smaller-size sample image and received feedback from one expert.

#### Image preparation and root tracing

Because the minirhizotron images acquired in the field covered a variety of different substrates, roots of different plant species, variance in image quality, and because tracing roots is very time consuming, we decided to maximize the number of images by tracing roots only in small sections, in order to cover the largest number of cases possible. To do this, we placed a box of 1000 × 1000 pixels (8.47 × 8.47 cm) at a random location in each of the images in the Comparison-Set and instructed participants to trace only roots within that box. Similarly, we provided RootDetector images where the parts of the image outside the rectangle were occluded. All groups used RootSnap! 1.3.2.25 (CID Bio-Science Inc., Camas, WA, USA;27), a vector based tool to manually trace roots in each of the 25 images in the comparison set. We decided on RootSnap! due to our previous good experience with the software and its’ relative ease of use. The combined length of all roots was then exported as a csv file for each person and image and compared to RootDetector’s output of the Kimura root length.

### Validation

We tested the accuracy of RootDetector on a set of 10 image segments of 1000 by 1000 pixels cropped from random locations of the 10 images of the Validation-Set. These images were annotated by a human expert without knowledge of the estimations by the algorithm and were exempted from the training process. As commonly applied in binary classification, we use the F1 score as a metric to evaluate the performance RootDetector. F1 is calculated from precision (Eq. 4) and recall (Eq. 5) and represents their harmonic mean (Eq. 6). Ranging from 0 to 1, higher values indicate high classification (segmentation) performance. As one of the 10 image sections contained no roots and thus no F1 Score was calculable, it was excluded from the validation. We calculated the F1 score for each of the nine remaining image sections and averaged the values as a metric for overall segmentation performance.

$$Precision;(P) = frac{{tp}}{{tp + fp}}$$

(4)

$$Recall;(R) = frac{{tp}}{{tp + fn}}$$

(5)

$$F1 = 2*frac{{P*R}}{{P + R}}$$

(6)

where P = precision, R = recall, tp = true positives; fp = false positives, fn = false negatives.

### Statistical analysis

We used R Version 4.1.2 (R Core Team, 2021) for all statistical analyses and R package ggplot2 Version 3.2.128 for visualizations. Pixel identification-performance comparisons were based on least-squares fit and the Pearson method. Root length estimation-performance comparisons between groups of human analysts (novice, advanced, expert) and RootDetector were based on the respective estimates of total root length plotted over the minirhizotron images in increasing order of total root length. Linear models were calculated using the lm function for each group of analysts. To determine significant differences between the groups and the algorithm, 95% CIs as well as 83% CIs were displayed and RootDetector root length outside the 95% CI were considered significantly different from the group estimate at α = 0.0529. The groups of human analysts were considered significantly different if their 83% CIs did not overlap, as the comparison of two 83% CIs approximates an alpha level of 5%30,31.

This study is approved by Ethikkommission der Universitätsmedizin Greifswald, University of Greifswald, Germany.

Source: Ecology - nature.com