Whales from space dataset, an annotated satellite image dataset of whales for training machine learning models

Very high-resolution (VHR) satellite imagery allows us to survey regularly remote and large areas of the ocean, difficult to access by boats or planes. The interest in using VHR satellite imagery for the study of great whales (including sperm whales and baleen whales) has grown in the past years1,2,3,4,5 since Abileah6 and Fretwell et al.7 showed its potential. This growing interest may be linked to the improvement in the spatial resolution of satellite imagery, which increased in 2014 from 46 cm to 31 cm. This upgrade enhanced the confidence in the detection of whales in satellite imagery, as more details could be seen, such as whale-defining features (e.g. flukes).

Detecting whales in the imagery is either conducted manually1,4,5,7, or automatically2,3. A downside of the manual approach is that it is time-demanding, with manual counter often having to view hundred and sometimes thousands of square kilometres of open ocean. The development of automated approaches to detect whales by satellite would not only speed up this application, but also reduce the possibility of missing whales due to observer fatigue and standardize the procedure. Various automated approaches exist from pixel-based to artificial intelligence. Machine learning, an application of artificial intelligence, seems to be the most appropriate automated method to detect whales efficiently in satellite imagery2,3,8,9.

In machine learning an algorithm learns how to identify features by repeatedly testing different search parameters against a training dataset10,11. Concerning whales, the algorithm needs to be trained to detect the wide variety of shapes and colour characterising whales. Shapes and colour will be influenced by the type of species, the environment (e.g. various degree of turbidity), the light conditions, and the behaviours (e.g. foraging, travelling, breaching), as different behaviours will result in different postures. The larger a training dataset is, the more accurate and transferable to other satellite images the algorithm will be. At the time of writing, such a dataset does not exist or is not publicly available.

Creating a large enough dataset necessary to train algorithms to detect whales in VHR satellite imagery will require the various research groups analysing VHR satellite imagery to openly share examples of whales and non-whale objects in VHR satellite imagery, which could be facilitated by uploading such data on a central open source repository, similar to the GenBank12 for DNA code or OBIS-Seamap13 for marine wildlife observations. Ideally clipped out image chips of the whale objects would be shared as tiff files, which retains most of the characteristics of the original image. However, all VHR satellites are commercially owned, except for the Cartosat-3 owned by the government of India14, which means it is not possible to publicly share image chips as tiff file. Instead, image chips could be shared in a png or jepg format, which involve loosing some spectral information. If tiff files are required, georeferenced and labelled boxes encompassing the whale objects could also be shared, including information on the satellite imagery to allow anyone to ask the commercial providers for the exact imagery.

Here we present a database of whale objects found in VHR satellite imagery. It represents four different species of whales (i.e. southern right whale, Eubalaena australis; grey whale, Eschrichtius robustus; humpback whale, Megaptera novaeangliae; fin whale, Balaenoptera physalus; Fig. 1), which were manually detected in images captured by different satellites (i.e., GeoEye-1, Quickbird-2, WorldView-2, WorldView-3). We created the database by (i) first detecting whale objects manually in satellite imagery, (ii) then we classified whale objects as either “definite”, “probable” or “possible” as in Cubaynes et al.1; and (iii) finally we created georeferenced and labelled points and boxes centered around each whale object, as well as providing image chips in a png format. With this database made publicly available, we aim to initiate the creation of a central database that can be built upon.

Fig. 1

Database of annotated whales detected in satellite imagery covering different species and areas. Humpback whales were detected in Maui Nui, US (a); grey whales in Laguna San Ignacio, Mexico (b); fin whales in the Pelagos Sanctuary, France, Monaco and Italy (c); southern right whales were observed in three areas, off the Peninsula Valdes, Argentina (d); off Witsand, South Africa (e); and off the Auckland Islands, New Zealand (f). The dot size represents the number of annotated whales per location. Whale silhouettes were sourced from (the grey and humpback whales silhouettes are from Chris Luh).

Full size image

Source: Ecology -

Determinants of variability in signature whistles of the Mediterranean common bottlenose dolphin

“The world needs your smarts, your skills,” Ngozi Okonjo-Iweala tells MIT’s Class of 2022