More stories

  • in

    New tools are available to help reduce the energy that AI models devour

    When searching for flights on Google, you may have noticed that each flight’s carbon-emission estimate is now presented next to its cost. It’s a way to inform customers about their environmental impact, and to let them factor this information into their decision-making.

    A similar kind of transparency doesn’t yet exist for the computing industry, despite its carbon emissions exceeding those of the entire airline industry. Escalating this energy demand are artificial intelligence models. Huge, popular models like ChatGPT signal a trend of large-scale artificial intelligence, boosting forecasts that predict data centers will draw up to 21 percent of the world’s electricity supply by 2030.

    The MIT Lincoln Laboratory Supercomputing Center (LLSC) is developing techniques to help data centers reel in energy use. Their techniques range from simple but effective changes, like power-capping hardware, to adopting novel tools that can stop AI training early on. Crucially, they have found that these techniques have a minimal impact on model performance.

    In the wider picture, their work is mobilizing green-computing research and promoting a culture of transparency. “Energy-aware computing is not really a research area, because everyone’s been holding on to their data,” says Vijay Gadepally, senior staff in the LLSC who leads energy-aware research efforts. “Somebody has to start, and we’re hoping others will follow.”

    Curbing power and cooling down

    Like many data centers, the LLSC has seen a significant uptick in the number of AI jobs running on its hardware. Noticing an increase in energy usage, computer scientists at the LLSC were curious about ways to run jobs more efficiently. Green computing is a principle of the center, which is powered entirely by carbon-free energy.

    Training an AI model — the process by which it learns patterns from huge datasets — requires using graphics processing units (GPUs), which are power-hungry hardware. As one example, the GPUs that trained GPT-3 (the precursor to ChatGPT) are estimated to have consumed 1,300 megawatt-hours of electricity, roughly equal to that used by 1,450 average U.S. households per month.

    While most people seek out GPUs because of their computational power, manufacturers offer ways to limit the amount of power a GPU is allowed to draw. “We studied the effects of capping power and found that we could reduce energy consumption by about 12 percent to 15 percent, depending on the model,” Siddharth Samsi, a researcher within the LLSC, says.

    The trade-off for capping power is increasing task time — GPUs will take about 3 percent longer to complete a task, an increase Gadepally says is “barely noticeable” considering that models are often trained over days or even months. In one of their experiments in which they trained the popular BERT language model, limiting GPU power to 150 watts saw a two-hour increase in training time (from 80 to 82 hours) but saved the equivalent of a U.S. household’s week of energy.

    The team then built software that plugs this power-capping capability into the widely used scheduler system, Slurm. The software lets data center owners set limits across their system or on a job-by-job basis.

    “We can deploy this intervention today, and we’ve done so across all our systems,” Gadepally says.

    Side benefits have arisen, too. Since putting power constraints in place, the GPUs on LLSC supercomputers have been running about 30 degrees Fahrenheit cooler and at a more consistent temperature, reducing stress on the cooling system. Running the hardware cooler can potentially also increase reliability and service lifetime. They can now consider delaying the purchase of new hardware — reducing the center’s “embodied carbon,” or the emissions created through the manufacturing of equipment — until the efficiencies gained by using new hardware offset this aspect of the carbon footprint. They’re also finding ways to cut down on cooling needs by strategically scheduling jobs to run at night and during the winter months.

    “Data centers can use these easy-to-implement approaches today to increase efficiencies, without requiring modifications to code or infrastructure,” Gadepally says.

    Taking this holistic look at a data center’s operations to find opportunities to cut down can be time-intensive. To make this process easier for others, the team — in collaboration with Professor Devesh Tiwari and Baolin Li at Northeastern University — recently developed and published a comprehensive framework for analyzing the carbon footprint of high-performance computing systems. System practitioners can use this analysis framework to gain a better understanding of how sustainable their current system is and consider changes for next-generation systems.  

    Adjusting how models are trained and used

    On top of making adjustments to data center operations, the team is devising ways to make AI-model development more efficient.

    When training models, AI developers often focus on improving accuracy, and they build upon previous models as a starting point. To achieve the desired output, they have to figure out what parameters to use, and getting it right can take testing thousands of configurations. This process, called hyperparameter optimization, is one area LLSC researchers have found ripe for cutting down energy waste. 

    “We’ve developed a model that basically looks at the rate at which a given configuration is learning,” Gadepally says. Given that rate, their model predicts the likely performance. Underperforming models are stopped early. “We can give you a very accurate estimate early on that the best model will be in this top 10 of 100 models running,” he says.

    In their studies, this early stopping led to dramatic savings: an 80 percent reduction in the energy used for model training. They’ve applied this technique to models developed for computer vision, natural language processing, and material design applications.

    “In my opinion, this technique has the biggest potential for advancing the way AI models are trained,” Gadepally says.

    Training is just one part of an AI model’s emissions. The largest contributor to emissions over time is model inference, or the process of running the model live, like when a user chats with ChatGPT. To respond quickly, these models use redundant hardware, running all the time, waiting for a user to ask a question.

    One way to improve inference efficiency is to use the most appropriate hardware. Also with Northeastern University, the team created an optimizer that matches a model with the most carbon-efficient mix of hardware, such as high-power GPUs for the computationally intense parts of inference and low-power central processing units (CPUs) for the less-demanding aspects. This work recently won the best paper award at the International ACM Symposium on High-Performance Parallel and Distributed Computing.

    Using this optimizer can decrease energy use by 10-20 percent while still meeting the same “quality-of-service target” (how quickly the model can respond).

    This tool is especially helpful for cloud customers, who lease systems from data centers and must select hardware from among thousands of options. “Most customers overestimate what they need; they choose over-capable hardware just because they don’t know any better,” Gadepally says.

    Growing green-computing awareness

    The energy saved by implementing these interventions also reduces the associated costs of developing AI, often by a one-to-one ratio. In fact, cost is usually used as a proxy for energy consumption. Given these savings, why aren’t more data centers investing in green techniques?

    “I think it’s a bit of an incentive-misalignment problem,” Samsi says. “There’s been such a race to build bigger and better models that almost every secondary consideration has been put aside.”

    They point out that while some data centers buy renewable-energy credits, these renewables aren’t enough to cover the growing energy demands. The majority of electricity powering data centers comes from fossil fuels, and water used for cooling is contributing to stressed watersheds. 

    Hesitancy may also exist because systematic studies on energy-saving techniques haven’t been conducted. That’s why the team has been pushing their research in peer-reviewed venues in addition to open-source repositories. Some big industry players, like Google DeepMind, have applied machine learning to increase data center efficiency but have not made their work available for others to deploy or replicate. 

    Top AI conferences are now pushing for ethics statements that consider how AI could be misused. The team sees the climate aspect as an AI ethics topic that has not yet been given much attention, but this also appears to be slowly changing. Some researchers are now disclosing the carbon footprint of training the latest models, and industry is showing a shift in energy transparency too, as in this recent report from Meta AI.

    They also acknowledge that transparency is difficult without tools that can show AI developers their consumption. Reporting is on the LLSC roadmap for this year. They want to be able to show every LLSC user, for every job, how much energy they consume and how this amount compares to others, similar to home energy reports.

    Part of this effort requires working more closely with hardware manufacturers to make getting these data off hardware easier and more accurate. If manufacturers can standardize the way the data are read out, then energy-saving and reporting tools can be applied across different hardware platforms. A collaboration is underway between the LLSC researchers and Intel to work on this very problem.

    Even for AI developers who are aware of the intense energy needs of AI, they can’t do much on their own to curb this energy use. The LLSC team wants to help other data centers apply these interventions and provide users with energy-aware options. Their first partnership is with the U.S. Air Force, a sponsor of this research, which operates thousands of data centers. Applying these techniques can make a significant dent in their energy consumption and cost.

    “We’re putting control into the hands of AI developers who want to lessen their footprint,” Gadepally says. “Do I really need to gratuitously train unpromising models? Am I willing to run my GPUs slower to save energy? To our knowledge, no other supercomputing center is letting you consider these options. Using our tools, today, you get to decide.”

    Visit this webpage to see the group’s publications related to energy-aware computing and findings described in this article. More

  • in

    A new dataset of Arctic images will spur artificial intelligence research

    As the U.S. Coast Guard (USCG) icebreaker Healy takes part in a voyage across the North Pole this summer, it is capturing images of the Arctic to further the study of this rapidly changing region. Lincoln Laboratory researchers installed a camera system aboard the Healy while at port in Seattle before it embarked on a three-month science mission on July 11. The resulting dataset, which will be one of the first of its kind, will be used to develop artificial intelligence tools that can analyze Arctic imagery.

    “This dataset not only can help mariners navigate more safely and operate more efficiently, but also help protect our nation by providing critical maritime domain awareness and an improved understanding of how AI analysis can be brought to bear in this challenging and unique environment,” says Jo Kurucar, a researcher in Lincoln Laboratory’s AI Software Architectures and Algorithms Group, which led this project.

    As the planet warms and sea ice melts, Arctic passages are opening up to more traffic, both to military vessels and ships conducting illegal fishing. These movements may pose national security challenges to the United States. The opening Arctic also leaves questions about how its climate, wildlife, and geography are changing.

    Today, very few imagery datasets of the Arctic exist to study these changes. Overhead images from satellites or aircraft can only provide limited information about the environment. An outward-looking camera attached to a ship can capture more details of the setting and different angles of objects, such as other ships, in the scene. These types of images can then be used to train AI computer-vision tools, which can help the USCG plan naval missions and automate analysis. According to Kurucar, USCG assets in the Arctic are spread thin and can benefit greatly from AI tools, which can act as a force multiplier.

    The Healy is the USCG’s largest and most technologically advanced icebreaker. Given its current mission, it was a fitting candidate to be equipped with a new sensor to gather this dataset. The laboratory research team collaborated with the USCG Research and Development Center to determine the sensor requirements. Together, they developed the Cold Region Imaging and Surveillance Platform (CRISP).

    “Lincoln Laboratory has an excellent relationship with the Coast Guard, especially with the Research and Development Center. Over a decade, we’ve established ties that enabled the deployment of the CRISP system,” says Amna Greaves, the CRISP project lead and an assistant leader in the AI Software Architectures and Algorithms Group. “We have strong ties not only because of the USCG veterans working at the laboratory and in our group, but also because our technology missions are complementary. Today it was deploying infrared sensing in the Arctic; tomorrow it could be operating quadruped robot dogs on a fast-response cutter.”

    The CRISP system comprises a long-wave infrared camera, manufactured by Teledyne FLIR (for forward-looking infrared), that is designed for harsh maritime environments. The camera can stabilize itself during rough seas and image in complete darkness, fog, and glare. It is paired with a GPS-enabled time-synchronized clock and a network video recorder to record both video and still imagery along with GPS-positional data.  

    The camera is mounted at the front of the ship’s fly bridge, and the electronics are housed in a ruggedized rack on the bridge. The system can be operated manually from the bridge or be placed into an autonomous surveillance mode, in which it slowly pans back and forth, recording 15 minutes of video every three hours and a still image once every 15 seconds.

    “The installation of the equipment was a unique and fun experience. As with any good project, our expectations going into the install did not meet reality,” says Michael Emily, the project’s IT systems administrator who traveled to Seattle for the install. Working with the ship’s crew, the laboratory team had to quickly adjust their route for running cables from the camera to the observation station after they discovered that the expected access points weren’t in fact accessible. “We had 100-foot cables made for this project just in case of this type of scenario, which was a good thing because we only had a few inches to spare,” Emily says.

    The CRISP project team plans to publicly release the dataset, anticipated to be about 4 terabytes in size, once the USCG science mission concludes in the fall.

    The goal in releasing the dataset is to enable the wider research community to develop better tools for those operating in the Arctic, especially as this region becomes more navigable. “Collecting and publishing the data allows for faster and greater progress than what we could accomplish on our own,” Kurucar adds. “It also enables the laboratory to engage in more advanced AI applications while others make more incremental advances using the dataset.”

    On top of providing the dataset, the laboratory team plans to provide a baseline object-detection model, from which others can make progress on their own models. More advanced AI applications planned for development are classifiers for specific objects in the scene and the ability to identify and track objects across images.

    Beyond assisting with USCG missions, this project could create an influential dataset for researchers looking to apply AI to data from the Arctic to help combat climate change, says Paul Metzger, who leads the AI Software Architectures and Algorithms Group.

    Metzger adds that the group was honored to be a part of this project and is excited to see the advances that come from applying AI to novel challenges facing the United States: “I’m extremely proud of how our group applies AI to the highest-priority challenges in our nation, from predicting outbreaks of Covid-19 and assisting the U.S. European Command in their support of Ukraine to now employing AI in the Arctic for maritime awareness.”

    Once the dataset is available, it will be free to download on the Lincoln Laboratory dataset website. More

  • in

    System tracks movement of food through global humanitarian supply chain

    Although more than enough food is produced to feed everyone in the world, as many as 828 million people face hunger today. Poverty, social inequity, climate change, natural disasters, and political conflicts all contribute to inhibiting access to food. For decades, the U.S. Agency for International Development (USAID) Bureau for Humanitarian Assistance (BHA) has been a leader in global food assistance, supplying millions of metric tons of food to recipients worldwide. Alleviating hunger — and the conflict and instability hunger causes — is critical to U.S. national security.

    But BHA is only one player within a large, complex supply chain in which food gets handed off between more than 100 partner organizations before reaching its final destination. Traditionally, the movement of food through the supply chain has been a black-box operation, with stakeholders largely out of the loop about what happens to the food once it leaves their custody. This lack of direct visibility into operations is due to siloed data repositories, insufficient data sharing among stakeholders, and different data formats that operators must manually sort through and standardize. As a result, accurate, real-time information — such as where food shipments are at any given time, which shipments are affected by delays or food recalls, and when shipments have arrived at their final destination — is lacking. A centralized system capable of tracing food along its entire journey, from manufacture through delivery, would enable a more effective humanitarian response to food-aid needs.

    In 2020, a team from MIT Lincoln Laboratory began engaging with BHA to create an intelligent dashboard for their supply-chain operations. This dashboard brings together the expansive food-aid datasets from BHA’s existing systems into a single platform, with tools for visualizing and analyzing the data. When the team started developing the dashboard, they quickly realized the need for considerably more data than BHA had access to.

    “That’s where traceability comes in, with each handoff partner contributing key pieces of information as food moves through the supply chain,” explains Megan Richardson, a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group.

    Richardson and the rest of the team have been working with BHA and their partners to scope, build, and implement such an end-to-end traceability system. This system consists of serialized, unique identifiers (IDs) — akin to fingerprints — that are assigned to individual food items at the time they are produced. These individual IDs remain linked to items as they are aggregated along the supply chain, first domestically and then internationally. For example, individually tagged cans of vegetable oil get packaged into cartons; cartons are placed onto pallets and transported via railway and truck to warehouses; pallets are loaded onto shipping containers at U.S. ports; and pallets are unloaded and cartons are unpackaged overseas.

    With a trace

    Today, visibility at the single-item level doesn’t exist. Most suppliers mark pallets with a lot number (a lot is a batch of items produced in the same run), but this is for internal purposes (i.e., to track issues stemming back to their production supply, like over-enriched ingredients or machinery malfunction), not data sharing. So, organizations know which supplier lot a pallet and carton are associated with, but they can’t track the unique history of an individual carton or item within that pallet. As the lots move further downstream toward their final destination, they are often mixed with lots from other productions, and possibly other commodity types altogether, because of space constraints. On the international side, such mixing and the lack of granularity make it difficult to quickly pull commodities out of the supply chain if food safety concerns arise. Current response times can span several months.

    “Commodities are grouped differently at different stages of the supply chain, so it is logical to track them in those groupings where needed,” Richardson says. “Our item-level granularity serves as a form of Rosetta Stone to enable stakeholders to efficiently communicate throughout these stages. We’re trying to enable a way to track not only the movement of commodities, including through their lot information, but also any problems arising independent of lot, like exposure to high humidity levels in a warehouse. Right now, we have no way to associate commodities with histories that may have resulted in an issue.”

    “You can now track your checked luggage across the world and the fish on your dinner plate,” adds Brice MacLaren, also a researcher in the laboratory’s Humanitarian Assistance and Disaster Relief Systems Group. “So, this technology isn’t new, but it’s new to BHA as they evolve their methodology for commodity tracing. The traceability system needs to be versatile, working across a wide variety of operators who take custody of the commodity along the supply chain and fitting into their existing best practices.”

    As food products make their way through the supply chain, operators at each receiving point would be able to scan these IDs via a Lincoln Laboratory-developed mobile application (app) to indicate a product’s current location and transaction status — for example, that it is en route on a particular shipping container or stored in a certain warehouse. This information would get uploaded to a secure traceability server. By scanning a product, operators would also see its history up until that point.   

    Hitting the mark

    At the laboratory, the team tested the feasibility of their traceability technology, exploring different ways to mark and scan items. In their testing, they considered barcodes and radio-frequency identification (RFID) tags and handheld and fixed scanners. Their analysis revealed 2D barcodes (specifically data matrices) and smartphone-based scanners were the most feasible options in terms of how the technology works and how it fits into existing operations and infrastructure.

    “We needed to come up with a solution that would be practical and sustainable in the field,” MacLaren says. “While scanners can automatically read any RFID tags in close proximity as someone is walking by, they can’t discriminate exactly where the tags are coming from. RFID is expensive, and it’s hard to read commodities in bulk. On the other hand, a phone can scan a barcode on a particular box and tell you that code goes with that box. The challenge then becomes figuring out how to present the codes for people to easily scan without significantly interrupting their usual processes for handling and moving commodities.” 

    As the team learned from partner representatives in Kenya and Djibouti, offloading at the ports is a chaotic, fast operation. At manual warehouses, porters fling bags over their shoulders or stack cartons atop their heads any which way they can and run them to a drop point; at bagging terminals, commodities come down a conveyor belt and land this way or that way. With this variability comes several questions: How many barcodes do you need on an item? Where should they be placed? What size should they be? What will they cost? The laboratory team is considering these questions, keeping in mind that the answers will vary depending on the type of commodity; vegetable oil cartons will have different specifications than, say, 50-kilogram bags of wheat or peas.

    Leaving a mark

    Leveraging results from their testing and insights from international partners, the team has been running a traceability pilot evaluating how their proposed system meshes with real-world domestic and international operations. The current pilot features a domestic component in Houston, Texas, and an international component in Ethiopia, and focuses on tracking individual cartons of vegetable oil and identifying damaged cans. The Ethiopian team with Catholic Relief Services recently received a container filled with pallets of uniquely barcoded cartons of vegetable oil cans (in the next pilot, the cans will be barcoded, too). They are now scanning items and collecting data on product damage by using smartphones with the laboratory-developed mobile traceability app on which they were trained. 

    “The partners in Ethiopia are comparing a couple lid types to determine whether some are more resilient than others,” Richardson says. “With the app — which is designed to scan commodities, collect transaction data, and keep history — the partners can take pictures of damaged cans and see if a trend with the lid type emerges.”

    Next, the team will run a series of pilots with the World Food Program (WFP), the world’s largest humanitarian organization. The first pilot will focus on data connectivity and interoperability, and the team will engage with suppliers to directly print barcodes on individual commodities instead of applying barcode labels to packaging, as they did in the initial feasibility testing. The WFP will provide input on which of their operations are best suited for testing the traceability system, considering factors like the network bandwidth of WFP staff and local partners, the commodity types being distributed, and the country context for scanning. The BHA will likely also prioritize locations for system testing.

    “Our goal is to provide an infrastructure to enable as close to real-time data exchange as possible between all parties, given intermittent power and connectivity in these environments,” MacLaren says.

    In subsequent pilots, the team will try to integrate their approach with existing systems that partners rely on for tracking procurements, inventory, and movement of commodities under their custody so that this information is automatically pushed to the traceability server. The team also hopes to add a capability for real-time alerting of statuses, like the departure and arrival of commodities at a port or the exposure of unclaimed commodities to the elements. Real-time alerts would enable stakeholders to more efficiently respond to food-safety events. Currently, partners are forced to take a conservative approach, pulling out more commodities from the supply chain than are actually suspect, to reduce risk of harm. Both BHA and WHP are interested in testing out a food-safety event during one of the pilots to see how the traceability system works in enabling rapid communication response.

    To implement this technology at scale will require some standardization for marking different commodity types as well as give and take among the partners on best practices for handling commodities. It will also require an understanding of country regulations and partner interactions with subcontractors, government entities, and other stakeholders.

    “Within several years, I think it’s possible for BHA to use our system to mark and trace all their food procured in the United States and sent internationally,” MacLaren says.

    Once collected, the trove of traceability data could be harnessed for other purposes, among them analyzing historical trends, predicting future demand, and assessing the carbon footprint of commodity transport. In the future, a similar traceability system could scale for nonfood items, including medical supplies distributed to disaster victims, resources like generators and water trucks localized in emergency-response scenarios, and vaccines administered during pandemics. Several groups at the laboratory are also interested in such a system to track items such as tools deployed in space or equipment people carry through different operational environments.

    “When we first started this program, colleagues were asking why the laboratory was involved in simple tasks like making a dashboard, marking items with barcodes, and using hand scanners,” MacLaren says. “Our impact here isn’t about the technology; it’s about providing a strategy for coordinated food-aid response and successfully implementing that strategy. Most importantly, it’s about people getting fed.” More

  • in

    Preparing Colombia’s cities for life amid changing forests

    It was an uncharacteristically sunny morning as Marcela Angel MCP ’18, flanked by a drone pilot from the Boston engineering firm AirWorks and a data collection team from the Colombian regional environmental agency Corpoamazonia, climbed a hill in the Andes Mountains of southwest Colombia. The area’s usual mountain cloud cover — one of the major challenges to working with satellite imagery or flying UAVs (unpiloted aerial vehicles, or drones) in the Pacific highlands of the Amazon — would roll through in the hours to come. But for now, her team had chosen a good day to hike out for their first flight. Angel is used to long travel for her research. Raised in Bogotá, she maintained strong ties to Colombia throughout her master’s program in the MIT Department of Urban Studies and Planning (DUSP). Her graduate thesis, examining Bogotá’s management of its public green space, took her regularly back to her hometown, exploring how the city could offer residents more equal access to the clean air, flood protection and day-to-day health and social benefits provided by parks and trees. But the hill she was hiking this morning, outside the remote city of Mocoa, had taken an especially long time to climb: five years building relationships with the community of Mocoa and the Colombian government, recruiting project partners, and navigating the bureaucracy of bringing UAVs into the country. Now, her team finally unwrapped their first, knee-high drone from its tarp and set it carefully in the grass. Under the gathering gray clouds, the buzz of its rotors joined the hum of insects in the trees, and the machine at last took to the skies.

    From Colombia to Cambridge

    “I actually grew up on the last street before the eastern mountains reserve,” Angel says of her childhood in Bogotá. “I’ve always been at that border between city and nature.” This idea, that urban areas are married to the ecosystems around them, would inform Angel’s whole education and career. Before coming to MIT, she studied architecture at Bogotá’s Los Andes University; for her graduation project she proposed a plan to resettle an informal neighborhood on Bogotá’s outskirts to minimize environmental risks to its residents. Among her projects at MIT was an initiative to spatially analyze Bogotá’s tree canopy, providing data for the city to plan a tree-planting program as a strategy to give vulnerable populations in the city more access to nature. And she was naturally intrigued when Colombia’s former minister of environment and sustainable development came to MIT in 2017 to give a guest presentation to the DUSP master’s program. The minister, Luis Gilberto Murillo (now the Colombian ambassador to the United States), introduced the students to the challenges triggered by a recent disaster in the city of Mocoa, on the border between the lowland Amazon and the Andes Mountains. Unprecedented rainstorms had destabilized the surrounding forests, and that April a devastating flood and landslide had killed hundreds of people and destroyed entire neighborhoods. And as climate change contributed to growing rainfall in the region, the risks of more landslide events were rising. Murillo provided useful insights into how city planning decisions had contributed to the crisis. But he also asked for MIT’s support addressing future landslide risks in the area. Angel and Juan Camilo Osorio, a PhD candidate at DUSP, decided to take up the challenge, and in January 2018 and 2019, a research delegation from MIT traveled to Colombia for a newly-created graduate course. Returning once again to Bogotá, Angel interviewed government agencies and nonprofits to understand the state of landslide monitoring and public policy. In Mocoa, further interviews and a series of workshops helped clarify what locals needed most and what MIT could provide: better information on where and when landslides might strike, and a process to increase risk awareness and involve traditionally marginalized groups in decision-making processes around that risk. Over the coming year, a core team formed to put the insights from this trip into action, including Angel, Osorio, postdoc Norhan Bayomi of the MIT Environmental Solutions Initiative (ESI) and MIT Professor John Fernández, director of the ESI and one of Angel’s mentors at DUSP. After a second visit to Mocoa that brought into the fold Indigenous groups, environmental agencies, and the national army, a plan was formed: MIT would partner with Corpoamazonia and build a network of community researchers to deploy and test drone technology and machine learning models to monitor the mountain forests for both landslide risks and signs of forest health, while implementing a participatory planning process with residents. “What our projects aim to do is give the communities new tools to continue protecting and restoring the forest,” says Angel, “and support new and inclusive development models, even in the face of new challenges.”

    Lifelines for the climate

    The goal of tropical forest conservation is an urgent one. As forests are cut down, their trees and soils release carbon they have stored over millennia, adding huge amounts of heat-trapping carbon dioxide to the atmosphere. Deforestation, mainly in the tropics, is now estimated to contribute more to climate change than any country besides the United States and China — and once lost, tropical forests are exceptionally hard to restore. “Tropical forests should be a natural way to slow and reverse climate change,” says Angel. “And they can be. But today, we are reaching critical tipping points where it is just the opposite.” This became the motivating force for Angel’s career after her graduation. In 2019, Fernández invited her to join the ESI and lead a new Natural Climate Solutions Program, with the Mocoa project as its first centerpiece. She quickly mobilized the partners to raise funding for the project from the Global Environmental Facility and the CAF Development Bank of Latin America and the Caribbean, and recruited additional partners including MIT Lincoln Laboratories, AirWorks, and the Pratt Institute, where Osorio had become an assistant professor. She hired machine learning specialists from MIT to begin design on UAVs’ data processing, and helped assemble a local research network in Mocoa to increase risk awareness, promote community participation, and better understand what information city officials and community groups needed for city planning and conservation. “This is the amazing thing about MIT,” she says. “When you study a problem here, you’re not just playing in a sandbox. Everyone I’ve worked with is motivated by the complexity of the technical challenge and the opportunity for meaningful engagement in Mocoa, and hopefully in many more places besides.” At the same time, Angel created opportunities for the next generation of MIT graduate students to follow in her footsteps. With Fernández and Bayomi, she created a new course, 4.S23 (Biodiversity and Cities), in which students traveled to Colombia to develop urban planning strategies for the cities of Quidbó and Leticia, located in carbon-rich and biodiverse areas. The course has been taught twice, with Professor Gabriella Carolini joining the teaching team for spring 2023, and has already led to a student report to city officials in Quidbó recommending ways to enhance biodiversity and adapt to climate change as the city grows, a multi-stakeholder partnership to train local youth and implement a citizen-led biodiversity survey, and a seed grant from the MIT Climate and Sustainability Consortium to begin providing both cities detailed data on their tree cover derived from satellite images. “These regions face serious threats, especially on a warming planet, but many of the solutions for climate change, biodiversity conservation, and environmental equity in the region go hand-in-hand,” Angel says. “When you design a city to use fewer resources, to contribute less to climate change, it also causes less pressure on the environment around it. When you design a city for equity and quality of life, you’re giving attention to its green spaces and what they can provide for people and as habitat for other species. When you protect and restore forests, you’re protecting local bioeconomies.”

    Bringing the data home

    Meanwhile, in Mocoa, Angel’s original vision is taking flight. With the team’s test flights behind them, they can now begin creating digital models of the surrounding area. Regular drone flights and soil samples will fill in changing information about trees, water, and local geology, allowing the project’s machine learning specialists to identify warning signs for future landslides and extreme weather events. More importantly, there is now an established network of local community researchers and leaders ready to make use of this information. With feedback from their Mocoan partners, Angel’s team has built a prototype of the online platform they will use to share their UAV data; they’re now letting Mocoa residents take it for a test drive and suggest how it can be made more user-friendly. Her visit this January also paved the way for new projects that will tie the Environmental Solutions Initiative more tightly to Mocoa. With her project partners, Angel is exploring developing a course to teach local students how to use UAVs like the ones her team is flying. She is also considering expanded efforts to collect the kind of informal knowledge of Mocoa, on the local ecology and culture, that people everywhere use in making their city planning and emergency response decisions, but that is rarely codified and included in scientific risk analyses. It’s a great deal of work to offer this one community the tools to adapt successfully to climate change. But even with all the robotics and machine learning models in the world, this close, slow-unfolding engagement, grounded in trust and community inclusion, is what it takes to truly prepare people to confront profound changes in their city and environment. “Protecting natural carbon sinks is a global socio-environmental challenge, and one where it is not enough for MIT to just contribute to the knowledge base or develop a new technology,” says Angel. “But we can help mobilize decision-makers and nontraditional actors, and design more inclusive and technology-enhanced processes, to make this easier for the people who have lifelong stakes in these ecosystems. That is the vision.” More

  • in

    Detailed images from space offer clearer picture of drought effects on plants

    “MIT is a place where dreams come true,” says César Terrer, an assistant professor in the Department of Civil and Environmental Engineering. Here at MIT, Terrer says he’s given the resources needed to explore ideas he finds most exciting, and at the top of his list is climate science. In particular, he is interested in plant-soil interactions, and how the two can mitigate impacts of climate change. In 2022, Terrer received seed grant funding from the Abdul Latif Jameel Water and Food Systems Lab (J-WAFS) to produce drought monitoring systems for farmers. The project is leveraging a new generation of remote sensing devices to provide high-resolution plant water stress at regional to global scales.

    Growing up in Granada, Spain, Terrer always had an aptitude and passion for science. He studied environmental science at the University of Murcia, where he interned in the Department of Ecology. Using computational analysis tools, he worked on modeling species distribution in response to human development. Early on in his undergraduate experience, Terrer says he regarded his professors as “superheroes” with a kind of scholarly prowess. He knew he wanted to follow in their footsteps by one day working as a faculty member in academia. Of course, there would be many steps along the way before achieving that dream. 

    Upon completing his undergraduate studies, Terrer set his sights on exciting and adventurous research roles. He thought perhaps he would conduct field work in the Amazon, engaging with native communities. But when the opportunity arose to work in Australia on a state-of-the-art climate change experiment that simulates future levels of carbon dioxide, he headed south to study how plants react to CO2 in a biome of native Australian eucalyptus trees. It was during this experience that Terrer started to take a keen interest in the carbon cycle and the capacity of ecosystems to buffer rising levels of CO2 caused by human activity.

    Around 2014, he began to delve deeper into the carbon cycle as he began his doctoral studies at Imperial College London. The primary question Terrer sought to answer during his PhD was “will plants be able to absorb predicted future levels of CO2 in the atmosphere?” To answer the question, Terrer became an early adopter of artificial intelligence, machine learning, and remote sensing to analyze data from real-life, global climate change experiments. His findings from these “ground truth” values and observations resulted in a paper in the journal Science. In it, he claimed that climate models most likely overestimated how much carbon plants will be able to absorb by the end of the century, by a factor of three. 

    After postdoctoral positions at Stanford University and the Universitat Autonoma de Barcelona, followed by a prestigious Lawrence Fellowship, Terrer says he had “too many ideas and not enough time to accomplish all those ideas.” He knew it was time to lead his own group. Not long after applying for faculty positions, he landed at MIT. 

    New ways to monitor drought

    Terrer is employing similar methods to those he used during his PhD to analyze data from all over the world for his J-WAFS project. He and postdoc Wenzhe Jiao collect data from remote sensing satellites and field experiments and use machine learning to come up with new ways to monitor drought. Terrer says Jiao is a “remote sensing wizard,” who fuses data from different satellite products to understand the water cycle. With Jiao’s hydrology expertise and Terrer’s knowledge of plants, soil, and the carbon cycle, the duo is a formidable team to tackle this project.

    According to the U.N. World Meteorological Organization, the number and duration of droughts has increased by 29 percent since 2000, as compared to the two previous decades. From the Horn of Africa to the Western United States, drought is devastating vegetation and severely stressing water supplies, compromising food production and spiking food insecurity. Drought monitoring can offer fundamental information on drought location, frequency, and severity, but assessing the impact of drought on vegetation is extremely challenging. This is because plants’ sensitivity to water deficits varies across species and ecosystems. 

    Terrer and Jiao are able to obtain a clearer picture of how drought is affecting plants by employing the latest generation of remote sensing observations, which offer images of the planet with incredible spatial and temporal resolution. Satellite products such as Sentinel, Landsat, and Planet can provide daily images from space with such high resolution that individual trees can be discerned. Along with the images and datasets from satellites, the team is using ground-based observations from meteorological data. They are also using the MIT SuperCloud at MIT Lincoln Laboratory to process and analyze all of the data sets. The J-WAFS project is among one of the first to leverage high-resolution data to quantitatively measure plant drought impacts in the United States with the hopes of expanding to a global assessment in the future.

    Assisting farmers and resource managers 

    Every week, the U.S. Drought Monitor provides a map of drought conditions in the United States. The map has zero resolution and is more of a drought recap or summary, unable to predict future drought scenarios. The lack of a comprehensive spatiotemporal evaluation of historic and future drought impacts on global vegetation productivity is detrimental to farmers both in the United States and worldwide.  

    Terrer and Jiao plan to generate metrics for plant water stress at an unprecedented resolution of 10-30 meters. This means that they will be able to provide drought monitoring maps at the scale of a typical U.S. farm, giving farmers more precise, useful data every one to two days. The team will use the information from the satellites to monitor plant growth and soil moisture, as well as the time lag of plant growth response to soil moisture. In this way, Terrer and Jiao say they will eventually be able to create a kind of “plant water stress forecast” that may be able to predict adverse impacts of drought four weeks in advance. “According to the current soil moisture and lagged response time, we hope to predict plant water stress in the future,” says Jiao. 

    The expected outcomes of this project will give farmers, land and water resource managers, and decision-makers more accurate data at the farm-specific level, allowing for better drought preparation, mitigation, and adaptation. “We expect to make our data open-access online, after we finish the project, so that farmers and other stakeholders can use the maps as tools,” says Jiao. 

    Terrer adds that the project “has the potential to help us better understand the future states of climate systems, and also identify the regional hot spots more likely to experience water crises at the national, state, local, and tribal government scales.” He also expects the project will enhance our understanding of global carbon-water-energy cycle responses to drought, with applications in determining climate change impacts on natural ecosystems as a whole. More

  • in

    New nanosatellite tests autonomy in space

    In May 2022, a SpaceX Falcon 9 rocket launched the Transporter-5 mission into orbit. The mission contained a collection of micro and nanosatellites from both industry and government, including one from MIT Lincoln Laboratory called the Agile MicroSat (AMS).

    AMS’s primary mission is to test automated maneuvering capabilities in the tumultuous very low-Earth orbit (VLEO) environment, starting at 525 kilometers above the surface and lowering down. VLEO is a challenging location for satellites because the higher air density, coupled with variable space weather, causes increased and unpredictable drag that requires frequent maneuvers to maintain position. Using a commercial off-the-shelf electric-ion propulsion system and custom algorithms, AMS is testing how well it can execute automated navigation and control over an initial mission period of six months.

    “AMS integrates electric propulsion and autonomous navigation and guidance control algorithms that push a lot of the operation of the thruster onto the spacecraft — somewhat like a self-driving car,” says Andrew Stimac, who is the principal investigator for the AMS program and the leader of the laboratory’s Integrated Systems and Concepts Group.

    Stimac sees AMS as a kind of pathfinder mission for the field of small satellite autonomy. Autonomy is essential to support the growing number of small satellite launches for industry and science because it can reduce the cost and labor needed to maintain them, enable missions that call for quick and impromptu responses, and help to avoid collisions in an already-crowded sky.

    AMS is the first-ever test of a nanosatellite with this type of automated maneuvering capability.

    AMS uses an electric propulsion thruster that was selected to meet the size and power constraints of a nanosatellite while providing enough thrust and endurance to enable multiyear missions that operate in VLEO. The flight software, called the Bus Hosted Onboard Software Suite, was designed to autonomously operate the thruster to change the spacecraft’s orbit. Operators on the ground can give AMS a high-level command, such as to descend to and maintain a 300-kilometer orbit, and the software will schedule thruster burns to achieve that command autonomously, using measurements from the onboard GPS receiver as feedback. This experimental software is separate from the bus flight software, which allows AMS to safely test its novel algorithms without endangering the spacecraft.

    “One of the enablers for AMS is the way in which we’ve created this software sandbox onboard the spacecraft,” says Robert Legge, who is another member of the AMS team. “We have our own hosted software that’s running on the primary flight computer, but it’s separate from the critical health and safety avionics software. Basically, you can view this as being a little development environment on the spacecraft where we can test out different algorithms.”

    AMS has two secondary missions called Camera and Beacon. Camera’s mission is to take photos and short video clips of the Earth’s surface while AMS is in different low-Earth orbit positions.

    “One of the things we’re hoping to demonstrate is the ability to respond to current events,” says Rebecca Keenan, who helped to prepare the Camera payload. “We could hear about something that happened, like a fire or flood, and then respond pretty quickly to maneuver the satellite to image it.”

    Keenan and the rest of the AMS team are collaborating with the laboratory’s DisasterSat program, which aims to improve satellite image processing pipelines to help relief agencies respond to disasters more quickly. Small satellites that could schedule operations on-demand, rather than planning them months in advance before launch, could be a great asset to disaster response efforts.

    The other payload, Beacon, is testing new adaptive optics capabilities for tracking fast-moving targets by sending laser light from the moving satellite to a ground station at the laboratory’s Haystack Observatory in Westford, Massachusetts. Enabling precise laser pointing from an agile satellite could aid many different types of space missions, such as communications and tracking space debris. It could also be used for emerging programs such as Breakthrough Starshot, which is developing a satellite that can accelerate to high speeds using a laser-propelled lightsail.

    “As far as we know, this is the first on-orbit artificial guide star that has launched for a dedicated adaptive optics purpose,” says Lulu Liu, who worked on the Beacon payload. “Theoretically, the laser it carries can be maneuvered into position on other spacecraft to support a large number of science missions in different regions of the sky.”

    The team developed Beacon with a strict budget and timeline and hope that its success will shorten the design and test loop of next-generation laser transmitter systems. “The idea is that we could have a number of these flying in the sky at once, and a ground system can point to one of them and get near-real-time feedback on its performance,” says Liu.

    AMS weighs under 12 kilograms with 6U dimensions (23 x 11 x 36 centimeters). The bus was designed by Blue Canyon Technologies and the thruster was designed by Enpulsion GmbH.

    Legge says that the AMS program was approached as an opportunity for Lincoln Laboratory to showcase its ability to conduct work in the space domain quickly and flexibly. Some major roadblocks to rapid development of new space technology have been long timelines, high costs, and the extremely low risk tolerance associated with traditional space programs. “We wanted to show that we can really do rapid prototyping and testing of space hardware and software on orbit at an affordable cost,” Legge says.

    “AMS shows the value and fast time-to-orbit afforded by teaming with rapid space commercial partners for spacecraft core bus technologies and launch and ground segment operations, while allowing the laboratory to focus on innovative mission concepts, advanced components and payloads, and algorithms and processing software,” says Dan Cousins, who is the program manager for AMS. “The AMS team appreciates the support from the laboratory’s Technology Office for allowing us to showcase an effective operating model for rapid space programs.”

    AMS took its first image on June 1, completed its thruster commissioning in July, and has begun to descend toward its target VLEO position. More

  • in

    New materials could enable longer-lasting implantable batteries

    For the last few decades, battery research has largely focused on rechargeable lithium-ion batteries, which are used in everything from electric cars to portable electronics and have improved dramatically in terms of affordability and capacity. But nonrechargeable batteries have seen little improvement during that time, despite their crucial role in many important uses such as implantable medical devices like pacemakers.

    Now, researchers at MIT have come up with a way to improve the energy density of these nonrechargeable, or “primary,” batteries. They say it could enable up to a 50 percent increase in useful lifetime, or a corresponding decrease in size and weight for a given amount of power or energy capacity, while also improving safety, with little or no increase in cost.

    The new findings, which involve substituting the conventionally inactive battery electrolyte with a material that is active for energy delivery, are reported today in the journal Proceedings of the National Academy of Sciences, in a paper by MIT Kavanaugh Postdoctoral Fellow Haining Gao, graduate student Alejandro Sevilla, associate professor of mechanical engineering Betar Gallant, and four others at MIT and Caltech.

    Replacing the battery in a pacemaker or other medical implant requires a surgical procedure, so any increase in the longevity of their batteries could have a significant impact on the patient’s quality of life, Gallant says. Primary batteries are used for such essential applications because they can provide about three times as much energy for a given size and weight as rechargeable batteries.

    That difference in capacity, Gao says, makes primary batteries “critical for applications where charging is not possible or is impractical.” The new materials work at human body temperature, so would be suitable for medical implants. In addition to implantable devices, with further development to make the batteries operate efficiently at cooler temperatures, applications could also include sensors in tracking devices for shipments, for example to ensure that temperature and humidity requirements for food or drug shipments are properly maintained throughout the shipping process. Or, they might be used in remotely operated aerial or underwater vehicles that need to remain ready for deployment over long periods.

    Pacemaker batteries typically last from five to 10 years, and even less if they require high-voltage functions such as defibrillation. Yet for such batteries, Gao says, the technology is considered mature, and “there haven’t been any major innovations in fundamental cell chemistries in the past 40 years.”

    The key to the team’s innovation is a new kind of electrolyte — the material that lies between the two electrical poles of the battery, the cathode and the anode, and allows charge carriers to pass through from one side to the other. Using a new liquid fluorinated compound, the team found that they could combine some of the functions of the cathode and the electrolyte in one compound, called a catholyte. This allows for saving much of the weight of typical primary batteries, Gao says.

    While there are other materials besides this new compound that could theoretically function in a similar catholyte role in a high-capacity battery, Gallant explains, those materials have lower inherent voltages that do not match those of the remainder of the material in a conventional pacemaker battery, a type known as CFx. Because the overall output from the battery can’t be more than that of the lesser of the two electrode materials,  the extra capacity would go to waste because of the voltage mismatch. But with the new material, “one of the key merits of our fluorinated liquids is that their voltage aligns very well with that of CFx,” Gallant says.

    In a conventional  CFx battery, the liquid electrolyte is essential because it allows charged particles to pass through from one electrode to the other. But “those electrolytes are actually chemically inactive, so they’re basically dead weight,” Gao says. This means about 50 percent of the battery’s key components, mainly the electrolyte, is inactive material. But in the new design with the fluorinated catholyte material, the amount of dead weight can be reduced to about 20 percent, she says.

    The new cells also provide safety improvements over other kinds of proposed chemistries that would use toxic and corrosive catholyte materials, which their formula does not, Gallant says. And preliminary tests have demonstrated a stable shelf life over more than a year, an important characteristic for primary batteries, she says.

    So far, the team has not yet experimentally achieved the full 50 percent improvement in energy density predicted by their analysis. They have demonstrated a 20 percent improvement, which in itself would be an important gain for some applications, Gallant says. The design of the cell itself has not yet been fully optimized, but the researchers can project the cell performance based on the performance of the active material itself. “We can see the projected cell-level performance when it’s scaled up can reach around 50 percent higher than the CFx cell,” she says. Achieving that level experimentally is the team’s next goal.

    Sevilla, a doctoral student in the mechanical engineering department, will be focusing on that work in the coming year. “I was brought into this project to try to understand some of the limitations of why we haven’t been able to attain the full energy density possible,” he says. “My role has been trying to fill in the gaps in terms of understanding the underlying reaction.”

    One big advantage of the new material, Gao says, is that it can easily be integrated into existing battery manufacturing processes, as a simple substitution of one material for another. Preliminary discussions with manufacturers confirm this potentially easy substitution, Gao says. The basic starting material, used for other purposes, has already been scaled up for production, she says, and its price is comparable to that of the materials currently used in CFx batteries. The cost of batteries using the new material is likely to be comparable to the existing batteries as well, she says. The team has already applied for a patent on the catholyte, and they expect that the medical applications are likely to be the first to be commercialized, perhaps with a full-scale prototype ready for testing in real devices within about a year.

    Further down the road, other applications could likely take advantage of the new materials as well, such as smart water or gas meters that can be read out remotely, or devices like EZPass transponders, increasing their usable lifetime, the researchers say. Power for drone aircraft or undersea vehicles would require higher power and so may take longer to be developed. Other uses could include batteries for equipment used at remote sites, such as drilling rigs for oil and gas, including devices sent down into the wells to monitor conditions.

    The team also included Gustavo Hobold, Aaron Melemed, and Rui Guo at MIT and Simon Jones at Caltech. The work was supported by MIT Lincoln Laboratory and the Army Research Office. More

  • in

    Taking a magnifying glass to data center operations

    When the MIT Lincoln Laboratory Supercomputing Center (LLSC) unveiled its TX-GAIA supercomputer in 2019, it provided the MIT community a powerful new resource for applying artificial intelligence to their research. Anyone at MIT can submit a job to the system, which churns through trillions of operations per second to train models for diverse applications, such as spotting tumors in medical images, discovering new drugs, or modeling climate effects. But with this great power comes the great responsibility of managing and operating it in a sustainable manner — and the team is looking for ways to improve.

    “We have these powerful computational tools that let researchers build intricate models to solve problems, but they can essentially be used as black boxes. What gets lost in there is whether we are actually using the hardware as effectively as we can,” says Siddharth Samsi, a research scientist in the LLSC. 

    To gain insight into this challenge, the LLSC has been collecting detailed data on TX-GAIA usage over the past year. More than a million user jobs later, the team has released the dataset open source to the computing community.

    Their goal is to empower computer scientists and data center operators to better understand avenues for data center optimization — an important task as processing needs continue to grow. They also see potential for leveraging AI in the data center itself, by using the data to develop models for predicting failure points, optimizing job scheduling, and improving energy efficiency. While cloud providers are actively working on optimizing their data centers, they do not often make their data or models available for the broader high-performance computing (HPC) community to leverage. The release of this dataset and associated code seeks to fill this space.

    “Data centers are changing. We have an explosion of hardware platforms, the types of workloads are evolving, and the types of people who are using data centers is changing,” says Vijay Gadepally, a senior researcher at the LLSC. “Until now, there hasn’t been a great way to analyze the impact to data centers. We see this research and dataset as a big step toward coming up with a principled approach to understanding how these variables interact with each other and then applying AI for insights and improvements.”

    Papers describing the dataset and potential applications have been accepted to a number of venues, including the IEEE International Symposium on High-Performance Computer Architecture, the IEEE International Parallel and Distributed Processing Symposium, the Annual Conference of the North American Chapter of the Association for Computational Linguistics, the IEEE High-Performance and Embedded Computing Conference, and International Conference for High Performance Computing, Networking, Storage and Analysis. 

    Workload classification

    Among the world’s TOP500 supercomputers, TX-GAIA combines traditional computing hardware (central processing units, or CPUs) with nearly 900 graphics processing unit (GPU) accelerators. These NVIDIA GPUs are specialized for deep learning, the class of AI that has given rise to speech recognition and computer vision.

    The dataset covers CPU, GPU, and memory usage by job; scheduling logs; and physical monitoring data. Compared to similar datasets, such as those from Google and Microsoft, the LLSC dataset offers “labeled data, a variety of known AI workloads, and more detailed time series data compared with prior datasets. To our knowledge, it’s one of the most comprehensive and fine-grained datasets available,” Gadepally says. 

    Notably, the team collected time-series data at an unprecedented level of detail: 100-millisecond intervals on every GPU and 10-second intervals on every CPU, as the machines processed more than 3,000 known deep-learning jobs. One of the first goals is to use this labeled dataset to characterize the workloads that different types of deep-learning jobs place on the system. This process would extract features that reveal differences in how the hardware processes natural language models versus image classification or materials design models, for example.   

    The team has now launched the MIT Datacenter Challenge to mobilize this research. The challenge invites researchers to use AI techniques to identify with 95 percent accuracy the type of job that was run, using their labeled time-series data as ground truth.

    Such insights could enable data centers to better match a user’s job request with the hardware best suited for it, potentially conserving energy and improving system performance. Classifying workloads could also allow operators to quickly notice discrepancies resulting from hardware failures, inefficient data access patterns, or unauthorized usage.

    Too many choices

    Today, the LLSC offers tools that let users submit their job and select the processors they want to use, “but it’s a lot of guesswork on the part of users,” Samsi says. “Somebody might want to use the latest GPU, but maybe their computation doesn’t actually need it and they could get just as impressive results on CPUs, or lower-powered machines.”

    Professor Devesh Tiwari at Northeastern University is working with the LLSC team to develop techniques that can help users match their workloads to appropriate hardware. Tiwari explains that the emergence of different types of AI accelerators, GPUs, and CPUs has left users suffering from too many choices. Without the right tools to take advantage of this heterogeneity, they are missing out on the benefits: better performance, lower costs, and greater productivity.

    “We are fixing this very capability gap — making users more productive and helping users do science better and faster without worrying about managing heterogeneous hardware,” says Tiwari. “My PhD student, Baolin Li, is building new capabilities and tools to help HPC users leverage heterogeneity near-optimally without user intervention, using techniques grounded in Bayesian optimization and other learning-based optimization methods. But, this is just the beginning. We are looking into ways to introduce heterogeneity in our data centers in a principled approach to help our users achieve the maximum advantage of heterogeneity autonomously and cost-effectively.”

    Workload classification is the first of many problems to be posed through the Datacenter Challenge. Others include developing AI techniques to predict job failures, conserve energy, or create job scheduling approaches that improve data center cooling efficiencies.

    Energy conservation 

    To mobilize research into greener computing, the team is also planning to release an environmental dataset of TX-GAIA operations, containing rack temperature, power consumption, and other relevant data.

    According to the researchers, huge opportunities exist to improve the power efficiency of HPC systems being used for AI processing. As one example, recent work in the LLSC determined that simple hardware tuning, such as limiting the amount of power an individual GPU can draw, could reduce the energy cost of training an AI model by 20 percent, with only modest increases in computing time. “This reduction translates to approximately an entire week’s worth of household energy for a mere three-hour time increase,” Gadepally says.

    They have also been developing techniques to predict model accuracy, so that users can quickly terminate experiments that are unlikely to yield meaningful results, saving energy. The Datacenter Challenge will share relevant data to enable researchers to explore other opportunities to conserve energy.

    The team expects that lessons learned from this research can be applied to the thousands of data centers operated by the U.S. Department of Defense. The U.S. Air Force is a sponsor of this work, which is being conducted under the USAF-MIT AI Accelerator.

    Other collaborators include researchers at MIT Computer Science and Artificial Intelligence Laboratory (CSAIL). Professor Charles Leiserson’s Supertech Research Group is investigating performance-enhancing techniques for parallel computing, and research scientist Neil Thompson is designing studies on ways to nudge data center users toward climate-friendly behavior.

    Samsi presented this work at the inaugural AI for Datacenter Optimization (ADOPT’22) workshop last spring as part of the IEEE International Parallel and Distributed Processing Symposium. The workshop officially introduced their Datacenter Challenge to the HPC community.

    “We hope this research will allow us and others who run supercomputing centers to be more responsive to user needs while also reducing the energy consumption at the center level,” Samsi says. More