Beyond the Code: Carbon Impact of Large Language Models (LLMs)

Brendan Kamp

December 18, 2023

Sustainability Large language models Conscious coding Green software

I’m sure you’ve heard of ChatGPT, Copilot, or any of the other many (many, many, many) AI tools hitting the tech scene. But something a little more nuanced is coming to light–and that’s the environmental costs. I mean, ‘L’ in LLM doesn’t exactly stand for ’low-impact’.

So let’s dive into the environmental cost of that LinkedIn post you just generated, and the realities of training large language models.

Measuring large language model energy consumption

Diving into this subject, I found out that GPT-3, of which GPT-3.5 is a subclass, used around 1287 MWh to be trained (this is equivalent to the annual electricity consumption of an average household in the United States.) This was based on the understanding that they took 405 years (based on machine hours) to train in 2020 on V100 machines.

While OpenAI utilised Microsoft data centres to achieve a low PEU score , there’s still a lot to consider as we examine the emissions calculations:

# Microsoft's Reported Best Data Center: North Virginia
# Carbon intensity for region (5 Dec 2023): 401g CO²eq/kWh
# Training energy in Kilowatt hours: 1287000 kWh
# Microsoft's PEU score for America: 1.17
(1287000 kWh x 401g) x 1.17 = 603821.79 kg CO²eq

That’s still around 603 tons of CO²eq that was created just to train the GPT-3 model.

To put that into perspective, I recently wrote about the environmental impact of manufacturing a laptop, showcasing that its creation results in 243 kg CO²eq.So training GPT-3 had the same environmental impact as manufacturing 2377 laptops.

If we look at smaller LLMs like Koala, which is based on Meta’s LLaMa 1 model, the 7B parameter model took around 36 MWh.

For comparison, with the same setup that GPT-3 used:

# Microsoft's Reported Best Data Center: North Virginia
# Carbon intensity for region (5 Dec 2023): 401g CO²eq/kWh
# Training energy in Kilowatt hours: 36000 kWh
# Microsoft's PEU score for Americas: 1.17
(36000 kWH x 401g) x 1.17 = 16890.12kg CO²eq

You quickly pick up that the smaller model emits just above a quarter (168 tons) of CO²eq, while being able to perform close to the same quality of work as the GPT-3 model.

Location, location, location.

In the previous estimates, we very specifically chose one data centre to handle the calculations. The biggest factor in emission calculations is the type of energy you use, and there are better options than North Virginia, USA.

One moment: what do we mean by “type of energy”?

Well, we generally throw around a term like “Green Energy”. Green energy is defined as “energy that can be produced using a method, and from a source, that causes no harm to the natural environment. ” I do feel like “no harm” is a bit of a stretch (there are factors like manufacturing waste and land usage), so maybe “little harm” is a better term. Using greener energy in the training of your model will naturally mean that the emissions factor is lower. There is a handy tool called Electricity Maps that tells you just how “green” your energy is.

Coming back to the topic at hand, let’s say we decided to build GPT-3 in another region, say Norway, using the carbon intensity score there, you would find that it would greatly reduce the emissions of the training cycle.

# Region: Norway
# Carbon intensity for region (5 Dec 2023): 33g CO²eq/kWh
# Training energy in Kilowatt hours: 1287000 kWh
# Microsoft's PEU score for Europe: 1.185
(1287000 kWH x 33g) x 1.185 = 50328.135 kg CO²eq

Just shifting where your model is trained reduces emissions by about 91.% compared to its previous data centre location.

Using your model also causes carbon

When talking about the energy costs of running a GPT-3 model at scale, the calculations start becoming tricky. Towards Data Science compared two different methods for calculating the energy consumption of ChatGPT. In the end, they came to a result of around 0.0018 kWh per query. Now, using similar logic as before, we can take this information, with the added knowledge that ChatGPT gets around 10 million queries a day, and see how many CO²eq ChatGPT is emitting.

# Microsoft's Reported Best Data Center: North VirginIa
# Carbon Intensity For region (5 Dec 2023): 401g CO²eq/kWh
# Kilowatt hours per query: 0.0018 kWh
# Microsoft's PEU score for Americas: 1.17
# Queries per day: 10 000 000
(0.0018 kWh x 10000000 x 401g) x 1.17 = 8445.06 kg CO²eq

ChatGPT could be pushing 8.5 tons of CO²eq into the atmosphere. Every. Single. Day. (A typical gasoline-powered car emits about 4.6 tons of CO₂e per year.)

Let’s see what would happen if we moved ChatGPT to a greener data centre

# Region: Norway
# Carbon intensity for region (5 Dec 2023): 33g CO²eq/kWh
# Kilowatt hours per query: 0.0018KWh
# Microsoft's PEU score for Europe: 1.185
# Queries per day: 10 000 000
(0.0018 KWh x 10000000 x 33g) x 1.185 = 703.89 kg CO²eq

Just by changing where we run it, we could emit only 12% of the emissions than when we were in the less green data centre .

Is it worth the energy?

We have gone through just how much energy and emissions these LLMs are using, the next question is naturally, are they bringing enough value to validate these emissions?

When answering, we need to look at the scope of impact that these LLMs are having. Using these could help optimise various industries such as logistics companies by making shipments more efficient or reduce computational processes by creating more efficient code. The value is high and the impact of reducing emissions (if used in this way) could be enormous.

Conclusion

The use cases for large language models are various and they have impacted the world (for the better? Still to be determined.)

If you are thinking of using LLMs in your company or for your workflows, try to not just use the biggest and baddest model just for the sake of it. Start looking at smaller models that could still serve your purpose well. When training and running models, make use of cleaner regions. While this might not be viable for ChatGPT as there are limitations on the amount of GPUs in each region, it is a great start for companies looking to start utilising greener data centres and putting pressure on providers to make their data centres greener. Lastly, be cognisant of using open-source models like LLaMa as a base to reduce the amount of training that you need to perform.