Examining Global Energy Consumption and Production

Annika Haraikawa (Data Science at Reed College)https://reed-statistics.github.io/math241-spring2022/ , Chloe Tian (Data Science at Reed College)https://reed-statistics.github.io/math241-spring2022/ , Daniel Zhou (Data Science at Reed College)https://reed-statistics.github.io/math241-spring2022/
May 5, 2022
hide

Introduction: Background and Problem Statement

Climate change is a defining issue of our time, and we are at a defining moment to tackle this gargantuan problem. A major step is to introspect and examine our energy consumption and production habits, and how they change over time.

We chose this energy consumption data set for various reasons. First, the data set covers most countries’ energy consumption usage since the early 20th century, with detailed and deciphered types of energy sources and other information. We can employ time-series data and spatial data in a way that visualizations of these types are informative. Furthermore, despite the data set contains data over the span of the century, new variables such as various renewable energy sources and the subsequent percentage changes are included. Additionally, the data set contains GDP and population, which are useful addition to the variables that control for our research. Given the above mentioned features of the data set, we are able to explore geo-spatial distribution of various energy sources, aggregate global energy composition and regional composition shifts, time-series for such shifts. Additionally, we can dissect the interplay between various energy sources (i.e. the substitution effect among them), as well as the associations between energy consumption and variables such as population and GDP.

Global energy consumption is an important and fascinating topic because the energy sources contained in the data set are just as essential to our existence and well-being as other essential elements of life. More importantly, energy consumption is one of the main channels to the climate crisis. The rapid industrialization and global consumerism boost up energy consumption, depleting the planet through extracting various unsustainable energy sources. Fortunately, the recent technological breakthroughs and green revolution enabled the discovery of new energy sources. The data set captures the introduction of new energy sources on country level along with the aggregate trend towards more sustainable energy sources. Our objectives are to discern the broader trends and lay out the associations between energy consumption.

Introduction: Data Descriptions

The Data set we are using is “World Energy Consumption” from Kaggle. The website we obtained this data set from is World Energy Consumption (2021). This data set is a “collection of key metrics maintained by Hannah Ritchie (2020). It is updated regularly and includes data on energy consumption (primary energy, per capita levels, and growth rates), energy mix, electricity mix and other relevant metrics.” The energy consumption data is “sourced from a combination of two sources—the BP Statistical Review of World Energy and SHIFT Data Portal”. The electricity consumption data is “sourced from a combination of two sources—the BP Statistical Review of World Energy and EMBER – Global Electricity Dashboard”, and other data is taken from “United Nations, World Bank, Gapminder, Maddison Project Database, etc.”

Each row of this data set represents the energy status of a specific country at a specific year, ranging from 1900-2019, so it is a rather updated data set. For each row, there are 122 columns (some are unavailable due to the lack of data collection in the early 1900s) of data on energy usage, energy production (split into different columns based on the type of energy), electricity usage, and electricity production, as well as other columns we have not yet analyzed.

There are too many columns to give a detailed description of each column, so we will choose a few key ones:

Variable Description
iso_code ISO3 code for each country
country Name of each country
year year of data collected
electricity_demand Electricity demand, measured in terawatt-hours
electricity_generation Electricity generation, measured in terawatt-hours
population Population of country
gdp Total real gross domestic product, inflation-adjusted
fossilconschange_twh Annual change in fossil fuel consumption, measured in terawatt-hours
carbon_intensity_elec Carbon intensity of electricity production, measured in grams of carbon dioxide emitted per kilowatt-hour

Detailed descriptions for each column can be found at the codebook: World Energy Consumption (2021)

Methods: Data Explorations and Initial Findings

We aim to explore the relationship between fossil fuels and renewable energy sources. This data set provides access to multiple different variables related to those energy sources, but they are only available for country level. However, the data set does have a “World” category so we took advantage of that.

The data was filtered to only keep data points with ISO code = OWID_WRL, followed by a pivot_longer to put the column names for fossil shares and renewable shares under a new category, type, that would be easier to use for ggplot. A ggplot to show the time series of fossil fuel shares and renewable energy shares in the world is presented.

hide
fuel_world_ts <- df %>% filter(iso_code == "OWID_WRL") %>% #filter for summarized world data
 select(year, iso_code, country, fossil_share_energy, renewables_share_energy,
         fossil_fuel_consumption, renewables_consumption
         ) %>% #removing unnecessary columns
  group_by(year) %>%
  rename("fossil share" = "fossil_share_energy",
         "renewables share" = "renewables_share_energy") %>% #making it look better for ggplot
  pivot_longer(cols = c("fossil share", "renewables share"),
               names_to = "type",
               values_to = "share")

ggplot(fuel_world_ts, aes(x = year, y = share, color = type
                    )) + 
  geom_line() +
  theme(legend.position="bottom") +
  facet_wrap(~type, scale = "free_y") +
  xlim(1970, 2020) +
  labs(title = "Shares in fossil fuels vs renewable energy consumed by world")

The trend of the above plot is that fossil fuel shares are going down, while renewable energy shares are going up. What about actual change in consumption? To answer this question, the percent change of fossil fuel and renewable use were plotted using the same data wrangling steps as the shares plot.

hide
fuel_pct <- df %>% filter(iso_code == "OWID_WRL") %>% 
  select(year, iso_code, country, fossil_cons_change_pct, renewables_cons_change_pct
         ) %>% 
  group_by(year) %>%
  rename("fossil consumption" = "fossil_cons_change_pct",
         "renewables consumption" = "renewables_cons_change_pct") %>%
  pivot_longer(cols = c("fossil consumption", "renewables consumption", 
                        ),
               names_to = "type",
               values_to = "percentage")

ggplot(fuel_pct, aes(x = year, y = percentage, color = type
                    )) + 
  geom_line() +
  geom_point(size = 0.5) +
  theme(legend.position="bottom") +
  #facet_wrap(~type, scale = "free_y") +
  xlim(1970, 2020) +
  geom_hline(yintercept=0, linetype="dashed") +
  labs(title = "Annual percent change of fossil fuels vs renewables consumption in world",
       y = "percent change"
       )

This plot indicates that generally, both fossil fuel and renewables increase in consumption every year (positive percent change). This likely because each year more energy is used, so even if the shares of fossil fuels fall, that doesn’t necessarily mean the real amount consumed has fallen. However, it is also evident in this plot that the percent change of renewable energy consumption has been consistently greater than fossil fuels for the past 15 years, which is a promising trend.

Part II: Fossil Fuel Ratio by Country

To see what was happening to the share of fossil fuels at the level of individual countries, U.S., China, India, and Norway were compared against the world in an animation showing the share of fossil fuels from 1970 - 2019.

hide
ratio_ff_rnw <- df %>% #filter(iso_code != "") %>% 
  select(year, iso_code, country, fossil_share_energy, renewables_share_energy) %>% 
  drop_na() %>% group_by(year, country) %>%
  mutate(fossil_ratio = #variable for ratio between renewable and fossil
           (fossil_share_energy/(fossil_share_energy+renewables_share_energy)),
         renewables_ratio = 
           (renewables_share_energy/(fossil_share_energy+renewables_share_energy))) %>% 
  pivot_longer(cols = c(fossil_ratio, renewables_ratio),
               names_to = "type",
               values_to = "ratio")

#choosing countries to sample
sample <- filter(ratio_ff_rnw, #year == "2010",
              country %in% c("World", "United States", "China",
                             "Norway", "Uruguay", "Bangladesh",
                             "Canada", "South Africa", "India")
              ) %>%
  filter(type == "fossil_ratio") %>%
  filter(year > 1970)
  
#for animation
p <- ggplot(sample, 
       aes(x = country, y = ratio, fill = country)) +
  geom_col() +
  scale_x_discrete(limits = c("World", "China", "United States", "India",
                              "Norway")) +
  transition_states(year, transition_length=0.3, state_length=10) +
  #ease_aes('sine-in-out') +
  theme(legend.position="none") +
  labs(title = 'Share of fossil fuels for year: {closest_state}',
       y = "fossil fuel share")

animate(p)

An interesting observation is that for India, the share of fossil fuels is increasing. This is likely related to their economic development. Norway has some more prominent fluctuations year to year, but very obviously has the lowest fossil fuel use. For the other countries, the ratio of fossil fuels follows the world’s trend in generally declining as time progresses forward, which is best observed from 2010-2019.

Part III: GDP vs Renewable Energy Useage (sample year = 2016)

To see if there was a relationship between the GDP of a country and its renewable energy usage, data from GDP Per Capita World Bank Data (2020) for per capita was combined with the World Energy Consumption (2021) data. This involved left_join-ing the data frames by ISO code. A plot of GDP per capita and renewable energy consumption per capita was transformed so that both x and y scales would be logarithmic. A linear regression was modeled for the data.

hide
gdp_per_capita <- read_csv("GDP_per_capita.csv", 
                           skip = 3) #importing dataset from World Bank
hide
gdp_vs_renewable <- df %>% filter(iso_code != "") %>% 
  select(country,
         year,
         iso_code,
         gdp,
         renewables_elec_per_capita,
         renewables_energy_per_capita
         ) %>%
  filter(year == "2016") #selecting for data from the year of 2016

gdp_pc_2016 <- gdp_per_capita %>% select("Country Code", "2016") %>%
  rename("gpc_2016" = "2016", "iso_code" = "Country Code") #modifying column names for joining

gdp_vs_renewable_2 <- gdp_vs_renewable %>% left_join(gdp_pc_2016, by ="iso_code") #join

ggplot(gdp_vs_renewable_2, aes(x = renewables_energy_per_capita, y = gpc_2016)) +
  geom_point() +
  #xlim(0, 40000) +
  #ylim(0, 60000) +
  scale_y_log10() +
  scale_x_log10() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "GDP per capita vs renewable energy per capita of countries in 2016 
(log scale)",
       y = "gdp per capita",
       x ="renewable energy per capita")
hide
lmgdp <- lm(gpc_2016~renewables_energy_per_capita, data = gdp_vs_renewable_2) #linear regression
#summary(lmgdp)

The result of the lm on GDP per capita and renewable energy consumption per capita was y = 0.479 x + 20710. The R squared value of this simple linear regression model is 0.1389, indicating that this model could be significantly improved. There may be other variables that could also be considered, such as geographical access to renewable resources. However, it should also be noted that the p value is 0.0005, indicating that the null (GDP per capita and renewable enrgy per capita are not related) can be rejected.

Part IV: Analyzing and Visualizing Carbon Intensity of Energy Production

One of the most interesting columns in the data set is the carbon intensity of energy production for countries. This is one of the main intersections between energy production and the pollution it creates. This investigation is rather new and the data is difficult to collect, so the data is only available for the years 2000-2020 in 28 countries in the EU. We will examine this data in respect to time, as well as other variables such as population and GDP.

We first did some data wrangling to get the necessary data. We wanted to use GDP per capita as a possible variable, specifically to classify different countries and see how carbon intensity changes based on this classification. It would be counterproductive if the classifications change over time, so we decided to use the midpoint year 2010 to calculate GDP per capita for our classifications.

hide
# selecting important columns for investigation: country identification, year, and carbon intensity
df_intensity <- df %>%
  select(iso_code, country, year, carbon_intensity_elec) %>%
  drop_na()

# seperating countries into quartiles on their gdp per capita in 2010
df_gdpcalc <- df %>%
  filter(year == 2010) %>% 
  select(iso_code, gdp, population, carbon_intensity_elec) %>%
  drop_na() %>%
  mutate(gdppercap = gdp / population) %>%
  # gdpcut is the seperation into 4 sections
  mutate(gdpcut = as.numeric(cut_number(gdppercap, 4))) %>%
  # including iso_code to join properly
  select(iso_code, gdpcut)

# joining the actual data set with the categories
df_intensity <- df_intensity %>%
  left_join(df_gdpcalc)

# finding the mean carbon intensity for each year for each GDP per capita quantile
df_intensity_cut <- df_intensity %>%
  group_by(gdpcut, year) %>%
  summarise(meanelec = mean(carbon_intensity_elec))

We first compare carbon intensity with GDP per capita, split into quartiles of 7 countries. We have two graphs to show this relationship, one which considers the mean of the quartiles, and one that shows all the countries and their time series graphs faceted by their GDP per capita quartile.

hide
# faceted graph based on GDP per capita quartile
ggplot(df_intensity, aes(x = year, y = carbon_intensity_elec, color = country)) + 
  geom_point() + 
  geom_line() + 
  facet_wrap(~gdpcut) + 
  scale_colour_discrete() + 
  labs(x = "Year",
       y = "Carbon Intensity of Electricity Production",
       title= "Carbon Intensity of Electricity Production Faceted by GDP per Capita Quartiles",
       color = "Country")

hide
# mean carbon intensity for each quartile
ggplot(df_intensity_cut, aes(x = year, y = meanelec, color = factor(gdpcut))) + 
  geom_point() + 
  geom_line() + 
  scale_color_manual(values = c('#bdc9e1','#74a9cf','#2b8cbe','#045a8d')) + 
  labs(x = "Year",
       y = "Mean Carbon Intensity for Energy Production",
       title = "Mean Carbon Intensity for  GDP per capita Quartiles between 2000 and 2020",
       color = "GDP per capita Quartile")

It seems there is a general decrease in carbon intenstity as time goes on, and that carbon intensity is the lowest in the most wealthy quartile with higher GDP per capita. From the second graph, we can see that there is a larger decrease in carbon intensity for the lowest GDP per capita quartile, suggesting some form of catchup effect that allows countries with high carbon intensity of energy production to catch up to countries with lower intensity, and overall, the countries in the EU are shifting towards less carbon intensive energy sources, possibly due to the recent technological advancement in clean energy and the subsequent drop in costs.

We also used mapped them out specially to find a relationship between geographical position and carbon intensity. We have the following GIF.

hide
# put the png files together in a gif
png_files <- list.files(getwd(), pattern = "*.png")
gifski(png_files, gif_file = "animation.gif", width = 800, height = 600, delay = 0.5)
[1] "C:\\Users\\ajavq\\Google Drive\\Documents\\rc-aa\\courses\\math241-spring2022\\projects\\final-reports\\final-report-11\\animation.gif"
hide
knitr::include_graphics("animation.gif")

We still see a general decrease of carbon intensity as time goes on. There is a consistent but slow decrease of carbon intensity in Western Europe, but a few countries in Central and Eastern Europe such as Greece and Estonia had very significant decreases in carbon intensity.

For the countries in Western Europe which generally have high GDP per capita, a possible reason is the consistent improvement of technology and renewable energy that creates this consistent decrease. For countries such as Poland with significant decreases of carbon intensity, this can be attributed to strong government policy to wane off high carbon intensity means of producing electricity such as burning coal, which allows Poland to lower carbon intensity by a big factor in just 20 years.

Part V: Further Exploration: Energy Production and Consumption in Relations to Other Political and Economic variables

Research and Development (R&D) and Renewable Energy Adaption:

We use Research and Development (r&d) - Researchers - OECD Data (2019) investment and look at it in comparison to renewable energy, attempting to discern trends of association to investigate what are some political and economic variables that shape how each country adapts to renewable energy production and consumption.

hide
df_RD <- read.csv("OECD_data_Research_and_development.csv")

We are focusing on the most recent trend between R&D investment (as a share of GDP) and the share of renewable energy, thus only looking at the most recent available data which is the year of 2019. Renewable energy per capita is a variable that showcase how each citizen’s environmental footprint is based on renewable energy sources and thus causing less negative environmental externality. The assumption is that regions with higher share and presence of renewable energy especially at per capita level do not exhibit such feature out of vacuums. National-level Research and Development should be highly correlated to how innovative a country is, and environmental innovation should be a significant share of the technological advance.

hide
RD_renewable <- df_RD %>% left_join(df, by=c("LOCATION"="iso_code", "TIME"="year")) 

RD_renewable <- RD_renewable %>% 
  rename("RD_prct_GDP" = "Value") %>% 
  select(LOCATION, RD_prct_GDP, TIME, gdp, country, renewables_share_energy,renewables_energy_per_capita) %>% 
  filter(TIME=="2019") 

ggplot(data=RD_renewable, aes(y=RD_prct_GDP,
                              x=log(renewables_energy_per_capita),
                              size=renewables_share_energy
                              ))+
  geom_point(alpha=0.8)+
  geom_smooth(method = "lm", se = FALSE)+
  labs(x="R&D as a share of GDP",
       y="Renewable energy per capita in log scale",
       size="Renewable energy as a share of overall energy",
       titles="The association between R&D and renewable energy per capita in usage")

From this linear-log graph between R&D as a share of GDP and renewable energy per capita, we observe at least a positive association, where countries with high R&D as a share of GDP tend to have a higher renewable energy per capita.

Here we look at data: Environmental Policy - Patents on Environment Technologies - OECD Data (2019), and its impact on incentivizing renewable energy production in country level.

hide
df_patents <- read.csv("Patents_on_environment_technologies.csv")

“Inventors seek protection for their inventions in countries where they expect to invest, export or otherwise market their products. Often they do so in multiple jurisdictions (geographic markets). Patent data present a number of attractive properties compared to other alternative metrics of innovation”. —OECD Data

hide
patents_renewable <- df_patents %>% left_join(df, by=c("LOCATION"="iso_code", "TIME"="year")) 

patents_renewable <- patents_renewable %>% 
  rename("greentech_protect_index" = "Value") %>% 
  select(LOCATION, greentech_protect_index, TIME, gdp, country, renewables_share_energy) %>% 
  filter(TIME=="2019") 

ggplot(data=patents_renewable, aes(x=log(greentech_protect_index),
                              y=log(renewables_share_energy)))+
  geom_point(alpha=0.8)+
  geom_smooth(method = "lm", se = FALSE)+
  labs(x="Patents in environment-related technologies: Technology indicators, log scale",
       y="Share of renewable energy, log scale",
       titles="The association between Environmental patent protection and the share of renewable energy on natinoal level")

Based on on the log-log scatter plot between technology indicators—patents in environment-related technologies and share of renewable energy for each country in the OECD, we see that there appears to be a negative association between the two variables—more patent protection leads to less hsare of renewable energy. This is counter-intuitive since we would reasonably assume in countries with high intellectual property rights (patent protection on environmental technologies) will incentive private sectors to invest into renewable energy and thereby increasing the share of renewable energy in the country.

What we observe here is that in areas with high patent protection on environmental-related technologies, there appears to have smaller share of renewable energy as a share of overall energy consumption. One potential explanation for such trend is that perhaps, less protected markets incentive small energy production players to produce clean energy by imitating other existing technologies without investing in R&D costs at the first place.

Conclusion

We chose the topic at the first place because of the importance and urgency of academic research on energy consumption and production which is the main factor that contributed to climate crisis.

There are several key takeaways that can be useful for further academic research and increasing the public’s knowledge on such topic. First of all, there is a promising trend that the percent change of renewable energy consumption has been consistently greater than fossil fuels for the past 15 years. Specifically at country-level, we observe that more wealthy and developed countries such as Norway despite having large fluctuation in fossil fuel usage, the ratio of fossil fuel declines gradually in line with the world’s trend. To highlight some concerns, India as the world’s fastest growing economy with the largest population has seen an increase in the share of fossil fuel usage. This is often a general trend developing countries observe: more energy including fossil fuels are consumed as the economy grows at a rapid pace. Additionally, using 2016 as our baseline sample year, we discern a simple linear regression of weak positive relationship between renewable energy per capita and GDP per capita both in log-scale, and see an R-Squared 0.14 and a significant p-value to reject the null hypothesis of no correlation. When we investigate carbon intensity, we observe a general decrease trend in the past several decades, and that carbon intensity is the lowest in the most wealthy quartile with higher GDP per capita. Lastly, we explore other economics and political variables such as environmental technology patent protection and country’s overall R&D spending as a share of GDP. We find that countries with greater R&D share has more renewable energy per capita, and in countries with less patent protection on environmental technologies, share of renewable energy is higher.

Our findings may only shed lights on the topic of energy consumption and production, or potentially lead to some policy implication on altering the global trend of energy usage to greener, more sustainable types.

Class Peer Reviews

Environmental policy - patents on environment technologies - OECD data. (2019). Organisation for Economic Co-operation; Development. https://data.oecd.org/envpolicy/patents-on-environment-technologies.htm
GDP per capita world bank data. (2020). The World Bank. https://data.worldbank.org/indicator/NY.GDP.PCAP.CD
Hannah Ritchie, M. R. (2020). Energy. https://ourworldindata.org/energy
Research and development (r&d) - researchers - OECD data. (2019). Organisation for Economic Co-operation; Development. https://data.oecd.org/rd/researchers.htm
World energy consumption. (2021). https://www.kaggle.com/datasets/pralabhpoudel/world-energy-consumption

References

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".