What is a footprint analysis?
A footprint analysis answers the question: “How much of an environmental pressure (land, water, emissions…) caused in country A is ultimately driven by consumption in country B?”
Modern supply chains are global. Spain might import soybeans from Brazil, feed them to pigs, and export ham to France. A footprint analysis traces these flows backwards through the entire supply chain, so that the environmental cost of growing those soybeans is attributed to the French consumer who ate the ham — not to the Brazilian farmer who grew the beans.
This is done by building a Multi-Regional Input-Output (MRIO) model, a mathematical framework that captures all inter-industry and inter-country linkages in one system of equations.
The three-step pipeline
The whep package implements this in three steps:
-
build_io_model()— Assemble the MRIO tables from supply-use data, bilateral trade, and commodity balance sheets. -
compute_leontief_inverse()— Compute the Leontief inverse, which captures both direct and indirect supply chain linkages. -
compute_footprint()— Multiply the Leontief inverse by environmental extension data to trace pressures through to final consumption.
The methodology follows the FABIO model (Bruckner et al., 2019), which builds physical MRIO tables for the global food system.
Concepts and terminology
Before diving into the code, here are the key concepts:
Supply-use tables and process groups
A supply table records how much of each product each process produces. A use table records how much of each product each process consumes as input. Together they describe the technology of the economy.
In whep, each process belongs to one of four process groups, which together cover all transformations in the food system:
| Process group | What it does | Inputs | Outputs |
|---|---|---|---|
crop_production |
Grow crops | Seed (from CBS) | Harvested crop + residues (straw, etc.) |
husbandry |
Raise livestock | Feed (crops, grass, etc.) | Live animals (heads) + non-slaughter products (milk, eggs, wool…) |
slaughtering |
Slaughter animals | Live animals (heads) | Meat, offals, fats, hides and skins |
processing |
Derive secondary products | Primary commodities (tonnes) | Processed products (oil, flour, sugar, etc.) |
Each process is uniquely identified by a
(proc_group, proc_cbs_code) pair.
Mixed units in the supply-use system
Most flows are measured in tonnes, but live animals
are measured in heads (number of animals). This matches
the FABIO convention. Different rows of the Z matrix can
have different units because each row’s entries are only compared to
that row’s total output X[i]. The technical coefficients
are dimensionless ratios, so the system remains mathematically
consistent.
For example, a slaughtering process for cattle:
- Use: 1000 heads of cattle (heads)
- Supply: 400 tonnes of beef + 50 tonnes of offals + 30 tonnes of hides (tonnes)
The cattle heads and beef tonnes live in different rows of
Z, each with its own unit.
Bilateral trade
A bilateral trade matrix records, for each product, how many tonnes (or heads, for live animals) flow from each exporting country to each importing country. This is what connects domestic economies into a global system.
Commodity balance sheets (CBS)
A commodity balance sheet is an accounting identity for each product in each country:
The right-hand categories (food, feed, other uses) form the final demand — the end of the supply chain where products are consumed rather than being used as intermediate inputs.
For live animals (which FAO does not include in its CBS data), whep constructs synthetic CBS entries from primary production data:
-
Meat animals (cattle, pigs, poultry…): all
production goes to
processing(i.e. they enter the slaughtering process). -
Draft animals (horses, camels, asses…): all
production goes to
other_uses.
The Z matrix (inter-industry flows)
Z is the core of the MRIO model. It is a square matrix
where each row and column represents a (country, product) pair.
Entry
gives the physical flow from sector
to sector
.
For example, if row 5 is (Spain, wheat) and column 12 is (Spain, flour), then is the amount of Spanish wheat used by Spanish flour mills.
The model constructs Z using the industry
technology assumption:
where:
- is the multi-regional use matrix. It starts from single-country use tables and expands them with trade shares, so that each input can originate from any country. For example, if Spain’s pig farming uses 1000 tonnes of soybeans, and 60% of Spain’s soybean supply comes from Brazil, then 600 tonnes are attributed to Brazil and 400 to domestic production.
- is the row-normalized supply matrix (the transformation matrix). Each row sums to 1 and encodes how each process distributes its output across products. This converts process-level flows to product-level flows.
Trade shares
For each product, trade shares tell us where each country gets its supply from. If Spain consumes 1000 tonnes of soybeans, and 600 come from domestic production while 400 come from Brazil, then the trade share from Brazil to Spain is 0.4 and the domestic share is 0.6.
These shares are computed from the bilateral trade matrices combined with domestic production data from the CBS. They are used to expand the single-country use tables into a multi-regional system where each input can originate from any country.
The Y matrix (final demand)
Y is the final demand matrix. Rows are (country,
product) pairs (same as Z), and columns are
(country, demand category) pairs. Entry
tells us how much of product
goes to final demand category
.
The demand categories are derived from the CBS columns present in the
data, typically: food, other_uses, and
stock_addition.
Like Z, the final demand is expanded with trade shares
so that, for example, food consumption of wheat in France can partly
originate from French wheat and partly from imported wheat.
The X vector (total output)
X is a vector where
is the total output of sector
.
By construction:
meaning total output equals intermediate deliveries plus final demand.
FABIO adjustments
After the initial construction of Z and Y,
the model applies several corrections following the FABIO
methodology:
Leftover redistribution: When CBS reports
processingorseeduse but the supply-use table does not fully account for it, the unmatched quantity is redistributed to thefoodcategory inY. This prevents demand from vanishing.Loss endogenization (optional, via
endogenize_losses = TRUE): If the CBS contains alossescolumn, losses are moved fromYto the diagonal ofZ. This treats losses as self-use within each sector (the commodity is “consumed” by its own production process) rather than as final demand.Diagonal rebalancing: When (the diagonal accounts for all or more than total output, usually due to reporting errors), 80% of the diagonal value is moved to final demand and 20% is kept on the diagonal, proportionally distributed across the sector’s demand categories.
Negative output fixing: Any sectors with (which can arise from trade data inconsistencies) are corrected by adjusting negative entries in
Y.
The Leontief inverse (L)
The Leontief inverse is the key analytical tool. The technical coefficients matrix is defined as , representing how much input from sector is needed per unit of output from sector .
Before inversion, two corrections are applied:
- Negative entries in are set to zero: these arise from data inconsistencies and would distort the inverse.
- Column sums of are capped at 1: this ensures is invertible even when reported inputs exceed reported output.
The Leontief inverse captures the total requirements — both direct and indirect — needed to deliver one unit of final demand. Entry tells us: “To deliver one extra unit of product to final consumers, how much total output does sector need to produce (including all the intermediate steps)?”
This is what makes footprint tracing possible. A consumer buying bread in France triggers not just French bakeries, but also French mills, French wheat farmers, Brazilian soybean exporters (if feed is involved), and so on. The Leontief inverse encodes all of these ripple effects in a single matrix.
Environmental extensions
An environmental extension is any pressure you want to trace through the supply chain. Common examples:
- Land use (hectares): How many hectares of cropland does each sector use?
- Water use (cubic meters): How much blue water is consumed?
- Greenhouse gas emissions (tonnes CO2-eq): How much does each sector emit?
- Nitrogen surplus (tonnes N): How much excess nitrogen enters the environment?
You provide these as a vector with one value per (country, product) sector.
The footprint
The footprint matrix is computed as:
where is a diagonal matrix of extension intensities (pressure per unit of output). Each entry tells us: “How much environmental pressure occurring in sector is driven by final demand category ?”
When fd_labels is provided, the footprint uses the
FABIO diagonal approach: for each final demand column
,
the function computes
,
where
groups sectors by item. This decomposes the footprint by both the target
item being consumed and the demand category (food, other uses,
etc.).
The resulting tidy tibble decomposes this by origin country/product and target country/product, giving you a full bilateral footprint map.
Running the analysis
Step 1: Build the IO model
The simplest call uses default arguments — the function fetches all input data internally:
library(whep)
io <- build_io_model()This downloads supply-use tables, bilateral trade, and CBS data, then builds the IO model for all available years. To restrict to specific years:
io <- build_io_model(years = c(2010, 2015))If you want to inspect or pre-filter the input data, you can build them separately and pass them in:
supply_use <- build_supply_use()
bilateral_trade <- get_bilateral_trade()
cbs <- get_wide_cbs()
io <- build_io_model(supply_use, bilateral_trade, cbs, years = 2010)To endogenize losses (move them from final demand to the diagonal of
Z):
io <- build_io_model(endogenize_losses = TRUE)The result is a tibble with one row per year. Each row contains:
-
Z: The inter-industry flow matrix (sparse). -
Y: The final demand matrix (sparse). -
X: The total output vector. -
labels: A tibble mapping row/column indices toarea_codeanditem_cbs_code. -
fd_labels: A tibble mapping each Y column to itsarea_code(consuming country) andfd_col(demand category).
You can inspect a single year:
# Extract matrices for one year
z_mat <- io$Z[[1]]
y_mat <- io$Y[[1]]
x_vec <- io$X[[1]]
labels <- io$labels[[1]]
fd_labels <- io$fd_labels[[1]]
selected_year <- io$year[[1]]
# Dimensions: (n_countries * n_products) x (n_countries * n_products)
dim(z_mat)
# Label mapping
labels
# Final demand column labels
fd_labelsStep 2: Compute the Leontief inverse
For small systems (< 5000 sectors), you can pre-compute L:
l_inv <- compute_leontief_inverse(z_mat, x_vec)
dim(l_inv)This inverts the matrix. Column sums of are internally capped at 1 to prevent singularity, and negative entries are zeroed.
For large systems
For systems larger than ~5000 sectors (e.g. 192 countries x 125+
products = 24 000 sectors), computing the explicit Leontief inverse
requires too much memory (e.g. 4.8 GiB for 25 000 sectors). In those
cases, pass z_mat directly to
compute_footprint(), which solves
without ever materialising the full L.
Step 3: Compute the footprint
Now provide your environmental extension vector. This must have one
value per sector, in the same order as the labels. For land use, you can
use get_land_fp_production():
land_use_data <- get_land_fp_production() |>
dplyr::filter(year == selected_year) |>
dplyr::select(area_code, item_cbs_code, hectares = impact_u)
# Match it to the IO model's label order
extensions <- labels |>
dplyr::left_join(
land_use_data,
by = c("area_code", "item_cbs_code")
) |>
dplyr::mutate(
hectares = tidyr::replace_na(hectares, 0)
) |>
dplyr::pull(hectares)Dense path (pre-computed L, small systems)
If you pre-computed L:
footprint <- compute_footprint(
l_inv, x_vec, y_mat, extensions, labels,
fd_labels = fd_labels
)
footprintUsing Z directly (large systems)
Alternatively, pass the Z matrix and compute_footprint()
will derive the Leontief inverse for you:
footprint <- compute_footprint(
z_mat = z_mat,
x_vec = x_vec,
y_mat = y_mat,
extensions = extensions,
labels = labels,
fd_labels = fd_labels
)
footprintWith and without fd_labels
When fd_labels is provided (recommended), the result
includes target_item (the consumed product) and
target_fd (the demand category like "food" or
"other_uses"). This uses the FABIO diagonal decomposition
approach.
When fd_labels is omitted, columns of Y are treated as
individual sectors. This is appropriate only for square Y matrices and
does not produce a target_fd column.
The result is a tidy tibble with columns:
| Column | Meaning |
|---|---|
origin_area |
Country where the pressure physically occurs |
origin_item |
Product causing the pressure |
target_area |
Country whose consumption drives the pressure |
target_item |
Product being consumed |
target_fd |
Demand category (only with fd_labels) |
value |
Footprint in extension units (e.g. hectares) |
Analysing the results
Because the output is a tidy tibble, you can use standard dplyr operations to answer questions:
Which country’s consumption drives the most land use?
footprint |>
dplyr::summarise(
total_land = sum(value), .by = target_area
) |>
dplyr::arrange(dplyr::desc(total_land))How much of Spain’s land use is driven by foreign consumption?
spain_code <- 203L
footprint |>
dplyr::filter(origin_area == spain_code) |>
dplyr::mutate(
destination = ifelse(
target_area == spain_code, "domestic", "exported"
)
) |>
dplyr::summarise(
total_land = sum(value), .by = destination
)What are the top bilateral flows?
footprint |>
dplyr::summarise(
total = sum(value),
.by = c(origin_area, target_area)
) |>
dplyr::arrange(dplyr::desc(total)) |>
head(20)Which products drive the most land use in food consumption?
A conservation check
A useful sanity check: the total footprint should equal the total extensions. All pressure must be attributed somewhere.
# These two numbers should be equal (up to floating-point tolerance)
sum(footprint$value)
sum(extensions)Small differences can arise from negative-output corrections or column-sum capping in the technical coefficients matrix.
Processing multiple years
Since build_io_model() returns one row per year, you can
loop over them:
io <- build_io_model(years = 2010:2013)
land <- get_land_fp_production()
all_footprints <- purrr::pmap_dfr(
list(
io$Z, io$X, io$Y, io$labels,
io$fd_labels, io$year
),
function(z, x, y, lab, fdl, yr) {
ext <- lab |>
dplyr::left_join(
land |>
dplyr::filter(year == yr) |>
dplyr::select(area_code, item_cbs_code, impact_u),
by = c("area_code", "item_cbs_code")
) |>
dplyr::mutate(impact_u = tidyr::replace_na(impact_u, 0)) |>
dplyr::pull(impact_u)
compute_footprint(
z_mat = z, x_vec = x, y_mat = y,
extensions = ext, labels = lab,
fd_labels = fdl
) |>
dplyr::mutate(year = yr)
}
)Parallel execution (optional)
Since each year is independent, you can parallelise using
furrr:
library(furrr)
future::plan(multisession, workers = 4)
all_footprints <- furrr::future_pmap_dfr(
list(
io$Z, io$X, io$Y, io$labels,
io$fd_labels, io$year
),
function(z, x, y, lab, fdl, yr) {
ext <- lab |>
dplyr::left_join(
land |>
dplyr::filter(year == yr) |>
dplyr::select(area_code, item_cbs_code, impact_u),
by = c("area_code", "item_cbs_code")
) |>
dplyr::mutate(impact_u = tidyr::replace_na(impact_u, 0)) |>
dplyr::pull(impact_u)
compute_footprint(
z_mat = z, x_vec = x, y_mat = y,
extensions = ext, labels = lab,
fd_labels = fdl
) |>
dplyr::mutate(year = yr)
}
)Summary
| Step | Function | Input | Output |
|---|---|---|---|
| 1 | build_io_model() |
Supply-use, trade, CBS (all optional — fetched by default) | Z, Y, X, labels, fd_labels per year |
| 2 | compute_leontief_inverse() |
Z, X | Leontief inverse L (small systems only) |
| 3 | compute_footprint() |
L or Z, X, Y, extensions, labels, fd_labels | Tidy bilateral footprint |
References
Bruckner, M., Wood, R., Moran, D., Kuschnig, N., Wieland, H., Maus, V., & Borner, J. (2019). FABIO — The Construction of the Food and Agriculture Biomass Input-Output Model. Environmental Science & Technology, 53(19), 11302–11312. https://doi.org/10.1021/acs.est.8b03704
