devtools::install_github("jbrowell/bulktrends")User Guide: bulktrends
Example usage of the bulktrends package (version 0.1.0).
Set-up
bulktrends can be installed from it’s GitHub repository using
Once installed, load the package and set the path to where we are storing data.
library(bulktrends)
data_directory <- "path/to/my_data/"Parallel computation is supported in bullktrends using on the future framework. Its use is optional but recommended. To use parallel computation, simply set a “plan”…
future::plan("multisession")Monthly UK Trade Info Data
Load UK Trade Info Bulk Data Files
This package has been developed to work with HMRC Monthly Import data. For more details, see the package README. We can use it to load a single bulk data file or all of the files in a directory (and subdirectories). This can take a while for multiple years of data, so it is recommended to save the loaded data as an .rds for quicker loading in the future.
imports <- read_uktradeinfo(path = paste0(data_directory,"imports/"))
saveRDS(imports, file = paste0(data_directory,"imports.rds"))
# imports <- readRDS(paste0(data_directory,"imports.rds"))head(imports) PERREF TYPE MONTHAC COMCODE SITC COD_SEQ COD_ALPHA PORT_SEQ PORT_CODE
<char> <char> <char> <char> <char> <char> <char> <char> <char>
1: 201601 2 202507 01------ 00--- 001 FR 000 ZZZ
2: 201601 2 202507 01------ 00--- 003 NL 000 ZZZ
3: 201601 2 202507 01------ 00--- 004 DE 000 ZZZ
4: 201601 2 202507 01------ 00--- 005 IT 000 ZZZ
5: 201601 2 202507 01------ 00--- 007 IE 000 ZZZ
6: 201601 2 202507 01------ 00--- 008 DK 000 ZZZ
COO_SEQ COO_ALPHA MODE_OF_TRANSPORT STAT_VALUE NET_MASS SUMM_UNIT
<char> <char> <char> <num> <num> <char>
1: 298531 2091 000000000022
2: 319270 20318 000000000000
3: 555 25 000000000000
4: 5033 258 000000000000
5: 1332312 406697 000000795057
6: 37035 7805 000000000289
SUPRESSION FLOW REC_TYPE DATE_START DATE_END
<char> <char> <char> <Date> <Date>
1: 0 imp 0 2016-01-01 2016-01-31
2: 0 imp 0 2016-01-01 2016-01-31
3: 0 imp 0 2016-01-01 2016-01-31
4: 0 imp 0 2016-01-01 2016-01-31
5: 0 imp 0 2016-01-01 2016-01-31
6: 0 imp 0 2016-01-01 2016-01-31
Things to be aware of
Country of origin reporting changes from 2022 onward, individual EU countries are specified where previously they were not. There is therefore a large increase in volume (number or rows) of import data (see methodology and quality report for more information on the data).
Low value Trade EU: Imports via port code QVV only report chapter of
COMCODEand do not includeNET_MASS.
Request Data from UK Trade Data API
Download look-up tables, e.g. for commodity codes and ports, and save, or load if already available.
Commodity <- uktrades_request(endpoint = "Commodity")$valuehead(Commodity) CommodityId Cn8Code Hs2Code Hs4Code Hs6Code
1 0 00 <NA> <NA> <NA>
2 1 01 01 <NA> <NA>
3 2 02 02 <NA> <NA>
4 3 03 03 <NA> <NA>
5 4 04 04 <NA> <NA>
6 5 05 05 <NA> <NA>
Hs2Description
1 <NA>
2 Live animals
3 Meat and edible meat offal
4 Fish and crustaceans, molluscs and other aquatic invertebrates
5 Dairy produce; birds' eggs; natural honey; edible products of animal origin, not elsewhere specified or included
6 Products of animal origin not elsewhere specified or included
Hs4Description Hs6Description SitcCommodityCode Cn8LongDescription
1 <NA> <NA> 0 -
2 <NA> <NA> 0 -
3 <NA> <NA> 0 -
4 <NA> <NA> 0 -
5 <NA> <NA> 0 -
6 <NA> <NA> 0 -
Quick Visualisation
The comcode_plot() function produces time plots of imports by various measures included in import data, including mass, value (£) and volume (number of imports).
Here are some examples:
comcode_plot(imports, "01", comcode_lookup=Commodity)comcode_plot(imports, "0602", variable="STAT_VALUE", comcode_lookup=Commodity)comcode_plot(imports, "01", variable="volume",comcode_lookup = Commodity)- Hierarchical classification of a given commodity code
comcode_description("01012990", Commodity)
Hierarchical Description for Code: 01012990
---------------------------------------------
Chapter (Hs2): 01 — Live animals
Heading (Hs4): 0101 — Live horses, asses, mules and hinnies
Subheading (Hs6): 010129 — Live horses (excl. pure-bred for breeding)
Commodity code (Cn8): 01012990 — Live horses (excl. for slaughter, pure-bred for breeding)
---------------------------------------------
comcode_description("08039010", Commodity)
Hierarchical Description for Code: 08039010
---------------------------------------------
Chapter (Hs2): 08 — Edible fruit and nuts; peel of citrus fruits or melons
Heading (Hs4): 0803 — Bananas, incl. plantains, fresh or dried
Subheading (Hs6): 080390 — Fresh or dried bananas (excl. plantains)
Commodity code (Cn8): 08039010 — Bananas, fresh (excl. plantains)
---------------------------------------------
comcode_description("16041428", Commodity)
Hierarchical Description for Code: 16041428
---------------------------------------------
Chapter (Hs2): 16 — Preparations of meat, fish or crustaceans, molluscs or other aquatic invertebrates
Heading (Hs4): 1604 — Prepared or preserved fish; caviar and caviar substitutes prepared from fish eggs
Subheading (Hs6): 160414 — Prepared or preserved tunas, skipjack and Atlantic bonito, whole or in pieces (excl. minced)
Commodity code (Cn8): 16041428 — Prepared or preserved skipjack, whole or in pieces (excl. minced, fillets known as "loins" and such products in vegetable oil)
---------------------------------------------
Filter by commodity groups
bulktrends includes useful groupings of commodity codes, such as plants and animals relevant for Sanitary and Phytosanitary Controls (SPS). An object called comcode_groups is included in bulktrends and loaded along with the package. Warning: comcode_groups does not include any suppressed codes, e.g. “01——”, take care when using.
head(imports[COMCODE %in% comcode_groups$animal_SPS]) PERREF TYPE MONTHAC COMCODE SITC COD_SEQ COD_ALPHA PORT_SEQ PORT_CODE
<char> <char> <char> <char> <char> <char> <char> <char> <char>
1: 201601 1 202507 01012100 00150 001 FR 000 ZZZ
2: 201601 1 202507 01012100 00150 005 IT 000 ZZZ
3: 201601 1 202507 01012100 00150 007 IE 000 ZZZ
4: 201601 1 202507 01022130 00111 007 IE 000 ZZZ
5: 201601 1 202507 01022921 00119 007 IE 000 ZZZ
6: 201601 1 202507 01022949 00119 004 DE 000 ZZZ
COO_SEQ COO_ALPHA MODE_OF_TRANSPORT STAT_VALUE NET_MASS SUMM_UNIT
<char> <char> <char> <num> <num> <char>
1: 502526 3282 000000000007
2: 57 2386 000000000005
3: 3813813 8337 000000000018
4: 1252 600 000000000001
5: 989810 99680 000000000917
6: 369744 69200 000000000346
SUPRESSION FLOW REC_TYPE DATE_START DATE_END
<char> <char> <char> <Date> <Date>
1: 0 imp 0 2016-01-01 2016-01-31
2: 0 imp 0 2016-01-01 2016-01-31
3: 0 imp 0 2016-01-01 2016-01-31
4: 0 imp 0 2016-01-01 2016-01-31
5: 0 imp 0 2016-01-01 2016-01-31
6: 0 imp 0 2016-01-01 2016-01-31
Anomaly detection
For a given commodity and quantity, a series of helper functions extract an aggregated time series by summing all commodities with the specified code by time period, and a regression model is automatically selected from a pool of candidates. Finally, an outlier detection algorithm is run to detect outliers of various types (“AO” additive outliers, “LS” level shifts, “TC” temporary changes, “IO” innovative outliers and “SLS” seasonal level shifts).
ts_data <- extract_ts(imports, "01", quantity="NET_MASS")
selected_model <- select_best_model(ts_data, response_col = "NET_MASS", scale_ts = T)
detect_anomaly <- tso(y = ts_data[,as.ts(scale(NET_MASS))],
cval=5,
types = c("AO", "LS", "TC", "IO"),
xreg = if(ncol(selected_model$xreg)>0){selected_model$xreg}else{NULL})
print(detect_anomaly)Series: ts_data[, as.ts(scale(NET_MASS))]
Regression with ARIMA(0,0,0) errors
Coefficients:
TC39
-4.3752
s.e. 0.5884
sigma^2 = 0.6846: log likelihood = -147.03
AIC=298.07 AICc=298.17 BIC=303.64
Outliers:
type ind time coefhat tstat
1 TC 39 39 -4.375 -7.436
plot.tsoutliers(detect_anomaly)This procedure is performed for a list of commodity codes…
hs2_codes <- unique(substr(imports$COMCODE, 1,2))
hs2_anomalies <- detect_anomalies(import_data = imports,
codes = hs2_codes[1:3],
scale=T,
cval=5,
types = c("AO", "LS", "TC", "IO"))
print(hs2_anomalies) type ind time coefhat tstat code
<fctr> <int> <Date> <num> <num> <char>
1: TC 39 2019-03-01 -4.375237 -7.435678 01
2: <NA> NA <NA> NA NA 02
3: <NA> NA <NA> NA NA 03
model_formula
<char>
1: NET_MASS ~ -1
2: NET_MASS ~ -1
3: NET_MASS ~ linear_trend + annual_sin + annual_cos
Daily IPAFFS Data
Load IPAFFS Data Files
The Food Standards Agency (FSA) makes some Import of Products, Animals, Food and Feed System (IPAFFS) data public which we can also analyse with bulktrends. The data files contain daily data of high-risk products of animal origin and high-risk food of non-animal origin imported into the United Kingdom via approved designated ports (see the FSA website for more information). We can load a single csv file or all csv files in a directory (and subdirectories) using read_ipaffs().
imports_ipaffs <- read_ipaffs(path = paste0(data_directory,"imports_ipaffs"))For consistency, some column names have been standardised. For example, (i) date columns include DATE_START and DATE_END, (ii) all variations of net weight columns across .csv files have been renamed as NET_MASS and (iii) all variations of commodity code columns are renamed as COMCODE.
head(imports_ipaffs[,c(1:4,9,10,55,56,43,17)]) YearOfDeclaration QuarterOfDeclaration MonthOfDeclaration DayOfDeclaration
<char> <char> <char> <char>
1: 2017 Q1 1 23
2: 2017 Q1 2 10
3: 2017 Q1 2 15
4: 2017 Q1 2 15
5: 2017 Q1 2 16
6: 2017 Q1 2 16
ImporterCountry CountryOfOrigin DATE_START DATE_END COMCODE NET_MASS
<char> <char> <Date> <Date> <char> <num>
1: GB IN 2017-01-23 2017-01-23 230910 0.63
2: GB CN 2017-02-10 2017-02-10 230910 10502.00
3: GB AU 2017-02-15 2017-02-15 02044250 12575.70
4: GB AU 2017-02-15 2017-02-15 02044310 12575.70
5: GB NZ 2017-02-16 2017-02-16 02069099 17979.20
6: GB NZ 2017-02-16 2017-02-16 02044310 17952.00
Data Visualisation
The comcode_plot() function can be used to also visualise daily time series. For the IPAFFS data, we can currently view NET_MASS and volume as follows:
comcode_plot(imports_ipaffs[DATE_START>="2024-01-01"],
code = "02",
variable = "NET_MASS",
comcode_lookup=Commodity)comcode_plot(imports_ipaffs[DATE_START>="2024-01-01"],
code = "02",
variable = "volume",
comcode_lookup=Commodity)ts_data <- extract_ts(imports_ipaffs,
code = "15",
quantity = "NET_MASS",
date_col = "DATE_START",
fill_missing = 0,
freq = "day")
plot(ts_data)Anomaly detection
The anomaly detection procedure described using the monthly HMRC data can also be applied to the daily IPAFFS data. This involves using the same functions, such as, extract_ts to obtain the time series of a given commodity code, select_best_model to select an appropriate model from a set of candidate models and using the outlier detection algorithm to identify the various types of outliers.
ts_data <- extract_ts(imports_ipaffs,
code = "23",
quantity = "NET_MASS",
date_col = "DATE_START",
fill_missing=0)
selected_model <- select_best_model(ts_data, response_col = "NET_MASS", scale_ts = T)
detect_anomaly <- tso(y = as.ts(scale(ts_data[["NET_MASS"]])),
cval=5,
maxit.iloop = 20,
maxit.oloop = 50,
types = c("AO", "LS", "TC", "IO"),
xreg = if(ncol(selected_model$xreg)>0) {
selected_model$xreg} else {NULL})
print(detect_anomaly)
plot.tsoutliers(detect_anomaly)Here is also an example of using the detect_anomalies() function to a selected group of commodity codes from the IPAFFS dataset for large-scale anomaly detection:
hs2_codes <- unique(substr(imports_ipaffs$COMCODE, 1,2))
hs2_anomalies_netmass <- detect_anomalies(import_data = imports_ipaffs,
codes = hs2_codes[1:3],
quantity = "NET_MASS",
scale=T,
cval=5,
types = c("AO", "LS", "TC", "IO"),
freq = "day")
hs2_anomalies_volume <- detect_anomalies(import_data = imports_ipaffs,
codes = hs2_codes[1:3],
quantity = "volume",
scale=T,
cval=5,
types = c("AO", "LS", "TC", "IO"),
freq = "day")
head(hs2_anomalies_netmass)
head(hs2_anomalies_volume)Note: The above anomaly detection procedures for daily data work but can take a while to run so their output has been excluded in the current version of the userguide.