User Guide: bulktrends

Example usage of the bulktrends package (version 0.1.0).

Set-up

bulktrends can be installed from it’s GitHub repository using

devtools::install_github("jbrowell/bulktrends")

Once installed, load the package and set the path to where we are storing data.

library(bulktrends)
data_directory <- "path/to/my_data/"

Parallel computation is supported in bullktrends using on the future framework. Its use is optional but recommended. To use parallel computation, simply set a “plan”…

future::plan("multisession")

Monthly UK Trade Info Data

Load UK Trade Info Bulk Data Files

This package has been developed to work with HMRC Monthly Import data. For more details, see the package README. We can use it to load a single bulk data file or all of the files in a directory (and subdirectories). This can take a while for multiple years of data, so it is recommended to save the loaded data as an .rds for quicker loading in the future.

imports <- read_uktradeinfo(path = paste0(data_directory,"imports/"))
saveRDS(imports, file = paste0(data_directory,"imports.rds"))
# imports <- readRDS(paste0(data_directory,"imports.rds"))
head(imports)
   PERREF   TYPE MONTHAC  COMCODE   SITC COD_SEQ COD_ALPHA PORT_SEQ PORT_CODE
   <char> <char>  <char>   <char> <char>  <char>    <char>   <char>    <char>
1: 201601      2  202507 01------  00---     001        FR      000       ZZZ
2: 201601      2  202507 01------  00---     003        NL      000       ZZZ
3: 201601      2  202507 01------  00---     004        DE      000       ZZZ
4: 201601      2  202507 01------  00---     005        IT      000       ZZZ
5: 201601      2  202507 01------  00---     007        IE      000       ZZZ
6: 201601      2  202507 01------  00---     008        DK      000       ZZZ
   COO_SEQ COO_ALPHA MODE_OF_TRANSPORT STAT_VALUE NET_MASS    SUMM_UNIT
    <char>    <char>            <char>      <num>    <num>       <char>
1:                                         298531     2091 000000000022
2:                                         319270    20318 000000000000
3:                                            555       25 000000000000
4:                                           5033      258 000000000000
5:                                        1332312   406697 000000795057
6:                                          37035     7805 000000000289
   SUPRESSION   FLOW REC_TYPE DATE_START   DATE_END
       <char> <char>   <char>     <Date>     <Date>
1:          0    imp        0 2016-01-01 2016-01-31
2:          0    imp        0 2016-01-01 2016-01-31
3:          0    imp        0 2016-01-01 2016-01-31
4:          0    imp        0 2016-01-01 2016-01-31
5:          0    imp        0 2016-01-01 2016-01-31
6:          0    imp        0 2016-01-01 2016-01-31

Things to be aware of

  • Country of origin reporting changes from 2022 onward, individual EU countries are specified where previously they were not. There is therefore a large increase in volume (number or rows) of import data (see methodology and quality report for more information on the data).

  • Low value Trade EU: Imports via port code QVV only report chapter of COMCODE and do not include NET_MASS.

Request Data from UK Trade Data API

Download look-up tables, e.g. for commodity codes and ports, and save, or load if already available.

Commodity <- uktrades_request(endpoint = "Commodity")$value
head(Commodity)
  CommodityId Cn8Code Hs2Code Hs4Code Hs6Code
1           0      00    <NA>    <NA>    <NA>
2           1      01      01    <NA>    <NA>
3           2      02      02    <NA>    <NA>
4           3      03      03    <NA>    <NA>
5           4      04      04    <NA>    <NA>
6           5      05      05    <NA>    <NA>
                                                                                                    Hs2Description
1                                                                                                             <NA>
2                                                                                                     Live animals
3                                                                                       Meat and edible meat offal
4                                                   Fish and crustaceans, molluscs and other aquatic invertebrates
5 Dairy produce; birds' eggs; natural honey; edible products of animal origin, not elsewhere specified or included
6                                                    Products of animal origin not elsewhere specified or included
  Hs4Description Hs6Description SitcCommodityCode Cn8LongDescription
1           <NA>           <NA>                 0                  -
2           <NA>           <NA>                 0                  -
3           <NA>           <NA>                 0                  -
4           <NA>           <NA>                 0                  -
5           <NA>           <NA>                 0                  -
6           <NA>           <NA>                 0                  -

Quick Visualisation

The comcode_plot() function produces time plots of imports by various measures included in import data, including mass, value (£) and volume (number of imports).

Here are some examples:

comcode_plot(imports, "01", comcode_lookup=Commodity)

comcode_plot(imports, "0602", variable="STAT_VALUE", comcode_lookup=Commodity)

comcode_plot(imports, "01", variable="volume",comcode_lookup = Commodity)

  • Hierarchical classification of a given commodity code
comcode_description("01012990", Commodity)

Hierarchical Description for Code: 01012990 
---------------------------------------------
Chapter (Hs2):           01  —  Live animals 
Heading (Hs4):           0101  —  Live horses, asses, mules and hinnies 
Subheading (Hs6):        010129  —  Live horses (excl. pure-bred for breeding) 
Commodity code (Cn8):    01012990  —  Live horses (excl. for slaughter, pure-bred for breeding) 
---------------------------------------------
comcode_description("08039010", Commodity)

Hierarchical Description for Code: 08039010 
---------------------------------------------
Chapter (Hs2):           08  —  Edible fruit and nuts; peel of citrus fruits or melons 
Heading (Hs4):           0803  —  Bananas, incl. plantains, fresh or dried 
Subheading (Hs6):        080390  —  Fresh or dried bananas (excl. plantains) 
Commodity code (Cn8):    08039010  —  Bananas, fresh (excl. plantains) 
---------------------------------------------
comcode_description("16041428", Commodity)

Hierarchical Description for Code: 16041428 
---------------------------------------------
Chapter (Hs2):           16  —  Preparations of meat, fish or crustaceans, molluscs or other aquatic invertebrates 
Heading (Hs4):           1604  —  Prepared or preserved fish; caviar and caviar substitutes prepared from fish eggs 
Subheading (Hs6):        160414  —  Prepared or preserved tunas, skipjack and Atlantic bonito, whole or in pieces (excl. minced) 
Commodity code (Cn8):    16041428  —  Prepared or preserved skipjack, whole or in pieces (excl. minced, fillets known as "loins" and such products in vegetable oil) 
---------------------------------------------

Filter by commodity groups

bulktrends includes useful groupings of commodity codes, such as plants and animals relevant for Sanitary and Phytosanitary Controls (SPS). An object called comcode_groups is included in bulktrends and loaded along with the package. Warning: comcode_groups does not include any suppressed codes, e.g. “01——”, take care when using.

head(imports[COMCODE %in% comcode_groups$animal_SPS])
   PERREF   TYPE MONTHAC  COMCODE   SITC COD_SEQ COD_ALPHA PORT_SEQ PORT_CODE
   <char> <char>  <char>   <char> <char>  <char>    <char>   <char>    <char>
1: 201601      1  202507 01012100  00150     001        FR      000       ZZZ
2: 201601      1  202507 01012100  00150     005        IT      000       ZZZ
3: 201601      1  202507 01012100  00150     007        IE      000       ZZZ
4: 201601      1  202507 01022130  00111     007        IE      000       ZZZ
5: 201601      1  202507 01022921  00119     007        IE      000       ZZZ
6: 201601      1  202507 01022949  00119     004        DE      000       ZZZ
   COO_SEQ COO_ALPHA MODE_OF_TRANSPORT STAT_VALUE NET_MASS    SUMM_UNIT
    <char>    <char>            <char>      <num>    <num>       <char>
1:                                         502526     3282 000000000007
2:                                             57     2386 000000000005
3:                                        3813813     8337 000000000018
4:                                           1252      600 000000000001
5:                                         989810    99680 000000000917
6:                                         369744    69200 000000000346
   SUPRESSION   FLOW REC_TYPE DATE_START   DATE_END
       <char> <char>   <char>     <Date>     <Date>
1:          0    imp        0 2016-01-01 2016-01-31
2:          0    imp        0 2016-01-01 2016-01-31
3:          0    imp        0 2016-01-01 2016-01-31
4:          0    imp        0 2016-01-01 2016-01-31
5:          0    imp        0 2016-01-01 2016-01-31
6:          0    imp        0 2016-01-01 2016-01-31

Anomaly detection

For a given commodity and quantity, a series of helper functions extract an aggregated time series by summing all commodities with the specified code by time period, and a regression model is automatically selected from a pool of candidates. Finally, an outlier detection algorithm is run to detect outliers of various types (“AO” additive outliers, “LS” level shifts, “TC” temporary changes, “IO” innovative outliers and “SLS” seasonal level shifts).

ts_data <- extract_ts(imports, "01", quantity="NET_MASS")

selected_model <- select_best_model(ts_data, response_col = "NET_MASS", scale_ts = T)

detect_anomaly <- tso(y = ts_data[,as.ts(scale(NET_MASS))],
                      cval=5,
                      types = c("AO", "LS", "TC", "IO"),
                      xreg = if(ncol(selected_model$xreg)>0){selected_model$xreg}else{NULL})

print(detect_anomaly)
Series: ts_data[, as.ts(scale(NET_MASS))] 
Regression with ARIMA(0,0,0) errors 

Coefficients:
         TC39
      -4.3752
s.e.   0.5884

sigma^2 = 0.6846:  log likelihood = -147.03
AIC=298.07   AICc=298.17   BIC=303.64

Outliers:
  type ind time coefhat  tstat
1   TC  39   39  -4.375 -7.436
plot.tsoutliers(detect_anomaly)

This procedure is performed for a list of commodity codes…

hs2_codes <- unique(substr(imports$COMCODE, 1,2))

hs2_anomalies <- detect_anomalies(import_data = imports,
                                  codes = hs2_codes[1:3],
                                  scale=T,
                                  cval=5,
                                  types = c("AO", "LS", "TC", "IO"))

print(hs2_anomalies)
     type   ind       time   coefhat     tstat   code
   <fctr> <int>     <Date>     <num>     <num> <char>
1:     TC    39 2019-03-01 -4.375237 -7.435678     01
2:   <NA>    NA       <NA>        NA        NA     02
3:   <NA>    NA       <NA>        NA        NA     03
                                       model_formula
                                              <char>
1:                                     NET_MASS ~ -1
2:                                     NET_MASS ~ -1
3: NET_MASS ~ linear_trend + annual_sin + annual_cos

Daily IPAFFS Data

Load IPAFFS Data Files

The Food Standards Agency (FSA) makes some Import of Products, Animals, Food and Feed System (IPAFFS) data public which we can also analyse with bulktrends. The data files contain daily data of high-risk products of animal origin and high-risk food of non-animal origin imported into the United Kingdom via approved designated ports (see the FSA website for more information). We can load a single csv file or all csv files in a directory (and subdirectories) using read_ipaffs().

imports_ipaffs <- read_ipaffs(path = paste0(data_directory,"imports_ipaffs"))

For consistency, some column names have been standardised. For example, (i) date columns include DATE_START and DATE_END, (ii) all variations of net weight columns across .csv files have been renamed as NET_MASS and (iii) all variations of commodity code columns are renamed as COMCODE.

head(imports_ipaffs[,c(1:4,9,10,55,56,43,17)])
   YearOfDeclaration QuarterOfDeclaration MonthOfDeclaration DayOfDeclaration
              <char>               <char>             <char>           <char>
1:              2017                   Q1                  1               23
2:              2017                   Q1                  2               10
3:              2017                   Q1                  2               15
4:              2017                   Q1                  2               15
5:              2017                   Q1                  2               16
6:              2017                   Q1                  2               16
   ImporterCountry CountryOfOrigin DATE_START   DATE_END  COMCODE NET_MASS
            <char>          <char>     <Date>     <Date>   <char>    <num>
1:              GB              IN 2017-01-23 2017-01-23   230910     0.63
2:              GB              CN 2017-02-10 2017-02-10   230910 10502.00
3:              GB              AU 2017-02-15 2017-02-15 02044250 12575.70
4:              GB              AU 2017-02-15 2017-02-15 02044310 12575.70
5:              GB              NZ 2017-02-16 2017-02-16 02069099 17979.20
6:              GB              NZ 2017-02-16 2017-02-16 02044310 17952.00

Data Visualisation

The comcode_plot() function can be used to also visualise daily time series. For the IPAFFS data, we can currently view NET_MASS and volume as follows:

comcode_plot(imports_ipaffs[DATE_START>="2024-01-01"],
             code =  "02",
             variable = "NET_MASS",
             comcode_lookup=Commodity)

comcode_plot(imports_ipaffs[DATE_START>="2024-01-01"],
             code =  "02",
             variable = "volume",
             comcode_lookup=Commodity)

ts_data <- extract_ts(imports_ipaffs,
                      code = "15",
                      quantity = "NET_MASS",
                      date_col = "DATE_START",
                      fill_missing = 0,
                      freq = "day")
plot(ts_data)

Anomaly detection

The anomaly detection procedure described using the monthly HMRC data can also be applied to the daily IPAFFS data. This involves using the same functions, such as, extract_ts to obtain the time series of a given commodity code, select_best_model to select an appropriate model from a set of candidate models and using the outlier detection algorithm to identify the various types of outliers.

ts_data <- extract_ts(imports_ipaffs,
                      code = "23",
                      quantity = "NET_MASS",
                      date_col = "DATE_START",
                      fill_missing=0)

selected_model <- select_best_model(ts_data, response_col = "NET_MASS", scale_ts = T)

detect_anomaly <- tso(y = as.ts(scale(ts_data[["NET_MASS"]])),
                      cval=5,
                      maxit.iloop = 20,
                      maxit.oloop = 50,
                      types = c("AO", "LS", "TC", "IO"),
                      xreg = if(ncol(selected_model$xreg)>0) {
                              selected_model$xreg} else {NULL})

print(detect_anomaly)
plot.tsoutliers(detect_anomaly)

Here is also an example of using the detect_anomalies() function to a selected group of commodity codes from the IPAFFS dataset for large-scale anomaly detection:

hs2_codes <- unique(substr(imports_ipaffs$COMCODE, 1,2))

hs2_anomalies_netmass <- detect_anomalies(import_data = imports_ipaffs,
                                  codes = hs2_codes[1:3],
                                  quantity = "NET_MASS",
                                  scale=T,
                                  cval=5,
                                  types = c("AO", "LS", "TC", "IO"),
                                  freq = "day")

hs2_anomalies_volume <- detect_anomalies(import_data = imports_ipaffs,
                                  codes = hs2_codes[1:3],
                                  quantity = "volume",
                                  scale=T,
                                  cval=5,
                                  types = c("AO", "LS", "TC", "IO"),
                                  freq = "day")
head(hs2_anomalies_netmass)
head(hs2_anomalies_volume)

Note: The above anomaly detection procedures for daily data work but can take a while to run so their output has been excluded in the current version of the userguide.