Accurate Demand Forecasting for Retails with Deep Neural NetworksPermalink
ContentsPermalink
- Abstract
- Introduction
- Problem Formulation
- Framework
- Graph Attention Component
- Recurrent Component
- Variable-wise Temporal attention
- Autoregressive Component
- Final Prediction
0. AbstractPermalink
Previous Works
- prediction of each individual product item
- adopt MTS forecasting approach
→ none of them leveraged “structural information of product items”
- ex) product brand / multi-level categories
Proposal : DL-based prediction model to find…
- 1) **inherent inter-dependencies **
- 2) temporal characteristics
among product items!
1. IntroductionPermalink
Univariate TS model
- ex) ARIMA, AR, MA, ARMA,…
- treat each product separately
Multivariate TS model
- take into account the “INTER-dependencies” among items
- ex) VAR
DL-based models
- ex) RNN, LSTM, GRU
- ex2) LSTNet
- 1) CNN + GRU for MTS forecasting
- 2) special recurrent-skip component
- to capture very long-term periodic patterns
- 3) assumption : all variables in MTS have same periodicity
Existing prediction methods ignore that product items have inherent structural information, e.g., the relations between product items and brands, and the relations among various product items (which may share the same multi-level categories).
Product treePermalink
-
Internal nodes : product categories
-
Leaf nodes : product items
-
extend the product tree by incorporating product brands
→ construct a product graph structure
Product Graph structurePermalink
-
of 4 product items
-
without the product graph structure as prior…
-
case 1) treat all product items equally
-
case 2) have to implicitly infer the inherent relationship
( but at the cost of accuracy loss )
-
Structural Temporal Attention Network (STANet)Permalink
-
predict product demands in MTS forecasting
-
using the graph structure above
-
incorporates both…
- 1) the product graph structure ……… via GAT
- 2) temporal characteristics of product items …… via GRU + temporal attention
-
both 1) & 2) may “change over time”
→ use “attention mechanism” to deal with these
-
based on “GAT”, “GRU”, “Special temporal attention”
2. Problem FormulationPermalink
Data : transaction records
- 1) time stamp
- 2) item ID
- ( + 4 product categories & 1product brand )
- 3) amount of sold
→ total of 8 fields
Pre-processing
- 1) change into MTS
- 2) for a certain category/brand : SUM
→ result : MTS of volumes of product (1) items, (2) categories, (3) brands
Adjacency Matrix
- info : product graph structure
Notation
- Np : # of product items
- N : # of product items + brands + categories
- X={x1,x2,…,xT}.
- where xt∈RN×1,t=1,2,…,T
- M∈RN×N : adjacency matrix
Goal :
-
Input : {xt,…,xt+τ−1}& (fixed) M
- Output : xt+τ−1+h
- model : fM:RN×τ→RN×1
Testing stage
- only need to calculate evaluation metric for the Np
3. FrameworkPermalink
(1) Graph Attention ComponentPermalink
key point :
-
capture the inter-dependencies between different variables
→ use GNN to capture it
-
dynamic : “inter-dependencies may change over time!”
→ use attention mechanism
First layer : multi-head “GAT” layer
-
input : X∈RN×τ & M∈RN×N
-
at time step t….
hit=σ(1K∑Kk=1∑j∈NiαkijWkxjt).
- xjt : sale of variable (product, or brand, or category) j at time step t
- Wk : linear transformation ( Wk∈RF×1 )
-
Ni : all the adjacent nodes of variable i
- K : # of multi-head attention
- σ : activation function
- αkij : coefficient
-
αkij=exp(LeakyReLU(fa(Wkxit,Wkxjt)))∑ℓ∈Niexp(LeakyReLU(fa(Wkxit,Wkxℓt))).
- fa : scoring function
-
output : XG∈RFN×τ
(2) Recurrent ComponentPermalink
- in the previous step…
- variable-to-variable relationships have been processed
- use GRU as recurrent layer
- input : XG∈RFN×τ
- notation
- dr : hidden size of GRU
- output : XR∈Rdr×τ
(3) Variable-wise Temporal attentionPermalink
previous step
-
1) GAT : captured “inter-dependencies”
-
2) recurrent component : captured “temporal patterns”
-
but, it could be “DYNAMIC”
→ use “TEMPORAL attention”
-
Temporal attention
- αt+τ−1=fa(Ht+τ−1,ht+τ−1).
- αt+τ−1∈Rτ×1,
- fa : scoring function
- ht+τ−1 : last hidden state of RNN
- Ht+τ−1=[ht,…,ht+τ−1].
“VARIABLE-wise” temporal attention
- various products may have rather different temporal characteristics such as periodicity!
- αit+τ−1=fa(Hit+τ−1,hit+τ−1).
- i= 1,2,…,dr
- attention mechanism is calculated for “a particular GRU hidden variable”
- weighted context vector of ith hidden variable :
- cit+τ−1=Hit+τ−1αit+τ−1.
- Hit+τ−1∈R1×τ.
- αit+τ−1∈Rτ×1.
- cit+τ−1=Hit+τ−1αit+τ−1.
- context vector of all hidden variables :
- ct+τ−1.
Calculate the final output for horizon h as
- yt+τ−1+h=W[ct+τ−1;ht+τ−1]+b.
- W∈RN×2dr and b∈R1×1
(4) Autoregressive ComponentPermalink
- just like LSTNet
- add an “autoregressive component”
- to capture the local trend of product demands
- linear bypass that predicts future demands from input data, to address the “scale problem”
- fit all product’s historical data into single layer
(5) Final PredictionPermalink
-
integrate the outputs of the..
- 1) neural network part
- 2) autoregressive component
( using an automatically learned weight )