Accurate Demand Forecasting for Retails with Deep Neural NetworksPermalink

ContentsPermalink

  1. Abstract
  2. Introduction
  3. Problem Formulation
  4. Framework
    1. Graph Attention Component
    2. Recurrent Component
    3. Variable-wise Temporal attention
    4. Autoregressive Component
    5. Final Prediction


0. AbstractPermalink

Previous Works

  • prediction of each individual product item
  • adopt MTS forecasting approach

none of them leveraged “structural information of product items”

  • ex) product brand / multi-level categories


Proposal : DL-based prediction model to find…

  • 1) **inherent inter-dependencies **
  • 2) temporal characteristics

among product items!


1. IntroductionPermalink

Univariate TS model

  • ex) ARIMA, AR, MA, ARMA,…
  • treat each product separately


Multivariate TS model

  • take into account the “INTER-dependencies” among items
  • ex) VAR


DL-based models

  • ex) RNN, LSTM, GRU
  • ex2) LSTNet
    • 1) CNN + GRU for MTS forecasting
    • 2) special recurrent-skip component
      • to capture very long-term periodic patterns
    • 3) assumption : all variables in MTS have same periodicity


Existing prediction methods ignore that product items have inherent structural information, e.g., the relations between product items and brands, and the relations among various product items (which may share the same multi-level categories).


Product treePermalink

  • Internal nodes : product categories

  • Leaf nodes : product items

  • extend the product tree by incorporating product brands

    construct a product graph structure


Product Graph structurePermalink

figure2

  • of 4 product items

  • without the product graph structure as prior…

    • case 1) treat all product items equally

    • case 2) have to implicitly infer the inherent relationship

      ( but at the cost of accuracy loss )


Structural Temporal Attention Network (STANet)Permalink

  • predict product demands in MTS forecasting

  • using the graph structure above

  • incorporates both…

    • 1) the product graph structure ……… via GAT
    • 2) temporal characteristics of product items …… via GRU + temporal attention
  • both 1) & 2) may “change over time”

    use “attention mechanism” to deal with these

  • based on “GAT”, “GRU”, “Special temporal attention”


2. Problem FormulationPermalink

Data : transaction records

  • 1) time stamp
  • 2) item ID
    • ( + 4 product categories & 1product brand )
  • 3) amount of sold

total of 8 fields


Pre-processing

  • 1) change into MTS
  • 2) for a certain category/brand : SUM

result : MTS of volumes of product (1) items, (2) categories, (3) brands


Adjacency Matrix

  • info : product graph structure


Notation

  • Np : # of product items
  • N : # of product items + brands + categories
  • X={x1,x2,,xT}.
    • where xtRN×1,t=1,2,,T
  • MRN×N : adjacency matrix


Goal :

  • Input : {xt,,xt+τ1}& (fixed) M

  • Output : xt+τ1+h
  • model : fM:RN×τRN×1


Testing stage

  • only need to calculate evaluation metric for the Np


3. FrameworkPermalink

figure2


(1) Graph Attention ComponentPermalink

key point :

  • capture the inter-dependencies between different variables

    use GNN to capture it

  • dynamic : “inter-dependencies may change over time!”

    use attention mechanism


First layer : multi-head “GAT” layer

  • input : XRN×τ & MRN×N

  • at time step t….

    hit=σ(1KKk=1jNiαkijWkxjt).

    • xjt : sale of variable (product, or brand, or category) j at time step t
    • Wk : linear transformation ( WkRF×1 )
    • Ni : all the adjacent nodes of variable i

    • K : # of multi-head attention
    • σ : activation function
    • αkij : coefficient
  • αkij=exp(LeakyReLU(fa(Wkxit,Wkxjt)))Niexp(LeakyReLU(fa(Wkxit,Wkxt))).

    • fa : scoring function
  • output : XGRFN×τ


(2) Recurrent ComponentPermalink

  • in the previous step…
    • variable-to-variable relationships have been processed
  • use GRU as recurrent layer
  • input : XGRFN×τ
  • notation
    • dr : hidden size of GRU
  • output : XRRdr×τ


(3) Variable-wise Temporal attentionPermalink

previous step

  • 1) GAT : captured “inter-dependencies”

  • 2) recurrent component : captured “temporal patterns”

    • but, it could be “DYNAMIC”

      use “TEMPORAL attention”


Temporal attention

  • αt+τ1=fa(Ht+τ1,ht+τ1).
    • αt+τ1Rτ×1,
    • fa : scoring function
    • ht+τ1 : last hidden state of RNN
    • Ht+τ1=[ht,,ht+τ1].


“VARIABLE-wise” temporal attention

  • various products may have rather different temporal characteristics such as periodicity!
  • αit+τ1=fa(Hit+τ1,hit+τ1).
    • i= 1,2,,dr
  • attention mechanism is calculated for “a particular GRU hidden variable”
  • weighted context vector of ith hidden variable :
    • cit+τ1=Hit+τ1αit+τ1.
      • Hit+τ1R1×τ.
      • αit+τ1Rτ×1.
  • context vector of all hidden variables :
    • ct+τ1.


Calculate the final output for horizon h as

  • yt+τ1+h=W[ct+τ1;ht+τ1]+b.
    • WRN×2dr and bR1×1


(4) Autoregressive ComponentPermalink

  • just like LSTNet
  • add an “autoregressive component”
    • to capture the local trend of product demands
    • linear bypass that predicts future demands from input data, to address the “scale problem”
  • fit all product’s historical data into single layer


(5) Final PredictionPermalink

  • integrate the outputs of the..

    • 1) neural network part
    • 2) autoregressive component

    ( using an automatically learned weight )

Tags:

Categories: ,

Updated: