Dilated Recurrent Neural Networks (2017)Permalink
ContentsPermalink
- Abstract
- Introduction
- Dilated RNN
- Dilated recurrent skip-connection
- Exponentially Increasing Dilation
0. AbstractPermalink
RNN on long sequence… difficult!
[ 3 main challenges ]
- 1) complex dependencies
-
2) vanishing / exploding gradients
- 3) efficient parallelization
Simple, yet effective RNN structure, DilatedRNN
- key : dilated recurrent skip connections
- advantages
- reduce # of parameters
- enhance training efficiency
- match SOTA
1. IntroductionPermalink
attempts to overcome problems of RNNs
- LSTM, GRU, clockwork RNNs, phased LSTM, hierarchical multi-scale RNNs
Dilated CNNs
-
length of dependencies captured by dilated CNN is limited by its kernel size,
whereas an RNN’s autoregressive modeling can capture potentially infinitely long dependencies
→ introduce DilatedRNN
2. Dilated RNNPermalink
main ingredients of Dilated RNN
- 1) Dilated recurrent skip-connection
- 2) use of exponentially increasing dilation
(1) Dilated recurrent skip-connectionPermalink
(2) Exponentially Increasing DilationPermalink
-
stack dilated recurrent layers
- (similar to WaveNet) dilation increases exponentially across layers
- s(l) : dilation of the l-th layer
- s(l)=Ml−1,l=1,⋯,L.
- ex) figure 2 depicts an example with L=3 and M=2.
Benefits
- 1) makes different layers focus on different temporal resolutions
- 2) reduces the average length of paths between nodes at different timestamp
Generalized Dilated RNN
- does not start at one, but Ml0
- s(l)=M(l−1+l0),l=1,⋯,L and l0≥0.