24. Personalizing Dialogue Agents : I have a dog, do you have pets too? (2018)
목차
- Abstract
- Introduction
- PERSONA-CHAT Dataset
- Models
- Baseline ranking models
- Ranking Profile Memory Network
- Key-Value Profile Memory Network
- Seq2Seq
- Generative Profile Memory Network
Abstract
Problems of Chit-chat models
- lack specificity
- do not display consistent personality
Present the task of making chit-chat more “engaging”, by conditioning on profile info
collect data & train models to…
- 1) condition on their given profile information
- 2) information about about the person they are talking to
1. Introduction
common issues :
- 1) lack of consistent personality
- \(\because\) different speakers
- 2) lack of explicit long-term memory
- \(\because\) trained to produce an utterance given only RECENT dialogue history
- 3) tendency to produce non-specific answers.. ex) “I don’t know”
\(\rightarrow\) due to being no good publicly available dataset!
Goal : make more ENGAGING chit-chat dialogue agents!
-
give them configurable & persistent persona
( encoded by multiple sentences = “profile” )
-
trained to both ask & answer questions about personal topics
-
resulting dialogue can be used to build a model of the persona of the speaking partner!
2. PERSONA-CHAT Dataset
- crowd-sourced dataset
- collect via Amazon Mechanical Turk
- each pair of speakers condition their dialogue on a given profile, which is provided
3. Models
2 classes of model, for next utterance prediction
- 1) ranking models
- produce next utterance, considering any utterance in the training set as a possible candidate reply
- 3-1) ~ 3-3)
- 2) generative models
- generate novel sentences, conditioning on the dialogue history
- generate the response word-by-word
- 3-4) ~ 3-5)
3-1) Baseline ranking models
- 1) IR basline & 2) Starspace
- similarity function \(sim (q, c')\)
- cosine similarity of the sum of word embeddings of query \(q\) and candidate \(c'\)
- in both 1) & 2), to incorporate the profile, concatenate it to the query vector
3-2) Ranking Profile Memory Network
Comparison
-
baseline models : use profile information by “combining it with the dialogue “
-
this model : use a memory network, with the dialogue history as input & then perform attention
\(q^{+}=q+\sum s_{i} p_{i}, \quad s_{i}=\operatorname{Softmax}\left(\operatorname{sim}\left(q, p_{i}\right)\right)\).
- then, rank candidates \(c^{\prime}\) using \(\operatorname{sim}\left(q^{+}, c^{\prime}\right) .\) One can
3-3) Key-Value Profile Memory Network
improvement to memory network, by performing attention over keys and outputting the values
- keys : dialog histories
- values : next dialogue utterances
3-4) Seq2Seq
-
encode input sentence by \(h_{t}^{e}=L S T M_{e n c}\left(x_{t} \mid h_{t-1}^{e}\right) .\)
-
use Glove for word embeddings
-
final hidden state \(h_{t}^{e}\) is fed into decoder as an initial state \(h_{0}^{d}\)
-
For each time step \(t,\) the decoder produces the probability of a word \(j\),
occurring in that place via the softmax, \(p\left(y_{t, j}=1 \mid y_{t-1}, \ldots, y_{1}\right)=\frac{\exp \left(w_{j} h_{t}^{d}\right)}{\sum_{j^{\prime}=1}^{K} \exp \left(w_{j^{\prime}} h_{t}^{d}\right)}\)
-
trained via negative log likelihood.
-
can be extended to include persona information!
\(x=\forall p \in\) \(P \mid \mid x,\) where \(\mid \mid\) denotes concatenation
3-5) Generative Profile Memory Network
-
generative model that encodes “each of the profile entries as individual memory representations in a memory network”
-
\(a_{t}=\operatorname{softmax}\left(F W_{a} h_{t}^{d}\right)\).
\(c_{t}=a_{t}^{\top} F ; \hat{x}_{t}=\tanh \left(W_{c}\left[c_{t-1}, x_{t}\right]\right)\).
-
if model has no profile information ( no memory ), becomes same as Seq2Seq in 3-4)