24. Personalizing Dialogue Agents : I have a dog, do you have pets too? (2018)

목차

  1. Abstract
  2. Introduction
  3. PERSONA-CHAT Dataset
  4. Models
    1. Baseline ranking models
    2. Ranking Profile Memory Network
    3. Key-Value Profile Memory Network
    4. Seq2Seq
    5. Generative Profile Memory Network


Abstract

Problems of Chit-chat models

  • lack specificity
  • do not display consistent personality


Present the task of making chit-chat more “engaging”, by conditioning on profile info

collect data & train models to…

  • 1) condition on their given profile information
  • 2) information about about the person they are talking to


1. Introduction

common issues :

  • 1) lack of consistent personality
    • \(\because\) different speakers
  • 2) lack of explicit long-term memory
    • \(\because\) trained to produce an utterance given only RECENT dialogue history
  • 3) tendency to produce non-specific answers.. ex) “I don’t know”

\(\rightarrow\) due to being no good publicly available dataset!


Goal : make more ENGAGING chit-chat dialogue agents!

  • give them configurable & persistent persona

    ( encoded by multiple sentences = “profile” )

  • trained to both ask & answer questions about personal topics

  • resulting dialogue can be used to build a model of the persona of the speaking partner!


2. PERSONA-CHAT Dataset

  • crowd-sourced dataset
  • collect via Amazon Mechanical Turk
  • each pair of speakers condition their dialogue on a given profile, which is provided


3. Models

2 classes of model, for next utterance prediction

  • 1) ranking models
    • produce next utterance, considering any utterance in the training set as a possible candidate reply
    • 3-1) ~ 3-3)
  • 2) generative models
    • generate novel sentences, conditioning on the dialogue history
    • generate the response word-by-word
    • 3-4) ~ 3-5)


3-1) Baseline ranking models

  • 1) IR basline & 2) Starspace
  • similarity function \(sim (q, c')\)
    • cosine similarity of the sum of word embeddings of query \(q\) and candidate \(c'\)
  • in both 1) & 2), to incorporate the profile, concatenate it to the query vector


3-2) Ranking Profile Memory Network

Comparison

  • baseline models : use profile information by “combining it with the dialogue “

  • this model : use a memory network, with the dialogue history as input & then perform attention


\(q^{+}=q+\sum s_{i} p_{i}, \quad s_{i}=\operatorname{Softmax}\left(\operatorname{sim}\left(q, p_{i}\right)\right)\).

  • then, rank candidates \(c^{\prime}\) using \(\operatorname{sim}\left(q^{+}, c^{\prime}\right) .\) One can


3-3) Key-Value Profile Memory Network

improvement to memory network, by performing attention over keys and outputting the values

  • keys : dialog histories
  • values : next dialogue utterances


3-4) Seq2Seq

  • encode input sentence by \(h_{t}^{e}=L S T M_{e n c}\left(x_{t} \mid h_{t-1}^{e}\right) .\)

  • use Glove for word embeddings

  • final hidden state \(h_{t}^{e}\) is fed into decoder as an initial state \(h_{0}^{d}\)

  • For each time step \(t,\) the decoder produces the probability of a word \(j\),

    occurring in that place via the softmax, \(p\left(y_{t, j}=1 \mid y_{t-1}, \ldots, y_{1}\right)=\frac{\exp \left(w_{j} h_{t}^{d}\right)}{\sum_{j^{\prime}=1}^{K} \exp \left(w_{j^{\prime}} h_{t}^{d}\right)}\)

  • trained via negative log likelihood.

  • can be extended to include persona information!

    \(x=\forall p \in\) \(P \mid \mid x,\) where \(\mid \mid\) denotes concatenation


3-5) Generative Profile Memory Network

  • generative model that encodes “each of the profile entries as individual memory representations in a memory network”

  • \(a_{t}=\operatorname{softmax}\left(F W_{a} h_{t}^{d}\right)\).

    \(c_{t}=a_{t}^{\top} F ; \hat{x}_{t}=\tanh \left(W_{c}\left[c_{t-1}, x_{t}\right]\right)\).

  • if model has no profile information ( no memory ), becomes same as Seq2Seq in 3-4)