[ Recommender System ]

13.[code] Wide and Deep Learning for Recommender System

( 참고 : Fastcampus 추천시스템 강의 )

paper : Wide and Deep Learning for Recommender System ( HT Cheng et al., 2016 )

( https://arxiv.org/abs/1606.07792 )

1. Data Preprocessing

  • 기본 전처리
dummy_genres_df = movies_df['genres'].str.get_dummies(sep='/')
train_genres_df = train_df['movie'].apply(lambda x: dummy_genres_df.loc[x])

dummy_grade_df = pd.get_dummies(movies_df['grade'], prefix='grade')
train_grade_df = train_df['movie'].apply(lambda x: dummy_grade_df.loc[x])

train_df['year'] = train_df.apply(lambda x: movies_df.loc[x['movie']]['year'], axis=1)
train_df = pd.concat([train_df, train_grade_df, train_genres_df], axis=1)


2. WIDE part

  • genre & grade간의 interaction을 반영하고 싶음
wide_cols = list(dummy_genres_df.columns) + list(dummy_grade_df.columns)


  • cross-product 을 생성
import itertools
from itertools import product  

unique_combinations = list(list(zip(wide_cols, element)) 
                           for element in product(wide_cols, repeat = len(wide_cols))) 

cross_cols = [item for sublist in unique_combinations for item in sublist]
cross_cols = list(set([x for x in cross_cols if x[0] != x[1]]))


3. DEEP part

  • embedding하고 싶은 feature들
z_dim=16
embed_cols = list(set([(x[0], z_dim) for x in cross_cols]))
continuous_cols = ['year']


최종적인 target

  • Binary Classification으로 풀 것 ( 10점 = 1 & 9점 이하 : 0 )
target = train_df['rate'].apply(lambda x: 1 if x > 9 else 0).values


4. WIDE & DEEP

from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
from pytorch_widedeep.models import Wide, DeepDense, WideDeep
from pytorch_widedeep.metrics import Accuracy


  • WIDE : WidePreprocessor ( wide_cols , crossed_cols )
preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
X_wide = preprocess_wide.fit_transform(train_df)
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)


  • DEEP :DensePreprocessor (embed_cols, continuous_cols)
preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=continuous_cols)
X_deep = preprocess_deep.fit_transform(train_df)
deepdense = DeepDense(
    hidden_layers=[64, 32],
    deep_column_idx=preprocess_deep.deep_column_idx,
    embed_input=preprocess_deep.embeddings_input,
    continuous_cols=continuous_cols,
)


5. Build model & Train

model = WideDeep(wide=wide, deepdense=deepdense)
model.compile(method="binary", metrics=[Accuracy])


model.fit(
    X_wide=X_wide,
    X_deep=X_deep,
    target=target,
    n_epochs=5,
    batch_size=256,
    val_split=0.1,
)

Categories:

Updated: