[ Recommender System ]
13.[code] Wide and Deep Learning for Recommender System
( 참고 : Fastcampus 추천시스템 강의 )
paper : Wide and Deep Learning for Recommender System ( HT Cheng et al., 2016 )
( https://arxiv.org/abs/1606.07792 )
- Tensorflow : tf.keras.experimental.WideDeepModel & tf.estimator.DNNLinearCombinedClassifier
- Pytorch : pytorch-widedeep
1. Data Preprocessing
- 기본 전처리
dummy_genres_df = movies_df['genres'].str.get_dummies(sep='/')
train_genres_df = train_df['movie'].apply(lambda x: dummy_genres_df.loc[x])
dummy_grade_df = pd.get_dummies(movies_df['grade'], prefix='grade')
train_grade_df = train_df['movie'].apply(lambda x: dummy_grade_df.loc[x])
train_df['year'] = train_df.apply(lambda x: movies_df.loc[x['movie']]['year'], axis=1)
train_df = pd.concat([train_df, train_grade_df, train_genres_df], axis=1)
2. WIDE part
- genre & grade간의 interaction을 반영하고 싶음
wide_cols = list(dummy_genres_df.columns) + list(dummy_grade_df.columns)
- cross-product 을 생성
import itertools
from itertools import product
unique_combinations = list(list(zip(wide_cols, element))
for element in product(wide_cols, repeat = len(wide_cols)))
cross_cols = [item for sublist in unique_combinations for item in sublist]
cross_cols = list(set([x for x in cross_cols if x[0] != x[1]]))
3. DEEP part
- embedding하고 싶은 feature들
z_dim=16
embed_cols = list(set([(x[0], z_dim) for x in cross_cols]))
continuous_cols = ['year']
최종적인 target
- Binary Classification으로 풀 것 ( 10점 = 1 & 9점 이하 : 0 )
target = train_df['rate'].apply(lambda x: 1 if x > 9 else 0).values
4. WIDE & DEEP
from pytorch_widedeep.preprocessing import WidePreprocessor, DensePreprocessor
from pytorch_widedeep.models import Wide, DeepDense, WideDeep
from pytorch_widedeep.metrics import Accuracy
- WIDE :
WidePreprocessor ( wide_cols , crossed_cols )
preprocess_wide = WidePreprocessor(wide_cols=wide_cols, crossed_cols=cross_cols)
X_wide = preprocess_wide.fit_transform(train_df)
wide = Wide(wide_dim=np.unique(X_wide).shape[0], pred_dim=1)
- DEEP :
DensePreprocessor (embed_cols, continuous_cols)
preprocess_deep = DensePreprocessor(embed_cols=embed_cols, continuous_cols=continuous_cols)
X_deep = preprocess_deep.fit_transform(train_df)
deepdense = DeepDense(
hidden_layers=[64, 32],
deep_column_idx=preprocess_deep.deep_column_idx,
embed_input=preprocess_deep.embeddings_input,
continuous_cols=continuous_cols,
)
5. Build model & Train
model = WideDeep(wide=wide, deepdense=deepdense)
model.compile(method="binary", metrics=[Accuracy])
model.fit(
X_wide=X_wide,
X_deep=X_deep,
target=target,
n_epochs=5,
batch_size=256,
val_split=0.1,
)