International Core Journal of Engineering 2020-26 | Page 44
2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM)
Optimizing Ranking Algorithm in Recommender
System via Deep Reinforcement Learning
Jianhua Han 1, a , Yong Yu 2,b
Feng Liu 3,c , Ruiming Tang 4,d , Yuzhou Zhang 5,e
1,2
Data & Knowledge Management Lab
Professor of Computer Science department
1,2
Shanghai Jiao Tong University
a
[email protected]
b
[email protected]
Shanghai, China
3
Harbin Institute of Technology
4,5
Huawei Noah’s Ark Lab
c
[email protected]
d,e
{tangruiming, zhangyuzhou3}@huawei.com
Shenzhen, China
2
from two limitations.
Abstract—Recommender system, which attempts to narrow
down selections for users based on their preference, plays a
crucial role in many E-commerce platforms like Amazon and
Taobao. Sponsored search can be regarded as a major revenue
contributor of recommender system. The platform sorts the
items by a ranking function and charges the advertisers for the
users’ positive feedback (e.g., click). However, traditional
ranking strategies usually apply greedy ranking strategies
( . .,
∗
), which considers the recommendation
processes to be static and only focuses on the immediate reward.
In practice, recommendation processes may be highly
correlated with each other and cumulative reward needs to be
taken into account. To address these issues, this paper redefines
the ranking function and proposes a Deep Reinforcement
learning based Ranking Strategy (DRRS) to maximize the
cumulative reward of the platform. The experiments conducted
on real-world datasets demonstrate the effectiveness of the
proposed framework.
Firstly, traditional recommender systems consider the
recommendation procedure as a static process and may ignore
the dynamics of user preference, e.g., preference on previous
items will affect the feeling about the next recommended
items. Secondly, the traditional greedy ranking strategies are
designed to select the items which maximize the short-term
reward while completely overlooking whether these
recommended items are beneficial for the long-term reward
[7]. For example, if in the recommendation procedure, movie
Star Wars Ⅰ fails to be selected due to low immediate reward.
However, recommending Star Wars Ⅰ makes user want to
watch the other series of Star Wars, which in turn brings the
platform more cumulative reward. Therefore, ranking strategy
with cumulative reward considered is in need. In this paper we
,
), and our goal
redefine the ranking function as (
is to learn which maximizes the cumulative reward of the
platform.
Keywords—Recommender system, Actor-Critic network, Deep
reinforcement learning, Ranking
Reinforcement Learning (RL) [8] is an area of machine
learning concerned with how agents ought to take actions in
an interactive environment to maximize some pre-defined
cumulative reward. Due to its generality, RL is studied in
many disciplines, such as game theory, control theory,
operations research, etc. In traditional reinforcement learning,
the problem spaces are very limited and the possible states in
an environment are only a few. Then Deep Reinforcement
Learning (DRL) is proposed to deal with tasks with large state
spaces by applying deep neural networks. As an important
model of DRL, Deep Deterministic Policy Gradient (DDPG)
[9] can deal with the problem with continuous action spaces.
DDPG consists of two eponymous elements, Actor and Critic
network. An Actor is used to generate an action based on the
given state, and the Critic part leverages a deep neural network
to approximate the optimal state-action value function.
I. I NTRODUCTION
A recommender system is a subclass of information
filtering system that seeks to predict the "rating" or
"preference" a user would give to an item [1]. Sponsored
search, which has been widely used in industrial area [2], can
be regarded as a major revenue contributor of recommender
system. For traditional pay-per-click mechanism, the
platform designs a ranking function to sort the advertisements
that have bid the searched keywords, and the advertisers are
charged for each user’s click. The generalized first price (GFP)
mechanism [3] is applied as the charging mechanism in this
paper, which means for each click, the platform charges the
corresponding advertisers for their bid price. Traditionally, the
platform defines a ranking function as the product between the
bid price of the advertisement and the predicted Click-
Through Rate (pCTR) to maximize the immediate revenue of
the platform.
In this paper, we consider the recommendation procedure
between the platform and users as sequential interactions and
propose a Deep Reinforcement learning based Ranking
Strategy (DRRS) for sponsored search in recommender
,
) is defined
system. The new ranking function (
∗
. The Actor-Critic model is utilized to learn
as
the optimal which maximizes the cumulative reward of the
platform. To avoid hurting the performance of the commercial
platform, we build an offline recommender system to simulate
the user-item interactions for the given datasets. The main
contributions of this paper are summarized.
Most existing methods in recommender system including
collaborative filtering [4], hybrid methods [5] and content-
based methods [6] consider the whole recommendation
procedure as an independent process and do not take the bid
into account. Besides, traditional sponsored search ranking
methods usually apply greedy ranking strategies, e,g.,
∗
, which represents for expected revenue of selecting the
corresponding advertisement. However, these methods suffer
978-1-7281-4691-1/19/$31.00 ©2019 IEEE
DOI 10.1109/AIAM48774.2019.00011
22