International Core Journal of Engineering 2020-26

2019 International Conference on Artificial Intelligence and Advanced Manufacturing (AIAM) Optimizing Ranking Algorithm in Recommender System via Deep Reinforcement Learning Jianhua Han 1, a , Yong Yu 2,b Feng Liu 3,c , Ruiming Tang 4,d , Yuzhou Zhang 5,e 1,2 Data & Knowledge Management Lab Professor of Computer Science department 1,2 Shanghai Jiao Tong University a [email protected] b [email protected] Shanghai, China 3 Harbin Institute of Technology 4,5 Huawei Noah’s Ark Lab c [email protected] d,e {tangruiming, zhangyuzhou3}@huawei.com Shenzhen, China 2 from two limitations. Abstract—Recommender system, which attempts to narrow down selections for users based on their preference, plays a crucial role in many E-commerce platforms like Amazon and Taobao. Sponsored search can be regarded as a major revenue contributor of recommender system. The platform sorts the items by a ranking function and charges the advertisers for the users’ positive feedback (e.g., click). However, traditional ranking strategies usually apply greedy ranking strategies ( . ., ∗ ), which considers the recommendation processes to be static and only focuses on the immediate reward. In practice, recommendation processes may be highly correlated with each other and cumulative reward needs to be taken into account. To address these issues, this paper redefines the ranking function and proposes a Deep Reinforcement learning based Ranking Strategy (DRRS) to maximize the cumulative reward of the platform. The experiments conducted on real-world datasets demonstrate the effectiveness of the proposed framework. Firstly, traditional recommender systems consider the recommendation procedure as a static process and may ignore the dynamics of user preference, e.g., preference on previous items will affect the feeling about the next recommended items. Secondly, the traditional greedy ranking strategies are designed to select the items which maximize the short-term reward while completely overlooking whether these recommended items are beneficial for the long-term reward [7]. For example, if in the recommendation procedure, movie Star Wars Ⅰ fails to be selected due to low immediate reward. However, recommending Star Wars Ⅰ makes user want to watch the other series of Star Wars, which in turn brings the platform more cumulative reward. Therefore, ranking strategy with cumulative reward considered is in need. In this paper we , ), and our goal redefine the ranking function as ( is to learn which maximizes the cumulative reward of the platform. Keywords—Recommender system, Actor-Critic network, Deep reinforcement learning, Ranking Reinforcement Learning (RL) [8] is an area of machine learning concerned with how agents ought to take actions in an interactive environment to maximize some pre-defined cumulative reward. Due to its generality, RL is studied in many disciplines, such as game theory, control theory, operations research, etc. In traditional reinforcement learning, the problem spaces are very limited and the possible states in an environment are only a few. Then Deep Reinforcement Learning (DRL) is proposed to deal with tasks with large state spaces by applying deep neural networks. As an important model of DRL, Deep Deterministic Policy Gradient (DDPG) [9] can deal with the problem with continuous action spaces. DDPG consists of two eponymous elements, Actor and Critic network. An Actor is used to generate an action based on the given state, and the Critic part leverages a deep neural network to approximate the optimal state-action value function. I. I NTRODUCTION A recommender system is a subclass of information filtering system that seeks to predict the "rating" or "preference" a user would give to an item [1]. Sponsored search, which has been widely used in industrial area [2], can be regarded as a major revenue contributor of recommender system. For traditional pay-per-click mechanism, the platform designs a ranking function to sort the advertisements that have bid the searched keywords, and the advertisers are charged for each user’s click. The generalized first price (GFP) mechanism [3] is applied as the charging mechanism in this paper, which means for each click, the platform charges the corresponding advertisers for their bid price. Traditionally, the platform defines a ranking function as the product between the bid price of the advertisement and the predicted Click- Through Rate (pCTR) to maximize the immediate revenue of the platform. In this paper, we consider the recommendation procedure between the platform and users as sequential interactions and propose a Deep Reinforcement learning based Ranking Strategy (DRRS) for sponsored search in recommender , ) is defined system. The new ranking function ( ∗ . The Actor-Critic model is utilized to learn as the optimal which maximizes the cumulative reward of the platform. To avoid hurting the performance of the commercial platform, we build an offline recommender system to simulate the user-item interactions for the given datasets. The main contributions of this paper are summarized. Most existing methods in recommender system including collaborative filtering [4], hybrid methods [5] and content- based methods [6] consider the whole recommendation procedure as an independent process and do not take the bid into account. Besides, traditional sponsored search ranking methods usually apply greedy ranking strategies, e,g., ∗ , which represents for expected revenue of selecting the corresponding advertisement. However, these methods suffer 978-1-7281-4691-1/19/$31.00 ©2019 IEEE DOI 10.1109/AIAM48774.2019.00011 22

International Core Journal of Engineering 2020-26 | Page 44