International Core Journal of Engineering 2020-26 | Page 48

Ranking Strategy (DRRS) is proposed for sponsored searching in recommender system. DRRS treats the recommender process as the sequential decision-making process and maximizes the cumulative reward for the platform. Different from the previous work, we consider the problem on the side of platform and take bid price into consideration. To avoid hurting the performance of the commercial platform, an offline recommender system is built to simulate the user-item interactions. Experiments conducted on real-world datasets further demonstrate DRRS outperforms the traditional ranking algorithms. Future work includes changing the form of the ranking function. achieves the best performance over all baseline methods on both datasets. To verify the robustness of proposed method, significance test has been conducted. Calculated p-values under the evaluation metric on both datasets are less than 0.001, which validates the effectiveness of DRRS. C. Convergence of the proposed model To prove the correctness of the proposed model, the convergence curve is illustrated in Fig. 3. It can be observed that the performance curve (cumulative reward) monotonically increases until convergence after about 80 iterations, which demonstrates the DRRS does obtain the information from the training set, and utilize that information for recommendation. R EFERENCES [1] [2] [3] [4] Fig. 3. Convergence curve of DRRS [5] D. Hyper-parameter Investigation The impact of different hyper-parameters on the performance of the proposed models is explored, based on ML100k dataset in this section, including the number of learning steps and number of hidden layers. Experiments are conducted via holding other factors while varying the hyper- parameters being explored. [6] [7] [8] [9] [10] (a) Impact of learning steps [11] (b) Impact of hidden layers Fig. 4. Hyper-parameter Investigation [12] Number of Learning Steps: With the fixed evaluation steps of 10, the influence of different learning steps is studied. We compare different learning steps when training the model: 5, 10, 15 and 20. We can observe from Fig.4(a) that shorter steps weaken the ability of the proposed model and the longer steps bring a similar performance but with longer execution time. Therefore, the number of training steps is set to 10 as same as the evaluation step in this paper. [13] [14] [15] [16] Number of Hidden Layers: We explore the impact of different hidden layer depth on the performance of proposed model. The hidden layer depth is set among {1, 2, 3, 4} on both Actor and Critic network. The result is illustrated in Fig.4(b). We can observe that at the beginning the recommendation performance increases with the number of hidden layers. However, with the depth grows deeper, the performance decreases due to the overfitting problem. Network with two hidden layers achieves best result and is applied as the settings of DRRS. [17] [18] [19] [20] [21] V. C ONCLUSION In this paper, a Deep Reinforcement learning based 26 Resnick, Paul, and Hal R. Varian. "Recommender systems." Communications of the ACM 40.3 (1997): 56-58. Cheng, Haibin, and Erick Cantú-Paz. "Personalized click prediction in sponsored search." Proceedings of the third ACM international conference on Web search and data mining. ACM, 2010. Edelman, Benjamin, Michael Ostrovsky, and Michael Schwarz. "Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords." American economic review 97.1 (2007): 242-259. Das, Abhinandan S., et al. "Google news personalization: scalable online collaborative filtering." Proceedings of the 16th international conference on World Wide Web. ACM, 2007. Liu, Jiahui, Peter Dolan, and Elin Rønby Pedersen. "Personalized news recommendation based on click behavior." Proceedings of the 15th international conference on Intelligent user interfaces. ACM, 2010. Kompan, Michal, and Mária Bieliková. "Content-based news recommendation." International conference on electronic commerce and web technologies. Springer, Berlin, Heidelberg, 2010. Shani, Guy, David Heckerman, and Ronen I. Brafman. "An MDP- based recommender system." Journal of Machine Learning Research 6.Sep (2005): 1265-1295. Sutton, Richard S., Andrew G. Barto, and Francis Bach. Reinforcement learning: An introduction. MIT press, 1998. Lillicrap, Timothy P., et al. "Continuous control with deep reinforcement learning." arXiv preprint arXiv:1509.02971 (2015). Silver, David, et al. "Mastering the game of Go with deep neural networks and tree search." nature 529.7587 (2016): 484. Levine, Sergey, et al. "End-to-end training of deep visuomotor policies." The Journal of Machine Learning Research 17.1 (2016): 1334-1373. Wu, Yuxin, and Yuandong Tian. "Training agent for first-person shooter game with actor-critic curriculum learning." (2016). Zhao, Xiangyu, et al. "Deep Reinforcement Learning for List-wise Recommendations." arXiv preprint arXiv:1801.00209 (2017). Zhao, Xiangyu, et al. "Deep Reinforcement Learning for Page-wise Recommendations." arXiv preprint arXiv:1805.02343 (2018). He, Li, et al. "Optimizing Sponsored Search Ranking Strategy by Deep Reinforcement Learning." arXiv preprint arXiv:1803.07347 (2018). Mnih, Volodymyr, et al. "Human-level control through deep sreinforcement learning." Nature 518.7540 (2015): 529. Mnih, Andriy, and Ruslan R. Salakhutdinov. "Probabilistic matrix factorization." Advances in neural information processing systems. 2008. Silver, David, et al. "Deterministic policy gradient algorithms." ICML. 2014 Bellman, Richard. Dynamic programming. Courier Corporation, 2013. Lin, Long-Ji. Reinforcement learning for robots using neural networks. No. CMU-CS-93-103. Carnegie-Mellon Univ Pittsburgh PA School of Computer Science, 1993. Mnih, Volodymyr, et al. "Playing atari with deep reinforcement learning." arXiv preprint arXiv:1312.5602 (2013).