International Core Journal of Engineering 2020-26 | Page 48
Ranking Strategy (DRRS) is proposed for sponsored
searching in recommender system. DRRS treats the
recommender process as the sequential decision-making
process and maximizes the cumulative reward for the platform.
Different from the previous work, we consider the problem on
the side of platform and take bid price into consideration. To
avoid hurting the performance of the commercial platform, an
offline recommender system is built to simulate the user-item
interactions. Experiments conducted on real-world datasets
further demonstrate DRRS outperforms the traditional ranking
algorithms. Future work includes changing the form of the
ranking function.
achieves the best performance over all baseline methods on
both datasets. To verify the robustness of proposed method,
significance test has been conducted. Calculated p-values
under the evaluation metric on both datasets are less than
0.001, which validates the effectiveness of DRRS.
C. Convergence of the proposed model
To prove the correctness of the proposed model, the
convergence curve is illustrated in Fig. 3. It can be observed
that the performance curve (cumulative reward)
monotonically increases until convergence after about 80
iterations, which demonstrates the DRRS does obtain the
information from the training set, and utilize that information
for recommendation.
R EFERENCES
[1]
[2]
[3]
[4]
Fig. 3. Convergence curve of DRRS
[5]
D. Hyper-parameter Investigation
The impact of different hyper-parameters on the
performance of the proposed models is explored, based on
ML100k dataset in this section, including the number of
learning steps and number of hidden layers. Experiments are
conducted via holding other factors while varying the hyper-
parameters being explored.
[6]
[7]
[8]
[9]
[10]
(a) Impact of learning steps
[11]
(b) Impact of hidden layers
Fig. 4. Hyper-parameter Investigation
[12]
Number of Learning Steps: With the fixed evaluation
steps of 10, the influence of different learning steps is studied.
We compare different learning steps when training the model:
5, 10, 15 and 20. We can observe from Fig.4(a) that shorter
steps weaken the ability of the proposed model and the longer
steps bring a similar performance but with longer execution
time. Therefore, the number of training steps is set to 10 as
same as the evaluation step in this paper.
[13]
[14]
[15]
[16]
Number of Hidden Layers: We explore the impact of
different hidden layer depth on the performance of proposed
model. The hidden layer depth is set among {1, 2, 3, 4} on
both Actor and Critic network. The result is illustrated in
Fig.4(b). We can observe that at the beginning the
recommendation performance increases with the number of
hidden layers. However, with the depth grows deeper, the
performance decreases due to the overfitting problem.
Network with two hidden layers achieves best result and is
applied as the settings of DRRS.
[17]
[18]
[19]
[20]
[21]
V. C ONCLUSION
In this paper, a Deep Reinforcement learning based
26
Resnick, Paul, and Hal R. Varian. "Recommender systems."
Communications of the ACM 40.3 (1997): 56-58.
Cheng, Haibin, and Erick Cantú-Paz. "Personalized click prediction in
sponsored search." Proceedings of the third ACM international
conference on Web search and data mining. ACM, 2010.
Edelman, Benjamin, Michael Ostrovsky, and Michael Schwarz.
"Internet advertising and the generalized second-price auction: Selling
billions of dollars worth of keywords." American economic review
97.1 (2007): 242-259.
Das, Abhinandan S., et al. "Google news personalization: scalable
online collaborative filtering." Proceedings of the 16th international
conference on World Wide Web. ACM, 2007.
Liu, Jiahui, Peter Dolan, and Elin Rønby Pedersen. "Personalized news
recommendation based on click behavior." Proceedings of the 15th
international conference on Intelligent user interfaces. ACM, 2010.
Kompan, Michal, and Mária Bieliková. "Content-based news
recommendation." International conference on electronic commerce
and web technologies. Springer, Berlin, Heidelberg, 2010.
Shani, Guy, David Heckerman, and Ronen I. Brafman. "An MDP-
based recommender system." Journal of Machine Learning Research
6.Sep (2005): 1265-1295.
Sutton, Richard S., Andrew G. Barto, and Francis Bach. Reinforcement
learning: An introduction. MIT press, 1998.
Lillicrap, Timothy P., et al. "Continuous control with deep
reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
Silver, David, et al. "Mastering the game of Go with deep neural
networks and tree search." nature 529.7587 (2016): 484.
Levine, Sergey, et al. "End-to-end training of deep visuomotor
policies." The Journal of Machine Learning Research 17.1 (2016):
1334-1373.
Wu, Yuxin, and Yuandong Tian. "Training agent for first-person
shooter game with actor-critic curriculum learning." (2016).
Zhao, Xiangyu, et al. "Deep Reinforcement Learning for List-wise
Recommendations." arXiv preprint arXiv:1801.00209 (2017).
Zhao, Xiangyu, et al. "Deep Reinforcement Learning for Page-wise
Recommendations." arXiv preprint arXiv:1805.02343 (2018).
He, Li, et al. "Optimizing Sponsored Search Ranking Strategy by Deep
Reinforcement Learning." arXiv preprint arXiv:1803.07347 (2018).
Mnih, Volodymyr, et al. "Human-level control through deep
sreinforcement learning." Nature 518.7540 (2015): 529.
Mnih, Andriy, and Ruslan R. Salakhutdinov. "Probabilistic matrix
factorization." Advances in neural information processing systems.
2008.
Silver, David, et al. "Deterministic policy gradient algorithms." ICML.
2014
Bellman, Richard. Dynamic programming. Courier Corporation, 2013.
Lin, Long-Ji. Reinforcement learning for robots using neural networks.
No. CMU-CS-93-103. Carnegie-Mellon Univ Pittsburgh PA School of
Computer Science, 1993.
Mnih, Volodymyr, et al. "Playing atari with deep reinforcement
learning." arXiv preprint arXiv:1312.5602 (2013).