Multi-objective reinforcement learning for bi-objective time-dependent pickup and delivery problem with late penalties

Santiyuda, G. and Wardoyo, R. and Pulungan, R. and Yu, V.F. (2024) Multi-objective reinforcement learning for bi-objective time-dependent pickup and delivery problem with late penalties. Engineering Applications of Artificial Intelligence, 128. ISSN 09521976

[thumbnail of 1-s2.0-S0952197623015658-main.pdf]

Text
1-s2.0-S0952197623015658-main.pdf - Published Version
Restricted to Registered users only
Download (2MB)

Official URL: https://www.sciencedirect.com/science/article/pii/...

Abstract

This study addresses the bi-objective time-dependent pickup and delivery problem with late penalties (TDPDPLP). Incorporating time-dependent travel time into the problem formulation to model traffic congestion is critical, especially for problems with time-related costs, to decrease the difference in the projected quality of solutions when applying optimization methods in the real world. This study proposes a multi-objective reinforcement learning (MORL)-based method with hypernetwork and heterogeneous attention mechanism (HAM) with a two-stage training scheme to solve the bi-objective TDPDPLP. The proposed method can instantly generate an approximation of the Pareto optimal front (POF) after offline training. The conducted ablation study also showed that discarding coordinates from the features simplifies the model and saves several hours of training while improving the quality of the solutions. The performance of the trained model is evaluated on various instances, including real-world-based instances from Barcelona, Berlin, and Porto-Alegre. The performance of the proposed method is evaluated based on the hypervolume (HV) and additive epsilon (Ïµ+) of the generated POF. We compare the performance of the proposed method to another MORL method, namely the preference-conditioned multi-objective combinatorial optimization (PMOCO) and several well-known multi-objective evolutionary algorithms (MOEAs). Experiments showed that the proposed method performs better than PMOCO and the employed MOEAs on various problem instances. The trained method only needs minutes to generate a POF approximation, while the MOEAs require hours. Furthermore, it also generalizes well on different characteristics of problem instances and performs well on instances from cities other than the city in the training instances. Â© 2023 Elsevier Ltd

Item Type:	Article
Additional Information:	cited By 0
Uncontrolled Keywords:	Combinatorial optimization; Deep learning; Evolutionary algorithms; Pareto principle; Pickups; Traffic congestion; Travel time, Attention mechanisms; Bi objectives; Deep reinforcement learning; Hypernetwork; Multi objective; Pareto-optimal front; Performance; Pickup and delivery problems; Reinforcement learnings; Time dependent, Reinforcement learning
Subjects:	T Technology > TK Electrical engineering. Electronics Nuclear engineering > Electronics T Technology > TK Electrical engineering. Electronics Nuclear engineering > Electronics > Computer engineering. Computer hardware
Divisions:	Faculty of Engineering > Electrical and Information Technology Department
Depositing User:	Arif Surachman
Date Deposited:	28 Aug 2024 03:26
Last Modified:	28 Aug 2024 03:26
URI:	https://ir.lib.ugm.ac.id/id/eprint/18

Actions (login required)

: View Item