Abstract
Deep reinforcement learning (DRL) agents can learn end-to-end portfolio allocation policies, but their performance is extremely sensitive to hyperparameter choices and they are blind to non-financial objectives such as ESG compliance. Bayesian optimization (BO) is a natural tool for tuning these agents, yet standard BO ignores domain constraints.
We introduce the ESG-Constrained Composite Acquisition (ECCA), a soft–hard constraint framework for BO that steers the search toward ESG-feasible configurations without discarding the infeasible landscape. ECCA combines a soft penalty — an adaptive β-weighted ESG score — with a hard probabilistic feasibility gate, enabling the optimizer to exploit constraint-boundary structure that hard-only methods miss.
We validate ECCA on synthetic constrained landscapes where it recovers the constrained optimum at zero Sharpe cost, on a 28-stock DJIA universe (out-of-sample 2023–2024) where it matches unconstrained Sharpe while raising portfolio ESG scores above the compliance threshold, and on the IBEX 35 (480 runs, 10 seeds × 4 ESG thresholds) where it maintains stable returns as the ESG constraint tightens — a regime in which hard-only BO collapses.