資産運用(ポートフォリオ管理)は、強化学習(reinforcement learning)技術を利用したAIの有望な応用先の一つになっています。しかし、既存手法には、拡張性や再利用性が乏しいという問題がありました。ある例に適した意思決定システムを強化学習で構築後、資産の変動や新しい種類のデータを入力する必要が生じても、システムを再構築することが容易ではなかったのです。

 そこで本研究では、構築されるシステムに拡張性や再利用性を持たせることを念頭に置き、資産運用に適したモジュール構成のマルチエージェント強化学習システム(MSPM)を開発しました。MSPMは、個々の資産(アセット)毎に用意されるEAM(Evolving Agent Module)とEAM群からの入力をもとに意思決定を行うSAM(Strategic Agent Module)の2種類のモジュールから成ります。ここで各EAMは非同期に学習構築が可能で、さらには再利用も可能であることから、MSPMは資産運用における拡張性にも優れています。

 過去8年間の米国株式市場データを用いた検証実験において、提案手法と代表的な既存手法5種との比較シミュレーションを行った結果、収益率において全ての既存手法を上回る性能結果を得ました。また、4種類の異なるポートフォリオを用いてMSPM内のシステム検証を行ったところ、EAMを働かせた場合は働かせない場合に比べて、収益率を大幅に向上させる効果をもたらすことが確認できました。

PDF資料
プレスリリース

研究代表者
筑波大学 システム情報系 知能機能工学域
田中 文英 准教授

関連リンク
システム情報系

Researchers from the University of Tsukuba have developed a deep reinforcement learning-based framework that enables portfolio managers to reallocate multiple portfolios with a large volume of assets at scale.

Tsukuba, Japan—The ability to predict movements in the stock market can be an extremely lucrative skill. For portfolio managers, who reallocate capital into the multiple assets of a portfolio, predicting price trends enables them to maximize capital returns.

Many approaches to price prediction have been taken over the years, and the formulas and patterns that make up technical analysis are now being replaced by deep learning-based methods, especially those based on a type of learning called deep reinforcement learning. However, existing reinforcement learning-based portfolio management systems tend to have a fixed architecture and lack a modular design, so they cannot be expanded with additional reinforcement learning agents or be applied to multiple portfolios. Moreover, they can only handle a limited number of assets or types of market information.

In a recent paper published on PLOS ONE, researchers from the University of Tsukuba describe a deep reinforcement learning-based framework for portfolio management that overcomes these problems. "By building this framework with a modular design," says Zhenhan Huang, lead author of the paper, "systems targeting different portfolios can share and be built with pre-trained modules, just like assembling LEGO bricks, in different configurations."

The proposed system consists of evolving agent modules, one for each asset, and strategic agent modules, one for each portfolio. An evolving agent module uses a deep Q-network to predict price trends based on historical prices and web news sentiment. A strategic agent module uses a proximal policy optimization agent to reallocate assets according to the information generated by the evolving agent modules.

"Separating the tasks of predicting trends and making strategic decisions has several advantages," Huang says. The evolving agent module only needs to be trained once for an asset like Alphabet Inc. before it can be used (and reused) for any portfolio that includes that asset. Moreover, the scalability of the system allows new assets with heterogeneous data or different reinforcement-learning agents to be added into existing portfolios without retraining the whole system. The modules in the system can also be run in parallel, increasing efficiency, and scalability.

The researchers compared the proposed system with several conventional portfolio management strategies and one cutting-edge RL-based method. They found that the system performed the best with respect to performance metrics such as the accumulated rate of return and daily rate of return, even under the extreme conditions of the US stock market during the global pandemic in the year 2020.

The modularity of the proposed system opens up exciting opportunities for its further development. The team used the deep Q-network and proximal policy optimization in the current implementation, but plan to implement other algorithms. They also plan to use other, unconventional sources of data such as satellite images to predict asset price trends.

Original Paper
The paper, "MSPM: A modularized and scalable multi-agent reinforcement learning-based system for financial portfolio management," is available from PLOS ONE with DOI: 10.1371/journal.pone.0263689

Correspondence
Associate Professor TANAKA Fumihide
Faculty of Engineering, Information and Systems, University of Tsukuba

Related Link
Faculty of Engineering, Information and Systems (in Japanese)