Commit 1f26cac6 authored by seanlabor's avatar seanlabor
Browse files

first commit

parents
repo_token: a9eSNI8pkeeDAKwGtKKBSUPCaFIiQGvYU
service_name: travis-ci
\ No newline at end of file
**/.DS_Store
__pycache__/
*.pyc
/*.egg-info
.idea/
*.swp
*.wsn
*.swo
.scannerwork/
.vscode/
htmlcov/
sonar-project.properties
.coverage*
docs/rst
docs/sphinx
experiments/
dist/
rlcard/games/doudizhu/jsondata/
rlcard/agents/gin_rummy_human_agent/gui_cards/cards_png
language: python
python:
- "3.6"
- "3.7"
install:
- pip install -e .[torch]
before_script:
- pip install python-coveralls
- pip install pytest-cover
script:
- py.test tests/ --cov=rlcard
after_success:
- coveralls
# Contibuting Guide
Contribution to this project is greatly appreciated! If you find any bugs or have any feedback, please create an issue or send a pull request to fix the bug. If you want to contribute codes for new features, please contact [daochen.zha@tamu.edu](mailto:daochen.zha@tamu.edu) or [khlai@tamu.edu](mailto:khlai@tamu.edu). We currently have several plans. Please create an issue or contact us through emails if you have other suggestions.
## Roadmaps
* **Game Specific Configurations.** Now we plan to gradually support game specific configurations. Currently we only support specifying the number of players in Blackjack
* **Rule-based Agent and Pre-trained Models.** Provide more rule-based agents and pre-trained models to benchmark the evaluation. We currently have several models in `/models`.
* **More Games and Algorithms.** Develop more games and algorithms.
* **Hyperparameter Search** Search hyperparameters for each environment and update the best one in the example.
## How to Create a Pull Request
If this your first time to contribute to a project, kindly follow the following instructions. You may find [Creating a pull request](https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request) helpful. Mainly, you need to take the following steps to send a pull request:
* Click **Fork** in the upper-right corner of the project main page to create a new branch in your local Github.
* Clone the repo from your local repo in your Github.
* Make changes in your computer.
* Commit and push your local changes to your local repo.
* Send a pull request to merge your local branch to the branches in RLCard project.
## Testing Your Code
We strongly encourage you to write the testing code in parallel with your development. We use `unittest` in RLCard. An example is [Blackjack environment testing](tests/envs/test_blackjack_env.py).
## Making Configurable Environments
We take Blackjack as an Example to show how we can define game specific configurations in RLCard. The key points are highlighted as follows:
* We add a `DEFAULT_GAME_CONFIG` in [Blackjack Env](rlcard/envs/blackjack.py) to define the default values of the game configurations. Each field should start with `game_`
* Modify the game and environment according to the configurations. For example, we need to support multiple players in Blackjack.
* Modify [Env](rlcard/envs/env.py) to add your game to the `supported_envs`
* When making the environment, we pass the newly defined fields in `config`. For example, we pass `config={'game_player_num': 2}` for Blackjack.
Copyright (c) 2019 DATA Lab at Texas A&M University
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
\ No newline at end of file
# RLCard: A Toolkit for Reinforcement Learning in Card Games
<img width="500" src="https://dczha.com/files/rlcard/logo.jpg" alt="Logo" />
[![Build Status](https://travis-ci.org/datamllab/RLCard.svg?branch=master)](https://travis-ci.org/datamllab/RLCard)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/248eb15c086748a4bcc830755f1bd798)](https://www.codacy.com/manual/daochenzha/rlcard?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=datamllab/rlcard&amp;utm_campaign=Badge_Grade)
[![Coverage Status](https://coveralls.io/repos/github/datamllab/rlcard/badge.svg)](https://coveralls.io/github/datamllab/rlcard?branch=master)
[![Downloads](https://pepy.tech/badge/rlcard)](https://pepy.tech/project/rlcard)
[![Downloads](https://pepy.tech/badge/rlcard/month)](https://pepy.tech/project/rlcard)
[中文文档](README.zh-CN.md)
RLCard is a toolkit for Reinforcement Learning (RL) in card games. It supports multiple card environments with easy-to-use interfaces for implementing various reinforcement learning and searching algorithms. The goal of RLCard is to bridge reinforcement learning and imperfect information games. RLCard is developed by [DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University and community contributors.
* Official Website: [https://www.rlcard.org](https://www.rlcard.org)
* Tutorial in Jupyter Notebook: [https://github.com/datamllab/rlcard-tutorial](https://github.com/datamllab/rlcard-tutorial)
* Paper: [https://arxiv.org/abs/1910.04376](https://arxiv.org/abs/1910.04376)
* GUI: [RLCard-Showdown](https://github.com/datamllab/rlcard-showdown)
* Dou Dizhu Demo: [Demo](https://douzero.org/)
* Resources: [Awesome-Game-AI](https://github.com/datamllab/awesome-game-ai)
* Related Project: [DouZero Project](https://github.com/kwai/DouZero)
**Community:**
* **Slack**: Discuss in our [#rlcard-project](https://join.slack.com/t/rlcard/shared_invite/zt-rkvktsaq-xkMwz8BfKupCM6zGhO01xg) slack channel.
* **QQ Group**: Join our QQ group 665647450. Password: rlcardqqgroup
**News:**
* Please follow [DouZero](https://github.com/kwai/DouZero), a strong Dou Dizhu AI and the [ICML 2021 paper](https://arxiv.org/abs/2106.06135). An online demo is available [here](https://douzero.org/). The algorithm is also integrated in RLCard. See [Training DMC on Dou Dizhu](docs/toy-examples.md#training-dmc-on-dou-dizhu).
* Our package is used in [PettingZoo](https://github.com/PettingZoo-Team/PettingZoo). Please check it out!
* We have released RLCard-Showdown, GUI demo for RLCard. Please check out [here](https://github.com/datamllab/rlcard-showdown)!
* Jupyter Notebook tutorial available! We add some examples in R to call Python interfaces of RLCard with reticulate. See [here](docs/toy-examples-r.md)
* Thanks for the contribution of [@Clarit7](https://github.com/Clarit7) for supporting different number of players in Blackjack. We call for contributions for gradually making the games more configurable. See [here](CONTRIBUTING.md#making-configurable-environments) for more details.
* Thanks for the contribution of [@Clarit7](https://github.com/Clarit7) for the Blackjack and Limit Hold'em human interface.
* Now RLCard supports environment local seeding and multiprocessing. Thanks for the testing scripts provided by [@weepingwillowben](https://github.com/weepingwillowben).
* Human interface of NoLimit Holdem available. The action space of NoLimit Holdem has been abstracted. Thanks for the contribution of [@AdrianP-](https://github.com/AdrianP-).
* New game Gin Rummy and human GUI available. Thanks for the contribution of [@billh0420](https://github.com/billh0420).
* PyTorch implementation available. Thanks for the contribution of [@mjudell](https://github.com/mjudell).
## Cite this work
If you find this repo useful, you may cite:
Zha, Daochen, et al. "RLCard: A Platform for Reinforcement Learning in Card Games." IJCAI. 2020.
```bibtex
@inproceedings{zha2020rlcard,
title={RLCard: A Platform for Reinforcement Learning in Card Games},
author={Zha, Daochen and Lai, Kwei-Herng and Huang, Songyi and Cao, Yuanpu and Reddy, Keerthana and Vargas, Juan and Nguyen, Alex and Wei, Ruzhe and Guo, Junyu and Hu, Xia},
booktitle={IJCAI},
year={2020}
}
```
## Installation
Make sure that you have **Python 3.6+** and **pip** installed. We recommend installing the stable version of `rlcard` with `pip`:
```
pip3 install rlcard
```
The default installation will only include the card environments. To use PyTorch implementation of the training algorithms, run
```
pip3 install rlcard[torch]
```
If you are in China and the above command is too slow, you can use the mirror provided by Tsinghua University:
```
pip3 install rlcard -i https://pypi.tuna.tsinghua.edu.cn/simple
```
Alternatively, you can clone the latest version with (if you are in China and Github is slow, you can use the mirror in [Gitee](https://gitee.com/daochenzha/rlcard)):
```
git clone https://github.com/datamllab/rlcard.git
```
or only clone one branch to make it faster:
```
git clone -b master --single-branch --depth=1 https://github.com/datamllab/rlcard.git
```
Then install with
```
cd rlcard
pip3 install -e .
pip3 install -e .[torch]
```
We also provide [**conda** installation method](https://anaconda.org/toubun/rlcard):
```
conda install -c toubun rlcard
```
Conda installation only provides the card environments, you need to manually install Pytorch on your demands.
## Examples
A **short example** is as below.
```python
import rlcard
from rlcard.agents import RandomAgent
env = rlcard.make('blackjack')
env.set_agents([RandomAgent(num_actions=env.num_actions)])
print(env.num_actions) # 2
print(env.num_players) # 1
print(env.state_shape) # [[2]]
print(env.action_shape) # [None]
trajectories, payoffs = env.run()
```
RLCard can be flexibly connected to various algorithms. See the following examples:
* [Playing with random agents](docs/toy-examples.md#playing-with-random-agents)
* [Deep-Q learning on Blackjack](docs/toy-examples.md#deep-q-learning-on-blackjack)
* [Training CFR (chance sampling) on Leduc Hold'em](docs/toy-examples.md#training-cfr-on-leduc-holdem)
* [Having fun with pretrained Leduc model](docs/toy-examples.md#having-fun-with-pretrained-leduc-model)
* [Training DMC on Dou Dizhu](docs/toy-examples.md#training-dmc-on-dou-dizhu)
* [Evaluating Agents](docs/toy-examples.md#evaluating-agents)
## Demo
Run `examples/human/leduc_holdem_human.py` to play with the pre-trained Leduc Hold'em model. Leduc Hold'em is a simplified version of Texas Hold'em. Rules can be found [here](docs/games.md#leduc-holdem).
```
>> Leduc Hold'em pre-trained model
>> Start a new game!
>> Agent 1 chooses raise
=============== Community Card ===============
┌─────────┐
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
└─────────┘
=============== Your Hand ===============
┌─────────┐
│J │
│ │
│ │
│ ♥ │
│ │
│ │
│ J│
└─────────┘
=============== Chips ===============
Yours: +
Agent 1: +++
=========== Actions You Can Choose ===========
0: call, 1: raise, 2: fold
>> You choose action (integer):
```
We also provide a GUI for easy debugging. Please check [here](https://github.com/datamllab/rlcard-showdown/). Some demos:
![doudizhu-replay](https://github.com/datamllab/rlcard-showdown/blob/master/docs/imgs/doudizhu-replay.png?raw=true)
![leduc-replay](https://github.com/datamllab/rlcard-showdown/blob/master/docs/imgs/leduc-replay.png?raw=true)
## Available Environments
We provide a complexity estimation for the games on several aspects. **InfoSet Number:** the number of information sets; **InfoSet Size:** the average number of states in a single information set; **Action Size:** the size of the action space. **Name:** the name that should be passed to `rlcard.make` to create the game environment. We also provide the link to the documentation and the random example.
| Game | InfoSet Number | InfoSet Size | Action Size | Name | Usage |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------: | :---------------: | :---------: | :-------------: | :-----------------------------------------------------------------------------------------: |
| Blackjack ([wiki](https://en.wikipedia.org/wiki/Blackjack), [baike](https://baike.baidu.com/item/21%E7%82%B9/5481683?fr=aladdin)) | 10^3 | 10^1 | 10^0 | blackjack | [doc](docs/games.md#blackjack), [example](examples/blackjack_random.py) |
| Leduc Hold’em ([paper](http://poker.cs.ualberta.ca/publications/UAI05.pdf)) | 10^2 | 10^2 | 10^0 | leduc-holdem | [doc](docs/games.md#leduc-holdem), [example](examples/leduc_holdem_random.py) |
| Limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^14 | 10^3 | 10^0 | limit-holdem | [doc](docs/games.md#limit-texas-holdem), [example](examples/limit_holdem_random.py) |
| Dou Dizhu ([wiki](https://en.wikipedia.org/wiki/Dou_dizhu), [baike](https://baike.baidu.com/item/%E6%96%97%E5%9C%B0%E4%B8%BB/177997?fr=aladdin)) | 10^53 ~ 10^83 | 10^23 | 10^4 | doudizhu | [doc](docs/games.md#dou-dizhu), [example](examples/doudizhu_random.py) |
| Mahjong ([wiki](https://en.wikipedia.org/wiki/Competition_Mahjong_scoring_rules), [baike](https://baike.baidu.com/item/%E9%BA%BB%E5%B0%86/215)) | 10^121 | 10^48 | 10^2 | mahjong | [doc](docs/games.md#mahjong), [example](examples/mahjong_random.py) |
| No-limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [baike](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^162 | 10^3 | 10^4 | no-limit-holdem | [doc](docs/games.md#no-limit-texas-holdem), [example](examples/nolimit_holdem_random.py) |
| UNO ([wiki](https://en.wikipedia.org/wiki/Uno_\(card_game\)), [baike](https://baike.baidu.com/item/UNO%E7%89%8C/2249587)) | 10^163 | 10^10 | 10^1 | uno | [doc](docs/games.md#uno), [example](examples/uno_random.py) |
| Gin Rummy ([wiki](https://en.wikipedia.org/wiki/Gin_rummy), [baike](https://baike.baidu.com/item/%E9%87%91%E6%8B%89%E7%B1%B3/3471710)) | 10^52 | - | - | gin-rummy | [doc](docs/games.md#gin-rummy), [example](examples/gin_rummy_random.py) |
## Supported Algorithms
| Algorithm | example | reference |
| :--------------------------------------: | :-----------------------------------------: | :------------------------------------------------------------------------------------------------------: |
| Deep Monte-Carlo (DMC) | [examples/run\_dmc.py](examples/run_dmc.py) | [[paper]](https://arxiv.org/abs/2106.06135) |
| Deep Q-Learning (DQN) | [examples/run\_rl.py](examples/run_rl.py) | [[paper]](https://arxiv.org/abs/1312.5602) |
| Neural Fictitious Self-Play (NFSP) | [examples/run\_rl.py](examples/run_rl.py) | [[paper]](https://arxiv.org/abs/1603.01121) |
| Counterfactual Regret Minimization (CFR) | [examples/run\_cfr.py](examples/run_cfr.py) | [[paper]](http://papers.nips.cc/paper/3306-regret-minimization-in-games-with-incomplete-information.pdf) |
## Pre-trained and Rule-based Models
We provide a [model zoo](rlcard/models) to serve as the baselines.
| Model | Explanation |
| :--------------------------------------: | :------------------------------------------------------: |
| leduc-holdem-cfr | Pre-trained CFR (chance sampling) model on Leduc Hold'em |
| leduc-holdem-rule-v1 | Rule-based model for Leduc Hold'em, v1 |
| leduc-holdem-rule-v2 | Rule-based model for Leduc Hold'em, v2 |
| uno-rule-v1 | Rule-based model for UNO, v1 |
| limit-holdem-rule-v1 | Rule-based model for Limit Texas Hold'em, v1 |
| doudizhu-rule-v1 | Rule-based model for Dou Dizhu, v1 |
| gin-rummy-novice-rule | Gin Rummy novice rule model |
## API Cheat Sheet
### How to create an environment
You can use the the following interface to make an environment. You may optionally specify some configurations with a dictionary.
* **env = rlcard.make(env_id, config={})**: Make an environment. `env_id` is a string of a environment; `config` is a dictionary that specifies some environment configurations, which are as follows.
* `seed`: Default `None`. Set a environment local random seed for reproducing the results.
* `allow_step_back`: Default `False`. `True` if allowing `step_back` function to traverse backward in the tree.
* Game specific configurations: These fields start with `game_`. Currently, we only support `game_num_players` in Blackjack, .
Once the environemnt is made, we can access some information of the game.
* **env.num_actions**: The number of actions.
* **env.num_players**: The number of players.
* **env.state_shape**: The shape of the state space of the observations.
* **env.action_shape**: The shape of the action features (Dou Dizhu's action can encoded as features)
### What is state in RLCard
State is a Python dictionary. It consists of observation `state['obs']`, legal actions `state['legal_actions']`, raw observation `state['raw_obs']` and raw legal actions `state['raw_legal_actions']`.
### Basic interfaces
The following interfaces provide a basic usage. It is easy to use but it has assumtions on the agent. The agent must follow [agent template](docs/developping-algorithms.md).
* **env.set_agents(agents)**: `agents` is a list of `Agent` object. The length of the list should be equal to the number of the players in the game.
* **env.run(is_training=False)**: Run a complete game and return trajectories and payoffs. The function can be used after the `set_agents` is called. If `is_training` is `True`, it will use `step` function in the agent to play the game. If `is_training` is `False`, `eval_step` will be called instead.
### Advanced interfaces
For advanced usage, the following interfaces allow flexible operations on the game tree. These interfaces do not make any assumtions on the agent.
* **env.reset()**: Initialize a game. Return the state and the first player ID.
* **env.step(action, raw_action=False)**: Take one step in the environment. `action` can be raw action or integer; `raw_action` should be `True` if the action is raw action (string).
* **env.step_back()**: Available only when `allow_step_back` is `True`. Take one step backward. This can be used for algorithms that operate on the game tree, such as CFR (chance sampling).
* **env.is_over()**: Return `True` if the current game is over. Otherewise, return `False`.
* **env.get_player_id()**: Return the Player ID of the current player.
* **env.get_state(player_id)**: Return the state that corresponds to `player_id`.
* **env.get_payoffs()**: In the end of the game, return a list of payoffs for all the players.
* **env.get_perfect_information()**: (Currently only support some of the games) Obtain the perfect information at the current state.
## Library Structure
The purposes of the main modules are listed as below:
* [/examples](examples): Examples of using RLCard.
* [/docs](docs): Documentation of RLCard.
* [/tests](tests): Testing scripts for RLCard.
* [/rlcard/agents](rlcard/agents): Reinforcement learning algorithms and human agents.
* [/rlcard/envs](rlcard/envs): Environment wrappers (state representation, action encoding etc.)
* [/rlcard/games](rlcard/games): Various game engines.
* [/rlcard/models](rlcard/models): Model zoo including pre-trained models and rule models.
## More Documents
For more documentation, please refer to the [Documents](docs/README.md) for general introductions. API documents are available at our [website](http://www.rlcard.org).
## Contributing
Contribution to this project is greatly appreciated! Please create an issue for feedbacks/bugs. If you want to contribute codes, please refer to [Contributing Guide](./CONTRIBUTING.md). If you have any questions, please contact [Daochen Zha](https://github.com/daochenzha) with [daochen.zha@tamu.edu](mailto:daochen.zha@tamu.edu).
## Acknowledgements
We would like to thank JJ World Network Technology Co.,LTD for the generous support and all the contributions from the community contributors.
# RLCard: 卡牌游戏强化学习工具包
<img width="500" src="https://dczha.com/files/rlcard/logo.jpg" alt="Logo" />
[![Build Status](https://travis-ci.org/datamllab/RLCard.svg?branch=master)](https://travis-ci.org/datamllab/RLCard)
[![Codacy Badge](https://api.codacy.com/project/badge/Grade/248eb15c086748a4bcc830755f1bd798)](https://www.codacy.com/manual/daochenzha/rlcard?utm_source=github.com&amp;utm_medium=referral&amp;utm_content=datamllab/rlcard&amp;utm_campaign=Badge_Grade)
[![Coverage Status](https://coveralls.io/repos/github/datamllab/rlcard/badge.svg)](https://coveralls.io/github/datamllab/rlcard?branch=master)
[![Downloads](https://pepy.tech/badge/rlcard)](https://pepy.tech/project/rlcard)
[![Downloads](https://pepy.tech/badge/rlcard/month)](https://pepy.tech/project/rlcard)
[English README](README.md)
RLCard是一款卡牌游戏强化学习 (Reinforcement Learning, RL) 的工具包。 它支持多种卡牌游戏环境,具有易于使用的接口,以用于实现各种强化学习和搜索算法。 RLCard的目标是架起强化学习和非完全信息游戏之间的桥梁。 RLCard由[DATA Lab](http://faculty.cs.tamu.edu/xiahu/) at Texas A&M University以及社区贡献者共同开发.
* 官方网站:[https://www.rlcard.org](https://www.rlcard.org)
* Jupyter Notebook教程:[https://github.com/datamllab/rlcard-tutorial](https://github.com/datamllab/rlcard-tutorial)
* 论文:[https://arxiv.org/abs/1910.04376](https://arxiv.org/abs/1910.04376)
* 图形化界面:[RLCard-Showdown](https://github.com/datamllab/rlcard-showdown)
* 斗地主演示:[Demo](https://douzero.org/)
* 资源:[Awesome-Game-AI](https://github.com/datamllab/awesome-game-ai)
* 相关项目:[DouZero项目](https://github.com/kwai/DouZero)
**社区:**
* **Slack**: 在我们的[#rlcard-project](https://join.slack.com/t/rlcard/shared_invite/zt-rkvktsaq-xkMwz8BfKupCM6zGhO01xg) slack频道参与讨论.
* **QQ群**: 加入我们的QQ群665647450. 密码:rlcardqqgroup
**新闻:**
* 请关注[DouZero](https://github.com/kwai/DouZero), 一个强大的斗地主AI,以及[ICML 2021论文](https://arxiv.org/abs/2106.06135)。点击[此处](https://douzero.org/)进入在线演示。该算法同样集成到了RLCard中,详见[在斗地主中训练DMC](docs/toy-examples.md#training-dmc-on-dou-dizhu)
* 我们的项目被用在[PettingZoo](https://github.com/PettingZoo-Team/PettingZoo)中,去看看吧!
* 我们发布了RLCard的可视化演示项目:RLCard-Showdown。请点击[此处](https://github.com/datamllab/rlcard-showdown)查看详情!
* Jupyter Notebook教程发布了!我们添加了一些R语言的例子,包括用reticulate调用RLCard的Python接口。[点击](docs/toy-examples-r.md)查看详情。
* 感谢[@Clarit7](https://github.com/Clarit7)为支持不同人数的二十一点游戏(Blackjack)做出的贡献。我们欢迎更多的贡献,以使得RLCard中的游戏配置更加多样化。点击[这里](CONTRIBUTING.md#making-configurable-environments)查看详情。
* 感谢[@Clarit7](https://github.com/Clarit7)为二十一点游戏(Blackjack)和限注德州扑克的人机界面做出的贡献。
* RLCard现支持本地随机环境种子和多进程。感谢[@weepingwillowben](https://github.com/weepingwillowben)提供的测试脚本。
* 无限注德州扑克人机界面现已可用。无限注德州扑克的动作空间已被抽象化。感谢[@AdrianP-](https://github.com/AdrianP-)做出的贡献。
* 新游戏Gin Rummy以及其可视化人机界面现已可用,感谢[@billh0420](https://github.com/billh0420)做出的贡献。
* PyTorch实现现已可用,感谢[@mjudell](https://github.com/mjudell)做出的恭喜。
## 引用
如果本项目对您有帮助,请添加引用:
Zha, Daochen, et al. "RLCard: A Platform for Reinforcement Learning in Card Games." IJCAI. 2020.
```bibtex
@inproceedings{zha2020rlcard,
title={RLCard: A Platform for Reinforcement Learning in Card Games},
author={Zha, Daochen and Lai, Kwei-Herng and Huang, Songyi and Cao, Yuanpu and Reddy, Keerthana and Vargas, Juan and Nguyen, Alex and Wei, Ruzhe and Guo, Junyu and Hu, Xia},
booktitle={IJCAI},
year={2020}
}
```
## 安装
确保您已安装**Python 3.6+****pip**。我们推荐您使用`pip`安装稳定版本`rlcard`
```
pip3 install rlcard
```
默认安装方式只包括卡牌环境。如果想使用PyTorch实现的训练算法,运行
```
pip3 install rlcard[torch]
```
如果您访问较慢,国内用户可以通过清华镜像源安装:
```
pip3 install rlcard -i https://pypi.tuna.tsinghua.edu.cn/simple
```
或者,您可以克隆最新版本(如果您访问Github较慢,国内用户可以使用[Gitee镜像](https://gitee.com/daochenzha/rlcard)):
```
git clone https://github.com/datamllab/rlcard.git
```
或使只克隆一个分支以使其更快
```
git clone -b master --single-branch --depth=1 https://github.com/datamllab/rlcard.git
```
然后运行以下命令进行安装
```
cd rlcard
pip3 install -e .
pip3 install -e .[torch]
```
我们也提供[**conda**安装方法](https://anaconda.org/toubun/rlcard):
```
conda install -c toubun rlcard
```
Conda安装只包含卡牌环境,您需要按照您的需求手动安装PyTorch。
## 释例
以下是一个**小例子**
```python
import rlcard
from rlcard.agents import RandomAgent
env = rlcard.make('blackjack')
env.set_agents([RandomAgent(num_actions=env.num_actions)])
print(env.num_actions) # 2
print(env.num_players) # 1
print(env.state_shape) # [[2]]
print(env.action_shape) # [None]
trajectories, payoffs = env.run()
```
RLCard可以灵活地连接各种算法,参考以下例子:
* [小试随机智能体](docs/toy-examples.md#playing-with-random-agents)
* [Blackjack上的Deep-Q学习](docs/toy-examples.md#deep-q-learning-on-blackjack)
* [在Leduc Hold'em上训练CFR(机会抽样)](docs/toy-examples.md#training-cfr-on-leduc-holdem)
* [与预训练Leduc模型游玩](docs/toy-examples.md#having-fun-with-pretrained-leduc-model)
* [在斗地主上训练DMC](docs/toy-examples.md#training-dmc-on-dou-dizhu)
* [评估智能体](docs/toy-examples.md#evaluating-agents)
## 演示
运行`examples/human/leduc_holdem_human.py`来游玩预训练的Leduc Hold'em模型。Leduc Hold'em是简化版的德州扑克,具体规则可以参考[这里](docs/games.md#leduc-holdem)
```
>> Leduc Hold'em pre-trained model
>> Start a new game!
>> Agent 1 chooses raise
=============== Community Card ===============
┌─────────┐
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
│░░░░░░░░░│
└─────────┘
=============== Your Hand ===============
┌─────────┐
│J │
│ │
│ │
│ ♥ │
│ │
│ │
│ J│
└─────────┘
=============== Chips ===============
Yours: +
Agent 1: +++
=========== Actions You Can Choose ===========
0: call, 1: raise, 2: fold
>> You choose action (integer):
```
我们也提供图形界面以实现更便捷的调试,详情请查看[这里](https://github.com/datamllab/rlcard-showdown/)。以下是一些演示:
![斗地主回放](https://github.com/datamllab/rlcard-showdown/blob/master/docs/imgs/doudizhu-replay.png?raw=true)
![Leduc回放](https://github.com/datamllab/rlcard-showdown/blob/master/docs/imgs/leduc-replay.png?raw=true)
## 可用环境
我们从不同角度提供每种游戏的估算复杂度。
**InfoSet数量:** 信息集数量;**InfoSet尺寸:** 单个信息集的平均状态数量;**状态尺寸:** 状态空间的尺寸;**环境名:** 应该传入`rlcard.make`以创建新游戏环境的名称。除此之外,我们也提供每种环境的文档链接和随机智能体释例。
| 游戏 | InfoSet数量 | InfoSet尺寸 | 状态尺寸 | 环境名 | 用法 |
| :--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :-------------: | :---------------: | :---------: | :-------------: | :-----------------------------------------------------------------------------------------: |
| 二十一点 Blackjack ([wiki](https://en.wikipedia.org/wiki/Blackjack), [百科](https://baike.baidu.com/item/21%E7%82%B9/5481683?fr=aladdin)) | 10^3 | 10^1 | 10^0 | blackjack | [文档](docs/games.md#blackjack), [释例]](examples/blackjack_random.py) |
| Leduc Hold’em ([论文](http://poker.cs.ualberta.ca/publications/UAI05.pdf)) | 10^2 | 10^2 | 10^0 | leduc-holdem | [文档](docs/games.md#leduc-holdem), [释例](examples/leduc_holdem_random.py) |
| 限注德州扑克 Limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [百科](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^14 | 10^3 | 10^0 | limit-holdem | [文档](docs/games.md#limit-texas-holdem), [释例](examples/limit_holdem_random.py) |
| 斗地主 Dou Dizhu ([wiki](https://en.wikipedia.org/wiki/Dou_dizhu), [百科](https://baike.baidu.com/item/%E6%96%97%E5%9C%B0%E4%B8%BB/177997?fr=aladdin)) | 10^53 ~ 10^83 | 10^23 | 10^4 | doudizhu | [文档](docs/games.md#dou-dizhu), [释例](examples/doudizhu_random.py) |
| 麻将 Mahjong ([wiki](https://en.wikipedia.org/wiki/Competition_Mahjong_scoring_rules), [百科](https://baike.baidu.com/item/%E9%BA%BB%E5%B0%86/215)) | 10^121 | 10^48 | 10^2 | mahjong | [文档](docs/games.md#mahjong), [释例](examples/mahjong_random.py) |
| 无限注德州扑克 No-limit Texas Hold'em ([wiki](https://en.wikipedia.org/wiki/Texas_hold_%27em), [百科](https://baike.baidu.com/item/%E5%BE%B7%E5%85%8B%E8%90%A8%E6%96%AF%E6%89%91%E5%85%8B/83440?fr=aladdin)) | 10^162 | 10^3 | 10^4 | no-limit-holdem | [文档](docs/games.md#no-limit-texas-holdem), [释例](examples/nolimit_holdem_random.py) |
| UNO ([wiki](https://en.wikipedia.org/wiki/Uno_\(card_game\)), [百科](https://baike.baidu.com/item/UNO%E7%89%8C/2249587)) | 10^163 | 10^10 | 10^1 | uno | [文档](docs/games.md#uno), [释例](examples/uno_random.py) |
| Gin Rummy ([wiki](https://en.wikipedia.org/wiki/Gin_rummy), [百科](https://baike.baidu.com/item/%E9%87%91%E6%8B%89%E7%B1%B3/3471710)) | 10^52 | - | - | gin-rummy | [文档](docs/games.md#gin-rummy), [释例](examples/gin_rummy_random.py) |
## 支持算法
| 算法 | 释例 | 参考 |
| :--------------------------------------: | :-----------------------------------------: | :------------------------------------------------------------------------------------------------------: |
| 深度蒙特卡洛(Deep Monte-Carlo,DMC) | [examples/run\_dmc.py](examples/run_dmc.py) | [[论文]](https://arxiv.org/abs/2106.06135) |
| 深度Q学习 (Deep Q Learning, DQN) | [examples/run\_rl.py](examples/run_rl.py) | [[论文]](https://arxiv.org/abs/1312.5602) |
| 虚拟自我对局 (Neural Fictitious Self-Play,NFSP) | [examples/run\_rl.py](examples/run_rl.py) | [[论文]](https://arxiv.org/abs/1603.01121) |
| 虚拟遗憾最小化算法(Counterfactual Regret Minimization,CFR) | [examples/run\_cfr.py](examples/run_cfr.py) | [[论文]](http://papers.nips.cc/paper/3306-regret-minimization-in-games-with-incomplete-information.pdf) |
## 预训练和基于规则的模型
我们提供了一个[模型集合](rlcard/models)作为基准线。
| 模型 | 解释 |
| :--------------------------------------: | :------------------------------------------------------: |
| leduc-holdem-cfr | Leduc Hold'em上的预训练CFR(机会抽样)模型 |
| leduc-holdem-rule-v1 | 基于规则的Leduc Hold'em模型, v1 |
| leduc-holdem-rule-v2 | 基于规则的Leduc Hold'em模型, v2 |
| uno-rule-v1 | 基于规则的UNO模型,v1 |
| limit-holdem-rule-v1 | 基于规则的限注德州扑克模型,v1 |
| doudizhu-rule-v1 | 基于规则的斗地主模型,v1 |
| gin-rummy-novice-rule | Gin Rummy新手规则模型 |
## API小抄
### 如何创建新的环境
您可以使用以下的接口创建新环境,并且可以用字典传入一些可选配置项
* **env = rlcard.make(env_id, config={})**: 创建一个环境。`env_id`是环境的字符串代号;`config`是一个包含一些环境配置的字典,具体包括:
* `seed`:默认值`None`。设置一个本地随机环境种子用以复现结果。
* `allow_step_back`: 默认值`False`. `True`将允许`step_back`函数用以回溯遍历游戏树。
* 其他特定游戏配置:这些配置将以`game_`开头。目前我们只支持配置Blackjack游戏中的玩家数量`game_num_players`
环境创建完成后,我们就能访问一些游戏信息。
* **env.num_actions**: 状态数量。
* **env.num_players**: 玩家数量。
* **env.state_shape**: 观测到的状态空间的形状(shape)。
* **env.action_shape**: 状态特征的形状(shape),斗地主的状态可以被编码为特征。
### RLCard中的状态是什么
状态(State)是一个Python字典。它包括观测值`state['obs']`,合规动作`state['legal_actions']`,原始观测值`state['raw_obs']`和原始合规动作`state['raw_legal_actions']`
### 基础接口
以下接口提供基础功能,虽然其简单易用,但会对智能体做出一些前提假设。智能体必须符合[智能体模版](docs/developping-algorithms.md)
* **env.set_agents(agents)**: `agents``Agent`对象的列表。列表长度必须等于游戏中的玩家数量。
* **env.run(is_training=False)**: 运行一局完整游戏并返回轨迹(trajectories)和回报(payoffs)。该函数可以在`set_agents`被调用之后调用。如果`is_training`设定为`True`,它将使用智能体中的`step`函数来进行游戏;如果`is_training`设定为`False`,则会调用`eval_step`
### 高级接口
对于更高级的方法,可以使用以下接口来对游戏树进行更灵活的操作。这些接口不会对智能体有前提假设。
* **env.reset()**: 初始化一个游戏,返回状态和第一个玩家的ID。
* **env.step(action, raw_action=False)**: 推进环境到下一步骤。`action`可以是一个原始动作或整型数值;当传入原始动作(字符串)时,`raw_action`应该被设置为`True`
* **env.step_back()**: 只有当`allow_step_back`设定为`True`时可用,向后回溯一步。 该函数可以被用在需要操作游戏树的算法中,例如CFR(机会抽样)。
* **env.is_over()**: 如果当前游戏结束,则返回`True`,否则返回`False`
* **env.get_player_id()**: 返回当前玩家的ID。
* **env.get_state(player_id)**: 返回玩家ID`player_id`对应的状态。
* **env.get_payoffs()**: 在游戏结束时,返回所有玩家的回报(payoffs)列表。
* **env.get_perfect_information()**: (目前仅支持部分游戏)获取当前状态的完全信息。
## 库结构
主要模块的功能如下:
* [/examples](examples): 使用RLCard的一些样例。
* [/docs](docs): RLCard的文档。
* [/tests](tests): RLCard的测试脚本。
* [/rlcard/agents](rlcard/agents): 强化学习算法以及人类智能体。
* [/rlcard/envs](rlcard/envs): 环境包装(状态表述,动作编码等)。
* [/rlcard/games](rlcard/games): 不同的游戏引擎。
* [/rlcard/models](rlcard/models): 包括预训练模型和规则模型在内的模型集合。
## 更多文档
请参考[这里](docs/README.md)查阅更多文档[Documents](docs/README.md)。API文档在我们的[网站](http://www.rlcard.org)中。
## 贡献
我们非常感谢对本项目的贡献!请为反馈或漏洞创建Issue。如果您想恭喜代码,请参考[贡献指引](./CONTRIBUTING.md)。如果您有任何问题,请联系通过[daochen.zha@tamu.edu](mailto:daochen.zha@tamu.edu)联系[Daochen Zha](https://github.com/daochenzha)
## 致谢
我们诚挚的感谢竞技世界网络技术有限公司(JJ World Network Technology Co.,LTD)为本项目提供的大力支持,以及所有来自社区成员的贡献。
# Documents of RLCard
## Overview
The toolkit wraps each game by `Env` class with easy-to-use interfaces. The goal of this toolkit is to enable the users to focus on algorithm development without caring about the environment. The following design principles are applied when developing the toolkit:
* **Reproducible.** Results on the environments can be reproduced. The same result should be obtained with the same random seed in different runs.
* **Accessible.** The experiences are collected and well organized after each game with easy-to-use interfaces. Uses can conveniently configure state representation, action encoding, reward design, or even the game rules.
* **Scalable.** New card environments can be added conveniently into the toolkit with the above design principles. We also try to minimize the dependencies in the toolkit so that the codes can be easily maintained.
## User Guide
* [Toy examples](toy-examples.md)
* [RLCard high-level design](high-level-design.md)
* [Games in RLCard](games.md)
* [Algorithms in RLCard](algorithms.md)
## Developer Guide
* [Developping new algorithms](developping-algorithms.md)
* [Adding new environments](adding-new-environments.md)
* [Customizing environments](customizing-environments.md)
* [Adding pre-trained/rule-based models](adding-models.md)
## Application Programming Interface (API)
The API documents are and available at [Official Website](http://www.rlcard.org).
# Adding Pre-trained/Rule-based models
You can add your own pre-trained/rule-based models to the toolkit by following several steps:
* **Develop models.** You can either design a rule-based model or save a neural network model. For each game, you need to develop agents for all the players at the same time. You need to wrap each agent as a `Agent` class and make sure that `step`, `eval_step` and `use_raw` can work correctly.
* **Wrap models.** You need to inherit the `Model` class in `rlcard/models/model.py`. Then put all the agents into a list. Rewrite `agent` property to return this list.
* **Register the model.** Register the model in `rlcard/models/__init__.py`.
* **Load the model in environment.** An example of loading `leduc-holdem-nfsp` model is as follows:
```python
from rlcard import models