banditpylib.protocols
¶
Classes¶
Protocol
: Abstract class for a communication protocol which defines the principles ofSinglePlayerProtocol
: Single player protocolCollaborativeLearningProtocol
: Collaborative learning protocol [TZZ19]
- class banditpylib.protocols.Protocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.Learner])[source]¶
Abstract class for a communication protocol which defines the principles of the interactions between the learner and the bandit environment.
- Parameters
Inheritance
- abstract property name: str¶
Protocol name
- play(trials: int, output_filename: str, processes: int = - 1, debug: bool = False, intermediate_horizons: List[int] = [], horizon: int = inf)[source]¶
Start playing the game
- Parameters
trials – number of repetitions
output_filename – name of the file used to dump the simulation results
processes – maximum number of processes to run. -1 means no limit
debug – debug mode. When it is set to True, trials will be automatically set to 1 and debug information of the trial will be printed out.
intermediate_horizons – report intermediate regrets after these horizons
horizon – horizon of the game. Different protocols may have different interpretations.
Warning
By default, output_filename will be opened with mode a.
- class banditpylib.protocols.SinglePlayerProtocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.SinglePlayerLearner])[source]¶
Single player protocol
This class defines the communication protocol for the ordinary single-player game. The game runs in rounds and during each round, the protocol runs the following steps in sequence:
fetch the state of the bandit environment and ask the learner for actions;
send the actions to the bandit environment for execution;
update the learner with the feedback of the bandit environment.
The game runs until one of the following two stopping conditions is satisfied:
no actions are returned by the learner;
total number of actions achieve horizon.
- Parameters
bandit (Bandit) – bandit environment
learners (List[SinglePlayerLearner]) – learners to be compared with
Note
During a round, a learner may want to perform multiple actions, which is so-called batched learner.
Inheritance
- property name: str¶
Protocol name
- class banditpylib.protocols.CollaborativeLearningProtocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.CollaborativeLearner])[source]¶
Collaborative learning protocol [TZZ19]
This class defines the communication protocol for the collaborative learning multi-agent game as discussed in the reference paper. The game runs in rounds. During each round, the protocol runs the following steps in sequence:
For each agent,
fetch the state of the corresponding bandit environment and ask the agent for actions;
send the actions to the bandit environment for execution;
update the agent with the feedback of the bandit environment;
repeat the above steps until the agent enters the WAIT or STOP state.
If there is at least one agent in WAIT state, then fetch information broadcasted from every waiting agent and send them to master to decide arm assignment of next round. Otherwise, stop the game.
- Parameters
bandit (Bandit) – bandit environment
learners (List[CollaborativeLearner]) – learners that will be compared with
Note
Each agent interacts with an independent bandit environment.
Note
Each action counts as a timestep. The time (or sample) complexity equals to the maximum number of pulls across different agents.
Note
According to the protocol, number of rounds always equals to number of communication rounds plus one.
Inheritance
- property name: str¶
Protocol name