`banditpylib.protocols`¶

Classes

Classes ¶

Protocol: Abstract class for a communication protocol which defines the principles of
SinglePlayerProtocol: Single player protocol
CollaborativeLearningProtocol: Collaborative learning protocol [TZZ19]

class banditpylib.protocols.Protocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.Learner])[source]¶

Abstract class for a communication protocol which defines the principles of the interactions between the learner and the bandit environment.

Parameters

bandit (Bandit) – bandit environment
learners (List[Learner]) – learners used to run simulations

Inheritance

abstract property name: str¶: Protocol name

play(trials: int, output_filename: str, processes: int = - 1, debug: bool = False, intermediate_horizons: List[int] = [], horizon: int = inf)[source]¶

Start playing the game

Parameters

trials – number of repetitions
output_filename – name of the file used to dump the simulation results
processes – maximum number of processes to run. -1 means no limit
debug – debug mode. When it is set to True, trials will be automatically set to 1 and debug information of the trial will be printed out.
intermediate_horizons – report intermediate regrets after these horizons
horizon – horizon of the game. Different protocols may have different interpretations.

Warning

By default, output_filename will be opened with mode a.

class banditpylib.protocols.SinglePlayerProtocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.SinglePlayerLearner])[source]¶

Single player protocol

This class defines the communication protocol for the ordinary single-player game. The game runs in rounds and during each round, the protocol runs the following steps in sequence:

fetch the state of the bandit environment and ask the learner for actions;
send the actions to the bandit environment for execution;
update the learner with the feedback of the bandit environment.

The game runs until one of the following two stopping conditions is satisfied:

no actions are returned by the learner;
total number of actions achieve horizon.

Parameters

bandit (Bandit) – bandit environment
learners (List[SinglePlayerLearner]) – learners to be compared with

Note

During a round, a learner may want to perform multiple actions, which is so-called batched learner.

Inheritance

property name: str¶: Protocol name

class banditpylib.protocols.CollaborativeLearningProtocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.CollaborativeLearner])[source]¶

Collaborative learning protocol [TZZ19]

This class defines the communication protocol for the collaborative learning multi-agent game as discussed in the reference paper. The game runs in rounds. During each round, the protocol runs the following steps in sequence:

For each agent,
- fetch the state of the corresponding bandit environment and ask the agent for actions;
- send the actions to the bandit environment for execution;
- update the agent with the feedback of the bandit environment;
- repeat the above steps until the agent enters the WAIT or STOP state.
If there is at least one agent in WAIT state, then fetch information broadcasted from every waiting agent and send them to master to decide arm assignment of next round. Otherwise, stop the game.

Parameters

bandit (Bandit) – bandit environment
learners (List[CollaborativeLearner]) – learners that will be compared with

Note

Each agent interacts with an independent bandit environment.

Note

Each action counts as a timestep. The time (or sample) complexity equals to the maximum number of pulls across different agents.

Note

According to the protocol, number of rounds always equals to number of communication rounds plus one.

Inheritance

property name: str¶: Protocol name

banditpylib.protocols¶

Classes¶

`banditpylib.protocols`¶

Classes ¶