banditpylib.protocols

Classes

class banditpylib.protocols.Protocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.Learner])[source]

Abstract class for a communication protocol which defines the principles of the interactions between the learner and the bandit environment.

Parameters
  • bandit (Bandit) – bandit environment

  • learners (List[Learner]) – learners used to run simulations

Inheritance

Inheritance diagram of Protocol
abstract property name: str

Protocol name

play(trials: int, output_filename: str, processes: int = - 1, debug: bool = False, intermediate_horizons: List[int] = [], horizon: int = inf)[source]

Start playing the game

Parameters
  • trials – number of repetitions

  • output_filename – name of the file used to dump the simulation results

  • processes – maximum number of processes to run. -1 means no limit

  • debug – debug mode. When it is set to True, trials will be automatically set to 1 and debug information of the trial will be printed out.

  • intermediate_horizons – report intermediate regrets after these horizons

  • horizon – horizon of the game. Different protocols may have different interpretations.

Warning

By default, output_filename will be opened with mode a.

class banditpylib.protocols.SinglePlayerProtocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.SinglePlayerLearner])[source]

Single player protocol

This class defines the communication protocol for the ordinary single-player game. The game runs in rounds and during each round, the protocol runs the following steps in sequence:

  • fetch the state of the bandit environment and ask the learner for actions;

  • send the actions to the bandit environment for execution;

  • update the learner with the feedback of the bandit environment.

The game runs until one of the following two stopping conditions is satisfied:

  • no actions are returned by the learner;

  • total number of actions achieve horizon.

Parameters

Note

During a round, a learner may want to perform multiple actions, which is so-called batched learner.

Inheritance

Inheritance diagram of SinglePlayerProtocol
property name: str

Protocol name

class banditpylib.protocols.CollaborativeLearningProtocol(bandit: banditpylib.bandits.utils.Bandit, learners: List[banditpylib.learners.utils.CollaborativeLearner])[source]

Collaborative learning protocol [TZZ19]

This class defines the communication protocol for the collaborative learning multi-agent game as discussed in the reference paper. The game runs in rounds. During each round, the protocol runs the following steps in sequence:

  • For each agent,

    • fetch the state of the corresponding bandit environment and ask the agent for actions;

    • send the actions to the bandit environment for execution;

    • update the agent with the feedback of the bandit environment;

    • repeat the above steps until the agent enters the WAIT or STOP state.

  • If there is at least one agent in WAIT state, then fetch information broadcasted from every waiting agent and send them to master to decide arm assignment of next round. Otherwise, stop the game.

Parameters

Note

Each agent interacts with an independent bandit environment.

Note

Each action counts as a timestep. The time (or sample) complexity equals to the maximum number of pulls across different agents.

Note

According to the protocol, number of rounds always equals to number of communication rounds plus one.

Inheritance

Inheritance diagram of CollaborativeLearningProtocol
property name: str

Protocol name