banditpylib.learners.mab_fbbai_learner

Classes

class banditpylib.learners.mab_fbbai_learner.MABFixedBudgetBAILearner(arm_num: int, budget: int, name: Optional[str])[source]

Abstract class for best-arm identification learners playing with the ordinary multi-armed bandit

This kind of learners aim to identify the best arm with fixed budget.

Parameters
  • arm_num (int) – number of arms

  • budget (int) – total number of pulls

  • name (Optional[str]) – alias name

Inheritance

Inheritance diagram of MABFixedBudgetBAILearner
property arm_num: int

Number of arms

abstract property best_arm: int

Index of the best arm identified by the learner

property budget: int

Budget of the learner

property goal: banditpylib.learners.utils.Goal

Goal of the learner

property running_environment: Union[type, List[type]]

Type of bandit environment the learner plays with

class banditpylib.learners.mab_fbbai_learner.Uniform(arm_num: int, budget: int, name: Optional[str] = None)[source]

Uniform sampling policy

Play each arm the same number of times and then output the arm with the highest empirical mean.

Parameters
  • arm_num (int) – number of arms

  • budget (int) – total number of pulls

  • name (Optional[str]) – alias name

Inheritance

Inheritance diagram of Uniform
actions(context: data_pb2.Context)data_pb2.Actions[source]

Actions of the learner

Parameters

context – contextual information about the bandit environment

Returns

actions to take

property best_arm: int

Index of the best arm identified by the learner

reset()[source]

Reset the learner

Warning

This function should be called before the start of the game.

update(feedback: data_pb2.Feedback)[source]

Update the learner

Parameters

feedback – feedback returned by the bandit environment after actions() is executed

class banditpylib.learners.mab_fbbai_learner.SH(arm_num: int, budget: int, threshold: int = 2, name: Optional[str] = None)[source]

Sequential halving policy [KKS13]

Eliminate half of the remaining arms in each round.

Parameters
  • arm_num (int) – number of arms

  • budget (int) – total number of pulls

  • threshold (int) – do uniform sampling when the number of arms left is no greater than this number

  • name (Optional[str]) – alias name

Inheritance

Inheritance diagram of SH
actions(context: data_pb2.Context)data_pb2.Actions[source]

Actions of the learner

Parameters

context – contextual information about the bandit environment

Returns

actions to take

property best_arm: int

Index of the best arm identified by the learner

reset()[source]

Reset the learner

Warning

This function should be called before the start of the game.

update(feedback: data_pb2.Feedback)[source]

Update the learner

Parameters

feedback – feedback returned by the bandit environment after actions() is executed

class banditpylib.learners.mab_fbbai_learner.SR(arm_num: int, budget: int, name: Optional[str] = None)[source]

Successive rejects policy [AB10]

Eliminate one arm in each round.

Parameters
  • arm_num (int) – number of arms

  • budget (int) – total number of pulls

  • name (Optional[str]) – alias name

Inheritance

Inheritance diagram of SR
actions(context: data_pb2.Context)data_pb2.Actions[source]

Actions of the learner

Parameters

context – contextual information about the bandit environment

Returns

actions to take

property best_arm: int

Index of the best arm identified by the learner

reset()[source]

Reset the learner

Warning

This function should be called before the start of the game.

update(feedback: data_pb2.Feedback)[source]

Update the learner

Parameters

feedback – feedback returned by the bandit environment after actions() is executed