banditpylib.learners.mab_fcbai_learner¶
Classes¶
- MABFixedConfidenceBAILearner: Base class for bai learners in the ordinary multi-armed bandit
- LilUCBHeuristic: LilUCB heuristic policy [JMNB14]
- class banditpylib.learners.mab_fcbai_learner.MABFixedConfidenceBAILearner(arm_num: int, confidence: float, name: Optional[str])[source]¶
- Base class for bai learners in the ordinary multi-armed bandit - This learner aims to identify the best arm with fixed confidence. - Parameters
- arm_num (int) – number of arms 
- confidence (float) – confidence level. It should be within (0, 1). The algorithm should output the best arm with probability at least this value. 
- name (str) – alias name 
 
 - Inheritance  - property arm_num: int¶
- Number of arms 
 - abstract property best_arm: int¶
- Index of the best arm identified by the learner 
 - property confidence: float¶
- Confidence level of the learner 
 - property goal: banditpylib.learners.utils.Goal¶
- Goal of the learner 
 - property running_environment: Union[type, List[type]]¶
- Type of bandit environment the learner plays with 
 
- class banditpylib.learners.mab_fcbai_learner.ExpGap(arm_num: int, confidence: float, threshold: int = 2, name: Optional[str] = None)[source]¶
- Exponential-gap elimination policy [KKS13] - Parameters
- arm_num (int) – number of arms 
- confidence confidence (float) – confidence level. It should be within (0, 1). The algorithm should output the best arm with probability at least this value. 
- threshold (int) – do uniform sampling when the active arms are no greater than the threshold within median elimination 
- name (Optional[str]) – alias name 
 
 - Inheritance  - actions(context: data_pb2.Context) → data_pb2.Actions[source]¶
- Actions of the learner - Parameters
- context – contextual information about the bandit environment 
- Returns
- actions to take 
 
 - property best_arm: int¶
- Index of the best arm identified by the learner 
 - reset()[source]¶
- Reset the learner - Warning - This function should be called before the start of the game. 
 - property stage: str¶
- Stage of the learner 
 
- class banditpylib.learners.mab_fcbai_learner.LilUCBHeuristic(arm_num: int, confidence: float, name: Optional[str] = None)[source]¶
- LilUCB heuristic policy [JMNB14] - Parameters
- arm_num (int) – number of arms 
- confidence (float) – confidence level. It should be within (0, 1). The algorithm should output the best arm with probability at least this value. 
- name (Optional[str]) – alias name 
 
 - Inheritance  - actions(context: data_pb2.Context) → data_pb2.Actions[source]¶
- Actions of the learner - Parameters
- context – contextual information about the bandit environment 
- Returns
- actions to take 
 
 - property best_arm: int¶
- Index of the best arm identified by the learner