`banditpylib.learners.mnl_bandit_learner`¶

Classes

Classes ¶

MNLBanditLearner: Abstract class for learners playing with mnl bandit
UCB: UCB policy [AAGZ19]
EpsGreedy: Epsilon-Greedy policy
ThompsonSampling: Thompson sampling policy [AAGZ17]

class banditpylib.learners.mnl_bandit_learner.MNLBanditLearner(revenues: numpy.ndarray, reward: banditpylib.bandits.mnl_bandit_utils.Reward, card_limit: int, use_local_search: bool, random_neighbors: int, name: Optional[str])[source]¶

Abstract class for learners playing with mnl bandit

Product 0 is reserved for non-purchase. And it is assumed that the preference parameter for non-purchase is 1.

Parameters

revenues (np.ndarray) – product revenues
reward (Reward) – reward the learner wants to maximize
card_limit (int) – cardinality constraint
use_local_search (bool) – whether to use local search for searching the best assortment
random_neighbors (int) – number of random neighbors to look up if local search is enabled
name (Optional[str]) – alias name

Inheritance

property card_limit: int¶: Cardinality limit

property goal: banditpylib.learners.utils.Goal¶: Goal of the learner

property product_num: int¶: Product numbers (not including product 0)

property random_neighbors: int¶: Number of random neighbors to look up when local search is enabled

property revenues: numpy.ndarray¶: Revenues of products (product 0 is included)

property reward: banditpylib.bandits.mnl_bandit_utils.Reward¶: Reward the learner wants to maximize

property running_environment: Union[type, List[type]]¶: Type of bandit environment the learner plays with

property use_local_search: bool¶: Whether local search is enabled

class banditpylib.learners.mnl_bandit_learner.UCB(revenues: numpy.ndarray, reward: banditpylib.bandits.mnl_bandit_utils.Reward, card_limit: int = inf, use_local_search: bool = False, random_neighbors: int = 10, name: Optional[str] = None)[source]¶

UCB policy [AAGZ19]

Parameters

revenues (np.ndarray) – product revenues
reward (Reward) – reward the learner wants to maximize
card_limit (int) – cardinality constraint
use_local_search (bool) – whether to use local search for searching the best assortment
random_neighbors (int) – number of random neighbors to look up if local search is enabled
name (Optional[str]) – alias name

Inheritance

actions(context: data_pb2.Context) → data_pb2.Actions[source]¶

Actions of the learner

Parameters: context – contextual information about the bandit environment
Returns: actions to take

reset()[source]¶: Reset the learner

Warning

This function should be called before the start of the game.

update(feedback: data_pb2.Feedback)[source]¶

Update the learner

Parameters: feedback – feedback returned by the bandit environment after actions() is executed

class banditpylib.learners.mnl_bandit_learner.EpsGreedy(revenues: numpy.ndarray, reward: banditpylib.bandits.mnl_bandit_utils.Reward, card_limit: int = inf, use_local_search: bool = False, random_neighbors: int = 10, eps: float = 1.0, name: Optional[str] = None)[source]¶

Epsilon-Greedy policy

With probability \(\frac{\epsilon}{t}\) do uniform sampling and with the remaining probability serve the assortment with the maximum empirical reward.

Parameters

revenues (np.ndarray) – product revenues
reward (Reward) – reward the learner wants to maximize
card_limit (int) – cardinality constraint
use_local_search (bool) – whether to use local search for searching the best assortment
random_neighbors (int) – number of random neighbors to look up if local search is enabled
eps (float) – epsilon
name (Optional[str]) – alias name

Inheritance

actions(context: data_pb2.Context) → data_pb2.Actions[source]¶

Actions of the learner

Parameters: context – contextual information about the bandit environment
Returns: actions to take

reset()[source]¶: Reset the learner

Warning

This function should be called before the start of the game.

update(feedback: data_pb2.Feedback)[source]¶

Update the learner

Parameters: feedback – feedback returned by the bandit environment after actions() is executed

class banditpylib.learners.mnl_bandit_learner.ThompsonSampling(revenues: numpy.ndarray, horizon: int, reward: banditpylib.bandits.mnl_bandit_utils.Reward, card_limit: int = inf, use_local_search: bool = False, random_neighbors: int = 10, name: Optional[str] = None)[source]¶

Thompson sampling policy [AAGZ17]

Parameters

revenues (np.ndarray) – product revenues
horizon (int) – total number of time steps
reward (Reward) – reward the learner wants to maximize
card_limit (int) – cardinality constraint
use_local_search (bool) – whether to use local search for searching the best assortment
random_neighbors (int) – number of random neighbors to look up if local search is enabled
name (Optional[str]) – alias name

Inheritance

actions(context: data_pb2.Context) → data_pb2.Actions[source]¶

Actions of the learner

Parameters: context – contextual information about the bandit environment
Returns: actions to take

reset()[source]¶: Reset the learner

Warning

This function should be called before the start of the game.

update(feedback: data_pb2.Feedback)[source]¶

Update the learner

Parameters: feedback – feedback returned by the bandit environment after actions() is executed

banditpylib.learners.mnl_bandit_learner¶

Classes¶

`banditpylib.learners.mnl_bandit_learner`¶

Classes ¶