## Maximize An Online Store Profit

[This work is based on this course: Artificial Intelligence for Business.]

A company wants to implement a plan premium for its clients. The goal for this online reatail business is getting as many suscribers as possible to the special plan. This is offered for 100$per year. The marketing department has provided us 10 differents strategies and our IA will not have any idea which one is the best. So we will assume that each has a fixed conversion rate. To simulate this Case Study, we will assume these strategies have the following conversion rates: StrategyConversion Rate 10.01 20.08 30.1 40.04 50.26 60.12 70.17 80.09 90.19 100.25 We will simulate 10,000 clients being approached successively by one of the 10 strategies, and with the rewards matrix we will simulate the client’s decision to subscribe yes(1) or no(0) to the premium plan. As an example, the first few rows of a simulated reward matrix:  0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 0 0 Thompson sampling will collect the feedback from each of these subscribers and thanks to its powerful algorithm, it will quickly figure out the strategy with the highest conversion rate. ### Thompson sampling steps: We repeat for more than 10000 iteratios for each round C. Step 1: A random value is chosen for each iteration i. This value has to follow the distribution:$latex \theta_i(n) \sim \beta(N_i^1(n)+1,N_i^0(n)+1)$,where: •$latex N_i^1(n)$is the number of times the i-th strategy has gotten a reward equal to 1 until the n-th round. •$latex N_i^0(n)$is the number of times the i-th strategy has gotten a reward equal to 0 until the n-th round. Step 2: We select the strategy$s(n)$that gives us the highest value$latex \theta_i(n)$: $$latex s(n) = \underset{i\in{1,…,9}}{\textrm{argmax}}(\theta_i(n))$$ Step 3: We update$latex N_{s(n)}^1(n)$and$latex N_{s(n)}^0(n)$according to the next conditions: • If the selected strategy$latex s(n)$has a reward equal to 1:$latex N_{s(n)}^1(n) := N_{s(n)}^1(n) + 1$. • If the selected strategy$latex s(n)$has a reward equal to 0:$latex N_{s(n)}^0(n) := N_{s(n)}^0(n) + 1$. Each strategy has its own beta distribution. Throughout the rounds, the beta distribution strategy with the highest conversion rate will progressively move to the right, and the beta distributions strategy with the lowest conversion rates will progressively move to the left (Steps 1 and 3). Therefore, due to Step 2, the strategy with the highest conversion rate will be selected. [Source: https://medium.com/pursuitnotes/day-28-reinforcement-learning-3-7a98bacc0927] We will also implement the random selection algorithm, which will select a random strategy each round. This will be our benchmark for evaluating the return of our Thompson sampling model. Of course, the Thompson sampling and the random selection algorithm will compete in the same simulation using the same reward matrix. Once the complete simulation is done, we will evaluate the Thompson Sampling return by calculating the relative return:$latex \textrm{Relative return} = \frac{\textrm{(Return of Thompson)} – (\textrm{Return of random selection})}{\textrm{Random Selection Reward}} \cdot 100 $## Import libraries import numpy as np import matplotlib.pyplot as plt import random ## Config Parameters C = 100000 #Number of rounds = Number of clients s = 10 #Number of marketing strategies ## Building the simulation conversion_rates = [0.01, 0.08, 0.1, 0.04, 0.26, 0.12, 0.17, 0.09, 0.19, 0.25] X = np.array(np.zeros([C, s])) for i in range(C): #Rounds=clients for j in range(s): #Strategy if np.random.rand() <= conversion_rates[j]: #if our random number.....then... X[i,j] = 1 ## Random assignment and Thompson sampling strategies_selected_rs = [] #random select strategies_selected_ts = [] #thompson select total_reward_rs = 0 total_reward_ts = 0 number_of_rewards_1 =  * s #creating a s-zeros list number_of_rewards_0 =  * s for n in range(0, C): #Starting rounds=clients # Random Selection strategy_rs = random.randrange(s) #select a random strategy strategies_selected_rs.append(strategy_rs) reward_rs = X[n, strategy_rs] #WE get the reward which has been created before total_reward_rs += reward_rs # Thompson Sampling strategy_ts = 0 max_random = 0 for i in range(0, s): #Step 1 random_beta = random.betavariate(number_of_rewards_1[i]+1, number_of_rewards_0[i]+1) #Step 2 if random_beta > max_random: max_random = random_beta strategy_ts = i strategies_selected_ts.append(strategy_ts) #Step 3 reward_ts = X[n, strategy_ts] if reward_ts == 1: number_of_rewards_1[strategy_ts] += 1 else: number_of_rewards_0[strategy_ts] += 1 total_reward_ts += reward_ts ## Relative and absolute return absolute_return = (total_reward_ts - total_reward_rs)*100 #We assume 100$/sale.
relative_return = (total_reward_ts - total_reward_rs) / total_reward_rs * 100
print("Absolute Return: {:.0f} $".format(absolute_return)) print("Relative Return: {:.0f} %".format(relative_return))  Absolute Return: 1287500$
Relative Return: 98 %

## Plotting histogram

plt.hist(strategies_selected_ts)
plt.title("Histogram of selections")
plt.xlabel("Strategy")
plt.ylabel("Number of times has been selected the strategy")
plt.show()

## Regret Curve

The Regret curve is the representation of the difference between the best strategy and the model, with respect to the rounds. The best strategy is calculated obtaining the maximum accumulated rewards in all the different strategies in each round.

rewards_strategies =  * s
for n in range(0, C):
# Best Strategy
for i in range(0, s):
rewards_strategies[i] = rewards_strategies[i] + X[n, i]
total_reward_bs = max(rewards_strategies)

– Regret of Thompson Sampling:

strategies_selected_ts = []
total_reward_ts = 0
total_reward_bs = 0
numbers_of_rewards_1 =  * s
numbers_of_rewards_0 =  * s
rewards_strategies =  * s
regret = []

for n in range(0, C):
# Thompson Sampling
strategy_ts = 0
max_random = 0
for i in range(0, s):
random_beta = random.betavariate(numbers_of_rewards_1[i] + 1,
numbers_of_rewards_0[i] + 1)
if random_beta > max_random:
max_random = random_beta
strategy_ts = i
reward_ts = X[n, strategy_ts]
if reward_ts == 1:
numbers_of_rewards_1[strategy_ts] = numbers_of_rewards_1[strategy_ts] + 1
else:
numbers_of_rewards_0[strategy_ts] = numbers_of_rewards_0[strategy_ts] + 1
strategies_selected_ts.append(strategy_ts)
total_reward_ts = total_reward_ts + reward_ts
# Best Strategy
for i in range(0, s):
rewards_strategies[i] = rewards_strategies[i] + X[n, i]
total_reward_bs = max(rewards_strategies)
# Regret
regret.append(total_reward_bs - total_reward_ts)

– Plotting the Regret Curve of Thompson Sampling:

plt.plot(regret)
plt.title('Regret Curve of Thompson Sampling')
plt.xlabel('Round')
plt.ylabel('Regret')
plt.show()

– Regret of the Random Strategy:

strategies_selected_rs = []
total_reward_rs = 0
total_reward_bs = 0
numbers_of_rewards_1 =  * s
numbers_of_rewards_0 =  * s
rewards_strategies =  * s
regret = []
for n in range(0, C):
# Random Strategy
strategy_rs = random.randrange(s)
strategies_selected_rs.append(strategy_rs)
reward_rs = X[n, strategy_rs]
total_reward_rs = total_reward_rs + reward_rs
# Best Strategy
for i in range(0, s):
rewards_strategies[i] = rewards_strategies[i] + X[n, i]
total_reward_bs = max(rewards_strategies)
# Regret
regret.append(total_reward_bs - total_reward_rs)

– Plotting the Regret Curve of Random Strategy:

plt.plot(regret)
plt.title('Regret Curve of Random Strategy')
plt.xlabel('Round')
plt.ylabel('Regret')
plt.show()