I lost my job only to find myself

I used to wake up to a constant struggle between “I hate myself, and I cannot do IT anymore” and “I need to pay the rent, and everyone does IT, so can I.” “IT” was a full-time soulless…

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转

Capstone Project in Udacity

The goal is to create cleaned data and put statements in above 2 points. The involved tasks are …
1. Downloading the datas and libraries
2. Cleaning the datas to make analysis easier
3. Analyzing the cleaned datas to put statements in above 2 points

Profit(profit per customer) is metrics that have big effect on the profitability calculation. I assume that the cost rate per one drink is 0.262 based on Starbucks P/L.
Profit(profit per customer)= “total sales per customer”*(1–0.262)- “total rewarded cost per customer”

View rate is metrics to evaluate the promotion efficiency.
View rate = “total of offer viewed” / “total of offer received”

Firstly, I got below 3 data table.

Below data table is portfolio.json data which shows the promotion overview offered from Starbucks to the customer. (Of course, this is mimic data not real data.)

BOGO (bogo), disc and info stand for buy-one-get-one, discount and information. In a BOGO offer, a user needs to spend a certain amount to get a reward equal to that threshold amount. In a discount, a user gains a reward equal to a fraction of the amount spent. In an informational offer, there is no reward, but neither is there a requisite amount that the user is expected to spend. Offers can be delivered via multiple channels.

The last data is transcript.json displaying event log. Below datas are inputted .

As offer id, amount and reward is included in a cell, I split the data.

I created 3 new data tables, time-oriented data, person-oriented data and offer-oriented data for analysis.

Time-oriented data is the DataFrame grouped by time to check the offer count deviation by time-lag among received, viewed and completed offer. The below chart is the result.

As above chart shows…
- The offer was sent regularly once per 3~5days. The last offer was sent at 576 hours.
- The last offer was viewed and completed at 714 hours.
- After sending the last offer, it takes 5.75 days (714–576)/24 to finish the research.
- Just after the offer was sent, the offer viewed and completed was increased.
That’s why I judged
-> As the offer viewed(50 view) and completed(103 completed) was decreased at 714 hours (the last hour), I decided not to drop any datas.

person-oriented data is the data grouped by person, which includes all personal data, the total offer viewed per person and the total amount and reward per person. By utilizing this data, we can check the distribution of amount and reward per person.

person-oriented data is the data grouped by each offer transcript, which includes personal data and each offer type and status. We can check the number of offer received, viewed and completed with this data.

Considering the workflow, the offer received should be the biggest value and the offer completed should be the smallest. However, in bogo 3 and disc 1, offer viewed is smaller than offer completed. Though I’d like to check whether we have missing value in raw data, it’s not possible because I don’t have the authorization to check it. That’s why I’ll regard this data as correct one and analysis this data later.

To create 3 datas I mentioned above, I implemented…
- Offer abbreviation column was created in portfolio data.
- The unrealistic age (118 years old) and no gender and no income in profile was deleted in profile data.
- The sex column in profile data was converted into dummy variables.
- Amount and offer id data was extracted from dictionaries in value column in transcript data.
- The cost when when bogo or discount offer are implemented was calculated and new column is created.
- Offer id and status was set into column and dummy variable for offer was created.
- Some datas were merged and filtered. Person-oriented, time-oriented and offer-oriented df was created.

I’d like to analyze 2 points, (i) some ways to improve the offer view rate and (ii) whether the offers are effective to increase Starbucks profitability.

(i) some ways to improve the offer view rate
As you checked below portfolio chart, you can understand that Bogo1 is more attractive offer than bogo2, because of same difficulty and reward but longer duration. But bogo1 isn’t sent via web-channel. If bogo2 view rate is larger than bogo1's, we can’t reject the web channel is meaningless. That’s why A/B test was implemented. And more, I checked about any deviation between bogo1 and bogo2 sampling.
The relationship between bogo3 and bogo4 is same situation. Same A/B test was implemented and I checked the social-channel effectiveness.

i-i Check the view rate between bogo1 and bogo2

i-i-i Check the view rate

The bogo1 view rate: 87.7%
The bogo2 view rate: 96.11%
The bogo1&2 view rate: 91.89%

i-i-ii Check the sampling deviation

i-i-iii Calculate the p-value between view rates

p-value is 0.0. As the p-value is small enough, we can reject that we don’t have dviation between bogo1 and bogo2 view rate.
It means that web-channel is effective to improve the offer view rate.

i-ii Check the view rate between bogo3 and bogo4

i-ii-i Check the view rate

The bogo3 view rate: 54.33%
The bogo4 view rate: 95.95%
The bogo3&4 view rate: 74.99%

i-ii-ii Check the sampling deviation

i-ii-iii Calculate the p-value between view rates

p-value is 0.0. As the p-value is small enough, we can reject that we don’t have dviation between bogo3 and bogo4 view rate.
It means that social-channel is effective to improve the offer view rate.

Caution

As I mentioned above, the offer view rate is smaller than offer completed rate. We need to re-check whether the first transcript.json is missing some datas or not.
(As I don’t have the authorization to check it, I skipped this task.)

(ii) whether the offers are effective to increase Starbucks profitability.
I created profit column and predicted the profit by sklearn LinearRegression model.
Firstly, I did it without any filtering.

ii-i Create heat map

ii-ii split the datas into train data and test data and calculate r2 score

r2 score between train data and train prediction data is 0.05. r2 score between test data and test prediction data is also 0.05.
I can’t say that this prediction is useful.

As the prediction score is too low in topic ii, I consider to improve the r2 score. Firstly, I check the relation between profit and any float variable.

I found out that we have 2 customer type, “Heavy user” and “Normal user”.
I defined “Heavy user” as one which have more than or equal to 100 profit and “Normal user” as one which have less than 100 profit.
(The split histogram is like below.)

i. Whether the offers are effective for “Normal user” to increase Starbucks profitability.

i-i Create heat map

i-ii split the datas into train data and test data and calculate r2 score
r2 score between train data and train prediction data is 0.39. r2 score between test data and test prediction data is also 0.39. This prediction is useful.

i-iii check the coefficient of the linear model

As the above graph shows, all discount and bogo have negative coefficient in the low profit group.
It means that all discount and bogo are negative impact on the profit.

ii. Whether the offers are effective for “Heavy user” to increase Starbucks profitability.

ii-i Create heat map

ii-ii split the datas into train data and test data and calculate r2 score
r2 score between train data and train prediction data is 0.63. r2 score between test data and test prediction data is also 0.70. This prediction is useful.

i-iii check the coefficient of the linear model

As the above graph shows, all discount and bogo have negative or small positive coefficient in the high profit group.
It means that all discount and bogo are negative or small positive impact on the profit.

Caution

Sample size is 376rows and 20columns. I need to get more data to predict this data correctly.

(i) some ways to improve the offer view rate

Web-channel and social channel are effective to increase the view rate though we need to recheck the raw data

(ii) whether the offers are effective to increase Starbucks profitability

We can split the customer into “Heavy user” and “Normal user”.
For “Normal user”, all buy-one-get-one and discount offers have negative effects on improving the profitability in the study term.
For “Heavy user”, some discount offer might have small positive effects on improving the profitability in the study term though we need to increase the sample size to get more precise result.

The definition of cost rate per one product
I assumed that the cost rate per one drink is 0.262 based on Starbucks Japan P/L data.
- The profits are different among the products. The products for this offer has better or worse cost rate.
- I checked the Starbucks Japan P/L. Starbucks headquarter might have better or worse cost rate.

Sample Number
When I check the coefficient of “Heavy user”, the sample number is only 376. If I can get more sample data, I might be able to predict the coefficient more accurately.

I lost my job only to find myself

Capstone Project in Udacity

Add a comment

Related posts:

Can you get an automobile loan if you file chap 7 bankcruptcy?

Weight Loss Coffee Smoothie Recipes

The Grammys 2023 Nominations for Best Music Video