Turn my stacked bar chart into a 100% stacked bar chart

Turn my stacked bar chart into a 100% stacked bar chart - python

I currently have a stacked bar chart for brewers. There are 6 brewers. It is good to understand the volume but I want to highlight in my analysis that some of the brewers are being used more than others. To do so I need to turn my bar chart in to 100% stacked bar.
What it currently looks like
I want it so that each of these bars y-axis is always 100.
The code I have at the moment is:
def brewer_number_bar(location):
brewer_df_filtered = brewer_df[(brewer_df['Location Name'].isin(location))]
traces = []
for brewer in brewer_df['Menu Item Name'].unique():
brewer_df_by_brewer = brewer_df_filtered[brewer_df_filtered['Menu Item Name']==brewer]
traces.append(go.Bar(
x = brewer_df_by_brewer['Business Date'],
y = brewer_df_by_brewer['Sales Count'],
name=brewer,
))
return {'data': traces,
'layout': go.Layout(title='Brewer Volume',
xaxis={'title': 'Date', 'categoryorder': 'total descending'},
yaxis={'title': 'Brewer Numbers Used'},
barmode='stack')
}
I have tried to take brewer_df_by_brewer['Sales Count'] / brewer_df_by_brewer['Sales Count'].sum() and created a new trace for each but as I also have location in there it has not worked.
Head of the dataframe brewer_df
{'Business Date': {0: Timestamp('2022-09-05 00:00:00'), 1: Timestamp('2022-09-05 00:00:00'), 2: Timestamp('2022-09-05 00:00:00'), 3: Timestamp('2022-09-05 00:00:00'), 4: Timestamp('2022-09-05 00:00:00')}, 'Major Category': {0: 'Brewer Number', 1: 'Brewer Number', 2: 'Brewer Number', 3: 'Brewer Number', 4: 'Brewer Number'}, 'Location Name': {0: 'France', 1: 'France', 2: 'France', 3: 'Germany', 4: 'Germany'}, 'Menu Item Name': {0: '1', 1: '2', 2: '3', 3: '4', 4: '1'}, 'Sales Count': {0: 176, 1: 163, 2: 22, 3: 7, 4: 89}}

You can simplify your function by using plotly.express. This allows you to stack bars by default, pass the category to the x parameter, and also specify which column you want for the color. I've also used a pandas groupby + transform operation that divides each unique location and date in your filtered dataframe by its sum – this is a little cleaner and also more performant than looping through brewer_df['Menu Item Name'].unique().
In order to make sure we are taking dates into account, I've extended your sample dataframe to include more than one day:
Timestamp = pd.Timestamp
brewer_df = pd.DataFrame({
'Business Date': {
0: Timestamp('2022-09-05 00:00:00'),
1: Timestamp('2022-09-05 00:00:00'),
2: Timestamp('2022-09-05 00:00:00'),
3: Timestamp('2022-09-05 00:00:00'),
4: Timestamp('2022-09-05 00:00:00'),
5: Timestamp('2022-09-06 00:00:00'),
6: Timestamp('2022-09-06 00:00:00'),
7: Timestamp('2022-09-06 00:00:00'),
8: Timestamp('2022-09-06 00:00:00'),
9: Timestamp('2022-09-06 00:00:00')
},
'Major Category': {
0: 'Brewer Number',
1: 'Brewer Number',
2: 'Brewer Number',
3: 'Brewer Number',
4: 'Brewer Number',
5: 'Brewer Number',
6: 'Brewer Number',
7: 'Brewer Number',
8: 'Brewer Number',
9: 'Brewer Number'
},
'Location Name':{
0: 'France',
1: 'France',
2: 'France',
3: 'Germany',
4: 'Germany',
5: 'France',
6: 'France',
7: 'France',
8: 'Germany',
9: 'Germany'
},
'Menu Item Name': {
0: '1',
1: '2',
2: '3',
3: '4',
4: '1',
5: '2',
6: '3',
7: '4',
8: '1',
9: '2'
},
'Sales Count': {
0: 176,
1: 163,
2: 22,
3: 7,
4: 89,
5: 90,
6: 6,
7: 14,
8: 22,
9: 200
}
})
Then the modified brewer_number_bar callback looks like the following:
def brewer_number_bar(location):
brewer_df_filtered = brewer_df[brewer_df['Location Name'] == location].copy()
brewer_df_filtered['Sales Count Percent'] = brewer_df_filtered['Sales Count'] / brewer_df_filtered.groupby(['Location Name','Business Date'])['Sales Count'].transform('sum')
fig = px.bar(brewer_df_filtered, x="Business Date", y="Sales Count Percent", color="Menu Item Name")
fig.update_layout(
title='Brewer Volume',
xaxis={'title': 'Date', 'categoryorder': 'total descending'},
yaxis={'title': 'Brewer Numbers Used', 'tickformat': ',.0%'},
)
return fig
Below are two example figs that this callback would return, one when you pass 'France' as the location with fig = brewer_number_bar('France'), and the other when you pass 'Germany' as the location:

Related

Pipeline & ColumnTransformer: ValueError: Selected columns are not unique in dataframe

Background: I am trying to learn from a notebook used in Kaggle House Price Prediction Dataset.
I am trying to use a Pipeline to transform numerical and categorical columns in a dataframe. It is having issues with my Categorical variables' names, which is a list stored in this variable categ_cols_names. It says that those categorical columns are not unique in dataframe, which I'm not sure what that means.
categ_cols_names = ['MSZoning','Street','LotShape','LandContour','Utilities','LotConfig','LandSlope','Neighborhood','Condition1','Condition2','BldgType','HouseStyle','OverallQual','OverallCond','YearBuilt','YearRemodAdd','RoofStyle','RoofMatl','Exterior1st','Exterior2nd','MasVnrType','ExterQual','ExterCond','Foundation','BsmtQual','BsmtCond','BsmtExposure','BsmtFinType1','BsmtFinType2','Heating','HeatingQC','CentralAir','Electrical','BsmtFullBath','BsmtHalfBath','FullBath','HalfBath','BedroomAbvGr','KitchenAbvGr','KitchenQual','Functional','Fireplaces','GarageType','GarageYrBlt','GarageFinish','GarageCars','GarageQual','GarageCond','PavedDrive','MoSold','YrSold','SaleType','SaleCondition','OverallQual','GarageCars','FullBath','YearBuilt']
Below is my code:
# Get numerical columns names
num_cols_names = X_train.columns[X_train.dtypes != object].to_list()
# Numerical columns with missing values
num_nan_cols = X_train[num_cols_names].columns[X_train[num_cols_names].isna().sum() > 0]
# Assign np.nan type to NaN values in categorical features
# in order to ensure detectability in posterior methods
X_train[num_nan_cols] = X_train[num_nan_cols].fillna(value = np.nan, axis = 1)
# Define pipeline for imputation of the numerical features
num_pipeline = Pipeline(steps = [
('Simple Imputer', SimpleImputer(strategy = 'median')),
('Robust Scaler', RobustScaler()),
('Power Transformer', PowerTransformer())
]
)
# Get categorical columns names
categ_cols_names = X_train.columns[X_train.dtypes == object].to_list()
# Categorical columns with missing values
categ_nan_cols = X_train[categ_cols_names].columns[X_train[categ_cols_names].isna().sum() > 0]
# Assign np.nan type to NaN values in categorical features
# in order to ensure detectability in posterior methods
X_train[categ_nan_cols] = X_train[categ_nan_cols].fillna(value = np.nan, axis = 1)
# Define pipeline for imputation and encoding of the categorical features
categ_pipeline = Pipeline(steps = [
('Categorical Imputer', SimpleImputer(strategy = 'most_frequent')),
('One Hot Encoder', OneHotEncoder(drop = 'first'))
])
ct = ColumnTransformer([
('Categorical Pipeline', categ_pipeline, categ_cols_names),
('Numerical Pipeline', num_pipeline, num_cols_names)],
remainder = 'passthrough',
sparse_threshold = 0,
n_jobs = -1)
pipe = Pipeline(steps = [('Column Transformer', ct)])
pipe.fit_transform(X_train)
The ValueError occurs on the .fit_transform() line:
Here is a sample of my X_train:
{'MSZoning': {0: 'RL', 1: 'RL', 2: 'RL', 3: 'RL', 4: 'RL'},
'Street': {0: 'Pave', 1: 'Pave', 2: 'Pave', 3: 'Pave', 4: 'Pave'},
'LotShape': {0: 'Reg', 1: 'Reg', 2: 'IR1', 3: 'IR1', 4: 'IR1'},
'LandContour': {0: 'Lvl', 1: 'Lvl', 2: 'Lvl', 3: 'Lvl', 4: 'Lvl'},
'Utilities': {0: 'AllPub',
1: 'AllPub',
2: 'AllPub',
3: 'AllPub',
4: 'AllPub'},
'LotConfig': {0: 'Inside', 1: 'FR2', 2: 'Inside', 3: 'Corner', 4: 'FR2'},
'LandSlope': {0: 'Gtl', 1: 'Gtl', 2: 'Gtl', 3: 'Gtl', 4: 'Gtl'},
'Neighborhood': {0: 'CollgCr',
1: 'Veenker',
2: 'CollgCr',
3: 'Crawfor',
4: 'NoRidge'},
'Condition1': {0: 'Norm', 1: 'Feedr', 2: 'Norm', 3: 'Norm', 4: 'Norm'},
'Condition2': {0: 'Norm', 1: 'Norm', 2: 'Norm', 3: 'Norm', 4: 'Norm'},
'BldgType': {0: '1Fam', 1: '1Fam', 2: '1Fam', 3: '1Fam', 4: '1Fam'},
'HouseStyle': {0: '2Story',
1: '1Story',
2: '2Story',
3: '2Story',
4: '2Story'},
'OverallQual': {0: '7', 1: '6', 2: '7', 3: '7', 4: '8'},
'OverallCond': {0: '5', 1: '8', 2: '5', 3: '5', 4: '5'},
'YearBuilt': {0: '2003', 1: '1976', 2: '2001', 3: '1915', 4: '2000'},
'YearRemodAdd': {0: '2003', 1: '1976', 2: '2002', 3: '1970', 4: '2000'},
'RoofStyle': {0: 'Gable', 1: 'Gable', 2: 'Gable', 3: 'Gable', 4: 'Gable'},
'RoofMatl': {0: 'CompShg',
1: 'CompShg',
2: 'CompShg',
3: 'CompShg',
4: 'CompShg'},
'Exterior1st': {0: 'VinylSd',
1: 'MetalSd',
2: 'VinylSd',
3: 'Wd Sdng',
4: 'VinylSd'},
'Exterior2nd': {0: 'VinylSd',
1: 'MetalSd',
2: 'VinylSd',
3: 'Wd Shng',
4: 'VinylSd'},
'MasVnrType': {0: 'BrkFace',
1: 'None',
2: 'BrkFace',
3: 'None',
4: 'BrkFace'},
'ExterQual': {0: 'Gd', 1: 'TA', 2: 'Gd', 3: 'TA', 4: 'Gd'},
'ExterCond': {0: 'TA', 1: 'TA', 2: 'TA', 3: 'TA', 4: 'TA'},
'Foundation': {0: 'PConc', 1: 'CBlock', 2: 'PConc', 3: 'BrkTil', 4: 'PConc'},
'BsmtQual': {0: 'Gd', 1: 'Gd', 2: 'Gd', 3: 'TA', 4: 'Gd'},
'BsmtCond': {0: 'TA', 1: 'TA', 2: 'TA', 3: 'Gd', 4: 'TA'},
'BsmtExposure': {0: 'No', 1: 'Gd', 2: 'Mn', 3: 'No', 4: 'Av'},
'BsmtFinType1': {0: 'GLQ', 1: 'ALQ', 2: 'GLQ', 3: 'ALQ', 4: 'GLQ'},
'BsmtFinType2': {0: 'Unf', 1: 'Unf', 2: 'Unf', 3: 'Unf', 4: 'Unf'},
'Heating': {0: 'GasA', 1: 'GasA', 2: 'GasA', 3: 'GasA', 4: 'GasA'},
'HeatingQC': {0: 'Ex', 1: 'Ex', 2: 'Ex', 3: 'Gd', 4: 'Ex'},
'CentralAir': {0: 'Y', 1: 'Y', 2: 'Y', 3: 'Y', 4: 'Y'},
'Electrical': {0: 'SBrkr', 1: 'SBrkr', 2: 'SBrkr', 3: 'SBrkr', 4: 'SBrkr'},
'BsmtFullBath': {0: '1', 1: '0', 2: '1', 3: '1', 4: '1'},
'BsmtHalfBath': {0: '0', 1: '1', 2: '0', 3: '0', 4: '0'},
'FullBath': {0: '2', 1: '2', 2: '2', 3: '1', 4: '2'},
'HalfBath': {0: '1', 1: '0', 2: '1', 3: '0', 4: '1'},
'BedroomAbvGr': {0: '3', 1: '3', 2: '3', 3: '3', 4: '4'},
'KitchenAbvGr': {0: '1', 1: '1', 2: '1', 3: '1', 4: '1'},
'KitchenQual': {0: 'Gd', 1: 'TA', 2: 'Gd', 3: 'Gd', 4: 'Gd'},
'Functional': {0: 'Typ', 1: 'Typ', 2: 'Typ', 3: 'Typ', 4: 'Typ'},
'Fireplaces': {0: '0', 1: '1', 2: '1', 3: '1', 4: '1'},
'GarageType': {0: 'Attchd',
1: 'Attchd',
2: 'Attchd',
3: 'Detchd',
4: 'Attchd'},
'GarageYrBlt': {0: '2003.0',
1: '1976.0',
2: '2001.0',
3: '1998.0',
4: '2000.0'},
'GarageFinish': {0: 'RFn', 1: 'RFn', 2: 'RFn', 3: 'Unf', 4: 'RFn'},
'GarageCars': {0: '2', 1: '2', 2: '2', 3: '3', 4: '3'},
'GarageQual': {0: 'TA', 1: 'TA', 2: 'TA', 3: 'TA', 4: 'TA'},
'GarageCond': {0: 'TA', 1: 'TA', 2: 'TA', 3: 'TA', 4: 'TA'},
'PavedDrive': {0: 'Y', 1: 'Y', 2: 'Y', 3: 'Y', 4: 'Y'},
'MoSold': {0: '2', 1: '5', 2: '9', 3: '2', 4: '12'},
'YrSold': {0: '2008', 1: '2007', 2: '2008', 3: '2006', 4: '2008'},
'SaleType': {0: 'WD', 1: 'WD', 2: 'WD', 3: 'WD', 4: 'WD'},
'SaleCondition': {0: 'Normal',
1: 'Normal',
2: 'Normal',
3: 'Abnorml',
4: 'Normal'},
'GrLivArea': {0: 1710, 1: 1262, 2: 1786, 3: 1717, 4: 2198},
'GarageArea': {0: 548, 1: 460, 2: 608, 3: 642, 4: 836},
'TotalBsmtSF': {0: 856, 1: 1262, 2: 920, 3: 756, 4: 1145},
'1stFlrSF': {0: 856, 1: 1262, 2: 920, 3: 961, 4: 1145},
'TotRmsAbvGrd': {0: 8, 1: 6, 2: 6, 3: 7, 4: 9}}

Plotly - set decimal place in choropleth

How do you convert number 1.425887B to 1.4 in plotly choropleth ?
data2022 = dict(type = 'choropleth',
colorscale = 'agsunset',
reversescale = True,
locations = df['Country/Territory'],
locationmode = 'country names',
z = df['2022 Population'],
text = df['CCA3' ],
marker = dict(line = dict(color = 'rgb(12, 12, 12)', width=1)),
colorbar = {'title': 'Population'})
layout2022 = dict(title = '<b>World Population 2022<b>',
geo = dict(showframe = True,
showland = True, landcolor = 'rgb(198, 197, 198)',
showlakes = True, lakecolor = 'rgb(85, 173, 240)',
showrivers = True, rivercolor = 'rgb(173, 216, 230)',
showocean = True, oceancolor = 'rgb(173, 216, 230)',
projection = {'type': 'natural earth'}))
choromap2022 = go.Figure(data=[data2022], layout=layout2022)
choromap2022.update_geos(lataxis_showgrid = True, lonaxis_showgrid = True)
choromap2022.update_layout(height = 600,
title_x = 0.5,
title_font_color = 'red',
title_font_family = 'Times New Roman',
title_font_size = 30,
margin=dict(t=80, r=50, l=50))
iplot(choromap2022)
This is the image of the result I got, I want to convert the population of China from 1.425887B to 1.4B
I try to look up on the plotly document but cannot find anything.
This is the output of df.head().to_dict()
'CCA3': {0: 'AFG', 1: 'ALB', 2: 'DZA', 3: 'ASM', 4: 'AND'},
'Country/Territory': {0: 'Afghanistan',
1: 'Albania',
2: 'Algeria',
3: 'American Samoa',
4: 'Andorra'},
'Capital': {0: 'Kabul',
1: 'Tirana',
2: 'Algiers',
3: 'Pago Pago',
4: 'Andorra la Vella'},
'Continent': {0: 'Asia', 1: 'Europe', 2: 'Africa', 3: 'Oceania', 4: 'Europe'},
'2022 Population': {0: 41128771, 1: 2842321, 2: 44903225, 3: 44273, 4: 79824},
'2020 Population': {0: 38972230, 1: 2866849, 2: 43451666, 3: 46189, 4: 77700},
'2015 Population': {0: 33753499, 1: 2882481, 2: 39543154, 3: 51368, 4: 71746},
'2010 Population': {0: 28189672, 1: 2913399, 2: 35856344, 3: 54849, 4: 71519},
'2000 Population': {0: 19542982, 1: 3182021, 2: 30774621, 3: 58230, 4: 66097},
'1990 Population': {0: 10694796, 1: 3295066, 2: 25518074, 3: 47818, 4: 53569},
'1980 Population': {0: 12486631, 1: 2941651, 2: 18739378, 3: 32886, 4: 35611},
'1970 Population': {0: 10752971, 1: 2324731, 2: 13795915, 3: 27075, 4: 19860},
'Area (km²)': {0: 652230, 1: 28748, 2: 2381741, 3: 199, 4: 468},
'Density (per km²)': {0: 63.0587,
1: 98.8702,
2: 18.8531,
3: 222.4774,
4: 170.5641},
'Growth Rate': {0: 1.0257, 1: 0.9957, 2: 1.0164, 3: 0.9831, 4: 1.01},
'World Population Percentage': {0: 0.52, 1: 0.04, 2: 0.56, 3: 0.0, 4: 0.0}}```

This is trickier than it appears because plotly uses d3-format, but I believe they are using additional metric abbreviations in their formatting to have the default display numbers larger than 1000 in the format 1.425887B.
My original idea was to round to the nearest 2 digits in the hovertemplate with something like:
data2022 = dict(..., hovertemplate = "%{z:.2r}<br>%{text}<extra></extra>")
However, this removes the default metric abbreviation and causes the entire long form decimal to display. The population of China should show up as 1400000000 instead of 1.4B.
So one possible workaround would be to create a new column in your DataFrame called "2022 Population Text" and format the number using a custom function to round and abbreviate your number (credit goes to #rtaft for their function which does exactly that). Then you can pass this column to customdata, and display customdata in your hovertemplate (instead of z).
import pandas as pd
import plotly.graph_objects as go
data = {'CCA3': {0: 'AFG', 1: 'ALB', 2: 'DZA', 3: 'ASM', 4: 'AND'},
'Country/Territory': {0: 'Afghanistan',
1: 'Albania',
2: 'Algeria',
3: 'American Samoa',
4: 'Andorra'},
'Capital': {0: 'Kabul',
1: 'Tirana',
2: 'Algiers',
3: 'Pago Pago',
4: 'Andorra la Vella'},
'Continent': {0: 'Asia', 1: 'Europe', 2: 'Africa', 3: 'Oceania', 4: 'Europe'},
'2022 Population': {0: 1412000000, 1: 2842321, 2: 44903225, 3: 44273, 4: 79824},
'2020 Population': {0: 38972230, 1: 2866849, 2: 43451666, 3: 46189, 4: 77700},
'2015 Population': {0: 33753499, 1: 2882481, 2: 39543154, 3: 51368, 4: 71746},
'2010 Population': {0: 28189672, 1: 2913399, 2: 35856344, 3: 54849, 4: 71519},
'2000 Population': {0: 19542982, 1: 3182021, 2: 30774621, 3: 58230, 4: 66097},
'1990 Population': {0: 10694796, 1: 3295066, 2: 25518074, 3: 47818, 4: 53569},
'1980 Population': {0: 12486631, 1: 2941651, 2: 18739378, 3: 32886, 4: 35611},
'1970 Population': {0: 10752971, 1: 2324731, 2: 13795915, 3: 27075, 4: 19860},
'Area (km²)': {0: 652230, 1: 28748, 2: 2381741, 3: 199, 4: 468},
'Density (per km²)': {0: 63.0587,
1: 98.8702,
2: 18.8531,
3: 222.4774,
4: 170.5641},
'Growth Rate': {0: 1.0257, 1: 0.9957, 2: 1.0164, 3: 0.9831, 4: 1.01},
'World Population Percentage': {0: 0.52, 1: 0.04, 2: 0.56, 3: 0.0, 4: 0.0}
}
## rounds a number to the specified precision, and adds metrics abbreviations
## i.e. 14230000000 --> 14B
## reference: https://stackoverflow.com/a/45846841/5327068
def human_format(num):
num = float('{:.2g}'.format(num))
magnitude = 0
while abs(num) >= 1000:
magnitude += 1
num /= 1000.0
return '{}{}'.format('{:f}'.format(num).rstrip('0').rstrip('.'), ['', 'K', 'M', 'B', 'T'][magnitude])
df = pd.DataFrame(data=data)
df['2022 Population Text'] = df['2022 Population'].apply(lambda x: human_format(x))
data2022 = dict(type = 'choropleth',
colorscale = 'agsunset',
reversescale = True,
locations = df['Country/Territory'],
locationmode = 'country names',
z = df['2022 Population'],
text = df['CCA3'],
customdata = df['2022 Population Text'],
marker = dict(line = dict(color = 'rgb(12, 12, 12)', width=1)),
colorbar = {'title': 'Population'},
hovertemplate = "%{customdata}<br>%{text}<extra></extra>"
)
layout2022 = dict(title = '<b>World Population 2022<b>',
geo = dict(showframe = True,
showland = True, landcolor = 'rgb(198, 197, 198)',
showlakes = True, lakecolor = 'rgb(85, 173, 240)',
showrivers = True, rivercolor = 'rgb(173, 216, 230)',
showocean = True, oceancolor = 'rgb(173, 216, 230)',
projection = {'type': 'natural earth'}))
choromap2022 = go.Figure(data=[data2022], layout=layout2022)
choromap2022.update_geos(lataxis_showgrid = True, lonaxis_showgrid = True)
choromap2022.update_layout(height = 600,
title_x = 0.5,
title_font_color = 'red',
title_font_family = 'Times New Roman',
title_font_size = 30,
margin=dict(t=80, r=50, l=50),
)
choromap2022.show()
Note: Since China wasn't included in your sample data, I changed the population of AFG to 1412000000 to test that the hovertemplate would display it as '1.4B'.

How to highlight certain table rows in Plotly?

In my table from a dataset I need to highlight rows in bold that contain "All" in columns Building, Floor or Teams:
My code :
headerColor = 'darkgrey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'
fig_occ_fl_team = go.Figure(data=[go.Table(
header=dict(
values=list(final_table_occ_fl_team.columns),
line_color='black',
fill_color=headerColor,
align=['left','left','left','left','left','left','left','left','left','left'],
font=dict(color='black', size=9)
),
cells=dict(
values=[final_table_occ_fl_team['Building'],
final_table_occ_fl_team['Floor'],
final_table_occ_fl_team['Team'],
final_table_occ_fl_team['Number of Desks'],
final_table_occ_fl_team['Avg Occu (#)'],
final_table_occ_fl_team['Avg Occu (%)'],
final_table_occ_fl_team['Avg Occu 10-4 (#)'],
final_table_occ_fl_team['Avg Occu 10-4 (%)'],
final_table_occ_fl_team['Max Occu (#)'],
final_table_occ_fl_team['Max Occu (%)'],
],
line_color='black',
# 2-D list of colors for alternating rows
fill_color = [[rowOddColor,rowEvenColor]*56],
align = ['left','left','left','left','left','left','left','left','left','left'],
font = dict(color = 'black', size = 7)
))
])
fig_occ_fl_team.show()
Dataset head :
data = {'Building': {0: 'All',
1: '1LWP',
2: '1LWP',
3: '1LWP',
4: '1LWP',
5: '1LWP',
6: '1LWP',
7: '1LWP',
8: '1LWP',
9: '1LWP'},
'Floor': {0: 'All',
1: 'All',
2: '2nd',
3: '2nd',
4: '2nd',
5: '2nd',
6: '2nd',
7: '2nd',
8: '2nd',
9: '2nd'},
'Team': {0: 'All',
1: 'All',
2: 'All',
3: 'Anderson/Money',
4: 'Banking & Treasury',
5: 'Charities',
6: 'Client Management',
7: 'Compliance, Legal & Risk',
8: 'DFM',
9: 'Emmerson'},
'Number of Desks': {0: 2297,
1: 2008,
2: 381,
3: 22,
4: 8,
5: 19,
6: 9,
7: 41,
8: 20,
9: 33},
'Avg Occu (#)': {0: 1261,
1: 1126,
2: 195,
3: 14,
4: 4,
5: 9,
6: 5,
7: 21,
8: 13,
9: 18},
'Avg Occu (%)': {0: '55%',
1: '56%',
2: '51%',
3: '64%',
4: '50%',
5: '48%',
6: '56%',
7: '52%',
8: '65%',
9: '55%'},
'Avg Occu 10-4 (#)': {0: 851,
1: 759,
2: 132,
3: 8,
4: 3,
5: 6,
6: 3,
7: 14,
8: 9,
9: 12},
'Avg Occu 10-4 (%)': {0: '37%',
1: '38%',
2: '35%',
3: '37%',
4: '38%',
5: '32%',
6: '34%',
7: '35%',
8: '45%',
9: '37%'},
'Max Occu (#)': {0: 1901,
1: 1680,
2: 274,
3: 22,
4: 6,
5: 13,
6: 7,
7: 27,
8: 17,
9: 25},
'Max Occu (%)': {0: '83%',
1: '84%',
2: '72%',
3: '100%',
4: '75%',
5: '69%',
6: '78%',
7: '66%',
8: '85%',
9: '76%'}}

You can add the bold style to your dataframe prior to creating the table as follows:
import pandas as pd
df = pd.DataFrame().from_dict(data)
indices = df.index[(df[["Building","Floor","Team"]] == "All").all(1)]
for i in indices:
for j in range(len(df.columns)):
df.iloc[i,j] = "<b>{}</b>".format(df.iloc[i,j])
You can now create the table, I increase the size of font to 12:
import plotly.graph_objects as go
headerColor = 'darkgrey'
rowEvenColor = 'lightgrey'
rowOddColor = 'white'
fig_occ_fl_team = go.Figure(data=[go.Table(
header=dict(
values=list(df.columns),
line_color='black',
fill_color=headerColor,
align=['left','left','left','left','left','left','left','left','left','left'],
font=dict(color='black', size=9)
),
cells=dict(
values=[df['Building'],
df['Floor'],
df['Team'],
df['Number of Desks'],
df['Avg Occu (#)'],
df['Avg Occu (%)'],
df['Avg Occu 10-4 (#)'],
df['Avg Occu 10-4 (%)'],
df['Max Occu (#)'],
df['Max Occu (%)'],
],
line_color='black',
# 2-D list of colors for alternating rows
fill_color = [[rowOddColor,rowEvenColor]*56],
align = ['left','left','left','left','left','left','left','left','left','left'],
font = dict(color = 'black', size = 12)
))
])
fig_occ_fl_team.show()
Output:
You will notice that the first and forth columns are bold. If you want to keep the original dataframe unchanged, you can use such that df2 = df1.copy().

How do I select specific values in column and add character before

How could I select in column 'Funding' all the values ending with "M" and then eliminate M,$ and add "0," before value.
ex. from $535M to 0,535
That's beacuase I have Billion and Million values, I've decided to formatting the column in billion so, values in millions must be 0,...
df.head(10).to_dict()
{'Company': {0: 'Bytedance',
1: 'SpaceX',
2: 'SHEIN',
3: 'Stripe',
4: 'Klarna',
5: 'Canva',
6: 'Checkout.com',
7: 'Instacart',
8: 'JUUL Labs',
9: 'Databricks'},
'Valuation': {0: '$180B',
1: '$100B',
2: '$100B',
3: '$95B',
4: '$46B',
5: '$40B',
6: '$40B',
7: '$39B',
8: '$38B',
9: '$38B'},
'Date Joined': {0: '2017-04-07',
1: '2012-12-01',
2: '2018-07-03',
3: '2014-01-23',
4: '2011-12-12',
5: '2018-01-08',
6: '2019-05-02',
7: '2014-12-30',
8: '2017-12-20',
9: '2019-02-05'},
'Industry': {0: 'Artificial intelligence',
1: 'Other',
2: 'E-commerce & direct-to-consumer',
3: 'Fintech',
4: 'Fintech',
5: 'Internet software & services',
6: 'Fintech',
7: 'Supply chain, logistics, & delivery',
8: 'Consumer & retail',
9: 'Data management & analytics'},
'City': {0: 'Beijing',
1: 'Hawthorne',
2: 'Shenzhen',
3: 'San Francisco',
4: 'Stockholm',
5: 'Surry Hills',
6: 'London',
7: 'San Francisco',
8: 'San Francisco',
9: 'San Francisco'},
'Country': {0: 'China',
1: 'United States',
2: 'China',
3: 'United States',
4: 'Sweden',
5: 'Australia',
6: 'United Kingdom',
7: 'United States',
8: 'United States',
9: 'United States'},
'Continent': {0: 'Asia',
1: 'North America',
2: 'Asia',
3: 'North America',
4: 'Europe',
5: 'Oceania',
6: 'Europe',
7: 'North America',
8: 'North America',
9: 'North America'},
'Year Founded': {0: 2012,
1: 2002,
2: 2008,
3: 2010,
4: 2005,
5: 2012,
6: 2012,
7: 2012,
8: 2015,
9: 2013},
'Funding': {0: '$8B',
1: '$7B',
2: '$2B',
3: '$2B',
4: '$4B',
5: '$572M',
6: '$2B',
7: '$3B',
8: '$14B',
9: '$3B'},
'Select Investors': {0: 'Sequoia Capital China, SIG Asia Investments, Sina Weibo, Softbank Group',
1: 'Founders Fund, Draper Fisher Jurvetson, Rothenberg Ventures',
2: 'Tiger Global Management, Sequoia Capital China, Shunwei Capital Partners',
3: 'Khosla Ventures, LowercaseCapital, capitalG',
4: 'Institutional Venture Partners, Sequoia Capital, General Atlantic',
5: 'Sequoia Capital China, Blackbird Ventures, Matrix Partners',
6: 'Tiger Global Management, Insight Partners, DST Global',
7: 'Khosla Ventures, Kleiner Perkins Caufield & Byers, Collaborative Fund',
8: 'Tiger Global Management',
9: 'Andreessen Horowitz, New Enterprise Associates, Battery Ventures'}}
I did a similar manipulation with Valuation, here is how I did. I hope it's right.
df['Valuation'] = df['Valuation'].str.replace(
"B","").str.replace(
"$","").astype(int)
I've tried in several way but none of them works. Here are some of them:
df['Funding'] = np.where(df.Funding.str.contain("M"),
df['Funding'] = ('0,'+ df['Funding']),
pass)
df['Funding'] = df['Funding'].str.replace(
"B", "").str.replace(
"$","").str.replace(
"M","0,")
if df['Funding'].str.contains("M").any():
df['Funding'] = df['Funding'].str.replace("M", "")
asd = "M"
if any(("M" in asd) for M in df['Funding']):
df['Funding'].join((df['Funding'][:0],'0,',df['Funding'][0:])) and replace("M", "")
Thank to all who want to help me. It's my first time with Python, I'm more familiare with R

If you want all your column values in billions, you can use:
df["Valuation"] = df["Funding"].str[1:-1].astype(int).where(df["Funding"].str.endswith("B"),df["Funding"].str[1:-1].astype(int).div(1000))
>>> df
Funding Valuation
0 $8B 8.000
1 $2B 2.000
2 $535M 0.535
Input df:
df = pd.DataFrame({"Funding": ["$8B", "$2B", "$535M"]})

Multiple pie charts from pandas dataframe

I have the following dataframe:
df = pd.DataFrame({'REC2': {0: '18-24',
1: '18-24',
2: '25-34',
3: '25-34',
4: '35-44',
5: '35-44',
6: '45-54',
7: '45-54',
8: '55-64',
9: '55-64',
10: '65+',
11: '65+'},
'Q8_1': {0: 'No',
1: 'Yes',
2: 'No',
3: 'Yes',
4: 'No',
5: 'Yes',
6: 'No',
7: 'Yes',
8: 'No',
9: 'Yes',
10: 'No',
11: 'Yes'},
'val': {0: 0.9642857142857143,
1: 0.03571428571428571,
2: 0.8208955223880597,
3: 0.1791044776119403,
4: 0.8507462686567164,
5: 0.14925373134328357,
6: 0.8484848484848485,
7: 0.15151515151515152,
8: 0.8653846153846154,
9: 0.1346153846153846,
10: 0.9375,
11: 0.0625}})
which looks like this:
I am trying to create a separate pie chart for each age bin. Currently I am using a hardcoded version, where I need to type in all the available bins. However, I am looking for a solution that does this within a loop or automatically asigns the correct bins. This is my current solution:
df = data.pivot_table(values="val",index=["REC2","Q8_1"])
rcParams['figure.figsize'] = (6,10)
f, a = plt.subplots(3,2)
df.xs('18-24').plot(kind='pie',ax=a[0,0],y="val")
df.xs('25-34').plot(kind='pie',ax=a[1,0],y="val")
df.xs('35-44').plot(kind='pie',ax=a[2,0],y="val")
df.xs('45-54').plot(kind='pie',ax=a[0,1],y="val")
df.xs('55-64').plot(kind='pie',ax=a[1,1],y="val")
df.xs('65+').plot(kind='pie',ax=a[2,1],y="val")
Output:

I think you want:
df.groupby('REC2').plot.pie(x='Q8_1', y='val', layout=(2,3))
Update: I take a look and it turns out that groupby.plot does a different thing. So you can try the for loop:
df = df.set_index("Q8_1")
f, a = plt.subplots(3,2)
for age, ax in zip(set(df.REC2), a.ravel()):
df[df.REC2.eq(age)].plot.pie( y='val', ax=ax)
plt.show()
which yields:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Turn my stacked bar chart into a 100% stacked bar chart - python

Related

Pipeline & ColumnTransformer: ValueError: Selected columns are not unique in dataframe

Plotly - set decimal place in choropleth

How to highlight certain table rows in Plotly?

How do I select specific values in column and add character before

Multiple pie charts from pandas dataframe

Categories

Resources