adding label number to first and last observation - python

I have created the following graph. I would like to annotate the first value and the last value for each of the line created in the graph. The value would be annotated for each line just before the line and at its end
How to?
data = {'Time':['1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2', '1', '2' , '1', '2'], 'Country':['Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Italy', 'Russia', 'Russia', 'Russia', 'Russia' , 'Russia', 'Russia', 'Russia', 'Russia'], 'Score':[20, 21, 14, 15, 19, 18, 5, 9, 5, 8, 3, 3, 5, 1, 3, 8]}
df = pd.DataFrame(data)
sns.lineplot(x="Time", y="Score", hue="Country", data=df)

Without more details on what you exactly want you can use the axes object returned by sns.lineplot and then use the text method of the axes object.
In the following I calculate the mean 'Score' for each country at 'Time' = 1 and then then add this as text to the graph. You can customise this as needed:
ax = sns.lineplot(x="Time", y="Score", hue="Country", data=df)
means = df.groupby(['Country','Time']).mean()
times = ['1', '2']
x_positions = [0.02, 0.9]
for country in df['Country'].unique():
for time, xpos in zip(times, x_positions):
mean = means.loc[(country, time)].values
ax.text(xpos, mean-1.1, mean[0])

Related

pandas : drop duplicates in the same time when grouping by

im doing a simple groupby on my data as shown in the code below. Is there a manner to do it directly without the drop_duplicates please, in the same line of code?
Thank you
df_brut['Revenue'] = df_brut.groupby(['cod', 'date', 'zone'])['Revenue'].transform('sum')
df_brut = df_brut.drop_duplicates()
df_brut.columns = ['cod','date', 'zone','SUM_']
My data
data1 = {'date': ['2021-06', '2021-06', '2021-07', '2021-07', '2021-07', '2021-07'], 'cod': ['12', '12', '14', '15', '15', '18'], 'zone': ['LA', 'LA', 'LA', 'PARIS', 'PARIS', 'PARIS'], 'Revenue': [10, 20, 30, 50, 40, 10]}
df_brut= pd.DataFrame(data1)
the grouped data expected is
data2 = {'date': ['2021-06', '2021-07', '2021-07', '2021-07'], 'cod': ['12', '14', '15','18'], 'zone': ['LA', 'LA', 'PARIS', 'PARIS'], 'SUM_': [30, 30, 90, 10]}
df_grouped= pd.DataFrame(data2)
You could do:
(df_brut.groupby(['cod', 'date', 'zone'], as_index=False)['Revenue']
.sum()
.rename({'Revenue': '_SUM'}, axis=1)
)

add alpha (shade) to plotly express stacked bar

I have the following pandas dataframe
import pandas as pd
df_dict = {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8], 'columns': ['from', 'moving', 'N', 'total', 'perc', 'helper', 'label'], 'data': [['0', 'no', 29, 39, 74.35897435897436, 'all', '74.4 %'], ['0', 'yes', 10, 39, 25.641025641025642, 'all', '25.6 %'], ['1', 'no', 77, 84, 91.66666666666667, 'all', '91.7 %'], ['1', 'yes', 7, 84, 8.333333333333334, 'all', '8.3 %'], ['2', 'no', 6, 6, 100.0, 'all', '100.0 %'], ['3', 'no', 19, 25, 76.0, 'all', '76.0 %'], ['3', 'yes', 6, 25, 24.0, 'all', '24.0 %'], ['4', 'no', 30, 45, 66.66666666666667, 'all', '66.7 %'], ['4', 'yes', 15, 45, 33.333333333333336, 'all', '33.3 %']]}
df = pd.DataFrame(index=df_dict['index'], columns=df_dict['columns'], data=df_dict['data'])
I am using the following code:
import plotly.express as px
def pl(dt, color_col, title, facet_col=None,
color_discrete_map=dict(zip(['4', '0', '2', '3', '1'],['#003898', '#164461','#61B3C1', '#8ED3F6 ','#8DD1C8']))):
px.bar(dt, x='helper', y='perc', color=color_col, facet_col=facet_col, category_orders={col: sorted(dt_temp[col].unique())},
color_discrete_map=color_discrete_map, title=title, text='label').show()
pl(dt=df, facet_col='from',
color_col='from', title='title')
In order to produce this plot:
I would like to add a shading of the specified color in the color_discrete_map with respect to the moving column of the df, so that the no's are a bit more faded.
How could I do that with plotly express ?
I don't believe that you can access the text font through any of the px.bar parameters. However, you can save your px.bar in an object called fig, then directly modify the each bar object through fig.data[0], fig.data[1], ... fig.data[n-1] for n bars.
The text color for each of these bars can be modified by passing a dictionary to the textcolor attribute of each of bar objects fig.data[0]... fig.data[n-1]. For example, you could modify the text of the first bar to be red with the line: fig.data[0].textcolor: {'color': 'red'}. This lends itself to looping through each fig.data bar object and modifying the textcolor attribute to be the desired color.
The last part is to make your color a shade of the bar color. I am not that familiar with hex color codes, so it makes the most sense to convert each hex color code to an rgb tuple of three values, and find the intermediate color between this value and white. plotly.colors conveniently has the methods hex_to_rgb and find_intermediate_color so we can use these to convert each of your hex colors to rgb, then find the rgb tuple between that color and white which is rgb(255,255,255).
To be consistent with the way you've structured your program, I put the code setting the textfont attributes in your pl function.
import pandas as pd
import plotly.express as px
from plotly.colors import hex_to_rgb, find_intermediate_color
df_dict = {'index': [0, 1, 2, 3, 4, 5, 6, 7, 8], 'columns': ['from', 'moving', 'N', 'total', 'perc', 'helper', 'label'], 'data': [['0', 'no', 29, 39, 74.35897435897436, 'all', '74.4 %'], ['0', 'yes', 10, 39, 25.641025641025642, 'all', '25.6 %'], ['1', 'no', 77, 84, 91.66666666666667, 'all', '91.7 %'], ['1', 'yes', 7, 84, 8.333333333333334, 'all', '8.3 %'], ['2', 'no', 6, 6, 100.0, 'all', '100.0 %'], ['3', 'no', 19, 25, 76.0, 'all', '76.0 %'], ['3', 'yes', 6, 25, 24.0, 'all', '24.0 %'], ['4', 'no', 30, 45, 66.66666666666667, 'all', '66.7 %'], ['4', 'yes', 15, 45, 33.333333333333336, 'all', '33.3 %']]}
df = pd.DataFrame(index=df_dict['index'], columns=df_dict['columns'], data=df_dict['data'])
bar_color_map = dict(zip(['4', '0', '2', '3', '1'],['#003898', '#164461','#61B3C1', '#8ED3F6','#8DD1C8']))
def pl(dt, color_col, title, facet_col=None, color_discrete_map=bar_color_map):
fig = px.bar(dt, x='helper', y='perc', color=color_col, facet_col=facet_col,
# category_orders={col: sorted(dt_temp[col].unique())},
category_orders={color_col: sorted(dt[color_col].unique())},
color_discrete_map=color_discrete_map, title=title, text='label')
## set fig.data.textfont attribute
for bar_number in range(len(fig.data)):
bar_color = hex_to_rgb(bar_color_map[str(bar_number)])
shaded_text_color = find_intermediate_color(bar_color,(255,255,255),0.5)
shaded_int_rgb_color = tuple([int(text_color) for text_color in shaded_text_color])
# print('rgb'+str(shaded_int_rgb_color))
fig.data[bar_number].textfont = {'color': 'rgb'+str(shaded_int_rgb_color)}
fig.show()
pl(dt=df, facet_col='from', color_col='from', title='title')

Index error while concatenating two dataframes in pandas

I get the following error
pandas.core.indexes.base.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
On code
dfp = pd.concat([df, tdf], axis=1)
I am trying to concatenate columns of tdf to the columns of df.
For these print statements
print(df.shape)
print(tdf.shape)
print(df.columns)
print(tdf.columns)
print(df.index)
print(tdf.index)
I get the following output:
(70000, 25)
(70000, 20)
Index(['300', '301', '302', '303', '304', '305', '306', '307', '308', '309',
'310', '311', '312', '313', '314', '315', '316', '317', '318', '319',
'320', '321', '322', '323', '324'],
dtype='object')
Index(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13',
'14', '15', '16', '17', '18', '19', '20'],
dtype='object')
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
...
9990, 9991, 9992, 9993, 9994, 9995, 9996, 9997, 9998, 9999],
dtype='int64', length=70000)
RangeIndex(start=0, stop=70000, step=1)
Any idea what is the issue? Why would indexing be a problem? Indexes are supposed to be the same since I concat columns, not rows. Column values seem to be perfectly different.
Thanks!
The problem is that df is not uniquely indexed. So you need to either reset the index
pd.concat([df.reset_index(),tdf], axis=1)
or drop it
pd.concat([df.reset_index(drop=True),tdf], axis=1)

pandas pivot aggregate percentage by index value counts

I have a dataframe as follows
df = pd.DataFrame({'date': ['2013-04-01','2013-04-01','2013-04-01','2013-04-02', '2013-04-02'],
'month': ['1','1','3','3','5'],
'pmonth': ['1', '1', '2', '5', '5'],
'duration': [30, 15, 20, 15, 30],
'user_id': ['10', '20', '30', '40', '50']})
I can calculate the percent of userid count by date,month and pmonth using
pd.crosstab(index=[df.date,df.month,df.pmonth],columns=df.duration,values=df.user_id,normalize ='index',aggfunc='count')
But i want to calculate the percent of userid in date,month combination only, is it possible using crosstab.

Plot a list of dictionaries using matplotlib

List =
[{'Month': '1', 'Store': 'A', 'Sales': '100'},
{'Month': '2', 'Store': 'A', 'Sales': '50'},
{'Month': '3', 'Store': 'A', 'Sales': '200'},
{'Month': '1', 'Store': 'B', 'Sales': '300'},
{'Month': '2', 'Store': 'B', 'Sales': '200'},
{'Month': '3', 'Store': 'B', 'Sales': '250'}]
I do know how to plot the basic line.
But how can I have a combined result with both data set?
Like this Expected result
This will do it. Place things in pandas simplify this - also, plot multiple line and then all will be shown on the same chart.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(your_data)
df[['Month', 'Sales']] = df[['Month', 'Sales']].apply(pd.to_numeric, errors='coerce')
a = df[df.Store == 'A']
b = df[df.Store == 'B']
fig = plt.figure(figsize=(10,6))
ax = fig.add_subplot(111)
a.plot('Month', 'Sales', ax=ax)
b.plot('Month', 'Sales', ax=ax)
ax.grid(True)
fig.set_facecolor('white')

Categories

Resources