dates.YearLocator() does not show years - python

I unfortunately cannot upload my dataset but here is how my dataset looks like:
UMTMVS month
DATE
1992-01-01 209438.0 1
1992-02-01 232679.0 2
1992-03-01 249673.0 3
1992-04-01 239666.0 4
1992-05-01 243231.0 5
1992-06-01 262854.0 6
1992-07-01 222832.0 7
1992-08-01 240299.0 8
1992-09-01 260216.0 9
1992-10-01 252272.0 10
1992-11-01 245261.0 11
1992-12-01 245603.0 12
1993-01-01 223258.0 1
1993-02-01 246941.0 2
1993-03-01 264886.0 3
1993-04-01 249181.0 4
1993-05-01 250870.0 5
1993-06-01 271047.0 6
1993-07-01 224077.0 7
1993-08-01 248963.0 8
1993-09-01 269227.0 9
1993-10-01 263075.0 10
1993-11-01 256142.0 11
1993-12-01 252830.0 12
1994-01-01 234097.0 1
1994-02-01 259041.0 2
1994-03-01 277243.0 3
1994-04-01 261755.0 4
1994-05-01 267573.0 5
1994-06-01 287336.0 6
1994-07-01 239931.0 7
1994-08-01 276947.0 8
1994-09-01 291357.0 9
1994-10-01 282489.0 10
1994-11-01 280455.0 11
1994-12-01 279888.0 12
1995-01-01 260175.0 1
1995-02-01 286290.0 2
1995-03-01 303201.0 3
1995-04-01 283129.0 4
1995-05-01 289257.0 5
1995-06-01 310201.0 6
1995-07-01 255163.0 7
1995-08-01 293605.0 8
1995-09-01 313228.0 9
1995-10-01 301301.0 10
1995-11-01 293164.0 11
1995-12-01 290963.0 12
1996-01-01 263041.0 1
1996-02-01 290317.0 2
I want to set a locator for each year and ran the following code
ax = df.UMTMVS.plot(figsize=(12, 5))
ax.xaxis.set_major_locator(dates.YearLocator())
but it simply gives the following figure without any locator at all
Why does the locator fail to point out the years?

Try applying set_major_locator() to the axis before df.plot(). Like this:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
# reading your sample data into dataframe
df = pd.read_clipboard()
# dates should be dates (datetime), not strings
df.index = df.index.to_series().apply(pd.to_datetime)
fig, ax = plt.subplots(1,1,figsize=(12, 5))
# set locator before df.plot()
ax.xaxis.set_major_locator(dates.YearLocator())
df.UMTMVS.plot()
Result:
Slightly different result could be achieved with last bit of code above modified to the following:
fig, ax = plt.subplots(1,1,figsize=(12, 5))
ax.plot(df.UMTMVS)
ax.xaxis.set_major_locator(dates.YearLocator())
plt.xlabel('DATE')
plt.show()
Result_alt (note the "padding" and loss of minor ticks):

Related

python: cumulative density plot

I have the following dataframe:
df =
Time_to_event event
0 0 days 443
1 1 days 226
2 2 days 162
3 3 days 72
4 4 days 55
5 5 days 30
6 6 days 36
7 7 days 18
8 8 days 15
9 9 days 14
10 10 days 21
11 11 days 13
12 12 days 10
13 13 days 10
14 14 days 8
I want to produce a cumulative density plot of the sum of the events per days. For example 0 days 443, 1 days = 443 + 226 etc.
I am currently trying this code:
stat = "count" # or proportion
sns.histplot(df, stat=stat, cumulative=True, alpha=.4)
but I come up with a pretty terrible plot:
If I could also come up with a line instead of bars that would be awesome!
You can try a combo of pandas.Series.cumsum and seaborn.lineplot :
df["cumsum"] = df["event"].cumsum()
plt.figure(figsize=(6,4))
sns.lineplot(x="Time_to_event", y="cumsum", data=df);
Output :
I think what you are looking for your plot values is:
xvalues=df["Time_to_event"]
yvalues=df["event"].cumsum()
The code could look like this:
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("test.txt")
print(df.columns)
print(df)
plt.bar(df["Time_to_event"],df["event"].cumsum())
# replace plt.bar with plt.plot for a plotted diagram
plt.show()

Add x-axis to matplotlib with multiple y-axis line chart

How do I add the x-axis(Month) to a simple Matplotlib
My Dataset:
Month Views CMA30
0 11 24662 24662.000000
1 11 2420 13541.000000
2 11 11318 12800.000000
3 11 8529 11732.250000
4 10 78861 25158.000000
5 10 1281 21178.500000
6 10 22701 21396.000000
7 10 17088 20857.500000
This is my code:
df[['Views', 'CMA30']].plot(label='Views', figsize=(5, 5))
This is giving me Views and CMA30 on the y-axis. How do I add Month(1-12) on the x-axis?
If you average the values per month, then try groupby/mean:
df.groupby('Month')[['Views','CMA30']].mean().plot(label='Views', figsize=(5, 5))

How to add up more data in an existing plotly graph?

I have successfully plotted the below data using plotly from an Excel file.
Here is my code:
file_loc1 = "AgeGroupData_time_to_treatment.xlsx"
df_centroid_CoordNew = pd.read_excel(file_loc1, index_col=None, na_values=['NA'], usecols="C:D,AB")
df_centroid_CoordNew.head()
df_centroid_Coord['Ambulance_Treatment_Time'] = df_centroid_Coord ['Base_TT']
fig = px.scatter(df_centroid_Coord, x="x", y="y",
title="Southern Region Centroids",
color='Ambulance_Treatment_Time',
hover_name="KnNamn",
hover_data= ['Ambulance_Treatment_Time', "TotPop"],
log_x=True, size_max=60,
color_continuous_scale='Reds', range_color=(0.5,2), width=1250, height=1000)
fig.update_traces(marker={'size': 8, 'symbol': 1})
#fig.update_traces(marker={'symbol': 1})
fig.update_layout(paper_bgcolor="LightSteelBlue")
fig.show()
The shapes of the plotted data points are square.
Here is output of my code:
Now, I want to plot more data points in circle or any shapes on the same plotly graph by reading an excel file. Please have a look at the data below.
How I can add up the new data to an existing graph in plotly?
Map data with total population and treatment time (Base_TT):
ID KnNamn x y TotPop Base_TT
1 2 Växjö 14.662290 57.027520 9 1.599971
2 3 Bromölla 14.494072 56.065635 264 1.307165
3 4 Trelleborg 13.219968 55.478675 40 1.411554
4 5 Tomelilla 14.005013 55.721209 6 1.968138
5 6 Halmstad 12.737361 56.710973 386 1.309849
6 7 Alvesta 14.566685 56.748729 47 1.719117
7 8 Laholm 13.241388 56.413591 0 2.000620
8 9 Tingsryd 14.943081 56.542837 16 1.668725
9 10 Sölvesborg 14.574474 56.056953 1147 1.266862
10 11 Halmstad 13.068009 56.635666 38 1.589239
11 12 Tingsryd 14.699642 56.479597 3 1.960050
12 13 Vellinge 13.029769 55.484749 61 1.254957
13 14 Örkelljunga 13.169010 56.232819 12 1.429789
14 15 Svalöv 13.059068 55.853696 26 1.553722
15 16 Sjöbo 13.738205 55.601936 6 1.326429
16 17 Hässleholm 13.729872 56.347672 13 1.709021
17 18 Olofström 14.588037 56.290604 6 1.444833
18 19 Eslöv 13.168712 55.900311 3 1.527547
19 20 Ronneby 15.024222 56.273317 3 1.692005
20 21 Ängelholm 12.910101 56.246689 19 1.090544
Ambulance Data:
ID Ambulance station name Longtitude Latitude
0 1 Älmhult 14.128734 56.547992
1 2 Ängelholm 12.870739 56.242114
2 3 Alvesta 14.549503 56.920740
3 4 Östra Ljungby 13.057450 56.188099
4 5 Broby 14.080958 56.254481
5 6 Bromölla 14.466869 56.072272
6 7 Förslöv 12.814913 56.350098
7 9 Hässleholm 13.778234 56.161536
8 10 Höganäs 12.556995 56.206016
9 11 Hörby 13.643265 55.849811
10 12 Halmstad, Väster 12.819960 56.674306
11 13 Halmstad, Öster 12.882289 56.676871
12 14 Helsingborg 12.738642 56.084708
13 15 Hyltebruk 13.238277 56.993058
14 16 Karlshamn 14.854022 56.186596
15 17 Karlskrona 15.606300 56.183054
16 18 Kristianstad 14.171371 56.031201
17 20 Löddeköpinge 12.995037 55.766946
18 21 Laholm 13.033763 56.498955
19 22 Landskrona 12.867245 55.872659
20 23 Lenhovda 15.283913 57.001953
21 24 Lessebo 15.267357 56.756860
22 25 Ljungby 13.935399 56.835023
23 26 Lund 13.226607 55.695212
24 27 Markaryd 13.591491 56.452057
25 28 Olofström 14.545848 56.272221
26 29 Osby 13.983674 56.384833
27 30 Perstorp 13.388304 56.130752
28 31 Ronneby 15.280554 56.211863
29 32 Sölvesborg 14.570503 56.052113
30 33 Simrishamn 14.338632 55.552765
Merged Dataset for plotting
KnNamn x y TotPop Base_TT Ambulance station name Longtitude Latitude
Växjö 14.66229 57.02752 9 1.599971 Ängelholm 12.87074 56.24211
Bromölla 14.49407 56.06564 264 1.307165 Alvesta 14.5495 56.92074
Trelleborg 13.21997 55.47868 40 1.411554 Östra Ljungby 13.05745 56.1881
Tomelilla 14.00501 55.72121 6 1.968138 Broby 14.08096 56.25448
Halmstad 12.73736 56.71097 386 1.309849
Alvesta 14.56669 56.74873 47 1.719117
Laholm 13.24139 56.41359 0 2.00062
Tingsryd 14.94308 56.54284 16 1.668725
If the data is the same but the column names are different, aligning to either column name is fine for the data for the chart.
Add a graph with a graph object by reusing the graph data created with plotly.express. First I added a chart that was already completed, then a chart with latitude and longitude. Station names and locations are drawn using scatterplot markers and text mode.
df_station.rename(columns={'Longtitude':'x', 'Latitude':'y'}, inplace=True)
import plotly.express as px
import plotly.graph_objects as go
df_centroid_Coord['Ambulance_Treatment_Time'] = df_centroid_Coord ['Base_TT']
sca = px.scatter(df_centroid_Coord, x="x", y="y",
title="Southern Region Centroids",
color='Ambulance_Treatment_Time',
hover_name="KnNamn",
#hover_data= ['Ambulance_Treatment_Time', "TotPop"],
log_x=True,
size_max=60,
color_continuous_scale='Reds',
range_color=(0.5,2),
)
sca.update_traces(marker={'size': 8, 'symbol': 1})
fig = go.Figure()
fig.add_trace(go.Scatter(sca.data[0]))
fig.add_trace(go.Scatter(x=df_station['x'],
y=df_station['y'],
mode='markers+text',
text=df_station['Ambulance station name'],
textposition='top center',
showlegend=False,
marker=dict(
size=5,
symbol=2,
color='blue'
)
)
)
#fig.update_traces(marker={'symbol': 1})
fig.update_layout(width=625, height=500, paper_bgcolor="LightSteelBlue")
fig.show()

How to plot small floating numbers properly

How to plot the set of numbers like (first column is x-axis, second column is y-axis):
1 3.4335e-14
2 5.8945e-28
3 6.7462e-42
4 5.7908e-56
5 3.9765e-70
6 2.2756e-84
7 1.1162e-98
8 4.7904e-113
9 1.8275e-127
10 6.2749e-142
11 1.9586e-156
12 5.6041e-171
13 1.4801e-185
14 3.6300e-200
15 8.3091e-215
16 1.7831e-229
17 3.6013e-244
18 6.8694e-259
19 1.2414e-273
For now I get:
And I can't figure out how to make it properly. It means no flat line from 2 to the end and correct y-axis values. I read these values from the file with:
x_values.append(line.split(' ')[0])
y_values.append(float(line.split(' ')[1]))
You may wish to switch the yscale to "log" scale, e.g.:
import matplotlib.ticker as mtick
_,ax = plt.subplots()
plt.plot(x,y)
plt.xticks(x)
plt.yscale("log")
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2e'));

Is there an option to add standard errors for bar plot which is grouped by of 2 elements?

i'm a bit stuck with a plotting issue here, i have this dataframe:
experiment_type date hour AVG STD
1 1 280917 0730 0.249848 0.05733176946343718
2 2 280917 0730 0.328861 0.057735162344068565
3 3 280917 0730 0.302126 0.04303528661289821
4 4 280917 0730 0.212397 0.047732078563537034
5 5 280917 0730 0.297650 0.06917274408851469
6 6 280917 0730 0.306201 0.058643980490341
7 1 280917 1000 0.355719 0.10123070455064967
8 2 280917 1000 0.318242 0.06653079852300682
9 3 280917 1000 0.400407 0.0551857288095858
10 4 280917 1000 0.392078 0.07128036827900652
11 5 280917 1000 0.458792 0.0536016257165336
12 6 280917 1000 0.421946 0.09203557459964495
13 1 280917 1130 0.326355 0.07685731886302632
14 2 280917 1130 0.295412 0.05515868490280801
15 3 280917 1130 0.369003 0.052296418927459745
16 4 280917 1130 0.310969 0.058653995798575775
17 5 280917 1130 0.391034 0.0848147338348273
18 6 280917 1130 0.328540 0.0685519298043828
19 1 021017 0730 0.371137 0.06654942076753678
20 2 021017 0730 0.590593 0.08694478976189386
21 3 021017 0730 0.509631 0.09217340399261317
22 4 021017 0730 0.588429 0.11754539860104395
23 5 021017 0730 0.759006 0.03217804532952569
24 6 021017 0730 0.516125 0.10400866621070887
25 1 021017 1200 0.562901 0.07442696030744335
26 2 021017 1200 0.584997 0.09530613874682822
27 3 021017 1200 0.368201 0.06716307188306521
28 4 021017 1200 0.323314 0.07897174337377368
29 5 021017 1200 0.573152 0.055731097595140985
30 6 021017 1200 0.536843 0.0250192994887813
31 1 101017 0730 0.566245 0.05591184701727823
32 2 101017 0730 0.740925 0.0298011175002202
33 3 101017 0730 0.812121 0.020692910083544295
34 4 101017 0730 0.732448 0.03678606897543907
35 5 101017 0730 0.716778 0.03991758033052914
36 6 101017 0730 0.696405 0.015314129335472805
each row will be grouped by date, the x axis will be the experiment_type and clustered by hours , the y axis is the AVG and the standard deviation is STD.
now i have got everything working but the standard deviation part.
can someone please help me add it ?
here is the current result: barplot
ps: i would also like to know how to lable each y_axis and x_axis
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
date_list = list(set(df['date']))
fig, axes = plt.subplots(nrows=len(date_list), ncols=1)
fig.subplots_adjust(hspace=1, wspace=3)
for i in range(len(date_list)):
date_separator = df[df['date'] == date_list[i]]
groupby_experimentType_hour = date_separator.groupby(['experiment_type ', 'hour' ])
AVG = groupby_experimentType_hour['AVG'].aggregate(np.sum).unstack()
std = groupby_experimentType_hour['STD'].aggregate(np.sum).unstack()
AVG.plot( ax=axes[i] , kind = 'bar', title = date_list[i])
plt.show()
i tried:
AVG.plot( ax=axes[i] , kind = 'bar', title = date_list[i], yerr=std)
but got this error:
AttributeError: 'NoneType' object has no attribute 'update'
I just tried running this and it worked fine:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
a = {'exp':[1,2,3,4,5,6,7,8,9,10],'date':[280917,280917,280917,280917,'021017','021017','021017','021017',101017,101017],'hour':['0730','0730','0730',1000,1000,1000,1130,1130,'0730','0730'],'AVG':[12,13,15,31,23,25,25,21,20,14],'STD':[1,3,6,3,2,3,5,1,2,4]}
df = pd.DataFrame(a)
date_list = list(set(df['date']))
fig, axes = plt.subplots(nrows=len(date_list), ncols=1)
fig.subplots_adjust(hspace=1, wspace=3)
for i in range(len(date_list)):
date_separator = df[df['date'] == date_list[i]]
groupby_experimentType_hour = date_separator.groupby(['exp', 'hour' ])
AVG = groupby_experimentType_hour['AVG'].aggregate(np.sum).unstack()
std = groupby_experimentType_hour['STD'].aggregate(np.sum).unstack()
AVG.plot( ax=axes[i] , kind = 'bar', title = date_list[i], yerr=std)
plt.show()
I created an example to show like work:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
df=pd.DataFrame()
df['index']=[1,2,3,4,5,6,7,8,9,10]
df['experiment_type ']=[1,2,3,4,5,1,2,3,4,5]
df['hour']=['0730','0730','0730','1000','1000','1000','1000','0730','1000','0730']
df['AVG']=[0.01,0.02,0.1,0.2,0.3,0.1,0.5,0.6,0.9,0.7]
df['STD']=[0.05,0.05,0.05,0.04,0.02,0.1,0.1,0.09,0.05,0.2]
df['date']=['280917','280917','280917','280917','280917','021017','021017','021017','021017','021017']
df.set_index('index')
Now slightly modifying your code:
date_list = list(set(df['date']))
fig, axes = plt.subplots(nrows=len(date_list), ncols=1,figsize=(12,12))
fig, axes = plt.subplots(nrows=len(date_list), ncols=1,figsize=(12,12))
for i in range(len(date_list)):
date_separator = df[df['date'] == date_list[i]]
groupby_experimentType_hour = date_separator.groupby(['experiment_type ', 'hour' ])
AVG_STD = groupby_experimentType_hour['AVG','STD'].aggregate(np.sum).unstack()
ax=AVG_STD.plot( ax=axes[i] , kind = 'bar', title = date_list[i],fontsize=20)
ax.set_ylabel('AVG/STD',fontsize=20)
ax.set_xlabel('experiment_type',fontsize=20)
plt.show()
Output:
if you look at the legend, classify by hour and AVG/STD
Now you just have to apply it to your dataframe!

Categories

Resources