python: cumulative density plot

python: cumulative density plot - python

I have the following dataframe:
df =
Time_to_event event
0 0 days 443
1 1 days 226
2 2 days 162
3 3 days 72
4 4 days 55
5 5 days 30
6 6 days 36
7 7 days 18
8 8 days 15
9 9 days 14
10 10 days 21
11 11 days 13
12 12 days 10
13 13 days 10
14 14 days 8
I want to produce a cumulative density plot of the sum of the events per days. For example 0 days 443, 1 days = 443 + 226 etc.
I am currently trying this code:
stat = "count" # or proportion
sns.histplot(df, stat=stat, cumulative=True, alpha=.4)
but I come up with a pretty terrible plot:
If I could also come up with a line instead of bars that would be awesome!

You can try a combo of pandas.Series.cumsum and seaborn.lineplot :
df["cumsum"] = df["event"].cumsum()
plt.figure(figsize=(6,4))
sns.lineplot(x="Time_to_event", y="cumsum", data=df);
Output :

I think what you are looking for your plot values is:
xvalues=df["Time_to_event"]
yvalues=df["event"].cumsum()
The code could look like this:
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("test.txt")
print(df.columns)
print(df)
plt.bar(df["Time_to_event"],df["event"].cumsum())
# replace plt.bar with plt.plot for a plotted diagram
plt.show()

Related

Add x-axis to matplotlib with multiple y-axis line chart

How do I add the x-axis(Month) to a simple Matplotlib
My Dataset:
Month Views CMA30
0 11 24662 24662.000000
1 11 2420 13541.000000
2 11 11318 12800.000000
3 11 8529 11732.250000
4 10 78861 25158.000000
5 10 1281 21178.500000
6 10 22701 21396.000000
7 10 17088 20857.500000
This is my code:
df[['Views', 'CMA30']].plot(label='Views', figsize=(5, 5))
This is giving me Views and CMA30 on the y-axis. How do I add Month(1-12) on the x-axis?

If you average the values per month, then try groupby/mean:
df.groupby('Month')[['Views','CMA30']].mean().plot(label='Views', figsize=(5, 5))

How to add up more data in an existing plotly graph?

I have successfully plotted the below data using plotly from an Excel file.
Here is my code:
file_loc1 = "AgeGroupData_time_to_treatment.xlsx"
df_centroid_CoordNew = pd.read_excel(file_loc1, index_col=None, na_values=['NA'], usecols="C:D,AB")
df_centroid_CoordNew.head()
df_centroid_Coord['Ambulance_Treatment_Time'] = df_centroid_Coord ['Base_TT']
fig = px.scatter(df_centroid_Coord, x="x", y="y",
title="Southern Region Centroids",
color='Ambulance_Treatment_Time',
hover_name="KnNamn",
hover_data= ['Ambulance_Treatment_Time', "TotPop"],
log_x=True, size_max=60,
color_continuous_scale='Reds', range_color=(0.5,2), width=1250, height=1000)
fig.update_traces(marker={'size': 8, 'symbol': 1})
#fig.update_traces(marker={'symbol': 1})
fig.update_layout(paper_bgcolor="LightSteelBlue")
fig.show()
The shapes of the plotted data points are square.
Here is output of my code:
Now, I want to plot more data points in circle or any shapes on the same plotly graph by reading an excel file. Please have a look at the data below.
How I can add up the new data to an existing graph in plotly?
Map data with total population and treatment time (Base_TT):
ID KnNamn x y TotPop Base_TT
1 2 Växjö 14.662290 57.027520 9 1.599971
2 3 Bromölla 14.494072 56.065635 264 1.307165
3 4 Trelleborg 13.219968 55.478675 40 1.411554
4 5 Tomelilla 14.005013 55.721209 6 1.968138
5 6 Halmstad 12.737361 56.710973 386 1.309849
6 7 Alvesta 14.566685 56.748729 47 1.719117
7 8 Laholm 13.241388 56.413591 0 2.000620
8 9 Tingsryd 14.943081 56.542837 16 1.668725
9 10 Sölvesborg 14.574474 56.056953 1147 1.266862
10 11 Halmstad 13.068009 56.635666 38 1.589239
11 12 Tingsryd 14.699642 56.479597 3 1.960050
12 13 Vellinge 13.029769 55.484749 61 1.254957
13 14 Örkelljunga 13.169010 56.232819 12 1.429789
14 15 Svalöv 13.059068 55.853696 26 1.553722
15 16 Sjöbo 13.738205 55.601936 6 1.326429
16 17 Hässleholm 13.729872 56.347672 13 1.709021
17 18 Olofström 14.588037 56.290604 6 1.444833
18 19 Eslöv 13.168712 55.900311 3 1.527547
19 20 Ronneby 15.024222 56.273317 3 1.692005
20 21 Ängelholm 12.910101 56.246689 19 1.090544
Ambulance Data:
ID Ambulance station name Longtitude Latitude
0 1 Älmhult 14.128734 56.547992
1 2 Ängelholm 12.870739 56.242114
2 3 Alvesta 14.549503 56.920740
3 4 Östra Ljungby 13.057450 56.188099
4 5 Broby 14.080958 56.254481
5 6 Bromölla 14.466869 56.072272
6 7 Förslöv 12.814913 56.350098
7 9 Hässleholm 13.778234 56.161536
8 10 Höganäs 12.556995 56.206016
9 11 Hörby 13.643265 55.849811
10 12 Halmstad, Väster 12.819960 56.674306
11 13 Halmstad, Öster 12.882289 56.676871
12 14 Helsingborg 12.738642 56.084708
13 15 Hyltebruk 13.238277 56.993058
14 16 Karlshamn 14.854022 56.186596
15 17 Karlskrona 15.606300 56.183054
16 18 Kristianstad 14.171371 56.031201
17 20 Löddeköpinge 12.995037 55.766946
18 21 Laholm 13.033763 56.498955
19 22 Landskrona 12.867245 55.872659
20 23 Lenhovda 15.283913 57.001953
21 24 Lessebo 15.267357 56.756860
22 25 Ljungby 13.935399 56.835023
23 26 Lund 13.226607 55.695212
24 27 Markaryd 13.591491 56.452057
25 28 Olofström 14.545848 56.272221
26 29 Osby 13.983674 56.384833
27 30 Perstorp 13.388304 56.130752
28 31 Ronneby 15.280554 56.211863
29 32 Sölvesborg 14.570503 56.052113
30 33 Simrishamn 14.338632 55.552765
Merged Dataset for plotting
KnNamn x y TotPop Base_TT Ambulance station name Longtitude Latitude
Växjö 14.66229 57.02752 9 1.599971 Ängelholm 12.87074 56.24211
Bromölla 14.49407 56.06564 264 1.307165 Alvesta 14.5495 56.92074
Trelleborg 13.21997 55.47868 40 1.411554 Östra Ljungby 13.05745 56.1881
Tomelilla 14.00501 55.72121 6 1.968138 Broby 14.08096 56.25448
Halmstad 12.73736 56.71097 386 1.309849
Alvesta 14.56669 56.74873 47 1.719117
Laholm 13.24139 56.41359 0 2.00062
Tingsryd 14.94308 56.54284 16 1.668725

If the data is the same but the column names are different, aligning to either column name is fine for the data for the chart.
Add a graph with a graph object by reusing the graph data created with plotly.express. First I added a chart that was already completed, then a chart with latitude and longitude. Station names and locations are drawn using scatterplot markers and text mode.
df_station.rename(columns={'Longtitude':'x', 'Latitude':'y'}, inplace=True)
import plotly.express as px
import plotly.graph_objects as go
df_centroid_Coord['Ambulance_Treatment_Time'] = df_centroid_Coord ['Base_TT']
sca = px.scatter(df_centroid_Coord, x="x", y="y",
title="Southern Region Centroids",
color='Ambulance_Treatment_Time',
hover_name="KnNamn",
#hover_data= ['Ambulance_Treatment_Time', "TotPop"],
log_x=True,
size_max=60,
color_continuous_scale='Reds',
range_color=(0.5,2),
)
sca.update_traces(marker={'size': 8, 'symbol': 1})
fig = go.Figure()
fig.add_trace(go.Scatter(sca.data[0]))
fig.add_trace(go.Scatter(x=df_station['x'],
y=df_station['y'],
mode='markers+text',
text=df_station['Ambulance station name'],
textposition='top center',
showlegend=False,
marker=dict(
size=5,
symbol=2,
color='blue'
)
)
)
#fig.update_traces(marker={'symbol': 1})
fig.update_layout(width=625, height=500, paper_bgcolor="LightSteelBlue")
fig.show()

dates.YearLocator() does not show years

I unfortunately cannot upload my dataset but here is how my dataset looks like:
UMTMVS month
DATE
1992-01-01 209438.0 1
1992-02-01 232679.0 2
1992-03-01 249673.0 3
1992-04-01 239666.0 4
1992-05-01 243231.0 5
1992-06-01 262854.0 6
1992-07-01 222832.0 7
1992-08-01 240299.0 8
1992-09-01 260216.0 9
1992-10-01 252272.0 10
1992-11-01 245261.0 11
1992-12-01 245603.0 12
1993-01-01 223258.0 1
1993-02-01 246941.0 2
1993-03-01 264886.0 3
1993-04-01 249181.0 4
1993-05-01 250870.0 5
1993-06-01 271047.0 6
1993-07-01 224077.0 7
1993-08-01 248963.0 8
1993-09-01 269227.0 9
1993-10-01 263075.0 10
1993-11-01 256142.0 11
1993-12-01 252830.0 12
1994-01-01 234097.0 1
1994-02-01 259041.0 2
1994-03-01 277243.0 3
1994-04-01 261755.0 4
1994-05-01 267573.0 5
1994-06-01 287336.0 6
1994-07-01 239931.0 7
1994-08-01 276947.0 8
1994-09-01 291357.0 9
1994-10-01 282489.0 10
1994-11-01 280455.0 11
1994-12-01 279888.0 12
1995-01-01 260175.0 1
1995-02-01 286290.0 2
1995-03-01 303201.0 3
1995-04-01 283129.0 4
1995-05-01 289257.0 5
1995-06-01 310201.0 6
1995-07-01 255163.0 7
1995-08-01 293605.0 8
1995-09-01 313228.0 9
1995-10-01 301301.0 10
1995-11-01 293164.0 11
1995-12-01 290963.0 12
1996-01-01 263041.0 1
1996-02-01 290317.0 2
I want to set a locator for each year and ran the following code
ax = df.UMTMVS.plot(figsize=(12, 5))
ax.xaxis.set_major_locator(dates.YearLocator())
but it simply gives the following figure without any locator at all
Why does the locator fail to point out the years?

Try applying set_major_locator() to the axis before df.plot(). Like this:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
# reading your sample data into dataframe
df = pd.read_clipboard()
# dates should be dates (datetime), not strings
df.index = df.index.to_series().apply(pd.to_datetime)
fig, ax = plt.subplots(1,1,figsize=(12, 5))
# set locator before df.plot()
ax.xaxis.set_major_locator(dates.YearLocator())
df.UMTMVS.plot()
Result:
Slightly different result could be achieved with last bit of code above modified to the following:
fig, ax = plt.subplots(1,1,figsize=(12, 5))
ax.plot(df.UMTMVS)
ax.xaxis.set_major_locator(dates.YearLocator())
plt.xlabel('DATE')
plt.show()
Result_alt (note the "padding" and loss of minor ticks):

Is it possible to generate a clock chart using Plotly?

I'm developing a dataviz project and I came across the report generated by Last.FM, in which there is a clock chart to represent the distribution of records by hours.
The chart in question is this:
It is an interactive graph, so I tried to use the Plotly library to try to replicate the chart, but without success.
Is there any way to replicate this in Plotly? Here are the data I need to represent
listeningHour = df.hour.value_counts().rename_axis('hour').reset_index(name='counts')
listeningHour
hour counts
0 17 16874
1 18 16703
2 16 14741
3 19 14525
4 23 14440
5 22 13455
6 20 13119
7 21 12766
8 14 11605
9 13 11575
10 15 11491
11 0 10220
12 12 7793
13 1 6057
14 9 3774
15 11 3476
16 10 1674
17 8 1626
18 2 1519
19 3 588
20 6 500
21 7 163
22 4 157
23 5 26

The graph provided by Plotly is a polar bar chart. I have written a code using it with your data. At the time of my research, there does not seem to be a way to place the ticks inside the doughnut. The point of the code is to start at 0:00 in the direction of the angle axis. The clock display is a list of 24 tick places with an empty string and a string every 6 hours. The angle grid is aligned with the center of the bar chart.
import plotly.graph_objects as go
r = df['counts'].tolist()
theta = np.arange(7.5,368,15)
width = [15]*24
ticktexts = [f'$\large{i}$' if i % 6 == 0 else '' for i in np.arange(24)]
fig = go.Figure(go.Barpolar(
r=r,
theta=theta,
width=width,
marker_color=df['counts'],
marker_colorscale='Blues',
marker_line_color="white",
marker_line_width=2,
opacity=0.8
))
fig.update_layout(
template=None,
polar=dict(
hole=0.4,
bgcolor='rgb(223, 223,223)',
radialaxis=dict(
showticklabels=False,
ticks='',
linewidth=2,
linecolor='white',
showgrid=False,
),
angularaxis=dict(
tickvals=np.arange(0,360,15),
ticktext=ticktexts,
showline=True,
direction='clockwise',
period=24,
linecolor='white',
gridcolor='white',
showticklabels=True,
ticks=''
)
)
)
fig.show()

Plot histogram using two columns (values, counts) in python dataframe

I have a dataframe having multiple columns in pairs: if one column is values then the adjacent column is the corresponding counts. I want to plot a histogram using values as x variable and counts as the frequency.
For example, I have the following columns:
Age Counts
60 1204
45 700
21 400
. .
. .
34 56
10 150
I want my code to bin the Age values in ten-year intervals between the maximum and minimum values and get the cumulative frequencies for each interval from the Counts column and then plot a histogram. Is there a way to do this using matplotlib ?
I have tried the following but in vain:
patient_dets.plot(x='PatientAge', y='PatientAgecounts', kind='hist')
(patient_dets is the dataframe with 'PatientAge' and 'PatientAgecounts' as columns)

I think you need Series.plot.bar:
patient_dets.set_index('PatientAge')['PatientAgecounts'].plot.bar()
If need bins, one possible solution is with pd.cut:
#helper df with min and max ages
df1 = pd.DataFrame({'G':['14 yo and younger','15-19','20-24','25-29','30-34',
'35-39','40-44','45-49','50-54','55-59','60-64','65+'],
'Min':[0, 15,20,25,30,35,40,45,50,55,60,65],
'Max':[14,19,24,29,34,39,44,49,54,59,64,120]})
print (df1)
G Max Min
0 14 yo and younger 14 0
1 15-19 19 15
2 20-24 24 20
3 25-29 29 25
4 30-34 34 30
5 35-39 39 35
6 40-44 44 40
7 45-49 49 45
8 50-54 54 50
9 55-59 59 55
10 60-64 64 60
11 65+ 120 65
cutoff = np.hstack([np.array(df1.Min[0]), df1.Max.values])
labels = df1.G.values
patient_dets['Groups'] = pd.cut(patient_dets.PatientAge, bins=cutoff, labels=labels, right=True, include_lowest=True)
print (patient_dets)
PatientAge PatientAgecounts Groups
0 60 1204 60-64
1 45 700 45-49
2 21 400 20-24
3 34 56 30-34
4 10 150 14 yo and younger
patient_dets.groupby(['PatientAge','Groups'])['PatientAgecounts'].sum().plot.bar()

You can use pd.cut() to bin your data, and then plot using the function plot('bar')
import numpy as np
nBins = 10
my_bins = np.linspace(patient_dets.Age.min(),patient_dets.Age.max(),nBins)
patient_dets.groupby(pd.cut(patient_dets.Age, bins =nBins)).sum()['Counts'].plot('bar')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

python: cumulative density plot - python

You can try a combo of pandas.Series.cumsum and seaborn.lineplot : df["cumsum"] = df["event"].cumsum() plt.figure(figsize=(6,4)) sns.lineplot(x="Time_to_event", y="cumsum", data=df); Output :

Related

Add x-axis to matplotlib with multiple y-axis line chart

How to add up more data in an existing plotly graph?

dates.YearLocator() does not show years

Is it possible to generate a clock chart using Plotly?

Plot histogram using two columns (values, counts) in python dataframe

Categories

Resources