How do I add the x-axis(Month) to a simple Matplotlib
My Dataset:
Month Views CMA30
0 11 24662 24662.000000
1 11 2420 13541.000000
2 11 11318 12800.000000
3 11 8529 11732.250000
4 10 78861 25158.000000
5 10 1281 21178.500000
6 10 22701 21396.000000
7 10 17088 20857.500000
This is my code:
df[['Views', 'CMA30']].plot(label='Views', figsize=(5, 5))
This is giving me Views and CMA30 on the y-axis. How do I add Month(1-12) on the x-axis?
If you average the values per month, then try groupby/mean:
df.groupby('Month')[['Views','CMA30']].mean().plot(label='Views', figsize=(5, 5))
Related
I have the following dataframe:
df =
Time_to_event event
0 0 days 443
1 1 days 226
2 2 days 162
3 3 days 72
4 4 days 55
5 5 days 30
6 6 days 36
7 7 days 18
8 8 days 15
9 9 days 14
10 10 days 21
11 11 days 13
12 12 days 10
13 13 days 10
14 14 days 8
I want to produce a cumulative density plot of the sum of the events per days. For example 0 days 443, 1 days = 443 + 226 etc.
I am currently trying this code:
stat = "count" # or proportion
sns.histplot(df, stat=stat, cumulative=True, alpha=.4)
but I come up with a pretty terrible plot:
If I could also come up with a line instead of bars that would be awesome!
You can try a combo of pandas.Series.cumsum and seaborn.lineplot :
df["cumsum"] = df["event"].cumsum()
plt.figure(figsize=(6,4))
sns.lineplot(x="Time_to_event", y="cumsum", data=df);
Output :
I think what you are looking for your plot values is:
xvalues=df["Time_to_event"]
yvalues=df["event"].cumsum()
The code could look like this:
import pandas as pd
import matplotlib.pyplot as plt
df=pd.read_csv("test.txt")
print(df.columns)
print(df)
plt.bar(df["Time_to_event"],df["event"].cumsum())
# replace plt.bar with plt.plot for a plotted diagram
plt.show()
I unfortunately cannot upload my dataset but here is how my dataset looks like:
UMTMVS month
DATE
1992-01-01 209438.0 1
1992-02-01 232679.0 2
1992-03-01 249673.0 3
1992-04-01 239666.0 4
1992-05-01 243231.0 5
1992-06-01 262854.0 6
1992-07-01 222832.0 7
1992-08-01 240299.0 8
1992-09-01 260216.0 9
1992-10-01 252272.0 10
1992-11-01 245261.0 11
1992-12-01 245603.0 12
1993-01-01 223258.0 1
1993-02-01 246941.0 2
1993-03-01 264886.0 3
1993-04-01 249181.0 4
1993-05-01 250870.0 5
1993-06-01 271047.0 6
1993-07-01 224077.0 7
1993-08-01 248963.0 8
1993-09-01 269227.0 9
1993-10-01 263075.0 10
1993-11-01 256142.0 11
1993-12-01 252830.0 12
1994-01-01 234097.0 1
1994-02-01 259041.0 2
1994-03-01 277243.0 3
1994-04-01 261755.0 4
1994-05-01 267573.0 5
1994-06-01 287336.0 6
1994-07-01 239931.0 7
1994-08-01 276947.0 8
1994-09-01 291357.0 9
1994-10-01 282489.0 10
1994-11-01 280455.0 11
1994-12-01 279888.0 12
1995-01-01 260175.0 1
1995-02-01 286290.0 2
1995-03-01 303201.0 3
1995-04-01 283129.0 4
1995-05-01 289257.0 5
1995-06-01 310201.0 6
1995-07-01 255163.0 7
1995-08-01 293605.0 8
1995-09-01 313228.0 9
1995-10-01 301301.0 10
1995-11-01 293164.0 11
1995-12-01 290963.0 12
1996-01-01 263041.0 1
1996-02-01 290317.0 2
I want to set a locator for each year and ran the following code
ax = df.UMTMVS.plot(figsize=(12, 5))
ax.xaxis.set_major_locator(dates.YearLocator())
but it simply gives the following figure without any locator at all
Why does the locator fail to point out the years?
Try applying set_major_locator() to the axis before df.plot(). Like this:
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib import dates
# reading your sample data into dataframe
df = pd.read_clipboard()
# dates should be dates (datetime), not strings
df.index = df.index.to_series().apply(pd.to_datetime)
fig, ax = plt.subplots(1,1,figsize=(12, 5))
# set locator before df.plot()
ax.xaxis.set_major_locator(dates.YearLocator())
df.UMTMVS.plot()
Result:
Slightly different result could be achieved with last bit of code above modified to the following:
fig, ax = plt.subplots(1,1,figsize=(12, 5))
ax.plot(df.UMTMVS)
ax.xaxis.set_major_locator(dates.YearLocator())
plt.xlabel('DATE')
plt.show()
Result_alt (note the "padding" and loss of minor ticks):
I'm developing a dataviz project and I came across the report generated by Last.FM, in which there is a clock chart to represent the distribution of records by hours.
The chart in question is this:
It is an interactive graph, so I tried to use the Plotly library to try to replicate the chart, but without success.
Is there any way to replicate this in Plotly? Here are the data I need to represent
listeningHour = df.hour.value_counts().rename_axis('hour').reset_index(name='counts')
listeningHour
hour counts
0 17 16874
1 18 16703
2 16 14741
3 19 14525
4 23 14440
5 22 13455
6 20 13119
7 21 12766
8 14 11605
9 13 11575
10 15 11491
11 0 10220
12 12 7793
13 1 6057
14 9 3774
15 11 3476
16 10 1674
17 8 1626
18 2 1519
19 3 588
20 6 500
21 7 163
22 4 157
23 5 26
The graph provided by Plotly is a polar bar chart. I have written a code using it with your data. At the time of my research, there does not seem to be a way to place the ticks inside the doughnut. The point of the code is to start at 0:00 in the direction of the angle axis. The clock display is a list of 24 tick places with an empty string and a string every 6 hours. The angle grid is aligned with the center of the bar chart.
import plotly.graph_objects as go
r = df['counts'].tolist()
theta = np.arange(7.5,368,15)
width = [15]*24
ticktexts = [f'$\large{i}$' if i % 6 == 0 else '' for i in np.arange(24)]
fig = go.Figure(go.Barpolar(
r=r,
theta=theta,
width=width,
marker_color=df['counts'],
marker_colorscale='Blues',
marker_line_color="white",
marker_line_width=2,
opacity=0.8
))
fig.update_layout(
template=None,
polar=dict(
hole=0.4,
bgcolor='rgb(223, 223,223)',
radialaxis=dict(
showticklabels=False,
ticks='',
linewidth=2,
linecolor='white',
showgrid=False,
),
angularaxis=dict(
tickvals=np.arange(0,360,15),
ticktext=ticktexts,
showline=True,
direction='clockwise',
period=24,
linecolor='white',
gridcolor='white',
showticklabels=True,
ticks=''
)
)
)
fig.show()
I have a dataframe with 12 columns and 30 rows (only the first 5 rows are shown here):
0 1 2 3 4 5 6 7 8 9 10 11
0
10 0.420000 0.724000 0.552000 0.316000 0.176000 0.320000 0.228000 0.552000 0.476000 0.468000 0.560000 0.332000
20 0.387097 0.701613 0.516129 0.338710 0.177419 0.346774 0.217742 0.443548 0.483871 0.435484 0.516129 0.330645
30 0.353659 0.731707 0.365854 0.280488 0.158537 0.243902 0.231707 0.451220 0.524390 0.414634 0.451220 0.329268
40 0.377049 0.557377 0.311475 0.213115 0.213115 0.262295 0.262295 0.459016 0.540984 0.475410 0.377049 0.262295
50 0.285714 0.673469 0.183673 0.183673 0.163265 0.285714 0.204082 0.387755 0.489796 0.367347 0.306122 0.244898
I would like to plot a dot plot with rows indices as the x-axis columns values as the y-axis (ie. 12 dots on each x).
I have tried the following:
df.plot()
and I get this plot
I would like to show only the markers (dots) and not the lines
I tried df.plot(linestyle='None') but then I get an empty plot.
How can I change my code to show the dots/markers and hide the lines?
pandas.DataFrame.plot passes **kwargs to matplotlib's .plot method. Thus you can use any of the matplotlib.lines.Line2D properties:
df.plot(ls='', marker='.')
How to plot the set of numbers like (first column is x-axis, second column is y-axis):
1 3.4335e-14
2 5.8945e-28
3 6.7462e-42
4 5.7908e-56
5 3.9765e-70
6 2.2756e-84
7 1.1162e-98
8 4.7904e-113
9 1.8275e-127
10 6.2749e-142
11 1.9586e-156
12 5.6041e-171
13 1.4801e-185
14 3.6300e-200
15 8.3091e-215
16 1.7831e-229
17 3.6013e-244
18 6.8694e-259
19 1.2414e-273
For now I get:
And I can't figure out how to make it properly. It means no flat line from 2 to the end and correct y-axis values. I read these values from the file with:
x_values.append(line.split(' ')[0])
y_values.append(float(line.split(' ')[1]))
You may wish to switch the yscale to "log" scale, e.g.:
import matplotlib.ticker as mtick
_,ax = plt.subplots()
plt.plot(x,y)
plt.xticks(x)
plt.yscale("log")
ax.yaxis.set_major_formatter(mtick.FormatStrFormatter('%.2e'));