I've a dataframe like this:
DATE
VALUE
TYPE
2021-01-11
57
A
2021-02-11
34
B
2021-03-11
43
A
2021-04-11
15
B
...
My question is how I could plot a bar graph, the mean monthly ordered by date of course and grouped by 'TYPE'
I'm using Pandas with this extract of code:
df = df.set_index('DATE')
df.index = pd.to_datetime(df.index)
df = df.resample('M').mean()
df.plot(kind='bar',stacked=True)
I want to draw a stacked bar plot but I don't know how...
Not sure if I understand correctly but if you want to stack values by types with date as x-axis, I would use pivot (do not set index first):
df = df.pivot('DATE', 'TYPE', 'VALUE')
df.plot(kind='bar', stacked=True, rot=0)
plt.show()
With the slightly edited table to show the stacking:
DATE
VALUE
TYPE
2021-01-11
57
A
2021-02-11
34
A
2021-02-11
12
B
2021-03-11
43
A
2021-04-11
15
B
You get the following:
Related
I have a dataframe that looks like this:
commits commitdates Age (in days) Year-Month server_version
0 97 2021-04-07 75 days 2021-04 v1
1 20 2021-05-31 43 days 2021-05 v3
2 54 2021-06-21 54 days 2021-06 v0.1
3 100 2021-06-18 75 days 2021-06 v2.1.0
4 12 2020-12-06 22 days 2020-12 Nan
I want to plot the Age(in days) which is of type timedelta64[ns] , by the number of commits of type int64 by the server_versions on the x axis.
I tried to do this using plotly, but the age values are a bit strange and don't seem correct, I just want them to be displayed starting from 0 to all the rest, like normal int values, I am not sure how can I fix this.
My code is this:
import plotly.express as px
fig = px.scatter(final_api, x="Version", y="Age (in days)", color="commits", width =1000, height = 800)
fig.show()
I am a bit new to plotly, any help will be appreciated.
All you have to do is to convert your timedelta64 to days and add days as suffix for the yaxis, I answer your question based on this random data:
import pandas as pd
import numpy as np
s = pd.Series(pd.timedelta_range(start='1 days', end='75 days'))
df = pd.DataFrame()
df['commits'] = np.random.randint(100, size=len(s))
df['Version'] = 'v'+df['commits'].astype(str)
df['Age (in days)'] = s
fig = px.scatter(df, x="Version", y=df['Age (in days)'].dt.days, color="commits", width =1000, height = 800)
fig.update_yaxes(ticksuffix = " days")
fig.show()
Output
I have a data frame like that (it's just the head) :
Timestamp Function_code Node_id Delta
0 2000-01-01 10:39:51.790683 Tx_PDO_2 54 551.0
1 2000-01-01 10:39:51.791650 Tx_PDO_2 54 601.0
2 2000-01-01 10:39:51.792564 Tx_PDO_3 54 545.0
3 2000-01-01 10:39:51.793511 Tx_PDO_3 54 564.0
There are only two types of Function_code : Tx_PDO_2 and Tx_PDO_3
I plot in two windows, a graph with Timestamp on the x-axis and Delta on the y-axis. One for Tx_PDO_2 and the other for Tx_PDO_3 :
delta_rx_tx_df.groupby("Function_code").plot(x="Timestamp", y="Delta", )
Now, I want to know which window corresponds to which Function_code
I tried to use title=delta_rx_tx_df.groupby("Function_code").groups but it did not work.
There may be a better way, but for starters, you can assign the titles to the plots after they are created:
plots = delta_rx_tx_df.groupby("Function_code").plot(x="Timestamp", y="Delta")
plots.reset_index()\
.apply(lambda x: x[0].set_title(x['Function_code']), axis=1)
I'm new to Python. I hope you can help me.
I have a dataframe with two columns. The first column is called dates and the second column is filled with numbers. The dataframe has 351 row.
dates numbers
01.03.2019 5
02.03.2019 8
...
20.02.2020 3
21.02.2020 2
I want the whole first column to be on the x axis from. I tried to plot it like this:
graph = FinalDataframe.plot(figsize=(12, 8))
graph.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
graph.set_xticklabels(FinalDataframe['dates'])
plt.show()
But on the x axis are only the first few values from the column instead of the whole column. Furthermore, they are not correlated to the data from the second column.
Any suggestions?
Thank you in advance!
Your issue is that x ticks are generated automatically, and spaced out to be readable. However you the tell matplotlib to use all the labels. The simple fix is to tell him to use one tick label per entry, but that’s going to make your x-axis unreadable:
graph.set_xticks(range(len(FinalDataframe['dates'])))
Now you could space them out manually:
graph.set_xticks(range(0, len(FinalDataframe['dates']), 61))
graph.set_xticklabels(FinalDataframe['dates'][::61])
However the best result to plot dates on the x-axis is still to use pandas’ built-in date objects. We can do this with pd.to_datetime
This will also allow pandas to know where to place points on the x-axis, by specifying that you want the x-axis to be the dates. In that way, if dates are not sorted or missing, the gaps will be skipped properly, and points will be above the ordinate of the right date.
I’m first recreating a dataframe that looks like what you posted:
>>> df = pd.DataFrame({'dates': pd.date_range('20190301', '20200221', freq='D').strftime('%d.%m.%Y'), 'numbers': np.random.randint(0, 10, 358)})
>>> df
dates numbers
0 01.03.2019 2
1 02.03.2019 2
2 03.03.2019 5
3 04.03.2019 4
4 05.03.2019 3
.. ... ...
353 17.02.2020 2
354 18.02.2020 1
355 19.02.2020 2
356 20.02.2020 3
357 21.02.2020 1
(This should be the same as FinalDataFrame, or if your dates are the index, then it’s the same as FinalDataFrame.reset_index())
Now I’m converting the dates:
>>> df['dates'] = pd.to_datetime(df['dates'], format='%d.%m.%Y')
>>> df
dates numbers
0 2019-03-01 2
1 2019-03-02 2
2 2019-03-03 5
3 2019-03-04 4
4 2019-03-05 3
.. ... ...
353 2020-02-17 2
354 2020-02-18 1
355 2020-02-19 2
356 2020-02-20 3
357 2020-02-21 1
You can check your columns contain dates and not string representations of dates by checking their dtypes:
>>> df.dtypes
dates datetime64[ns]
numbers int64
Finally plotting:
>>> ax = df.plot(x='dates', y='numbers', figsize=(12, 8))
>>> ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
<matplotlib.legend.Legend object at 0x7fc8c24fd4f0>
>>> plt.show()
Legends are taken care of automatically. This is what you get:
I have created a visualization utilizing the plotly library within Python. Everything looks fine, except the axis is starting with 2020 and then shows 2019. The axis should be the opposite of what is displayed.
Here is the data (df):
date percent type
3/1/2020 10 a
3/1/2020 0 b
4/1/2020 15 a
4/1/2020 60 b
1/1/2019 25 a
1/1/2019 1 b
2/1/2019 50 c
2/1/2019 20 d
This is what I am doing
import plotly.express as px
px.scatter(df, x = "date", y = "percent", color = "type", facet_col = "type")
How would I make it so that the dates are sorted correctly, earliest to latest? The dates are sorted within the raw data so why is it not reflecting this on the graph?
Any suggestion will be appreciated.
Here is the result:
It is plotting in the order of your df. If you want date order then sort so in date order.
df.sort_values('date', inplace=True)
A lot of other graphing utilities (Seaborn, etc) by default sort when plotting. Plotly Express does not do this.
Your date column seems to be a string. If you convert it to a datetime you don't have to sort your dataframe: plotly express will set the x-axis to datetime:
Working code example:
import pandas as pd
import plotly.express as px
from io import StringIO
text = """
date percent type
3/1/2020 10 a
3/1/2020 0 b
4/1/2020 15 a
4/1/2020 60 b
1/1/2019 25 a
1/1/2019 1 b
2/1/2019 50 c
2/1/2019 20 d
"""
df = pd.read_csv(StringIO(text), sep='\s+', header=0)
px.scatter(df, x="date", y="percent", color="type", facet_col="type")
I have a temperature file with many years temperature records, in a format as below:
2012-04-12,16:13:09,20.6
2012-04-12,17:13:09,20.9
2012-04-12,18:13:09,20.6
2007-05-12,19:13:09,5.4
2007-05-12,20:13:09,20.6
2007-05-12,20:13:09,20.6
2005-08-11,11:13:09,20.6
2005-08-11,11:13:09,17.5
2005-08-13,07:13:09,20.6
2006-04-13,01:13:09,20.6
Every year has different numbers, time of the records, so the pandas datetimeindices are all different.
I want to plot the different year's data in the same figure for comparing . The X-axis is Jan to Dec, the Y-axis is temperature. How should I go about doing this?
Try:
ax = df1.plot()
df2.plot(ax=ax)
If you a running Jupyter/Ipython notebook and having problems using;
ax = df1.plot()
df2.plot(ax=ax)
Run the command inside of the same cell!! It wont, for some reason, work when they are separated into sequential cells. For me at least.
Chang's answer shows how to plot a different DataFrame on the same axes.
In this case, all of the data is in the same dataframe, so it's better to use groupby and unstack.
Alternatively, pandas.DataFrame.pivot_table can be used.
dfp = df.pivot_table(index='Month', columns='Year', values='value', aggfunc='mean')
When using pandas.read_csv, names= creates column headers when there are none in the file. The 'date' column must be parsed into datetime64[ns] Dtype so the .dt extractor can be used to extract the month and year.
import pandas as pd
# given the data in a file as shown in the op
df = pd.read_csv('temp.csv', names=['date', 'time', 'value'], parse_dates=['date'])
# create additional month and year columns for convenience
df['Year'] = df.date.dt.year
df['Month'] = df.date.dt.month
# groupby the month a year and aggreate mean on the value column
dfg = df.groupby(['Month', 'Year'])['value'].mean().unstack()
# display(dfg)
Year 2005 2006 2007 2012
Month
4 NaN 20.6 NaN 20.7
5 NaN NaN 15.533333 NaN
8 19.566667 NaN NaN NaN
Now it's easy to plot each year as a separate line. The OP only has one observation for each year, so only a marker is displayed.
ax = dfg.plot(figsize=(9, 7), marker='.', xticks=dfg.index)
To do this for multiple dataframes, you can do a for loop over them:
fig = plt.figure(num=None, figsize=(10, 8))
ax = dict_of_dfs['FOO'].column.plot()
for BAR in dict_of_dfs.keys():
if BAR == 'FOO':
pass
else:
dict_of_dfs[BAR].column.plot(ax=ax)
This can also be implemented without the if condition:
fig, ax = plt.subplots()
for BAR in dict_of_dfs.keys():
dict_of_dfs[BAR].plot(ax=ax)
You can make use of the hue parameter in seaborn. For example:
import seaborn as sns
df = sns.load_dataset('flights')
year month passengers
0 1949 Jan 112
1 1949 Feb 118
2 1949 Mar 132
3 1949 Apr 129
4 1949 May 121
.. ... ... ...
139 1960 Aug 606
140 1960 Sep 508
141 1960 Oct 461
142 1960 Nov 390
143 1960 Dec 432
sns.lineplot(x='month', y='passengers', hue='year', data=df)