I'm new to Python. I hope you can help me.
I have a dataframe with two columns. The first column is called dates and the second column is filled with numbers. The dataframe has 351 row.
dates numbers
01.03.2019 5
02.03.2019 8
...
20.02.2020 3
21.02.2020 2
I want the whole first column to be on the x axis from. I tried to plot it like this:
graph = FinalDataframe.plot(figsize=(12, 8))
graph.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
graph.set_xticklabels(FinalDataframe['dates'])
plt.show()
But on the x axis are only the first few values from the column instead of the whole column. Furthermore, they are not correlated to the data from the second column.
Any suggestions?
Thank you in advance!
Your issue is that x ticks are generated automatically, and spaced out to be readable. However you the tell matplotlib to use all the labels. The simple fix is to tell him to use one tick label per entry, but that’s going to make your x-axis unreadable:
graph.set_xticks(range(len(FinalDataframe['dates'])))
Now you could space them out manually:
graph.set_xticks(range(0, len(FinalDataframe['dates']), 61))
graph.set_xticklabels(FinalDataframe['dates'][::61])
However the best result to plot dates on the x-axis is still to use pandas’ built-in date objects. We can do this with pd.to_datetime
This will also allow pandas to know where to place points on the x-axis, by specifying that you want the x-axis to be the dates. In that way, if dates are not sorted or missing, the gaps will be skipped properly, and points will be above the ordinate of the right date.
I’m first recreating a dataframe that looks like what you posted:
>>> df = pd.DataFrame({'dates': pd.date_range('20190301', '20200221', freq='D').strftime('%d.%m.%Y'), 'numbers': np.random.randint(0, 10, 358)})
>>> df
dates numbers
0 01.03.2019 2
1 02.03.2019 2
2 03.03.2019 5
3 04.03.2019 4
4 05.03.2019 3
.. ... ...
353 17.02.2020 2
354 18.02.2020 1
355 19.02.2020 2
356 20.02.2020 3
357 21.02.2020 1
(This should be the same as FinalDataFrame, or if your dates are the index, then it’s the same as FinalDataFrame.reset_index())
Now I’m converting the dates:
>>> df['dates'] = pd.to_datetime(df['dates'], format='%d.%m.%Y')
>>> df
dates numbers
0 2019-03-01 2
1 2019-03-02 2
2 2019-03-03 5
3 2019-03-04 4
4 2019-03-05 3
.. ... ...
353 2020-02-17 2
354 2020-02-18 1
355 2020-02-19 2
356 2020-02-20 3
357 2020-02-21 1
You can check your columns contain dates and not string representations of dates by checking their dtypes:
>>> df.dtypes
dates datetime64[ns]
numbers int64
Finally plotting:
>>> ax = df.plot(x='dates', y='numbers', figsize=(12, 8))
>>> ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.075), ncol=4)
<matplotlib.legend.Legend object at 0x7fc8c24fd4f0>
>>> plt.show()
Legends are taken care of automatically. This is what you get:
Related
I've got a df that has three columns, one of them has a repetitive pattern, the df looks like this
>>> df
date hour value
0 01/01/2022 1 0.267648
1 01/01/2022 2 1.564420
2 01/01/2022 ... 0.702019
3 01/01/2022 24 1.504663
4 01/02/2022 1 0.309097
5 01/02/2022 2 0.309097
6 01/02/2022 ... 0.309097
7 01/02/2022 24 0.309097
>>>
I want to make a heatmap with this, the x-axis would be the month, the y axis the hour of the day and the value would be the median of all the values in that specific hour from everyday in the month.
import seaborn as sns
import matplotlib.pyplot as plt
df.date = pd.to_datetime(df.date)
df['month'] = df.date.dt.month
pivot = df.pivot_table(columns='month', index='hour', values='value', aggfunc='median')
sns.heatmap(pivot.sort_index(ascending=False))
plt.show()
Output:
Seaborn Heatmap
I would like to plot a graph with the most recent dates on the left instead of the right of the x-axis.
Is there a way to do this in pandas and matplotlib and still get the date axis?
Invert an axis in a matplotlib grafic
shows how to do this for the y-axis using invert_yaxis(). However, this is not available for xaxis.
Set xlim() from pyplot. Let's take this example:
period = pd.period_range("1.1.2013","12.1.2013",freq="M")
data = np.arange(12)
s = pd.Series(data=data,index=period)
#Output
2013-01 0
2013-02 1
2013-03 2
2013-04 3
2013-05 4
2013-06 5
2013-07 6
2013-08 7
2013-09 8
2013-10 9
2013-11 10
2013-12 11
Set first value of xlim to be last index of series and second value to be the first index, like this:
s.plot()
plt.xlim(s.index[-1],s.index[0])
plt.show()
I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:
My df has index with time datetime64 and my columns are timedelta and float64. Below 3 example rows of my df.
CTIL downtime ratio
quater
2015-04-01 4859 days 01:46:00 1699 days 17:20:00 0.349804
2015-07-01 4553 days 14:16:00 1862 days 03:27:00 0.408939
2015-10-01 5502 days 21:18:00 2442 days 20:15:00 0.443920
I would like to plot in on one chart. CTIL and downtime should be area plots and ratio should be a line chart.
Current I have 2 separate plots:
df_quater[['CTIL', 'downtime']].plot()
df_quater['ratio'].plot()
Question 1:
How can I plot area plot when type of x is different than y.
I try this:
df_quater[['CTIL', 'downtime']].plot(kind='area')
It generate error
TypeError: Cannot cast ufunc greater_equal input (...) with casting rule 'same_kind'
Question 2:
Can my labels on y be in deltatime format? Current plot has numbers.
Qustion 3:
Can I connect this 2 plot into one? Label for CTIL and downright.time should be on left and label for ratio should be on
So i have a data like the below when i use this code:
>>datefreq = textdata[['date','smstext']]
>>date_freq = datefreq.groupby('date').agg(len)
>>date_freq
smstext_freq
date
2015-02-03 1
2015-02-04 1500
2015-02-05 13526
2015-02-03 54444
How can i convert this into a dataframe
expected output:
date_freq
date smstext_freq
0 2015-02-03 1
1 2015-02-04 1500
2 2015-02-05 13526
3 2015-02-03 54444
please help me how to write the code for the above in python and please tell me how to draw the histogram for the converted data if you can
It appears that date_freq is already a Pandas DataFrame object. However, the dates are set as the index; you would like the dates to appear as an ordinary column. There is a one-line solution to this: use the reset_index() method.
>>> date_freq.reset_index(inplace=True)
>>> print data_freq
date smstext_freq
0 2015-02-03 1
1 2015-02-04 1500
2 2015-02-05 13526
3 2015-02-06 54444
Now you also want a histogram. Call the Pandas plot() method on date_freq['smstext_freq'], and set kind='hist'. This will return a matplotlib.axes object, which you can fine-tune however you want. Here, I've just added a title and an x-axis label.
>>> ax = date_freq['smstext_freq'].plot(kind='hist')
>>> ax.set_title('SMS Text Frequency per Day')
>>> ax.set_xlabel('SMS Text Frequency')
>>> plt.show()
Here is the resulting histogram. It should look more interesting with more data, as well as with any other tweaks you may want to make to the graph.