how to set the width of daily bar chart in python matplotlib - python

I want to plot the daily rainfall data of 5 years by the bar chart. when the width of bars is 1, they become lines without any width, while I changed the width of bars they overlapped each other like the image below. I want to have discrete bar charts with a good looking width. This my code.
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import pyplot as plt
data=pd.read_excel('final.xlsx')
data['Date']=pd.to_datetime(data['Date'])
date = data['Date']
amount = data['Amount']
plt.bar (date, amount, color='gold', edgecolor='blue', align='center', width=5)
plt.ylabel('rainfall amount (mm)')
plt.show()

Just to note, you can also pass a Timedelta to the width parameter; I find this helpful to be explicit about how many units in x (e.g. days here) the bars will take up. Additionally for some time series the int widths are less intuitive:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
#fake data with minute frequency for an hour
dr = pd.date_range('01-01-2016 9:00:00', '01-01-2016 10:00:00', freq='1T')
df = pd.DataFrame(np.random.rand(len(dr)), index=dr)
#graph 1, using int width
plt.figure(figsize=(10,2))
plt.bar (df.index, df[0], color='gold', edgecolor='blue', align='center',
width=1)
#graph 2, using Timedelta width
plt.figure(figsize=(10,2))
plt.bar (df.index, df[0], color='gold', edgecolor='blue', align='center',
width=pd.Timedelta(minutes=1))
Graph 1:
Graph 2:
This was what came to mind when I saw your issue, but I think the real problem is the amount of data points (as #JohanC pointed out). Already when you plot 365 days, you can barely see the yellow anymore (and by 3 or 4 years its definitely gone):
You can also see in the above that different bars get rendered with different apparent widths, but that is just because there are too few pixels in the space provided to accurately show the bar fill and bar widths the same for each point.

Related

How to properly display date from csv in matplotlib plot?

I have a csv with the following columns: recorded, humidity and temperature. I want to display the recorded values(date and time) on the x axis and the humidity on the y axis. How can I properly display the dates(it is quite a big csv), as my current plot has black instead of proper date numbers... My date format is like this: 2019-09-12T07:26:55, having the date and also the time displayed in the csv.
I have displayed the plot using this code:
from matplotlib import pyplot as plt
import pandas as pd
data = pd.read_csv('home_data.csv')
plt.plot(data.recorded, data.humidity)
plt.xlabel('date')
plt.ylabel('humidity')
plt.title('Visualizing date and humidity')
plt.show()
This is a print screen of the plot:
https://snipboard.io/d4hfS7.jpg
Actually, the plot is displaying every date in your dataset. They are so many that they seem just like a black blob. You can downsample the xticks in order to increase the readability. Do something like this:
fig, ax = plt.subplots()
ax.plot(data.recorded, data.humidity)
# some axes labelling
# Reduce now the number of the ticks printed in the figure
ax.set_xticks(ax.get_xticks()[::4])
ax.get_xticklabels(ax.get_xticks(), rotation=45)
In line ax.set_xticks(ax.get_xticks()[::4]) you are setting the ticks of the x-axis
picking 1 date every 4 using the property of the list. It will reduce the number of dates printed. You can increase the number as much as you want.
To increase the readibility, you can rotate the tick labels as I suggested in the line
ax.get_xticklabels(ax.get_xticks(), rotation=45).
Hope this helps.

Python barchart overlapping vertical bars

I have two graphs that share the same x-axis. They are both time series with 2880 times (4 months with hourly data). I have an array with the values of precipitation for every hour (2880). I want to overlay this data via a vertical bar chart over the first graph, so that the bars width is equivalent to 1 hr and centered over the corresponding hour.
My issue is that the widths of the bars are too wide and overlap with each other. I have tried changing the width option to width=1/24 in the plot with no success (bars don't appear at all). Here is a snippet of the code where I do not set the width at all.
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter
import matplotlib.dates as mdates
import datetime
import pandas as pd
import numpy as np
t = np.arange(datetime.datetime(2010,1,01,0), datetime.datetime(2010,5,01,0),datetime.timedelta(hours=1)).astype(datetime.datetime)
stn_temp = np.random.rand(2880)
model_temp = stn_temp-0.2
stn_rh = np.random.randint(0,100,2880)
model_rh = stn_rh -1
fig, (ax1,ax2) = plt.subplots(2,1,sharex=True)
ax1.plot(t,stn_temp,'r',linewidth=0.3)
ax1.plot(t,model_temp,'k',linewidth=0.3)
minor_ticks_temp = np.arange(min(stn_temp),max(stn_temp),1)
ax1.set_yticks(minor_ticks_temp, minor=True)
myFmt = mdates.DateFormatter('%m-%d')
ax1.xaxis.set_major_formatter(myFmt)
ax1.legend(loc=0)
ax1.set_ylabel('2 m Temperature ($^\circ$C)')
ax1 = ax1.twinx()
ax1.bar(t,prec,alpha=0.7,color='g')
ax1.set_ylabel('Accumulated \n Precipitation (mm)')
ax2.plot(t,stn_RH,'b',linewidth=0.3)
ax2.plot(t,rh,'k',linewidth=0.3)
ax2.set_ylim([0,100.5])
ax2.set_ylabel('Relative Humidity (%)')
fig.tight_layout()
The widths of the bars should be a lot smaller, only the width of an hour. This image is a zoomed in version to show the bar width issue.

Seaborn/Matplotlib Date Axis barplot minor-major tick formatting

I'm building a Seaborn barplot. The x-axis are dates, and the y-axis are integers.
I'd like to format major/minor ticks for the dates. I'd like Mondays' ticks to be bold and a different color (ie, "major ticks"), with the rest of the week less bold.
I have not been able to get major and minor tick formatting on the x-axis to work with Seaborn barplots. I'm stumped, and thus turning here for help.
I'm starting with the stackoverflow example that answered this question: Pandas timeseries plot setting x-axis major and minor ticks and labels
If I do a simple modification it to use a Seaborn barplot and I lose my X-axis ticks:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
import seaborn as sns
idx = pd.date_range('2011-05-01', '2011-07-01')
s = pd.Series(np.random.randn(len(idx)), index=idx)
###########################################
## Swap out these two lines of code:
#fig, ax = plt.subplots()
#ax.plot_date(idx.to_pydatetime(), s, 'v-')
## with this one
ax = sns.barplot(idx.to_pydatetime(), s)
###########################################
ax.xaxis.set_minor_locator(dates.WeekdayLocator(byweekday=(1),
interval=1))
ax.xaxis.set_minor_formatter(dates.DateFormatter('%d\n%a'))
ax.xaxis.grid(True, which="minor")
ax.yaxis.grid()
ax.xaxis.set_major_locator(dates.MonthLocator())
ax.xaxis.set_major_formatter(dates.DateFormatter('\n\n\n%b\n%Y'))
plt.tight_layout()
## save the result to a png instead of plotting to screen:
myFigure = plt.gcf()
myFigure.autofmt_xdate()
myFigure.set_size_inches(11,3.8)
plt.title('Example Chart', loc='center')
plt.savefig('/tmp/chartexample.png', format='png', bbox_inches='tight')
I've tried a variety of approaches but something in Seaborn seems to be overriding or undoing any attempts at major and minor axis formatting that I've managed to cook up yet beyond some simple styling for all ticks when I use set_xticklabels().
I can sort of get formatting on just the major ticks by using MultipleLocator(), but I can't get any formatting on the minor ticks.
I've also experimented with myFigure.autofmt_xdate() to see if it would help, but it doesn't seem to like mixed major & minor ticks on the same axis either.
I came across this while trying to solve the same problem. Based on the useful pointer from #mwaskom (that categorical plots like boxplots lose their structure and just become date-named categories) and ended up doing the location and formatting in Python as so:
from datetime import datetime
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates as dates
import seaborn as sns
idx = pd.date_range('2011-05-01', '2011-07-01')
s = pd.Series(np.random.randn(len(idx)), index=idx)
fig, ax = plt.subplots(figsize = (12,6))
ax = sns.barplot(idx.to_pydatetime(), s, ax = ax)
major_ticks = []
major_tick_labels = []
minor_ticks = []
minor_tick_labels = []
for loc, label in zip(ax.get_xticks(), ax.get_xticklabels()):
when = datetime.strptime(label.get_text(), '%Y-%m-%d %H:%M:%S')
if when.day == 1:
major_ticks.append(loc)
major_tick_labels.append(when.strftime("\n\n\n%b\n%Y"))
else:
minor_ticks.append(loc)
if when.weekday() == 0:
minor_tick_labels.append(when.strftime("%d\n%a"))
else:
minor_tick_labels.append(when.strftime("%d"))
ax.set_xticks(major_ticks)
ax.set_xticklabels(major_tick_labels)
ax.set_xticks(minor_ticks, minor=True)
ax.set_xticklabels(minor_tick_labels, minor=True)
Of course, you don't have to set the ticks based on parsing the labels which were installed from the data, if it's easier to start with the source data and just keep the indices aligned, but I prefer to have a single source of truth.
You can also mess with font weight, rotation, etc, on individual labels by getting the Text objects for the relevant label and calling set_ methods on it.

How to label and change the scale of Seaborn kdeplot's axes

Here's my code
import numpy as np
from numpy.random import randn
import pandas as pd
from scipy import stats
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
fig = sns.kdeplot(treze, shade=True, color=c1,cut =0, clip=(0,2000))
fig = sns.kdeplot(cjjardim, shade=True, color=c2,cut =0, clip=(0,2000))
fig.figure.suptitle("Plot", fontsize = 24)
plt.xlabel('Purchase amount', fontsize=18)
plt.ylabel('Distribution', fontsize=16)
, which results in the following plot:
I want to do two things:
1) Change the scale of the y-axis by multiplying its values by 10000 and, if it's possible, add a % sign to the numbers. In other words, I want the y-axis values shown in the above plot to be 0%, 5%, 10%, 15%, 20%, 25%, and 30%.
2) Add more values to the x-axis. I'm particularly interested in showing the data in intervals of 200. In other words, I want the x-axis values shown in the plot to be 0, 200, 400, 600,... and so on.
1) what you are looking for is most probably some combination of get_yticks() and set_yticks:
plt.yticks(fig.get_yticks(), fig.get_yticks() * 100)
plt.ylabel('Distribution [%]', fontsize=16)
Note: as mwaskom is commenting times 10000 and a % sign is mathematically incorrect.
2) you can specify where you want your ticks via the xticks function. Then you have more ticks and data get easier to read. You do not get more data that way.
plt.xticks([0, 200, 400, 600])
plt.xlabel('Purchase amount', fontsize=18)
Note: if you wanted to limit the view to your specified x-values you might even have a glimpse at plt.xlim() and reduce the figure to the interesting range.

How to highlight specific x-value ranges

I'm making a visualization of historical stock data for a project, and I'd like to highlight regions of drops. For instance, when the stock is experiencing significant drawdown, I would like to highlight it with a red region.
Can I do this automatically, or will I have to draw a rectangle or something?
Have a look at axvspan (and axhspan for highlighting a region of the y-axis).
import matplotlib.pyplot as plt
plt.plot(range(10))
plt.axvspan(3, 6, color='red', alpha=0.5)
plt.show()
If you're using dates, then you'll need to convert your min and max x values to matplotlib dates. Use matplotlib.dates.date2num for datetime objects or matplotlib.dates.datestr2num for various string timestamps.
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import datetime as dt
t = mdates.drange(dt.datetime(2011, 10, 15), dt.datetime(2011, 11, 27),
dt.timedelta(hours=2))
y = np.sin(t)
fig, ax = plt.subplots()
ax.plot_date(t, y, 'b-')
ax.axvspan(*mdates.datestr2num(['10/27/2011', '11/2/2011']), color='red', alpha=0.5)
fig.autofmt_xdate()
plt.show()
Here is a solution that uses axvspan to draw multiple highlights where the limits of each highlight are set by using the indices of the stock data corresponding to the peaks and troughs.
Stock data usually contain a discontinuous time variable where weekends and holidays are not included. Plotting them in matplotlib or pandas will produce gaps along the x-axis for weekends and holidays when dealing with daily stock prices. This may not be noticeable with long date ranges and/or small figures (like in this example), but it will become apparent if you zoom in and it may be something that you want to avoid.
This is why I share here a complete example that features:
A realistic sample dataset that includes a discontinuous DatetimeIndex based on the New York Stock Exchange trading calendar imported with the pandas_market_calendars as well as fake stock data that looks like the real thing.
A pandas plot created with use_index=False which removes the gaps for weekends and holidays by using instead a range of integers for the x-axis. The returned ax object is used in a way that avoids the need to import matplotlib.pyplot (unless you need plt.show).
An automatic detection of drawdowns over the entire date range by using the scipy.signal find_peaks function which returns the indices needed to plot the highlights with axvspan. Computing drawdowns in a more correct way would require a clear definition of what would count as a drawdown and would lead to more complicated code which is a topic for another question.
Properly formatted ticks created by looping through the timestamps of the DatetimeIndex seeing as all the convenient matplotlib.dates tick locators and formatters as well as DatetimeIndex properties like .is_month_start cannot be used in this case.
Create sample dataset
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import pandas_market_calendars as mcal # v 1.6.1
from scipy.signal import find_peaks # v 1.5.2
# Create datetime index with a 'trading day end' frequency based on the New York Stock
# Exchange trading hours (end date is inclusive)
nyse = mcal.get_calendar('NYSE')
nyse_schedule = nyse.schedule(start_date='2019-10-01', end_date='2021-02-01')
nyse_dti = mcal.date_range(nyse_schedule, frequency='1D').tz_convert(nyse.tz.zone)
# Create sample of random data for daily stock closing price
rng = np.random.default_rng(seed=1234) # random number generator
price = 100 + rng.normal(size=nyse_dti.size).cumsum()
df = pd.DataFrame(data=dict(price=price), index=nyse_dti)
df.head()
# price
# 2019-10-01 16:00:00-04:00 98.396163
# 2019-10-02 16:00:00-04:00 98.460263
# 2019-10-03 16:00:00-04:00 99.201154
# 2019-10-04 16:00:00-04:00 99.353774
# 2019-10-07 16:00:00-04:00 100.217517
Plot highlights for drawdowns with properly formatted ticks
# Plot stock price
ax = df['price'].plot(figsize=(10, 5), use_index=False, ylabel='Price')
ax.set_xlim(0, df.index.size-1)
ax.grid(axis='x', alpha=0.3)
# Highlight drawdowns using the indices of stock peaks and troughs: find peaks and
# troughs based on signal analysis rather than an algorithm for drawdowns to keep
# example simple. Width and prominence have been handpicked for this example to work.
peaks, _ = find_peaks(df['price'], width=7, prominence=4)
troughs, _ = find_peaks(-df['price'], width=7, prominence=4)
for peak, trough in zip(peaks, troughs):
ax.axvspan(peak, trough, facecolor='red', alpha=.2)
# Create and format monthly ticks
ticks = [idx for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
ax.set_xticks(ticks)
labels = [tick.strftime('%b\n%Y') if df.index[ticks[idx]].year
!= df.index[ticks[idx-1]].year else tick.strftime('%b')
for idx, tick in enumerate(df.index[ticks])]
ax.set_xticklabels(labels)
ax.figure.autofmt_xdate(rotation=0, ha='center')
ax.set_title('Drawdowns are highlighted in red', pad=15, size=14);
For the sake of completeness, it is worth noting that you can achieve exactly the same result using the fill_between plotting function, though it takes a few more lines of code:
ax.set_ylim(*ax.get_ylim()) # remove top and bottom gaps with plot frame
drawdowns = np.repeat(False, df['price'].size)
for peak, trough in zip(peaks, troughs):
drawdowns[np.arange(peak, trough+1)] = True
ax.fill_between(np.arange(df.index.size), *ax.get_ylim(), where=drawdowns,
facecolor='red', alpha=.2)
You are using matplotlib's interactive interface and want to have dynamic ticks when you zoom in? Then you will need to use locators and formatters from the matplotlib.ticker module. You could for example keep the major ticks fixed like in this example and add dynamic minor ticks to show days or weeks of the year when zooming in. You can find an example of how to do this at the end of this answer.

Categories

Resources