How to change spacing between two ticks in matplotlib chart? - python

I'm plotting some data that requires Day 0 to not be shown on the x-axis. The dataframe has no column for Day 0, but Matplotlib creates a space for it between day -1 and 1. I've looked through the documentation, but can't find a way to adjust spacing between only two ticks. The dataframe is:
group stat -1.0 1.0 2.0 3.0 4.0 5.0
abc mean 8.362999 17.043362 3.526539 22.931884 10.835121 6.035011
abc sem 1.481135 5.029173 0.822778 13.768812 2.149704 0.840965
abc std 3.311919 11.245573 1.839788 30.787999 4.806885 1.880455
Code to plot:
df.set_index(['subject'], inplace=True)
df.drop(['group'],axis=1,inplace=True)
x = df.columns.values
y = df.loc['mean'].values
sem = df.loc['sem'].values
plt.errorbar(x, y, sem, color='#0075d9', marker='o', clip_on=False)
This is an example of the chart (please ignore the shading):
You can see that it has more space between -1 and 1 than the other ticks. Is there a way to 'drop' the Day 0 tick from the X-axis?

Related

How to plot groups of points on a map by associating them with the date of detection in Python

i'm trying to assess the displacement of a particular fish on the seabed according to seasonality. Thus, i would like to create a map with different colored points according to the month in which the detection occured (e.g., all points from August in blue, all points from Sept in red, all points from Oct in yellow).
In my dataframe i have both coordinates for each point (Lat, Lon) and the dates (Dates) of detection:
LAT
LON
Dates
0
49.302005
-67.684971
2019-08-06
1
49.302031
-67.684960
2019-08-12
2
49.302039
-67.684983
2019-08-21
3
49.302039
-67.684979
2019-08-30
4
49.302041
-67.684980
2019-09-03
5
49.302041
-67.684983
2019-09-10
6
49.302042
-67.684979
2019-09-18
7
49.302043
-67.684980
2019-09-25
8
49.302045
-67.684980
2019-10-01
9
49.302045
-67.684983
2019-10-09
10
49.302048
-67.684979
2019-10-14
11
49.302049
-67.684981
2019-10-21
12
49.302049
-67.684982
2019-10-29
Would anyone know how to create this kind of map? I know to create a simple map with all points, but i really wonder how plot points associated to the date of detection.
Thank you very much
Here's one way to do it entirely with Pandas and matplotlib:
import pandas as pd
from matplotlib import pyplot as plt
# I'll just create some fake data for the exmaple
df = pd.DataFrame(
{
"LAT": [49.2, 49.2, 49.3, 45.6, 467.8],
"LON": [-67.7, -68.1, -65.2, -67.8, -67.4],
"Dates": ["2019-08-06", "2019-08-03", "2019-07-17", "2019-06-12", "2019-05-29"]})
}
)
# add a column containing the months
df["Month"] = pd.DatetimeIndex(df["Dates"]).month
# make a scatter plot with the colour based on the month
fig, ax = plt.subplots()
ax = df.plot.scatter(x="LAT", y="LON", c="Month", ax=ax, colormap="viridis")
fig.show
If you want the months as names rather than indexes, and a slightly more fancy plot (e.g., with a legend labelling the dates) using seaborn, you could do:
import seaborn as sns
# get month as name
df["Month"] = pd.to_datetime(df["Dates"]).dt.strftime("%b")
fig, ax = plt.subplots()
sns.scatterplot(df, x="LAT", y="LON", hue="Month", ax=ax)
fig.show()

Pandas: Plotting / annotating from DataFrame

There is this boring dataframe with stock data I have:
date close MA100 buy sell
2022-02-14 324.95 320.12 0 0
2022-02-13 324.87 320.11 1 0
2022-02-12 327.20 321.50 0 0
2022-02-11 319.61 320.71 0 1
Then I am plotting the prices
import pandas as pd
import matplotlib.pyplot as plt
df = ...
df['close'].plot()
df['MA100'].plot()
plt.show()
So far so good...
Then I'd like to show a marker on the chart if there was buy (green) or sell (red) on that day.
It's just to highlight if there was a transaction on that day. The exact intraday price at which the trade happened is not important.
So the x/y-coordinates could be the date and the close if there is a 1 in column buy (sell).
I am not sure how to implement this.
Would I need a loop to iterate over all rows where buy = 1 (sell = 1) and then somehow add these matches to the plot (probably with annotate?)
I'd really appreciate it if someone could point me in the right direction!
You can query the data frame for sell/buy and scatter plot:
fig, ax = plt.subplots()
df.plot(x='date', y=['close', 'MA100'], ax=ax)
df.query("buy==1").plot.scatter(x='date', y='close', c='g', ax=ax)
df.query("sell==1").plot.scatter(x='date', y='close', c='r', ax=ax)
Output:

How can I plot different length pandas series with matplotlib?

I've got two pandas series, one with a 7 day rolling mean for the entire year and another with monthly averages. I'm trying to plot them both on the same matplotlib figure, with the averages as a bar graph and the 7 day rolling mean as a line graph. Ideally, the line would be graph on top of the bar graph.
The issue I'm having is that, with my current code, the bar graph is showing up without the line graph, but when I try plotting the line graph first I get a ValueError: ordinal must be >= 1.
Here's what the series' look like:
These are first 15 values of the 7 day rolling mean series, it has a date and a value for the entire year:
date
2016-01-01 NaN
2016-01-03 NaN
2016-01-04 NaN
2016-01-05 NaN
2016-01-06 NaN
2016-01-07 NaN
2016-01-08 0.088473
2016-01-09 0.099122
2016-01-10 0.086265
2016-01-11 0.084836
2016-01-12 0.076741
2016-01-13 0.070670
2016-01-14 0.079731
2016-01-15 0.079187
2016-01-16 0.076395
This is the entire monthly average series:
dt_month
2016-01-01 0.498323
2016-02-01 0.497795
2016-03-01 0.726562
2016-04-01 1.000000
2016-05-01 0.986411
2016-06-01 0.899849
2016-07-01 0.219171
2016-08-01 0.511247
2016-09-01 0.371673
2016-10-01 0.000000
2016-11-01 0.972478
2016-12-01 0.326921
Here's the code I'm using to try and plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
series_two.plot(ax=ax)
plt.show()
Here's the graph that generates:
Any help is hugely appreciated! Also, advice on formatting this question and creating code to make two series for a minimum working example would be awesome.
Thanks!!
The problem is that pandas bar plots are categorical (Bars are at subsequent integer positions). Since in your case the two series have a different number of elements, plotting the line graph in categorical coordinates is not really an option. What remains is to plot the bar graph in numerical coordinates as well. This is not possible with pandas, but is the default behaviour with matplotlib.
Below I shift the monthly dates by 15 days to the middle of the month to have nicely centered bars.
import matplotlib.pyplot as plt
import numpy as np; np.random.seed(42)
import pandas as pd
t1 = pd.date_range("2018-01-01", "2018-12-31", freq="D")
s1 = pd.Series(np.cumsum(np.random.randn(len(t1)))+14, index=t1)
s1[:6] = np.nan
t2 = pd.date_range("2018-01-01", "2018-12-31", freq="MS")
s2 = pd.Series(np.random.rand(len(t2))*15+5, index=t2)
# shift monthly data to middle of month
s2.index += pd.Timedelta('15 days')
fig, ax = plt.subplots()
ax.bar(s2.index, s2.values, width=14, alpha=0.3)
ax.plot(s1.index, s1.values)
plt.show()
The problem might be the two series' indices are of very different scales. You can use ax.twiny to plot them:
ax = series_one.plot(kind="bar", figsize=(20,2))
ax_tw = ax.twiny()
series_two.plot(ax=ax_tw)
plt.show()
Output:

Python - x-axis labels not lining up with tick marks

I've successfully created the code to generate a bunch of charts. However, the x axis labels are slightly offset (to the left) from the x axis tick marks.
Dataframe
stationId date variable value prefix uom
0 site 1 2016-04-07 pH 6.90 NaN pH
1 site 1 2016-07-11 pH 6.80 NaN pH
2 site 1 2017-10-09 pH 6.80 NaN pH
3 site 1 2017-10-09 pH 6.80 NaN pH
4 site 1 2016-06-29 pH 6.79 NaN pH
Full dataframe here
There is nothing in the code which i can see why this should happen.
#plot
for line,group in linedf.groupby(['variable']):
x = group['date']
ax1 = group.plot(x='date', figsize=(8.2,4.5),linestyle='--',
linewidth=0.75,rot=0,marker='o',markersize=3)
#set axis labels and chart title
plt.title("chartTitle", fontsize=12)
ax1.set_xlabel('Date', fontsize=10)
ax1.set_ylabel('GWL (mAHD)',fontsize=10)
#set text font
rcParams['font.family'] = 'serif'
rcParams['font.serif'] = ['Cambria']
#set dates for x tick labels
years = mdates.YearLocator() # every year
months = mdates.MonthLocator() # every month
yearsFmt = mdates.DateFormatter('%Y')
lgd = plt.legend(bbox_to_anchor=(0.0 ,-0.13, 1.0, -0.03),
loc=2,ncol = 6, mode="expand", borderaxespad=0.0,shadow=True)
plt.show()
Without seeing the dataframe you are using (or at least a chunk of it) I have to speculate a bit, but it should suffice to simply adjust the alignment of the tick labels manually using
for tick in ax1.xaxis.get_major_ticks():
tick.label1.set_horizontalalignment('center')
Without the dataframe I can't test to ensure this works in your case, but from the plot in the question it appears the alignment of the x-tick labels has been set to 'right' and setting them to 'center' will align them how you desire.
    Drawn from the centered ticklabels example in the matplotlib docs.

plotting a vbar_stack using a dataframe

I'm struggling to get a stacked vbar working.
With python/pandas and bokeh I want to plot several statistics about the players of a football team. The dataframe is nicely filled, the values are a string where they should be an int where it should be a numeric value.
I used the sample of bokeh to try and adjust it for my purpose, but I'm stuck on
'ValueError: Keyword argument sequences for broadcasting must be the same length as stackers' this error.
My code (without imports and scraping pieces) is:
source = ColumnDataSource(data=statsdfsource[['goals','assists','naam']])
p = figure(plot_height=250, title="Fruit Counts by Year",
toolbar_location=None, tools="")
p.vbar_stack(['goals','assists'], x='naam', width=0.9, color=colors,
source=source)
p.y_range.start = 0
p.x_range.range_padding = 0.1
p.xgrid.grid_line_color = None
p.axis.minor_tick_line_color = None
p.outline_line_color = None
p.legend.location = "top_left"
p.legend.orientation = "horizontal"
show(p)
The dataframe I fill the columndatasource with is
goals assists naam
0 NaN NaN Miguel Santos
1 NaN NaN Aykut Özer
2 NaN NaN Job van de Walle
3 NaN NaN Rowen Koot
4 8.0 6.0 Perr Schuurs
5 4.0 2.0 Wessel Dammers
6 12.0 2.0 Stefan Askovski
7 1.0 NaN Mica Pinto
8 NaN NaN Christopher Braun
9 1.0 4.0 Marco Ospitalieri
10 NaN 1.0 Clint Esser
The result I want to reach is a stacked columnframe, where on the x-axis is the name of the player, with 2 columns above it, one with the goals the player made and one with the assists.
I think I'm messing up somewhere with how my dataframe is built, but I'm a bit floating how it should be formed (can't really imagine on the other hand that the dataframe doesn't fit the purpose).
When using categorical ranges, you have to tell figure what the categories for the axis are and what order you want them to show up, e.g. provide x_range something like:
# specify all the factors for the x-axis by passing x_range
p = figure(..., x_range=sorted(df.naam.unique()))
It's also possible the NaN values are a problem, since they are "contagious". I'd recommend changing them to zeros instead in any case.
Finally the error message probably indicates that your colors list is the wrong length. You are stacking two bars in each column, so the list of colors needs to also be two (one color for each "row" in the stack).

Categories

Resources