matplotlib set xticks to column, labels to corresponding index

matplotlib set xticks to column, labels to corresponding index - python

I am pretty new to matlpotlib and I find the tick locators and labels confusing, so please bear with me. I swear I've been googling for hours.
I have a dataframe 'frame' like this (relevant columns):
dayofweek sla
weekday
Mon 1 0.889734
Tue 2 0.895131
Wed 3 0.879747
Thu 4 0.935000
Fri 5 0.967742
Sat 6 0.852941
Sun 7 1.000000
where the weekday name is the index and the weekday number is a column. There are no datetime objects in this frame.
I turn this into a plt.figure
fig=plt.figure(figsize=(7,5))
ax=plt.subplot(111)
I need to have my x-axis as numeric values, because I want to add a scatter plot later, which is not possible with string values.
x_=frame.dayofweek.values
anbar=ax.bar(x_,y_an,width=0.8,color=an_c,label='angekommen')
This works ok
So basically I want my xticks to be the 'dayofweek' column and their labels to be the corresponding index.
Now if I set_xticklabels manually by
ax.set_xticklabels(frame.index)
the labels start from position 0 on the axis.
I can work around this by rearranging the list of labels, but there should be a 'correct' way to use the Locators or Formatter, but (see above) this is quite confusing for me.
Can someone point me to how I make the labels correspond to their index?

The straight forward solution is to not only set the xticklabels but also the ticks themselves:
ax.set_xticks(frame.dayofweek.values)
ax.set_xticklabels(frame.index)
The same can be accomplished with a FixedLocator and a FixedFormatter,
ax.xaxis.set_major_locator(matplotlib.ticker.FixedLocator(frame.dayofweek.values))
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(frame.index))
but seems quite unnecessary for this simple task.

Related

Understanding Plotly Time Difference units

So, I have a problem similar to this question. I have a DataFrame with a column 'diff' and a column 'date' with the following dtypes:
delta_df['diff'].dtype
>>> dtype('<m8[ns]')
delta_df['date'].dtype
>>> datetime64[ns, UTC]
According to this answer, there are (kind of) equivalent. However, then I plot using plotly (I used histogram and scatter), the 'diff' axis has a weird unit. Something like 2T, 2.5T, 3T, etc, what is this? The data on 'diff' column looks like 0 days 00:29:36.000001 so I don't understand what is happening (column of 'date' is 2018-06-11 01:04:25.000005+00:00).
BTW, the diff column was generated using df['date'].diff().
So my question is:
What is this T? Is it a standard choosen by plotly like 30 mins and then 2T is 1 hour? if so, how to check the value of the chosen T?
Maybe more important, how to plot with the axis as it appears on the column so it's easier to read?

The "T" you see in the axis label of your plot represents a time unit, and in Plotly, it stands for "Time". By default, Plotly uses seconds as the time unit, but if your data spans more than a few minutes, it will switch to larger time units like minutes (T), hours (H), or days (D). This is probably what is causing the weird units you see in your plot.
It's worth noting that using "T" as a shorthand for minutes is a convention adopted by some developers and libraries because "M" is already used to represent months.
To confirm that the weird units you see are due to Plotly switching to larger time units, you can check the largest value in your 'diff' column. If the largest value is more than a few minutes, Plotly will switch to using larger time units.

How to create year-over-year plots in python

I have the following df
player season pts
A 2017 6
A 2018 5
A 2019 9
B 2017 2
B 2018 1
B 2019 3
C 2017 10
C 2018 8
C 2019 7
I would like to make a plot to look at the stability of pts year-over-year. That is, I want to see how correlated pts are on a year-to year-basis. I have tried various ways to plot this, but can't seem to get it quite right. Here is what I tried initially:
fig, ax = plt.subplots(figsize=(15,10))
for i in df.season:
sns.scatterplot(df.pts.iloc[i],df.pts.iloc[i]+1)
plt.xlabel('WOPR Year n')
plt.ylabel('WOPR Year n+1')
IndexError: single positional indexer is out-of-bounds
I thought about it some more, and thought something like this may work:
fig, ax = plt.subplots(figsize=(15,10))
seasons = [2017,2018,2019]
for i in seasons:
sns.scatterplot(df.pts.loc[df.season==i],df.pts.loc[df.season==i+1])
plt.xlabel('WOPR Year n')
plt.ylabel('WOPR Year n+1')
This didn't return an error, but just gave me a blank plot. I think I am close here. Any help is appreciated. Thanks! To clarify, I want each player to be plotted twice. Once for x=2017 and y=2018, and another for x=2018 and y=2019 (hence the year n+1). EDIT: a sns.regplot() would probably be better here compared to sns.scatterplot as I could leverage the trendline to my liking. The below image captures the stability of the desired metric from year to year.

I think you can do a to do a self-merge:
sns.lineplot(data=df.merge(df.assign(season=df.season+1),
on=['player','season'],
suffixes=['_last','_current']),
x='pts_last', y='pts_current', hue='player')
Output:
Note: If you don't care for players, then you could drop hue. Also, use scatterplot instead of lineplot if it fits you better.

Based on your second idea:
for i in seasons[:-1]:
sns.scatterplot(df.pts.loc[df.season==i].tolist(),df.pts.loc[df.season==(i+1)].tolist())
It seems there were two issues: one is that the Seaborn method expect numerical data; converting the series to a list gets rid of the index so that Seaborn handles it properly. The other is that you need to exclude the last element of seasons, since you're plotting n against n+1.

Python Pandas - Don't sort bar graph on y axis values

I am beginner in Python. I have a Series with Date and count of some observation as below
Date Count
2003 10
2005 50
2015 12
2004 12
2003 15
2008 10
2004 05
I wanted to plot a graph to find out the count against the year with a Bar graph (x axis as year and y axis being count). I am using the below code
import pandas as pd
pd.value_counts(sfdf.Date_year).plot(kind='bar')
I am getting the bar graph which is automatically sorted on the count. So I am not able to clearly visualize how the count is distributed over the years. Is there any way we can stop sorting the data on the bar graph on the count and instead sort on the x axis values (i,e year)?

I know this is an old question, but in case someone is still looking for another answer.
I solved this by adding .sort_index(axis=0)
So, instead of this:
pd.value_counts(sfdf.Date_year).plot(kind='bar')
you can write this:
pd.value_counts(sfdf.Date_year).sort_index(axis=0).plot(kind='bar')
Hope, this helps.

The following code uses groupby() to join the multiple instances of the same year together, and then calls sum() on the groupby() object to sum it up. By default groupby() pushes the grouped object to the dataframe index. I think that groupby() automatically sorts, but just in case, sort(axis=0) will sort the index. All that then remains is to plot. All in one line:
df = pd.DataFrame([[2003,10],[2005,50],[2015,12],[2004,12],[2003,15],[2008,10],[2004,5]],columns=['Date','Count'])
df.groupby('Date').sum().sort(axis=0).plot(kind='bar')

Manipulating a non-evenly spaced data series in python

Hello guys I've trying to plot a bunch of data of some measurements taken in uneven intervals of time and make a cubic spline interpolation of it. Here is a sample of the data:
1645 2 28 .0
1645 6 30 .0
1646 6 30 .0
1646 7 31 .0
The first column corresponds to the year which the measurement was made, the second column is the month, the third one is the number of measurements and the fourth one is the standard deviation of the measurements.
The thing is that I can't seem to figure out how to make a scatter plot of the data keeping the "unevenness" of the intervals of measurement. Also I'm not quite sure how to implement the interpolation cause I don't know what should be my x value for the data points (months maybe?)
Any advice or help would be greatly appreciated. Thank You.
Btw I'm working with python and using Scipy.

For x, you could either convert the year and month to a datetime object:
np.datetime64('2005-02')
Or convert it to months (assuming 1645 is your first value):
CumulativeMonth = (year - 1645) * 12 + month

How to change axis limits for time in Matplotlib?

I have a data set stored in a Pandas dataframe object, and the first column of the dataframe is a datetime type, which looks like this:
0 2013-09-09 10:35:42.640000
1 2013-09-09 10:35:42.660000
2 2013-09-09 10:35:42.680000
3 2013-09-09 10:35:42.700000
In another column, I have another column called eventno, and that one looks like:
0 0
1 0
2 0
3 0
I am trying to create a scatter plot with Matplotlib, and once I have the scatter plot ready, I would like to change the range in the date axis (x-axis) to focus on certain times in the data. My problem is, I could not find a way to change the range the data will be plotted over in the x axis. I tried this below, but I get a Not implemented for this type error.
plt.figure(figsize=(13,7), dpi=200)
ax.set_xlim(['2013-09-09 10:35:00','2013-09-09 10:36:00'])
scatter(df2['datetime'][df.eventno<11],df2['eventno'][df.eventno<11])
If I comment out the ax.set.xlim line, I get the scatter plot, however with some default x axis range, not even matching my dates.
Do I have to tell matplotlib that my data is of datetime type, as well? If so, then how can I do it? Assuming this part is somehow accomplished, then how can I change the range of my data to be plotted?
Thanks!
PS: I tried uploading the picture, but I got a "Framing not allowed" error. Oh well... It just plots it from Jan 22 1970 to Jan 27 1970. No clue how it comes up with that :)

Try putting ax.set_xlim after the scatter command.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

matplotlib set xticks to column, labels to corresponding index - python

Related

Understanding Plotly Time Difference units

How to create year-over-year plots in python

Python Pandas - Don't sort bar graph on y axis values

Manipulating a non-evenly spaced data series in python

How to change axis limits for time in Matplotlib?

Categories

Resources