How to change axis limits for time in Matplotlib? - python

I have a data set stored in a Pandas dataframe object, and the first column of the dataframe is a datetime type, which looks like this:
0 2013-09-09 10:35:42.640000
1 2013-09-09 10:35:42.660000
2 2013-09-09 10:35:42.680000
3 2013-09-09 10:35:42.700000
In another column, I have another column called eventno, and that one looks like:
0 0
1 0
2 0
3 0
I am trying to create a scatter plot with Matplotlib, and once I have the scatter plot ready, I would like to change the range in the date axis (x-axis) to focus on certain times in the data. My problem is, I could not find a way to change the range the data will be plotted over in the x axis. I tried this below, but I get a Not implemented for this type error.
plt.figure(figsize=(13,7), dpi=200)
ax.set_xlim(['2013-09-09 10:35:00','2013-09-09 10:36:00'])
scatter(df2['datetime'][df.eventno<11],df2['eventno'][df.eventno<11])
If I comment out the ax.set.xlim line, I get the scatter plot, however with some default x axis range, not even matching my dates.
Do I have to tell matplotlib that my data is of datetime type, as well? If so, then how can I do it? Assuming this part is somehow accomplished, then how can I change the range of my data to be plotted?
Thanks!
PS: I tried uploading the picture, but I got a "Framing not allowed" error. Oh well... It just plots it from Jan 22 1970 to Jan 27 1970. No clue how it comes up with that :)

Try putting ax.set_xlim after the scatter command.

Related

pandas/matplotlib graph on frequency of appearance

I am a pandas newbie and I want to make a graph from a CSV I have. On this csv, there's some date written to it, and I want to make a graph of how frequent those date appears.
This is how it looks :
2022-01-12
2022-01-12
2022-01-12
2022-01-13
2022-01-13
2022-01-14
Here, we can see that I have three records on the 12th of january, 2 records the 13th and only one records the 14th. So we should see a decrease on the graph.
So, I tried converting my csv like this :
date,records
2022-01-12,3
2022-01-13,2
2022-01-14,1
And then make a graph with the date as the x axis and the records amount as the y axis.
But is there a way panda (or matplotlib I never understand which one to use) can make a graph based on the frequency of appearance, so that I don't have to convert the csv before ?
There is a function of PANDAS which allows you to count the number of values.
First off, you'd need to read your csv file into a dataframe. Do this by using:
import pandas as pd
df = pd.read_csv("~csv file name~")
Using the unique() function in the pandas library, you can display all of the unique values. The syntax should look like:
uniqueVals = df("~column name~").unique()
That should return a list of all the unique values. Then what you'll do is use the function value_counts() with whatever value you are trying to count in square brackets after the normal brackets. The syntax should look something like this:
totalOfVals = []
for date in uniqueVals:
numDate = df[date].valuecounts("~Whatever date you're looking for~")
totalOfVals.append(numDate)
Then you can use the two arrays you have for the unique dates and the amount of dates there are to then use matplotlib to create a graph.
You'll want to use the syntax:
import matplotlib.pyplot as mpl
mpl.plot(uniqueVals, totalOfVals, color = "~whatever colour you want the line to be~", marker = "~whatever you want the marker to look like~")
mpl.xlabel('Date')
mpl.ylabel('Number of occurrences')
mpl.title('Number of occurrences of dates')
mpl.grid(True)
mpl.show()
And that should display a graph with all the dates and number of occurrences with a grid behind it. Of course if you don't want the grid just either set mpl.grid to False or just get rid of it.

How to create a line plot using the mean of a column and extracting year from a Date column?

Update: I've now managed to solve this. For extracting the year this is what I used,
df['year'] = pd.DatetimeIndex(df['Date']).year
this allowed me to add a new column for the year and then use that column to plot the chart.
sns.lineplot(y="Class", x="year", data=df)
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
Now I managed the right plot chart.
I'm trying a get a line plot using the mean of a column and linking that to extracted value (year) from the date column. However, I can't seem to get the right outcome.
Here's how I extracted the Year value from the date column,
year=[]
def Extract_year(date):
for i in df["Date"]:
year.append(i.split("-")[0])
return year
And here's how plotted the values to create a line plot,
sns.lineplot(y=df['Class'].mean(), x=Extract_year(df))
plt.xlabel("Year",fontsize=20)
plt.ylabel("Success Rate",fontsize=20)
plt.show()
But instead of seeing a trend (see screenshot-1), it only displays a straight line (see screenshot-2) for the mean value. Could someone please explain to me, what I am doing wrong and how can I correct it?
Thanks!
What you are plotting is df['Class'].mean(), that of course is a fixed value. I don't know which time format you're using, but maybe you need to calculate different means for different years
EDIT:
Yes there is:
df = pd.DataFrame({'Date':['2020-01-20','2019-01-20','2022-01-20','2021-01-20','2012-01-20','2013-01-20','2016-01-20','2018-01-20']})
years = pd.to_datetime(df['Date'],format='%Y-%m-%d').dt.year.sort_values().tolist()

xlwings - Get data range of existing chart

let's say I have an excel file, where there is data from A1 to C5. Meaning it looks like this:
A
B
C
1
1997
1
2
2
1997
2
4
3
1997
3
5
I now have one graph that plots the first time series B so the range of the graph is "A1:B3". The second graph is plotting time series C so the range in xlwings language is ("A1:A3, C1:C3").
What I want to do is open the graph in python with xlwings and extract the range of the graph. I already tried:
wb = xw.Book("myfile.xlsx")
ws = wb.sheets["mysheet"]
for chart in ws.charts:
print(chart.parent.used_range)
But this only gives back the range of all data of that sheet. So in this case "A1:C3" and not the range of the data the chart uses.
Is there any way to extract the exact range of data the chart uses?
Best,
Stefan
Even directly in VBA, the overall chart source data range is not available. In many cases, this range is undefined: if series have different X values, for example, or if series have a different number of points, or if series are plotted out of order, etc.
But you can get the range for the individual series in the chart through the series formulas, and along with some validation and adjustment, merge these ranges to get the source data range.

Seaborn lineplot hue input could not be interpreted

I am trying to plot my dataframe as a lineplot.
The data is 2D movement data of x and y coordinates.
The dataframe has a column which identifies the data of each individual by a unique ID and a column that identifies the test group of the individual and an additional relevant column that shows the timepoints.
index Location_Center_Y unique_id Location_Center_X classifier
0 0 872.044 B21 0.000 ctrl
1 1 868.727 B21 -3.317 ctrl
2 2 864.918 B21 -7.126 ctrl
3 3 866.462 B21 -5.582 ctrl
I do want to display the data of each individual in a lineplot and want the lines to have different colours based on the test group.
Getting each individual as a single track I achieved by plotting the data of each individual at a time.
I tried using the input units='unique_id' but this unfortunately only works for seaborn.scatterplot. When using it with seaborn.lineplot it raises the error
"ValueError: Could not interpret input 'unique_id'"
But whatever, looping works. However I want it coloured by the different groups (classifier column). This should be doable by using the input argument hue='classifier'.
#looping through the individuals
for n in data.cells:
ix=data.tracks[data.tracks['unique_id']==n]
ax=sns.lineplot(ix['Location_Center_X_Zeroed'],
ix['Location_Center_Y_Zeroed'], hue='classifier')
However, again this raises the error
"ValueError: Could not interpret input 'unique_id'".
So I have no idea how to group my plot.
I should get something like this but with only 2 colours
It's hard to be sure since you didn't provide enough data for me to directly try it out, but it seems like this is what you are looking for?
sns.lineplot(data=df, x='Location_Center_X', y='Location_Center_Y',
hue='classifier', units="unique_id", estimator=None)

matplotlib set xticks to column, labels to corresponding index

I am pretty new to matlpotlib and I find the tick locators and labels confusing, so please bear with me. I swear I've been googling for hours.
I have a dataframe 'frame' like this (relevant columns):
dayofweek sla
weekday
Mon 1 0.889734
Tue 2 0.895131
Wed 3 0.879747
Thu 4 0.935000
Fri 5 0.967742
Sat 6 0.852941
Sun 7 1.000000
where the weekday name is the index and the weekday number is a column. There are no datetime objects in this frame.
I turn this into a plt.figure
fig=plt.figure(figsize=(7,5))
ax=plt.subplot(111)
I need to have my x-axis as numeric values, because I want to add a scatter plot later, which is not possible with string values.
x_=frame.dayofweek.values
anbar=ax.bar(x_,y_an,width=0.8,color=an_c,label='angekommen')
This works ok
So basically I want my xticks to be the 'dayofweek' column and their labels to be the corresponding index.
Now if I set_xticklabels manually by
ax.set_xticklabels(frame.index)
the labels start from position 0 on the axis.
I can work around this by rearranging the list of labels, but there should be a 'correct' way to use the Locators or Formatter, but (see above) this is quite confusing for me.
Can someone point me to how I make the labels correspond to their index?
The straight forward solution is to not only set the xticklabels but also the ticks themselves:
ax.set_xticks(frame.dayofweek.values)
ax.set_xticklabels(frame.index)
The same can be accomplished with a FixedLocator and a FixedFormatter,
ax.xaxis.set_major_locator(matplotlib.ticker.FixedLocator(frame.dayofweek.values))
ax.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(frame.index))
but seems quite unnecessary for this simple task.

Categories

Resources