Seaborn lineplot hue input could not be interpreted - python

I am trying to plot my dataframe as a lineplot.
The data is 2D movement data of x and y coordinates.
The dataframe has a column which identifies the data of each individual by a unique ID and a column that identifies the test group of the individual and an additional relevant column that shows the timepoints.
index Location_Center_Y unique_id Location_Center_X classifier
0 0 872.044 B21 0.000 ctrl
1 1 868.727 B21 -3.317 ctrl
2 2 864.918 B21 -7.126 ctrl
3 3 866.462 B21 -5.582 ctrl
I do want to display the data of each individual in a lineplot and want the lines to have different colours based on the test group.
Getting each individual as a single track I achieved by plotting the data of each individual at a time.
I tried using the input units='unique_id' but this unfortunately only works for seaborn.scatterplot. When using it with seaborn.lineplot it raises the error
"ValueError: Could not interpret input 'unique_id'"
But whatever, looping works. However I want it coloured by the different groups (classifier column). This should be doable by using the input argument hue='classifier'.
#looping through the individuals
for n in data.cells:
ix=data.tracks[data.tracks['unique_id']==n]
ax=sns.lineplot(ix['Location_Center_X_Zeroed'],
ix['Location_Center_Y_Zeroed'], hue='classifier')
However, again this raises the error
"ValueError: Could not interpret input 'unique_id'".
So I have no idea how to group my plot.
I should get something like this but with only 2 colours

It's hard to be sure since you didn't provide enough data for me to directly try it out, but it seems like this is what you are looking for?
sns.lineplot(data=df, x='Location_Center_X', y='Location_Center_Y',
hue='classifier', units="unique_id", estimator=None)

Related

Calculating the cumulative bottom values for a stacked bar chart when the length of the array varies

Found how to do it:
used pandas to groupby strike,expiration and sum openInterest, then after a couple of hours of scratching my head i learned what .unstack() does and did that.
y = option_chain.groupby(['strike', 'expirationDate'])['openInterest'].sum().unstack(level=-1)
y.plot.bar(stacked=True)
I am looking to plot a stacked bar chart for options open interest. I am looking to do the exact same thing as the person in this link : https://medium.com/#txlian13/webscraping-options-data-with-python-and-yfinance-e4deb0124613 . I have the data from the same source and I have it arranged in the same way.
My problem is that I can't find a way to calculate the bottom argument and chart looks like this:
all the values start at y=0 and not the previous bar height
tried this code among other options but not managed to make it work
exp is a list of all possible expiration dates for the options
bottom = np.zeros(12) #(using 12 because I am testing with the same stock, so I know my first array needs to be 12 to match the number of strikes for the first date)
for i in exp:
z = option_chain.loc[option_chain['expirationDate'] == i]
zx = z['strike']
zy = z['openInterest']
#here i print my bottom and its an empty array of 0s so it will plot the next line from 0
plt.bar(zx,zy,label=i,alpha=0.7,bottom=bottom)
bottom += zy
#i print bottom again here and I can see that it has the 12 correct values of the open interest
#then i get an error "ValueError: shape mismatch: objects cannot be broadcast to a single shape"
So my problem is that the strike (my x values) changes with every iteration I make. For example my first iteration has 12 values for x and the second one has 9 value for x.
So, is there a way to have a variable array that changes with my x and also I realize this will lead to another problem: how to match the x's so that it gets added to the correct strike.
One way I was thinking to do is to find which date has the most strikes and use that as my base, but the problem with that is that it is not given that the date with most strikes has all the strikes in the other dates.
If the problem can be easily fixed with another plotting package, I have no issue in using that. I am a finance graduate and just trying to learn python so only used matplotlib as it's the one the with the most learning materials out there.

xlwings - Get data range of existing chart

let's say I have an excel file, where there is data from A1 to C5. Meaning it looks like this:
A
B
C
1
1997
1
2
2
1997
2
4
3
1997
3
5
I now have one graph that plots the first time series B so the range of the graph is "A1:B3". The second graph is plotting time series C so the range in xlwings language is ("A1:A3, C1:C3").
What I want to do is open the graph in python with xlwings and extract the range of the graph. I already tried:
wb = xw.Book("myfile.xlsx")
ws = wb.sheets["mysheet"]
for chart in ws.charts:
print(chart.parent.used_range)
But this only gives back the range of all data of that sheet. So in this case "A1:C3" and not the range of the data the chart uses.
Is there any way to extract the exact range of data the chart uses?
Best,
Stefan
Even directly in VBA, the overall chart source data range is not available. In many cases, this range is undefined: if series have different X values, for example, or if series have a different number of points, or if series are plotted out of order, etc.
But you can get the range for the individual series in the chart through the series formulas, and along with some validation and adjustment, merge these ranges to get the source data range.

multiple boxplot in subplots in python

I have 18 individual of np.arrays, each containing 30 numbers with similar range (share = True).
I want to create boxplots for all 18 arrays in a subplot of 1 row, 4 columns. Each subplot will contain few sets of arrays.
How do I do this?
when I try it, it looks like this:
This was my trying to put them in one, the red scratch was what I want it to look like
I get this solved!!
Since it's only 1 row,
I should use only
-axes(num)
instead of
-axes(num,0)

Creating a line graph for a top X in a dataframe (Pandas)

I'm trying to make a line graph for my dataframe that has the names of 10 customers on the X axis and their amount of purchases they made on the Y axis.
I have over 100 customers in my data frame, so I created a new data frame that is grouped by customers and which shows the sum of their orders and I wish to only display the top 10 customers on my graph.
I have tried using
TopCustomers.nlargest(10, 'Company', keep='first')
But I run into the error nlargest() got multiple values for argument 'keep' and if I don't use keep, I get told it's a required argument.
TopCustomers is composed of TopCustomers = raw.groupby(raw['Company'])['Orders'].sum()
Sorting is not required at the moment, but it'd be good to know in advance.
On an additional Note: The list of customer's name is rather lengthy and, after playing with some dummy data, I see that the labels for the X axis are stacked on top of each other, is there a way to make it bigger so that all 10 are clearly visible? and maybe mark a dot where the X,Y meets?
we can do sort_values and tail
TopCustomers.sort_values().tail(10)

How to change axis limits for time in Matplotlib?

I have a data set stored in a Pandas dataframe object, and the first column of the dataframe is a datetime type, which looks like this:
0 2013-09-09 10:35:42.640000
1 2013-09-09 10:35:42.660000
2 2013-09-09 10:35:42.680000
3 2013-09-09 10:35:42.700000
In another column, I have another column called eventno, and that one looks like:
0 0
1 0
2 0
3 0
I am trying to create a scatter plot with Matplotlib, and once I have the scatter plot ready, I would like to change the range in the date axis (x-axis) to focus on certain times in the data. My problem is, I could not find a way to change the range the data will be plotted over in the x axis. I tried this below, but I get a Not implemented for this type error.
plt.figure(figsize=(13,7), dpi=200)
ax.set_xlim(['2013-09-09 10:35:00','2013-09-09 10:36:00'])
scatter(df2['datetime'][df.eventno<11],df2['eventno'][df.eventno<11])
If I comment out the ax.set.xlim line, I get the scatter plot, however with some default x axis range, not even matching my dates.
Do I have to tell matplotlib that my data is of datetime type, as well? If so, then how can I do it? Assuming this part is somehow accomplished, then how can I change the range of my data to be plotted?
Thanks!
PS: I tried uploading the picture, but I got a "Framing not allowed" error. Oh well... It just plots it from Jan 22 1970 to Jan 27 1970. No clue how it comes up with that :)
Try putting ax.set_xlim after the scatter command.

Categories

Resources