xlwings - Get data range of existing chart - python

let's say I have an excel file, where there is data from A1 to C5. Meaning it looks like this:
A
B
C
1
1997
1
2
2
1997
2
4
3
1997
3
5
I now have one graph that plots the first time series B so the range of the graph is "A1:B3". The second graph is plotting time series C so the range in xlwings language is ("A1:A3, C1:C3").
What I want to do is open the graph in python with xlwings and extract the range of the graph. I already tried:
wb = xw.Book("myfile.xlsx")
ws = wb.sheets["mysheet"]
for chart in ws.charts:
print(chart.parent.used_range)
But this only gives back the range of all data of that sheet. So in this case "A1:C3" and not the range of the data the chart uses.
Is there any way to extract the exact range of data the chart uses?
Best,
Stefan

Even directly in VBA, the overall chart source data range is not available. In many cases, this range is undefined: if series have different X values, for example, or if series have a different number of points, or if series are plotted out of order, etc.
But you can get the range for the individual series in the chart through the series formulas, and along with some validation and adjustment, merge these ranges to get the source data range.

Related

pandas/matplotlib graph on frequency of appearance

I am a pandas newbie and I want to make a graph from a CSV I have. On this csv, there's some date written to it, and I want to make a graph of how frequent those date appears.
This is how it looks :
2022-01-12
2022-01-12
2022-01-12
2022-01-13
2022-01-13
2022-01-14
Here, we can see that I have three records on the 12th of january, 2 records the 13th and only one records the 14th. So we should see a decrease on the graph.
So, I tried converting my csv like this :
date,records
2022-01-12,3
2022-01-13,2
2022-01-14,1
And then make a graph with the date as the x axis and the records amount as the y axis.
But is there a way panda (or matplotlib I never understand which one to use) can make a graph based on the frequency of appearance, so that I don't have to convert the csv before ?
There is a function of PANDAS which allows you to count the number of values.
First off, you'd need to read your csv file into a dataframe. Do this by using:
import pandas as pd
df = pd.read_csv("~csv file name~")
Using the unique() function in the pandas library, you can display all of the unique values. The syntax should look like:
uniqueVals = df("~column name~").unique()
That should return a list of all the unique values. Then what you'll do is use the function value_counts() with whatever value you are trying to count in square brackets after the normal brackets. The syntax should look something like this:
totalOfVals = []
for date in uniqueVals:
numDate = df[date].valuecounts("~Whatever date you're looking for~")
totalOfVals.append(numDate)
Then you can use the two arrays you have for the unique dates and the amount of dates there are to then use matplotlib to create a graph.
You'll want to use the syntax:
import matplotlib.pyplot as mpl
mpl.plot(uniqueVals, totalOfVals, color = "~whatever colour you want the line to be~", marker = "~whatever you want the marker to look like~")
mpl.xlabel('Date')
mpl.ylabel('Number of occurrences')
mpl.title('Number of occurrences of dates')
mpl.grid(True)
mpl.show()
And that should display a graph with all the dates and number of occurrences with a grid behind it. Of course if you don't want the grid just either set mpl.grid to False or just get rid of it.

Reshaping a material science Dataset (probably using melt() )

I'm dealing with a materials science dataset and I'm in the following situation,
I have data organized like this:
Chemical_ Formula Property_name Property_Scalar
He Electrical conduc. 1
NO_2 Resistance 50
CuO3 Hardness
... ... ...
CuO3 Fluorescence 300
He Toxicity 39
NO2 Hardness 80
... ... ...
As you can understand it is really messy because the same chemical formula appears more than once through the entire dataset, but referred to a different property that is considered. My question is, how can I easily maybe split the dataset in smaller ones, fitting every formula with its descriptors in ORDER? ( I used fiction names and values, just to explain my problem.)
I'm on Jupyter Notebook and I'm using Pandas.
I'm editing my question trying to be more clear:
My goal would be to plot some histograms of (for example) nĀ°materials vs conductivity at different temperatures (100K, 200K, 300K). So I need to have both conductivity and temperature for each material to be clearly comparable. For example, I guess that a more convenient thing to obtain would be:
Chemical formula Conductivity Temperature
He 5 10K
NO_2 7 59K
CuO_3 10 300K
... ... ...
He 14 100K
NO_2 5 70K
... ... ...
I think that this issue can be related to reshaping the dataset but I should also have each formula to MATCH exactly the temperature and conductivity. Thank you for your help!
If you want to plot Conductivity versus Temperature for a given formula, you can simly select the rows that match this condition.
import pandas as pd
import matplotlib.pyplot as plt
formula = 'NO_2'
subset = df.loc[df['Chemical_Formula'] == formula].sort_values('Temperature')
x = subset['Temperature'].values
y = subset['Conductivity'].values
plt.plot(x, y)
Here, we are defining the formula you want to extract. Then we are selecting only the rows in the DataFrame where the value in the column 'Chemical Formula' matches your specified formula using df.loc[]. This returns a new DataFrame that is a subset of your original DataFrame that contains only rows where our condition is satisfied. We sort this subset by 'Temperature' (I assume you want to plot Temperature on the x-axis) and store it as subset. We then select the 'Temperature' and 'Conductivity' columns which return pandas.Seriesobjects, which we convert to numpy arrays by calling .values. We store these in x and y variables and pass them to the matplotlib plot function.
EDIT:
To get from the first DataFrame to the second DataFrame described in your post, you can use the pivot function (assuming your first DataFrame is named df):
df = df.pivot(index='Chemical_Formula', columns='Property_name', values='Property_Scalar')

Time Series Analysis from reading a csv file with pandas

I want to create a program to plot graphs for every 3rd column, using 1 and 2 as the date time which is x axis and 3 as the value which is y axis. Can someone help with the code?
using python

Seaborn lineplot hue input could not be interpreted

I am trying to plot my dataframe as a lineplot.
The data is 2D movement data of x and y coordinates.
The dataframe has a column which identifies the data of each individual by a unique ID and a column that identifies the test group of the individual and an additional relevant column that shows the timepoints.
index Location_Center_Y unique_id Location_Center_X classifier
0 0 872.044 B21 0.000 ctrl
1 1 868.727 B21 -3.317 ctrl
2 2 864.918 B21 -7.126 ctrl
3 3 866.462 B21 -5.582 ctrl
I do want to display the data of each individual in a lineplot and want the lines to have different colours based on the test group.
Getting each individual as a single track I achieved by plotting the data of each individual at a time.
I tried using the input units='unique_id' but this unfortunately only works for seaborn.scatterplot. When using it with seaborn.lineplot it raises the error
"ValueError: Could not interpret input 'unique_id'"
But whatever, looping works. However I want it coloured by the different groups (classifier column). This should be doable by using the input argument hue='classifier'.
#looping through the individuals
for n in data.cells:
ix=data.tracks[data.tracks['unique_id']==n]
ax=sns.lineplot(ix['Location_Center_X_Zeroed'],
ix['Location_Center_Y_Zeroed'], hue='classifier')
However, again this raises the error
"ValueError: Could not interpret input 'unique_id'".
So I have no idea how to group my plot.
I should get something like this but with only 2 colours
It's hard to be sure since you didn't provide enough data for me to directly try it out, but it seems like this is what you are looking for?
sns.lineplot(data=df, x='Location_Center_X', y='Location_Center_Y',
hue='classifier', units="unique_id", estimator=None)

How to change axis limits for time in Matplotlib?

I have a data set stored in a Pandas dataframe object, and the first column of the dataframe is a datetime type, which looks like this:
0 2013-09-09 10:35:42.640000
1 2013-09-09 10:35:42.660000
2 2013-09-09 10:35:42.680000
3 2013-09-09 10:35:42.700000
In another column, I have another column called eventno, and that one looks like:
0 0
1 0
2 0
3 0
I am trying to create a scatter plot with Matplotlib, and once I have the scatter plot ready, I would like to change the range in the date axis (x-axis) to focus on certain times in the data. My problem is, I could not find a way to change the range the data will be plotted over in the x axis. I tried this below, but I get a Not implemented for this type error.
plt.figure(figsize=(13,7), dpi=200)
ax.set_xlim(['2013-09-09 10:35:00','2013-09-09 10:36:00'])
scatter(df2['datetime'][df.eventno<11],df2['eventno'][df.eventno<11])
If I comment out the ax.set.xlim line, I get the scatter plot, however with some default x axis range, not even matching my dates.
Do I have to tell matplotlib that my data is of datetime type, as well? If so, then how can I do it? Assuming this part is somehow accomplished, then how can I change the range of my data to be plotted?
Thanks!
PS: I tried uploading the picture, but I got a "Framing not allowed" error. Oh well... It just plots it from Jan 22 1970 to Jan 27 1970. No clue how it comes up with that :)
Try putting ax.set_xlim after the scatter command.

Categories

Resources