plotting pandas core series.series/values not showing - python

I am trying to plot the availability of my network per hour. So,I have a massive dataframe containing multiple variables including the availability and hour. I can clearly visualise everything I want on my plot I want to plot when I do the following:
mond_data= mond_data.groupby('Hour')['Availability'].mean()
The only problem is, if I bracket the whole code and plot it (I mean this (the code above).plot); I do not get any value on my x-axis that says 'Hour'.How can plot this showing the values of my x-axis (Hour). I should have 24 values as the code above bring an aaverage for the whole day for midnight to 11pm.

Here is how I solved it.
plt.plot(mon_data.index,mond_data.groupby('Hour')['Availability'].mean())
for some reason python was not plotting the index, only if called. I have not tested many cases. So additional explanation to this problem is still welcome.

Related

how to find periods from time series data in python

I have time series data
How do I get period??
The first picture graph and first graph of second picture is same graph.
and ignore the second graph of second picture.
We wanted to get micro current when a person workout.
this chart is that a man push the sensor with his finger.
Because we didn't make a good sensor yet.
and My team tried to find not noisy data so that we made the data to 0 below 400. But we can return it to normal data.
It seems 7 similiar periods.
I have used
https://github.com/gsubramani/SignalRecognition
but this have an error. code did not work well
https://github.com/guillaume-chevalier/seq2seq-signal-prediction
My computer have no gpu.. so I couldn't test it. It had errors
https://github.com/tbnsilveira/STFT_analysis/blob/master/STFT_sinusoidal_signal.ipynb
This don't seem to have how to get periods.
I use python.
Any helps would be helpful!! Thank you in advance.

Can't get sorted bar plot

I am trying to get a bar plot for feature importance in the XGBoost classifier. It should have worked but it didn't. I tried too many times. Can you check the code below and tell me what is wrong with it?
feat_import=clf.feature_importances_
feat_names=X.columns
sorted_idx=clf.feature_importances_.argsort()[-20:]
plt.barh(feat_names[sorted_idx], clf.feature_importances_[sorted_idx])
It takes the features that the most important ones. However, it plots them unsorted.
When I use just numbers instead of column names I take the sorted bar graph.
plt.barh(range(20),feat_import[sorted_idx])
I couldn't figure out the problem here.

How to automatically reduce the range of a chart?

First of all sorry for my bad english as it is not first language.
I have recently started learning python and I am trying to develop a "simple" program, but I have run into a problem.
I am using xlwings to modify and interact with Excel. What I want to achieve (or to know if its possible) is:
I have excel look into data and plot a graph. However this graph sometimes has for example 20 values for the X-Axis and in other cases let's say 10 values for the X-Axis, thus, leaving 10 #NA empty spaces. Based on this, I want to adjust the graph to show only 10 values by changing the range that shapes the graph .
The function get_prod_hours() looks how many values I want on the X-Axis:
def get_prod_hours():
"""From the input gets the production hours to adapt the graphs"""
dt = wb.sheets['Calculatrice']
return dt.range('E24').value
Based on the value gotten from the function I must modify the range of values on the graph (by reducing it).
Solutions as for example create the graphs from scratch are not OK because I would like to only modify the range of the graph because the Excel file is a "standard" on my company.
I hope for something like:
Column A in Excel with values: 1, 2, 3, 4, 5 and get from get_prod_hours() a value of 5, so my graph will have only 5 points and not for example 6 of which one is #NA.
Thank you very much, and sorry for the wall of text.
The xlwings API doesn't offer a lot of options for charts (see https://docs.xlwings.org/en/stable/api.html?highlight=charts#xlwings.main.Charts).
Try to find the chart in wb.sheets[0].charts.
The range can then be modified with
range = xw.Range((1,1), (get_prod_hours(),1))
set_source_data(wb.sheets[0].range(range))
But from looking at the API and knowing how many options Excel charts have, the API feels too thin.
If this doesn't work, an option is to add a VBA macro which modifies the chart and call that. See How do I call an Excel macro from Python using xlwings?

How do I plot this time series?

I am trying to plot this time series in a chart, but the canvas is empty.
As you can see in the image above, my time series is quite simple. I want to plot DATE in x-axis and PAYEMS in the y-axis.
At first, I was getting an error because my dates were strings, so I converted it in cell 11.
You do not want to use a tsplot to plot a time series. The name is a bit confusing, but as the documentation puts it, tsplot is "intended to be used with data where observations are nested within sampling units that were measured at multiple timepoints". As a rule of thumb: If you understand this sentence, you will know when to use it, if you don't understand this sentence, don't use it. Apart, tsplot will even be removed or significantly altered in the future, so its use is deprecated.
But that doesn't matter, because you can directly use pandas to plot the time series.
df.plot(x="Date", y="Payems")

How to set x lim to 99.5 percentile of my data series for matplotlib histogram?

I'm currently pumping out some histograms with matplotlib. The issue is that because of one or two outliers my whole graph is incredibly small and almost impossible to read due to having two separate histograms being plotted. The solution I am having problems with is dropping the outliers at around a 99/99.5 percentile. I have tried using:
plt.xlim([np.percentile(df,0), np.percentile(df,99.5)])
plt.xlim([df.min(),np.percentile(df,99.5)])
Seems like it should be a simple fix, but I'm missing some key information to make it happen. Any input would be much appreciated, thanks in advance.
To restrict focus to just the middle 99% of the values, you could do something like this:
trimmed_data = df[(df.Column > df.Column.quantile(0.005)) & (df.Column < df.Column.quantile(0.995))]
Then you could do your histogram on trimmed_data. Exactly how to exclude outliers is more of a stats question than a Python question, but basically the idea I was suggesting in a comment is to clean up the data set using whatever methods you can defend, and then do everything (plots, stats, etc.) on only the cleaned dataset, rather than trying to tweak each individual plot to make it look right while still having the outlier data in there.

Categories

Resources