I have an experimental bench where I retrieve data of the power of a compressor.
I import the csv using python and pandas. So it's a pandas dataframe with index datetime and a float column with P_comp.
And I would like to define and calculate the area under the curve for each period like this :
For the moment, I do it manually which is really annoying, I'm plotting all the data and manually selecting a range where there is a periodic steady state and then I'm integrating P_comp using np.trapz on this range.
I tried scipy.signal but I’m not sure it’s a good tool to do this job. Do you have any ideas ?
Looks like the intervals are fairly regular and the low-values are almost equal, too, so you might get away with taking the first value below a defined threshold, and then after a period of time the next etc.
Thank you, I found the solution using scipy.signal.find_peaks and numpy diff
Related
There are good technical analysis libraries for Python like pandas_ta or ta-lib. However, I could not find a way how I can analyze streaming data.
Let me explain what I mean. For example, I have an array of 120 intraday (one minute timespan) close price values. I calculated RSI based on this data. However, in one minute the data is updated because I get another close price value for another minute. With custom RSI implementation, I can easily calculate next RSA value based on previously calculated values. However, if I use TA libraries I mentioned above, I need to recalculate the whole data from the beginning (or maybe I miss something).
Is there a way to calculate indicators using streamed data when new calculation is based on previously calculated values?
I appreciate any help.
There is a TA-Lib RT, a fork of TA-Lib with some fixes and changes. And the biggest innovation in it is a support of such streaming calculations. Unfortunately its wrapper for python is experimental. There is a discussion of it and its alternatives.
Edit!
For anyone wondering this same thing, I figured it out. There is nothing wrong with the implementations below. Its just the fact that EMA requires more than 21 data points to count a 20 data point exponential moving average. The reason for this is that the earlier data points effect the datapoints you are trying to calculate. In simple terms you i tested and you need about 40-50 datapoints to get the same 20 day EMA as with 100+ datapoints.
I'm trying to calculate the EMA (Exponential moving average) of a stock, but there is something wrong with my calculations. I have exported the last 22+ days of stock data for AAPL, and when I try to calculate the EMA for this there is something wrong every time.
Here is the data for my example: https://pastebin.com/raw/2MsgCeQx
Here are the solutions that I have tried to calculate the 20 day EMA.
#Imported the data as "data".
#With Ta-lib
data["EMA20Talib"] = talib.EMA(data.uClose, timeperiod = 20)
#And with pandas
data["EMA20Pandas"] = data["uClose"].ewm(span=20, adjust = False).mean()
I here is an image of the data and the results.
https://i.imgur.com/pFtc7x8.png
As you can see the Real20EMA does not match the TA-lib or the pandas 20EMA. What am I doing wrong?
The uClose is the column Im calulating the EMA on, the "Real20EMA" is taken from tradingview (cross referenced with marketwatch to make sure its the correct one).
I noticed that the there was a similar problem on here earlier with the same problem: Pandas' EMA not matching the stock's EMA?. The problem was solved there when you sorted the index, and I have made sure that I have it correctly sorted, but alas I still get the same problem.
I want to get the same numbers as the other finance sites using some tool. Weirdly enough even those two method I tried does not even return the same result.
I suggest using Pandas TA to calculate technical indicators in python. I find it more accurate and is easier to install than TA-Lib.
Using Pandas TA, the 20 period exponential moving average is calculated like:
import pandas_ta as ta
data["EMA20"] = ta.ema(data["uClose"], length=20)
I am trying to decompose a Time Series, however my data does not have Dates, it is composed of entries taken at regular (and unknown) time intervals.
This solution is great and exactly what I want, however it assumed that my series has a datetime index, which it does not.
I can estimate the frequency parameter in this specific case, however this will need to be automated for different data, and as such I can not use the freq parameter of the seasonal_decompose function (unless there is some way to automatically calculate this) to make do for the fact that my series lacks a datetime index.
I have managed to estimate season lenght by utilizing the seasonal python package.
Using fit_seasons function and then seeing the lenght of the returned seasons.
I want to make a music (wav) visualizer in pyton.
I have code for get volume and frequency, but my output is only: ex. 440hz, 30 db.
I want to see (in one time): ex.
100hz, 5db
400hz, 20db
800hz, 30db
1600hz, 20db
4000hz, 2db
How to make it?
I would need more details to be sure, but I believe that some sort of fft algorithm would be required.
Perhaps try something from numpy.fft or from scipy fft implementations.
Both of those would need some love to convert amplitude to dB, but it seems possible.
I'm currently pumping out some histograms with matplotlib. The issue is that because of one or two outliers my whole graph is incredibly small and almost impossible to read due to having two separate histograms being plotted. The solution I am having problems with is dropping the outliers at around a 99/99.5 percentile. I have tried using:
plt.xlim([np.percentile(df,0), np.percentile(df,99.5)])
plt.xlim([df.min(),np.percentile(df,99.5)])
Seems like it should be a simple fix, but I'm missing some key information to make it happen. Any input would be much appreciated, thanks in advance.
To restrict focus to just the middle 99% of the values, you could do something like this:
trimmed_data = df[(df.Column > df.Column.quantile(0.005)) & (df.Column < df.Column.quantile(0.995))]
Then you could do your histogram on trimmed_data. Exactly how to exclude outliers is more of a stats question than a Python question, but basically the idea I was suggesting in a comment is to clean up the data set using whatever methods you can defend, and then do everything (plots, stats, etc.) on only the cleaned dataset, rather than trying to tweak each individual plot to make it look right while still having the outlier data in there.