Calculate normalized spectral entropy of time series - python
I have a time series data
c1= c(0.558642328,
0.567173803,
0.572518969,
0.579917556,
0.592155421,
0.600239837,
0.598955071,
0.608857572,
0.615442061,
0.613502347,
0.618076897,
0.626769781,
0.633930194,
0.645518577,
0.66773088,
0.68128165,
0.695552504,
0.6992836,
0.702771866,
0.700840271,
0.684032428,
0.665082645,
0.646948862,
0.621813893,
0.597888613,
0.577744126,
0.555984044,
0.533597678,
0.523645413,
0.522041142,
0.525437844,
0.53053292,
0.543152606,
0.549038792,
0.555300856,
0.563411331,
0.572663951,
0.584438777,
0.589476192,
0.604197562,
0.61670388,
0.624161184,
0.624345171,
0.629342985,
0.630379665,
0.620067096,
0.597480375,
0.576228619,
0.561285031,
0.543921304,
0.530826211,
0.519563568,
0.514228535,
0.515202665,
0.516663855,
0.525673366,
0.543545395,
0.551681638,
0.558951402,
0.566816133,
0.573842585,
0.578611696,
0.589180577,
0.603297615,
0.624550509,
0.641310155,
0.655093217,
0.668385196,
0.671600127,
0.658876967,
0.641041982,
0.605081463,
0.585503519,
0.556173635,
0.527428073,
0.502755737,
0.482510734,
0.453295642,
0.439938772,
0.428757811,
0.422361642,
0.40945864,
0.399504355,
0.412688798,
0.42684828,
0.456935656,
0.48355422,
0.513727218,
0.541630101,
0.559122121,
0.561763656,
0.572532833,
0.576761365,
0.576146233,
0.580199403,
0.584954906)
corresponding to dates
dates = seq(as.Date("2016-09-01"), as.Date("2020-07-30"), by=15)
What I want to do is compute normalized spectral entropy of this time series. I have found in literature that high value indicates high stability of a system.
I have found a function here: https://rdrr.io/cran/ForeCA/man/spectral_entropy.html, but cannot generate what I want.
New to this topic, and hence any interpretation would be helpful too.
Related
statistics program gives out values different from test sample
I wrote a program for a statistics educatory problem, simply put I was supposed to predict prices for the next 250 days, then extract the lowest and highest price from 10k tries of 250-day predictions. I followed the instructions written on the problem to use the gauss method from the random module and use the mean and std of the given sample. the highest and lowest prices in the test are in the range of 45-55 but I predict 18-88. is there a problem with my code or is it just not a good method for prediction. from random import gauss with open('AAPL_train.csv','r') as sheet: #we categorize the data here Date=[] Open=[] High=[] Low=[] Close=[] Adj_Close=[] Volume=[] for lines in sheet.readlines()[1:-1]: words=lines.strip().split(',') Date.append(words[0]) Open.append(float(words[1])) High.append(float(words[2])) Low.append(float(words[3])) Close.append(float(words[4])) Adj_Close.append(float(words[5])) Volume.append(int(words[6])) subtract=[] #find the pattern of price changing by finding the day-to-day changes for i in range(1,len(Volume)): subtract.append(Adj_Close[i]-Adj_Close[i-1]) mean=sum(subtract)/len(subtract) #find the mean and std of the change pattern accum=0 for amount in subtract: accum+= (amount-mean)**2 var=accum/len(subtract) stdev=var**0.5 worst=[] best=[] def Getwb(): #a function to predict index=Adj_Close[-1] index_lst=[] for i in range(250): index+=gauss(mean,stdev) index_lst.append(index) worst=(min(index_lst)) best=(max(index_lst)) return worst,best for i in range (10000): #try predicting 10000 times and then extract highest and lowest result x,y=Getwb() worst.append(x) best.append(y) print(min(worst)) print(max(best))
Algorithm for efficient portfolio optimization
I'm trying to find the best allocation for a portfolio based on backtesting data. As a general rule, I've divided stocks into large caps and small/mid caps and growth/value and want no more than 80% of my portfolio in large caps or 70% of my portfolio in value. I need an algorithm that will be flexible enough to use for more than two stocks. So far, what I have is (including a random class called Ticker): randomBoolean=True listOfTickers=[] listOfLargeCaps=[] listOfSmallMidCaps=[] largeCapAllocation=0 listOfValue=[] listOfGrowthBlend=[] valueAllocation=0 while randomBoolean: tickerName=input("What is the name of the ticker?") tickerCap=input("What is the cap of the ticker?") tickerAllocation=int(input("Around how much do you want to allocate in this ticker?")) tickerValue=input("Is this ticker a Value, Growth, or Blend stock?") tickerName=Ticker(tickerCap,tickerValue,tickerAllocation,tickerName) listOfTickers.append(tickerName) closer=input("Type DONE if you are finished. Type ENTER to continue entering tickers") if closer=="DONE": randomBoolean=False for ticker in listOfTickers: if ticker.cap==("Large" or "large"): listOfLargeCaps.append(ticker) else: listOfSmallMidCaps.append(ticker) if ticker.value==("Value" or "value"): listOfValue.append(ticker) else: listOfGrowthBlend.append(ticker) for largeCap in listOfLargeCaps: largeCapAllocation +=largeCap.allocation if largeCapAllocation>80: #run a function that will readjust ticker stuff and decrease allocation to large cap stocks for value in listOfValue: valueAllocation+=value.allocation if valueAllocation>70: #run a function that will readjust ticker stuff and decrease allocation to value stocks The "function" I have so far just iterates through -5 to 6 in a sort of for i in range (-5,6): ticker1AllocationPercent + i ticker2AllocationPercent - i #update the bestBalance if the new allocation is better How would I modify this algorithm to work for 3, 4, 5, etc. stocks, and how would I go about changing the allocations for the large/small-mid cap stocks and such?
As mentioned in the above answer, typically Quadratic solver is used in such problems. You can use Quadratic solver available in Pyportfolio. See this link for more details.
Filter out/omit regions with rapid slope change
I am trying to find the slopes of data sets but have many different type of sets. Ideally, I would have a relatively straight line with some noise where I can take the average slope and std. This plot can be seen below: However, sometimes some datasets will have sharp jumps (because the output data has to be resized) that interfere with the averaging and std processing. Is there a technique or method to filter out and omit these sharp spikes? I'm thinking of identifying the spikes (by their max/min peaks) and then deleting the data within a region of the spike to clean up the data. Not much code to show for these but I am using T for time, Y for the output data, and dYdT for the slope and graphing them. plt.semilogy(T,Y) plt.title('CUT QiES') plt.xlabel('time') plt.show() plt.semilogy(T,dYdT) plt.title('Slope') plt.xlabel('time') plt.show() print('average slope:', np.average(dYdT)) print('std slope:', np.std(dYdT)) Here is an example dataset for T, Y, and dYdT T = [2.34354 2.34632 2.3491 2.35188 2.35466 2.35744 2.36022 2.363 2.36578 2.36856 2.37134 2.37412 2.3769 2.37968 2.38246 2.38524 2.38802 2.3908 2.39358 2.39636 2.39914 2.40192 2.4047 2.40748 2.41026 2.41304 2.41582 2.4186 2.42138 2.42416 2.42694 2.42972 2.4325 2.43528 2.43806 2.44084 2.44362 2.4464 2.44918 2.45196 2.45474 2.45752 2.4603 2.46308 2.46586 2.46864 2.47142 2.4742 2.47698 2.47976 2.48254 2.48532 2.4881 2.49088 2.49366 2.49644 2.49922 2.502 2.50478 2.50756 2.51034 2.51312 2.5159 2.51868 2.52146 2.52424 2.52702 2.5298 2.53258 2.53536 2.53814 2.54092 2.5437 2.54648 2.54926 2.55204 2.55482 2.5576 2.56038 2.56316 2.56594 2.56872 2.5715 2.57428 2.57706 2.57984 2.58262 2.5854 2.58818 2.59096 2.59374 2.59652 2.5993 2.60208 2.60486 2.60764 2.61042 2.6132 2.61598 2.61876 2.62154 2.62432 2.6271 2.62988 2.63266 2.63544 2.63822 2.641 2.64378 2.64656 2.64934 2.65212 2.6549 2.65768 2.66046 2.66324 2.66602 2.6688 2.67158 2.67436 2.67714 2.67992 2.6827 2.68548 2.68826 2.69104 2.69382 2.6966 2.69938 2.70216 2.70494 2.70772 2.7105 2.71328 2.71606 2.71884 2.72162 2.7244 2.72718 2.72996 2.73274 2.73552 2.7383 2.74108 2.74386 2.74664 2.74942 2.7522 2.75498 2.75776 2.76054 2.76332 2.7661 2.76888 2.77166 2.77444 2.77722 2.78 2.78278 2.78556 2.78834 2.79112 2.7939 2.79668 2.79946 2.80224 2.80502 2.8078 2.81058 2.81336 2.81614 2.81892 2.8217 2.82448 2.82726 2.83004 2.83282 2.8356 2.83838 2.84116 2.84394 2.84672 2.8495 2.85228 2.85506 2.85784 2.86062 2.8634 2.86618 2.86896 2.87174 2.87452 2.8773 2.88008 2.88286 2.88564 2.88842 2.8912 2.89398 2.89676 2.89954 2.90232 2.9051 2.90788 2.91066 2.91344 2.91622 2.919 2.92178] Y = [1.4490e+24 4.1187e+24 1.1708e+25 3.3279e+25 9.4596e+25 2.6889e+26 7.6435e+26 2.1727e+27 6.1762e+27 1.7556e+28 1.6093e-01 4.5747e-01 1.3004e+00 3.6967e+00 1.0508e+01 2.9872e+01 8.4918e+01 2.4140e+02 6.8623e+02 1.9508e+03 5.5455e+03 1.5764e+04 4.4814e+04 1.2739e+05 3.6214e+05 1.0295e+06 2.9265e+06 8.3192e+06 2.3649e+07 6.7227e+07 1.9111e+08 5.4325e+08 1.5443e+09 4.3899e+09 1.2479e+10 3.5473e+10 1.0084e+11 2.8663e+11 8.1479e+11 2.3161e+12 6.5837e+12 1.8714e+13 5.3197e+13 1.5121e+14 4.2983e+14 1.2218e+15 3.4730e+15 9.8721e+15 2.8062e+16 7.9766e+16 2.2674e+17 6.4450e+17 1.8320e+18 5.2075e+18 1.4803e+19 4.2077e+19 1.1961e+20 3.3998e+20 9.6642e+20 2.7471e+21 7.8089e+21 2.2197e+22 6.3098e+22 1.7936e+23 5.0986e+23 1.4493e+24 4.1199e+24 1.1711e+25 3.3291e+25 9.4636e+25 2.6902e+26 7.6473e+26 2.1739e+27 6.1796e+27 1.7567e+28 1.6088e-01 4.5734e-01 1.3001e+00 3.6957e+00 1.0506e+01 2.9864e+01 8.4893e+01 2.4132e+02 6.8600e+02 1.9501e+03 5.5434e+03 1.5758e+04 4.4794e+04 1.2733e+05 3.6196e+05 1.0289e+06 2.9248e+06 8.3141e+06 2.3634e+07 6.7181e+07 1.9097e+08 5.4284e+08 1.5431e+09 4.3863e+09 1.2468e+10 3.5442e+10 1.0075e+11 2.8638e+11 8.1404e+11 2.3140e+12 6.5776e+12 1.8697e+13 5.3148e+13 1.5108e+14 4.2944e+14 1.2207e+15 3.4700e+15 9.8637e+15 2.8038e+16 7.9701e+16 2.2656e+17 6.4401e+17 1.8307e+18 5.2039e+18 1.4793e+19 4.2049e+19 1.1953e+20 3.3978e+20 9.6587e+20 2.7456e+21 7.8048e+21 2.2186e+22 6.3068e+22 1.7928e+23 5.0963e+23 1.4487e+24 4.1181e+24 1.1706e+25 3.3277e+25 9.4595e+25 2.6890e+26 7.6439e+26 2.1729e+27 6.1767e+27 1.7558e+28 1.6085e-01 4.5724e-01 1.2998e+00 3.6947e+00 1.0503e+01 2.9855e+01 8.4867e+01 2.4124e+02 6.8576e+02 1.9493e+03 5.5412e+03 1.5751e+04 4.4775e+04 1.2728e+05 3.6179e+05 1.0284e+06 2.9234e+06 8.3100e+06 2.3622e+07 6.7147e+07 1.9087e+08 5.4256e+08 1.5423e+09 4.3840e+09 1.2462e+10 3.5424e+10 1.0070e+11 2.8624e+11 8.1366e+11 2.3129e+12 6.5747e+12 1.8689e+13 5.3126e+13 1.5102e+14 4.2928e+14 1.2203e+15 3.4688e+15 9.8604e+15 2.8029e+16 7.9677e+16 2.2649e+17 6.4383e+17 1.8302e+18 5.2025e+18 1.4789e+19 4.2039e+19 1.1950e+20 3.3970e+20 9.6565e+20 2.7450e+21 7.8030e+21 2.2181e+22 6.3052e+22 1.7923e+23 5.0950e+23 1.4483e+24 4.1170e+24 1.1703e+25 3.3267e+25 9.4566e+25 2.6882e+26 7.6414e+26 2.1721e+27 6.1745e+27 1.7552e+28 1.6085e-01 4.5723e-01 1.2997e+00 3.6946e+00] dYdT = [ 375.78737971 375.77838714 375.80388102 375.77489158 375.78727498 375.78675618 375.79979331 375.79140766 375.80307981 375.78869648 -24051.07160929 375.80641042 375.79707901 375.81605048 375.79005109 375.82183992 375.81456769 375.81626751 375.81206469 375.82085444 375.80836087 375.80650046 375.82436377 375.80311386 375.81929146 375.82649247 375.80356916 375.81256353 375.81105553 375.81083518 375.81806689 375.79871965 375.81165219 375.80421386 375.80603732 375.80022022 375.8141218 375.77592917 375.80511723 375.7948217 375.79574124 375.78237753 375.80219314 375.77971056 375.79862682 375.78801386 375.78906206 375.78914497 375.79271626 375.78453256 375.79375471 375.78087618 375.78731038 375.78835536 375.80214704 375.7810828 375.80402049 375.77350442 375.79558632 375.79229009 375.7979493 375.78886178 375.80285205 375.79348417 375.80619385 375.79128849 375.80870862 375.79125174 375.81241695 375.80966301 375.80855141 375.80471369 375.81123673 375.80242985 375.81604229 -24051.40870014 375.81595371 375.81631899 375.80172556 375.8188996 375.7939637 375.80499941 375.80295453 375.81071013 375.81233977 375.80121497 375.80580644 375.80072988 375.79422292 375.80991615 375.79562622 375.80425607 375.8009951 375.80341171 375.79284757 375.80067561 375.79074445 375.80361218 375.78872914 375.78392619 375.80294807 375.80742564 375.7832372 375.78773547 375.79978522 375.78860014 375.78890145 375.79762286 375.80180688 375.78148795 375.79054293 375.80220477 375.79379778 375.79114431 375.79906468 375.80132293 375.79296526 375.80555123 375.7949414 375.80782426 375.78471547 375.80279891 375.80250484 375.8024821 375.80059702 375.80550314 375.79947132 375.81008995 375.80407198 375.80436789 375.80464364 375.80046381 375.79483415 375.81472546 375.80509096 375.80393624 375.80523987 375.80569415 375.79908899 375.80055332 -24051.29144699 375.80437535 375.81196702 375.78739346 375.81351451 375.78827315 375.80323568 375.79387173 375.80410925 375.79061168 375.80602519 375.78876673 375.80794699 375.80555255 375.78221186 375.78976345 375.80687988 375.79578647 375.79815551 375.79344053 375.79436049 375.79356487 375.80266523 375.78659745 375.79944765 375.79336058 375.81159827 375.78590649 375.79567215 375.7967047 375.80100764 375.79358484 375.80263853 375.80785172 375.79032673 375.80669873 375.79567722 375.79784997 375.79602616 375.80621348 375.79850067 375.80356916 375.80784654 375.79641323 375.80733162 375.79643801 375.79806206 375.80809489 375.80524264 375.80392238 375.80115114 375.80136383 375.79989791 375.79500521 375.8129336 375.79707955 375.80370071 375.79873252 375.79881131 375.80290963 375.80719684 375.79460713 375.79090002 375.80340505 375.80575394 -24051.16850348 375.79650823 375.79215864 375.80533293]
I found a way to identify the outlier points based on using the median to find the most common data point and deleting any points outside a range from the median values. The code is as follows: med = statistics.median(dYdT) #find median of dataset perc = .05 #cutoff threshold med_min = med - perc*med #min cuttof med_max = med + perc*med #max cutoff #find the locations of the sharp changes peak_loc = [] peak_loc.extend(np.where(dYdT < med_min)[0]) peak_loc.extend(np.where(dYdT > med_max)[0]) #delete the data where the peaks are located dYdT = np.delete(dYdT, peak_loc, axis=0) Y = np.delete(Y, peak_loc, axis=0) T = np.delete(T, peak_loc, axis=0) Here are plots showing how it deleted points with rapid varying slopes: Example 1 Example 2 And proof that datasets that don't need cleaning are unaffected: You can adjust the perc variable to make the cutoff move or less lax as you need.
You can use a median filter to remove outliers from a signal. More specifically you can use medfilt of scipy.signal. Here is the result of scipy.signal.medfilt(dYdT, 3):
Why pdf values are Nan with hypsecant distribution?
I looked into the question Best fit Distribution plots and I found that best fit distribution for my dataset is hypsecant distribution. When I ran the code with one part of my dataset, I got the parameters for loc and scale to be: loc=0.040593036736931196 ; scale=-0.008338984851829193 however, when I input for instance this data into hysecant.pdf with loc and scale, I get Nans as a result instead of values. Why is that? Here is my input: [ 1. 0.99745057 0.93944708 0.83424019 0.78561204 0.63211263 0.62151259 0.57883015 0.57512492 0.43878694 0.43813347 0.37548233 0.33836774 0.29610761 0.32102182 0.31378472 0.24809515 0.24638145 0.22580595 0.18480387 0.19404362 0.18919147 0.16377272 0.16954728 0.10912106 0.12407758 0.12819846 0.11673824 0.08957689 0.10353764 0.09469576 0.08336001 0.08591166 0.06309568 0.07445366 0.07062173 0.05535625 0.05682546 0.06803674 0.05217558 0.0492794 0.05403819 0.04535857 0.04562529 0.04259798 0.03830373 0.0374102 0.03217575 0.03291147 0.0288506 0.0268235 0.02467415 0.02409625 0.02486308 -0.02563436 -0.02801487 -0.02937738 -0.02948851 -0.03272476 -0.03324265 -0.03435844 -0.0383104 -0.03864602 -0.04091095 -0.04269355 -0.04428056 -0.05009069 -0.05037519 -0.05122204 -0.05770342 -0.06348465 -0.06468936 -0.06849683 -0.07477151 -0.08893675 -0.097383 -0.1033376 -0.10796748 -0.11835636 -0.13741154 -0.14920072 -0.16698451 -0.1715277 -0.20449029 -0.22241856 -0.25270058 -0.25699927 -0.26731036 -0.31098857 -0.35426224 -0.36204168 -0.44059844 -0.46754863 -0.53560093 -0.61463112 -0.65583547 -0.66378605 -0.70644849 -0.75217157 -0.92236344]
How to plot a document topic distribution in structural topic modeling R-package?
If I am using python Sklearn for LDA topic modeling, I can use the transform function to get a "document topic distribution" of the LDA-results like here: document_topic_distribution = lda_model.transform(document_term_matrix) Now I tried also the R structural topic models (stm) package and i want get the same. Is there any function in the stm package, which can produce the same thing (document topic distribution)? I have the stm-object created as follows: stm_model <- stm(documents = out$documents, vocab = out$vocab, K = number_of_topics, data = out$meta, max.em.its = 75, init.type = "Spectral" ) But i didn't find out how I can get the desired distribution out of this object. The documentation didn't really help me aswell.
As emilliman5 pointed out, your stm_model provides access to the underlying parameters of the model, as is shown in the documentation. Indeed, the theta parameter is a Number of Documents by Number of Topics matrix of topic proportions. This requires some linguistical parsing: it is an N_DOCS by N_TOPICS matrix, i.e. it has N_DOCS rows, one per document, and N_TOPICS columns, one per topic. The values are the topic proportions, i.e. if stm_model[1, ] == c(.3, .2, .5), that means Document 1 is 30% Topic 1, 20% Topic 2 and 50% Topic 3. To find out what topic dominates a document, you have to find the (column!) index of the maximum value, which can be retrieved e.g. by calling apply with MARGIN=1, which basically says "do this row-wise"; which.max simply returns the index of the maximum value: apply(stm_model$theta, MARGIN=1, FUN=which.max)