How can I calculate the time lag between two similar time series? - python
I'm trying to compute/visualize the time lag between 2 time series (I want to know the time lag between the humidity progression of outside and inside a room).
Each data point of my series was taken hourly. Plotting the 2 series together, I can clearly see a shift between them: Sorry for hiding the axis
Here are a part of my time series data. I will pack them in 2 arrays:
inside_humidity =
[11.77961297, 11.59755268, 12.28761522, 11.88797553, 11.78122077, 11.5694668,
11.70421932, 11.78122077, 11.74272005, 11.78122077, 11.69438733, 11.54126933,
11.28460592, 11.05624965, 10.9611012, 11.07527934, 11.25417308, 11.56040908,
11.6657186, 11.51171572, 11.49246536, 11.78594142, 11.22968373, 11.26840678,
11.26840678, 11.29447992, 11.25553344, 11.19711371, 11.17764047, 11.11922075,
11.04132778, 10.86996123, 10.67410607, 10.63493504, 10.74922916, 10.74922916,
10.6294765, 10.61011497, 10.59075345, 10.80373021, 11.07479154, 11.15223764,
11.19711371, 11.17764047, 11.15816723, 11.22250051, 11.22250051, 11.202915,
11.18332948, 11.16374396, 11.14415845, 11.12457293, 11.10498742, 11.14926578,
11.16896413, 11.16896413, 11.14926578, 10.8307902, 10.51742195, 10.28187137,
10.12608544, 9.98977276, 9.62267727, 9.31289289, 8.96438546, 8.77077022,
8.69332413, 8.51907042, 8.30609366, 8.38353975, 8.4513867, 8.47085994,
8.50980642, 8.52927966, 8.50980642, 8.55887037, 8.51969934, 8.48052831,
8.30425867, 8.2177078, 7.98402891, 7.92560918, 7.89950166, 7.83489682,
7.75789537, 7.5984808, 7.28426807, 7.39778913, 7.71943214, 8.01149931,
8.18276652, 8.23009255, 8.16215295, 7.93822471, 8.00350215, 7.93843482,
7.85072729, 7.49778011, 7.31782649, 7.29862668, 7.60162032, 8.29665484,
8.58797834, 8.50011383, 8.86757784, 8.76600556, 8.60491125, 8.4222628,
8.24923231, 8.14470714, 8.17351638, 8.52530093, 8.72220151, 9.26745883,
9.1580007, 8.61762692, 8.22187405, 8.43693644, 8.32414835, 8.32463974,
8.46833012, 8.55865487, 8.72647164, 9.04112806, 9.35578449, 9.59465974,
10.47339785, 11.07218093, 10.54091351, 10.56138918, 10.46099958, 10.38129168,
10.16434831, 10.10612612, 10.009246, 10.53502351, 10.8307902, 11.13420052,
11.64337309, 11.18958511, 10.49630791, 10.60856932, 10.37029108, 9.86281478,
9.64699826, 9.95341012, 10.24329812, 10.6848196, 11.47604231, 11.30505352,
10.72194974, 10.30058448, 10.05022037, 10.06318411, 9.90118897, 9.68530059,
9.47790657, 9.48585784, 9.61639418, 9.86244265, 10.29009361, 10.28297229,
10.32073088, 10.65389513, 11.09656351, 11.20188562, 11.24124169, 10.40503955,
9.74632512, 9.07606098, 8.85145589, 9.37080152, 9.65082743, 10.0707891,
10.68776091, 11.25879751, 11.0416348, 10.89558456, 10.7908258, 10.66539685,
10.7297755, 10.77571398, 10.9268264, 11.16021492, 11.60961709, 11.43827534,
11.96155427, 12.16116437, 12.80412266, 12.52540805, 11.96752965, 11.58099292]
outside_humidity =
[10.17449206, 10.4823292, 11.06818167, 10.82768699, 11.27582592, 11.4196233,
10.99393027, 11.4122507, 11.18192837, 10.87247831, 10.68664321, 10.37949651,
9.57155882, 10.86611665, 11.62547196, 11.32004266, 11.75537602, 11.51292063,
11.03107569, 10.7297755, 10.4345622, 10.61271497, 9.49271162, 10.15594248,
9.99053828, 9.80915398, 9.6452438, 10.06900573, 11.18075689, 11.8289847,
11.83334752, 11.27480708, 11.14370467, 10.88149985, 10.73930381, 10.7236597,
10.26210496, 11.01260226, 11.05428228, 11.58321342, 12.70523808, 12.5181118,
11.90023799, 11.67756426, 11.28859471, 10.86878222, 9.73984486, 10.18253902,
9.80915398, 10.50980784, 11.38673459, 11.22751685, 10.94171823, 10.56484228,
10.38220753, 10.05388847, 9.96147203, 9.90698862, 9.7732203, 9.85262125,
8.7412938, 8.88281702, 8.07919545, 8.02883587, 8.32341424, 8.07357711,
7.27302616, 6.73660684, 6.66722819, 7.29408637, 7.00046542, 6.46322019,
6.07150988, 6.00207234, 5.8818402, 6.82443881, 7.20212882, 7.52167696,
7.88857771, 8.351627, 8.36547023, 8.24802846, 8.18520693, 7.92420816,
7.64926024, 7.87944972, 7.82118727, 8.02091833, 7.93071882, 7.75789457,
7.5416447, 6.94430133, 6.65907535, 6.67454591, 7.25493614, 7.76939457,
7.55357806, 6.61479472, 7.17641357, 7.24664082, 8.62732387, 8.66913548,
8.70925667, 9.0477017, 8.24558224, 8.4330502, 8.44366397, 8.17995798,
8.1875752, 9.33296518, 9.66567041, 9.88581085, 8.95449382, 8.3587624,
9.20584448, 8.90605388, 8.87494884, 9.12694892, 8.35055177, 7.91879933,
7.78867253, 8.22800878, 9.03685287, 12.49630018, 11.11819755, 10.98869374,
10.65897176, 10.36444573, 10.052609, 10.87627021, 10.07379564, 10.02233847,
9.62022856, 11.21575473, 10.85483543, 11.67324627, 11.89234248, 11.10068132,
10.06942096, 8.50405894, 8.13168561, 8.83616476, 8.35675085, 8.33616802,
8.35675085, 9.02209801, 9.5530404, 9.44738836, 10.89645958, 11.44771721,
11.79943601, 10.7765335, 11.1453622, 10.74874776, 10.55195175, 10.34494483,
9.83813522, 11.26931785, 11.20641798, 10.51555027, 10.90808954, 11.80923545,
11.68300879, 11.60313809, 7.95163365, 7.77213815, 7.54209557, 7.30603673,
7.17842173, 8.25899805, 8.56494995, 10.44245578, 11.08542758, 11.74129079,
11.67979686, 12.94362214, 11.96285343, 11.8289847, 11.01388413, 10.6793698,
11.20662595, 11.97684701, 12.46383177, 11.34178655, 12.12477078, 12.48698059,
12.89325064, 12.07470295, 12.6777319, 10.91689448, 10.7676326, 10.66710434]
I know cross correlation is the right term to use, but after a while I still don't get the idea of using scipy.signal.correlate and numpy.correlate, because all I got is an array full of NaNs. So clearly I need some more knowledge in this area.
What I expect to achieve is probably a plot like those in the answer section of this thread How to make a correlation plot with a certain lag of two time series where I can see at how many hours the time lag is most likely.
Thank you a lot in advance!
With the given data, you can use the numpy and matplotlib modules to achieve the desired result.
so, you can do something like this:
import numpy as np
from matplotlib import pyplot as plt
x = np.array(inside_humidity)
y = np.array(outside_humidity)
fig = plt.figure()
# fit a curve of your choice
a, b = np.polyfit(inside_humidity, outside_humidity, 1)
y_fit = a * x + b
# scatter plot, and fitted plot (best fit used)
plt.scatter(inside_humidity, outside_humidity)
plt.plot(x, y_fit)
plt.show()
which gives this:
Related
Barplot with significant differences and interactions in python?
I started to use python 6 months ago and may be my question is a naive one. I would like to visualize my data and ANOVA statistics. It is common to do this using a barplot with added lines indicating significant differences and interactions. How do you make plot like this using python ? enter image description here Here is a simple dataframe, with 3 columns (A,B and the p_values already calculated with a t-test) mport pandas as pd import matplotlib.pyplot as plt import numpy as np ar = np.array([ [565.0, 81.0, 1.630947e-02], [1006.0, 311.0, 1.222740e-27], [2929.0, 1292.0, 5.559912e-12], [3365.0, 1979.0, 2.507474e-22], [2260.0, 1117.0, 1.540305e-01]]) df = pd.DataFrame(ar,columns = ['A', 'B', 'p_value']) ax = plt.subplot() # I calculate the percentage (df.iloc[:,0:2]/df.iloc[:,0:2].sum()*100).plot.bar(ax=ax) for container, p_val in zip(ax.containers,df['p_value']): labels = [f"{round(v,1)}%" if (p_val > 0.05) else f"(**)\n{round(v,1)}%" for v in container.datavalues] ax.bar_label(container,labels=labels, fontsize=10,padding=8) plt.show() Initially I just wanted to add a "**" each time a significant difference is observed between the 2 columns A & B. But the initial code above is not really working. Now I would prefer having the added lines indicating significant differences and interactions between the A&B columns. But I have no ideas how to make it happen. Regards JYK
How to iterate distance calculation for different vehicles from coordinates
I am new to coding and need help developing a Time Space Diagram (TSD) from a CSV file which I got from a VISSIM simulation as a result. A general TSD looks like this: TSD and I have a CSV which looks like this: CSV. I want to take "VEHICLE:SIMSEC" which represent the simulation time which I want it represented as the X axis on TSD, "NO" which represent the vehicle number (there are 185 different vehicles and I want to plot all 185 of them on the plot) as each of the line represented on TSD, "COORDFRONTX" which is the x coordinate of the simulation, and "COORDFRONTY" which is the y coordinate of the simulation as positions which would be the y axis on TSD. I have tried the following code but did not get the result I want. import pandas as pd import matplotlib.pyplot as mp # take data data = pd.read_csv(r"C:\Users\hk385\Desktop\VISSIM_DATA_CSV.csv") df = pd.DataFrame(data, columns=["VEHICLE:SIMSEC", "NO", "DISTTRAVTOT"]) # plot the dataframe df.plot(x="NO", y=["DISTTRAVTOT"], kind="scatter") # print bar graph mp.show() The plot came out to be uninterpretable as there were too many dots. The diagram looks like this: Time Space Diagram. So would you be able to help me or guide me to get a TSD from the CSV I have? Suggestion made by mitoRibo, The top 20 rows of the csv is the following: VEHICLE:SIMSEC,NO,LANE\LINK\NO,LANE\INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT 5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,8.42 6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86 6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29 6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73 6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18 6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64 6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1 6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56 6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04 6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51 6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,93.0 7.0,1,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,101.49 7.1,1,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,109.99 7.2,1,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,118.49 7.3,1,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,127.0 7.4,1,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,135.51 7.5,1,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,144.03 7.6,1,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,152.56 7.7,1,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,161.09 Thank you.
You can groupby and iterate through different vehicles, adding each one to your plot. I changed your example data so there were 2 different vehicles. import pandas as pd import io import matplotlib.pyplot as plt df = pd.read_csv(io.StringIO(""" VEHICLE:SIMSEC,NO,LANE_LINK_NO,LANE_INDEX,POS,POSLAT,COORDFRONTX,COORDFRONTY,COORDREARX,COORDREARY,DISTTRAVTOT 5.9,1,1,1,2.51,0.5,-1.259,-3.518,-4.85,-1.319,0 6.0,1,1,1,10.94,0.5,0.932,-4.86,-2.659,-2.661,16.86 6.1,1,1,1,19.37,0.5,3.125,-6.203,-0.466,-4.004,25.29 6.2,1,1,1,27.82,0.5,5.319,-7.547,1.728,-5.348,33.73 6.3,1,1,1,36.26,0.5,7.515,-8.892,3.924,-6.693,42.18 6.4,1,1,1,44.72,0.5,9.713,-10.238,6.122,-8.039,50.64 6.5,1,1,1,53.18,0.5,11.912,-11.585,8.321,-9.386,59.1 6.6,1,1,1,61.65,0.5,14.112,-12.933,10.521,-10.734,67.56 6.7,1,1,1,70.12,0.5,16.314,-14.282,12.724,-12.082,76.04 6.8,1,1,1,78.6,0.5,18.518,-15.632,14.927,-13.432,84.51 6.9,1,1,1,87.08,0.5,20.723,-16.982,17.132,-14.783,90 6.0,2,1,1,95.57,0.5,22.93,-18.334,19.339,-16.135,0 6.1,2,1,1,104.07,0.5,25.138,-19.687,21.547,-17.487,30 6.2,2,1,1,112.57,0.5,27.348,-21.04,23.757,-18.841,40 6.3,2,1,1,121.08,0.5,29.56,-22.395,25.969,-20.195,50 6.4,2,1,1,129.59,0.5,31.773,-23.75,28.182,-21.551,60 6.5,2,1,1,138.11,0.5,33.987,-25.107,30.396,-22.907,70 6.6,2,1,1,146.64,0.5,36.203,-26.464,32.612,-24.264,80 6.7,2,1,1,155.17,0.5,38.421,-27.822,34.83,-25.623,90 """),sep=',') fig = plt.figure() #Iterate through each vehicle, adding it to the plot for vehicle_no,vehicle_df in df.groupby('NO'): plt.plot(vehicle_df['VEHICLE:SIMSEC'],vehicle_df['DISTTRAVTOT'], label=vehicle_no) plt.legend() #comment this out if you don't want a legned plt.show() plt.close()
If you don't mind could you please try this. mp.scatter(x="NO", y=["DISTTRAVTOT"]) If still not work please attach your data for me to test from my side.
I want to detect ranges with the same numerical boundaries of a dataset using matplotlib or pandas in python 3.7
I have a ton of ranges. They all consist of numbers. The range has a maximum and a minimum which can not be exceeded, but given the example that you have two ranges and one max point of the range reaches above the min area of the other. That would mean that you have a small area that covers both of them. You can write one range that includes the others. I want to see if some ranges overlap or if I can find some ranges that cover most of the other. The goal would be to see if I can simplify them by using one smaller range that fits inside the other. For example 7,8 - 9,6 and 7,9 - 9,6 can be covered with one range. You can see my attempt to visualize them. But when I use my entire dataset consisting of hundreds of ranges my graph is not longer useful. I know that I can detect recurrent ranges using python. But I don't want to know how often a range occurs. I want to know how many ranges lay in the same numerical boundaries.I want see if I can have a couple of ranges covering all of them. Finally my goal is to have the masterranges sorted in categories. Meaning that I have range 1 covering 50 other ranges. then range 2 covering 25 ranges and so on. My current program shows the penetration of ranges but I also want that in a printed output with the exact digits. It would be nice if you share some ideas to solve that program or if you have any suggestions on tools within python 3.7 import matplotlib.pyplot as plt intervals = [[3.6,4.5], [3.6,4.5], [7.8,9.6], [7.9,9.6], [7.8,9.6], [3.4,4.1], [2.8,3.4], [8.25,9.83], [3.62,3.96], [8.25,9.83], [0.62,0.68], [2.15,2.49], [0.8,1.0], [0.8,1.0], [3.1,3.9], [6.7,8.3], [1,1.5], [1,1.2], [1.5,1.8], [1.8,2.5], [3,4.0], [6.5,8.0], [1.129,1.35], [2.82,3.38], [1.69,3.38], [3.38,6.21], [2.25,2.82], [5.649,6.214], [1.920,6.214] ] for int in intervals: plt.plot(int,[0,0], 'b', alpha = 0.2, linewidth = 100) plt.show()
Here is an idea, You make a pandas data frame with the array. You substract the values in column2 - colum1 ( column 1 is x, and column 2 is y ). After that you create a histogram in which you take the range and the frecuency. import pandas as pd import numpy as np import matplotlib.pyplot as plt intervals = [[3.6,4.5], [3.6,4.5], [7.8,9.6], [7.9,9.6], [7.8,9.6], [3.4,4.1], [2.8,3.4], [8.25,9.83], [3.62,3.96], [8.25,9.83], [0.62,0.68], [2.15,2.49], [0.8,1.0], [0.8,1.0], [3.1,3.9], [6.7,8.3], [1,1.5], [1,1.2], [1.5,1.8], [1.8,2.5], [3,4.0], [6.5,8.0], [1.129,1.35], [2.82,3.38], [1.69,3.38], [3.38,6.21], [2.25,2.82], [5.649,6.214], [1.920,6.214]] intervals_ar = np.array(intervals) df = pd.DataFrame({'Column1': intervals_ar[:, 0], 'Column2': intervals_ar[:, 1]}) df['Ranges'] = df['Column2'] - df ['Column1'] print(df) frecuency_range = df['Ranges'].value_counts().sort_index() print(frecuency_range) df.Ranges.value_counts().sort_index().plot(kind = 'hist', bins = 5) plt.title("Histogram Frecuency vs Range (column 2- column1)") plt.show()
Plot spectroscopic data from pandas dataframe in 3D with different array length
Is it possible to get something like this plot from a pandas dataframe, in a a similar fashion as I would just simply do to do 2d-plots (df.plot())? More precisely: I have data that I read from csv files into pandas DataFrames with following structure: 1st level header A B C D E F 2nd level header 2.0 1.0 0.2 0.4 0.6 0.8 Index 126.4348 -467048 -814795 301388 298430 -187654 -1903170 126.4310 -468329 -810060 304366 305343 -192035 -1881625 126.4272 -469209 -804697 305795 312472 -197013 -1854848 126.4234 -469685 -799604 305647 318936 -200957 -1827665 126.4195 -469795 -795708 304101 323922 -202192 -1805153 126.4157 -469610 -793795 301497 326780 -199323 -1791743 126.4119 -469213 -794362 298257 327092 -191547 -1790418 126.4081 -468687 -797499 294817 324717 -178875 -1802122 126.4043 -468097 -802853 291546 319800 -162225 -1825540 126.4005 -467486 -809663 288700 312745 -143334 -1857270 126.3967 -466863 -816878 286401 304170 -124505 -1892389 126.3929 -466210 -823335 284645 294827 -108228 -1925312 126.3890 -465485 -827966 283331 285520 -96733 -1950795 126.3852 -464637 -829997 282315 277018 -91559 -1964894 126.3814 -463617 -829104 281457 269965 -93242 -1965702 126.3776 -462399 -825487 280670 264824 -101170 -1953728 126.3738 -460982 -819857 279942 261819 -113660 -1931820 126.3700 -459408 -813317 279344 260927 -128242 -1904669 126.3662 -457757 -807177 279009 261885 -142112 -1877955 126.3624 -456143 -802715 279090 264233 -152667 -1857303 126.3585 -454700 -800940 279722 267380 -158023 -1847241 126.3547 -453566 -802397 280969 270692 -157406 -1850358 126.3509 -452862 -807050 282792 273579 -151350 -1866803 126.3471 -452672 -814262 285033 275591 -141627 -1894249 126.3433 -453030 -822898 287426 276486 -130942 -1928303 126.3395 -453910 -831501 289627 276273 -122426 -1963297 126.3357 -455223 -838544 291266 275222 -119021 -1993312 126.3319 -456834 -842695 292004 273824 -122882 -2013246 126.3280 -458571 -843048 291599 272725 -134907 -2019718 126.3242 -460252 -839292 289952 272620 -154497 -2011656 ... ... ... ... ... ... ... What I would like to do with that I would like to plot each of these columns (they are NMR spectra) against the index. In a 2D overlay, this is simple usage of the pandas wrapper around matplotlib. However, I would like to plot each spectrum in its own "line", along a third axis that has the second level headers as ticks. I tried to use matplotlibĀ“s 3D plotting functionality, but it seems to only be suitable if you actually have three arrays of equal length, which in the case of my data does just not make sense, because each spectrum is recorded for one of the values from the second level header. Am I maybe thinking too complicated when I try to make a 3D plot? Is the figure I would like my plot to look like maybe not an actual 3D plot but rather some special version of overlaid 2D plots? How I would prefer to do it Bonus points for: Using only python Using only pandas and matplotlib Already implemented functionality If there is no obvious python way to do it, I would as well be happy about libraries of other languages that can do the same, such as R or Octave. I am just not as familiar with these, so I would probably not be able to adapt more hacky solutions in these languages to suit my requirements. This question might be very similar, but as I understand it, it does not necessarily extend to software other than python and doesn't have an example of what the result should look like, so I am not sure if answers to that question might actually be helpful for this specific purpose. What is wrong with matplotlibĀ“s gallery examples As lanery pointed out, polygon3D from the matplotlib gallery gets close to what I wish for. However it has some drawbacks some of which are not acceptable for most scientific publications: With negative values, the whole plot gets shifted to what I would call "the middle of the screen", which looks kind of ugly, makes it hard to extract information from the figure and makes it different from the provided examples You get that interactive plot window, which requires you to find an angle from which you can see everything you need to see. That might be good for some data exploration tasks, but if you use scripts for your visualization and a minor change to the graphic would force you to do some manual work again, this decreases the advantage you expect from scripting If you have values that differ strongly and are not linear, something like [0,1,1.7,2.5,6.2], for your third dimension i.e. the second level header in this case, the 2d plots have very different distances from another, which is unacceptable, at least for any non-programming audience reading the publications It is quite long and technical for a quite common plotting operation in spectroscopy. The amount of code would be fine if I wanted to build software that can make 3D plots in some context. For science it would be preferable to be able to accomplish something like this with a low amount of code.
I gave you an example of plotting with the data from the continuous X and Y, and just hard-coded z based on your second level header. from mpl_toolkits.mplot3d import Axes3D import numpy as np import matplotlib.pyplot as plt import pandas as pd import matplotlib %matplotlib inline df = pd.read_csv("C:\Users\User\SkyDrive\Documents\import_data.tcsv.txt",header=None) fig = plt.figure() ax = fig.gca(projection='3d') # Plot a sin curve using the x and y axes. x = df[0] ax.plot(x, df[1], zs=2, zdir='z', label='A') ax.plot(x, df[2], zs=1, zdir='z', label='B') ax.plot(x, df[3], zs=0.2, zdir='z', label='C') ax.plot(x, df[4], zs=0.4, zdir='z', label='D') ax.plot(x, df[5], zs=0.6, zdir='z', label='E') ax.plot(x, df[6], zs=0.8, zdir='z', label='F') # Customize the view angle so it's easier to see that the scatter points lie # on the plane y=0 ax.view_init(elev=-150., azim=40) plt.show() Your going to have to play with the options on view_init to rotate around and get the axes where you want. I'm not really clear with what your end goal was, but this is the end plot.
what is the difference between the two datasets for numpy.fft
I am trying to find the period of a sin curve and can find the right periods for sin(t). However for sin(k*t), the frequency shifts. I do not know how it shifts. I can adjust the value of interd below to get the right signal only if I know the dataset is sin(0.6*t). Why can I get the right result for sin(t)? Anyone can detect the right signal just based on my code ? Or just a small change? The figure below is the power spectral density of sin(0.6*t). The dataset is like: 1,sin(1*0.6) 2,sin(2*0.6) 3,sin(3*0.6) ......... 2000,sin(2000*0.6) And my code: timepoints = np.loadtxt('dataset', usecols=(0,), unpack=True, delimiter=",") intensity = np.loadtxt('dataset', usecols=(1,), unpack=True, delimiter=",") binshu = 300 lastime = 2000 interd = 2000.0/300 sp = np.fft.fft(intensity) freq = np.fft.fftfreq(len(intensity),d=interd) freqnum = np.fft.fftfreq(len(intensity),d=interd).argsort() pl.xlabel("frequency(Hz)") pl.plot(freq[freqnum]*6.28, np.sqrt(sp.real**2+sp.imag**2)[freqnum])
I think you're making it too complicated. If you consider timepoints to be in seconds then interd is 1 (difference between values in timepoints). This works fine for me: import numpy as np import matplotlib.pyplot as pl # you can do this in one line, that's what 'unpack' is for: timepoints, intensity = np.loadtxt('dataset', usecols=(0,1), unpack=True, delimiter=",") interd = timepoints[1] - timepoints[0] # if this is 1, it can be ignored sp = np.fft.fft(intensity) freq = np.fft.fftfreq(len(intensity), d=interd) pl.plot(np.fft.fftshift(freq), np.fft.fftshift(np.abs(sp))) pl.xlabel("frequency(Hz)") pl.show() You'll also note that I didn't sort the frequencies, that's what fftshift is for. Also, don't do np.sqrt(sp.imag**2 + sp.real**2), that's what np.abs is for :) If you're not sampling enough (the frequency is higher than your sample rate, i.e., 2*pi/interd < 0.5*k), then there's no way for fft to know how much data you're missing, so it assumes you're not missing any. You can't expect it to know a priori. This is the data you're giving it: