vectorize for-loop to fill Pandas DataFrame - python

For a financial application, I'm trying to create a DataFrame where each row is a session date value for a particular equity. To get the data, I'm using Pandas Remote Data. So, for example, the features I'm trying to create might be the adjusted closes for the preceding 32 sessions.
This is easy to do in a for-loop, but it takes quite a long time for large features sets (like going back to 1960 on "ge" and making each row contain the preceding 256 session values). Does anyone see a good way to vectorize this code?
import pandas as pd
def featurize(equity_data, n_sessions, col_label='Adj Close'):
"""
Generate a raw (unnormalized) feature set from the input data.
The value at col_label on the given date is taken
as a feature, and each row contains values for n_sessions
"""
features = pd.DataFrame(index=equity_data.index[(n_sessions - 1):],
columns=range((-n_sessions + 1), 1))
for i in range(len(features.index)):
features.iloc[i, :] = equity_data[i:(n_sessions + i)][col_label].values
return features
I could alternatively just multi-thread this easily, but I'm guessing that pandas does that automatically if I can vectorize it. I mention that mainly because my primary concern is performance. So, if multi-threading is likely to outperform vectorization in any significant way, then I'd prefer that.
Short example of input and output:
>>> eq_data
Open High Low Close Volume Adj Close
Date
2014-01-02 15.42 15.45 15.28 15.44 31528500 14.96
2014-01-03 15.52 15.64 15.30 15.51 46122300 15.02
2014-01-06 15.72 15.76 15.52 15.58 42657600 15.09
2014-01-07 15.73 15.74 15.35 15.38 54476300 14.90
2014-01-08 15.60 15.71 15.51 15.54 48448300 15.05
2014-01-09 15.83 16.02 15.77 15.84 67836500 15.34
2014-01-10 16.01 16.11 15.94 16.07 44984000 15.57
2014-01-13 16.37 16.53 16.08 16.11 57566400 15.61
2014-01-14 16.31 16.43 16.17 16.40 44039200 15.89
2014-01-15 16.37 16.73 16.35 16.70 64118200 16.18
2014-01-16 16.67 16.76 16.56 16.73 38410800 16.21
2014-01-17 16.78 16.78 16.45 16.52 37152100 16.00
2014-01-21 16.64 16.68 16.36 16.41 35597200 15.90
2014-01-22 16.44 16.62 16.37 16.55 28741900 16.03
2014-01-23 16.49 16.53 16.31 16.43 37860800 15.92
2014-01-24 16.19 16.21 15.78 15.83 66023500 15.33
2014-01-27 15.90 15.91 15.52 15.71 51218700 15.22
2014-01-28 15.97 16.01 15.51 15.72 57677500 15.23
2014-01-29 15.48 15.53 15.20 15.26 52241500 14.90
2014-01-30 15.43 15.45 15.18 15.25 32654100 14.89
2014-01-31 15.09 15.10 14.90 14.96 64132600 14.61
>>> features = data.featurize(eq_data, 3)
>>> features
-2 -1 0
Date
2014-01-06 14.96 15.02 15.09
2014-01-07 15.02 15.09 14.9
2014-01-08 15.09 14.9 15.05
2014-01-09 14.9 15.05 15.34
2014-01-10 15.05 15.34 15.57
2014-01-13 15.34 15.57 15.61
2014-01-14 15.57 15.61 15.89
2014-01-15 15.61 15.89 16.18
2014-01-16 15.89 16.18 16.21
2014-01-17 16.18 16.21 16
2014-01-21 16.21 16 15.9
2014-01-22 16 15.9 16.03
2014-01-23 15.9 16.03 15.92
2014-01-24 16.03 15.92 15.33
2014-01-27 15.92 15.33 15.22
2014-01-28 15.33 15.22 15.23
2014-01-29 15.22 15.23 14.9
2014-01-30 15.23 14.9 14.89
2014-01-31 14.9 14.89 14.61
So each row of features is a series of 3 (n_sessions) successive values from the 'Adj Close' column of the features DataFrame.
====================
Improved version based on Primer's answer below:
def featurize(equity_data, n_sessions, column='Adj Close'):
"""
Generate a raw (unnormalized) feature set from the input data.
The value at column on the given date is taken
as a feature, and each row contains values for n_sessions
>>> timeit.timeit('data.featurize(data.get("ge", dt.date(1960, 1, 1),
dt.date(2014, 12, 31)), 256)', setup=s, number=1)
1.6771750450134277
"""
features = pd.DataFrame(index=equity_data.index[(n_sessions - 1):],
columns=map(str, range((-n_sessions + 1), 1)), dtype='float64')
values = equity_data[column].values
for i in range(n_sessions - 1):
features.iloc[:, i] = values[i:(-n_sessions + i + 1)]
features.iloc[:, n_sessions - 1] = values[(n_sessions - 1):]
return features

It looks like shift is your friend here and something like this will do:
df = pd.DataFrame({'adj close': np.random.random(10) + 15},index=pd.date_range(start='2014-01-02', periods=10, freq='B'))
df.index.name = 'date'
df
adj close
date
2014-01-02 15.650
2014-01-03 15.775
2014-01-06 15.750
2014-01-07 15.464
2014-01-08 15.966
2014-01-09 15.475
2014-01-10 15.164
2014-01-13 15.281
2014-01-14 15.568
2014-01-15 15.648
features = pd.DataFrame(data=df['adj close'], index=df.index)
features.columns = ['0']
features['-1'] = df['adj close'].shift()
features['-2'] = df['adj close'].shift(2)
features.dropna(inplace=True)
features
0 -1 -2
date
2014-01-06 15.750 15.775 15.650
2014-01-07 15.464 15.750 15.775
2014-01-08 15.966 15.464 15.750
2014-01-09 15.475 15.966 15.464
2014-01-10 15.164 15.475 15.966
2014-01-13 15.281 15.164 15.475
2014-01-14 15.568 15.281 15.164
2014-01-15 15.648 15.568 15.281

Related

How to calculate moving average using pandas for a daily frequency over 3 years

I have a large dataset and need to calculate rolling returns over 3 years for each date. I am new in pandas and not able to understand that how can I do this using pandas. Below is my sample data frame.
nav_date price
1989 2019-11-29 25.02
2338 2019-11-28 25.22
1991 2019-11-27 25.11
1988 2019-11-26 24.98
1990 2019-11-25 25.06
1978 2019-11-22 24.73
1984 2019-11-21 24.84
1985 2019-11-20 24.90
1980 2019-11-19 24.78
1971 2019-11-18 24.67
1975 2019-11-15 24.69
1970 2019-11-14 24.64
1962 2019-11-13 24.58
1977 2019-11-11 24.73
1976 2019-11-08 24.72
1987 2019-11-07 24.93
1983 2019-11-06 24.84
1979 2019-11-05 24.74
1981 2019-11-04 24.79
1974 2019-11-01 24.68
2337 2019-10-31 24.66
1966 2019-10-30 24.59
1957 2019-10-29 24.47
1924 2019-10-25 24.06
2336 2019-10-24 24.06
1929 2019-10-23 24.10
1923 2019-10-22 24.05
1940 2019-10-18 24.20
1921 2019-10-17 24.05
1890 2019-10-16 23.77
1882 2019-10-15 23.70
1868 2019-10-14 23.52
1860 2019-10-11 23.45
1846 2019-10-10 23.30
1862 2019-10-09 23.46
2335 2019-10-07 23.08
1837 2019-10-04 23.18
1863 2019-10-03 23.47
1873 2019-10-01 23.57
1894 2019-09-30 23.80
1901 2019-09-27 23.88
1916 2019-09-26 24.00
1885 2019-09-25 23.73
1919 2019-09-24 24.04
1925 2019-09-23 24.06
1856 2019-09-20 23.39
1724 2019-09-19 22.22
1773 2019-09-18 22.50
1763 2019-09-17 22.45
1811 2019-09-16 22.83
1825 2019-09-13 22.98
1806 2019-09-12 22.79
1817 2019-09-11 22.90
1812 2019-09-09 22.84
1797 2019-09-06 22.72
1777 2019-09-05 22.52
1776 2019-09-04 22.51
2334 2019-09-03 22.42
1815 2019-08-30 22.88
1798 2019-08-29 22.73
1820 2019-08-28 22.93
1830 2019-08-27 23.05
1822 2019-08-26 22.95
1770 2019-08-23 22.48
1737 2019-08-22 22.30
1794 2019-08-21 22.66
2333 2019-08-20 22.86
1821 2019-08-19 22.93
1819 2019-08-16 22.92
1814 2019-08-14 22.88
However I can do this in simple python but it takes too long to execute. In python I do like this-
start_date = '2019-10-31'
end_date = '2016-10-31' #For 3 years
years = 3
# Now look at each price for all the dates between start_date and end_date for 3 year and #calculate the CAGR and then do the average.
total_returns = 0
for n in range(int((start_date - end_date).days)):
sd = start_date - relativedelta(days=n)
ed = sd - relativedelta(years=years)
returns = (((price_dict['sd']/price_dict['ed']) ** (1 / years)) - 1) * 100
total_returns+=returns
roll_return = total_returns/int((start_date - end_date).days)
I am sure there will be something to get the same output using pandas without making too much iteration since it is getting too slow and takes too much time to execute. Thanks in advance.
You didn't show expected result... In any case, this is just an example and I think you'll understand my approach.
df = pd.DataFrame({
'nav_date': (
'2019-11-29',
'2018-11-29',
'2017-11-29',
'2016-11-29',
'2019-11-28',
'2018-11-28',
'2017-11-28',
'2016-11-28',
),
'price': (
25.02, # <- example of your price(2019-11-29)
25.11,
25.06,
26.50, # <- example of your price(2016-11-29)
30.51,
30.41,
30.31,
30.21,
),
})
df['year'] = ''
# parse year from date string
df['year'] = df['nav_date'].apply(lambda x: x[0:4])
# parse date without year
df['nav_date'] = df['nav_date'].apply(lambda x: x[5:])
# years to columns, prices to rows
df = df.pivot(index='nav_date', columns='year', values='price')
df = pd.DataFrame(df.to_records())
# value calculation by columns...
df['2019'] = ((df['2019'] / df['2016'] * (1 / 3)) - 1) * 100
# df['2018'] = blablabla...
print(df)
Result:
nav_date 2016 2017 2018 2019
0 11-28 30.21 30.31 30.41 -66.335650
1 11-29 26.50 25.06 25.11 -68.528302 # <- your expected value
So you'll have dataframe with calculated values by each day and you can easily do anything with it(avg()/max()/min()/just any manipulations).
Hope this helps.

How to plot candlestick skipping empty dates using matplotlib?

I'm still a newbie to matplotlib. Currently, I have below dataset for plotting:
Date Open High Low Close
Trade_Date
2018-01-02 736696.0 42.45 42.45 41.45 41.45
2018-01-03 736697.0 41.60 41.70 40.70 40.95
2018-01-04 736698.0 40.90 41.05 40.20 40.25
2018-01-05 736699.0 40.35 41.60 40.35 41.50
2018-01-08 736702.0 40.20 40.20 37.95 38.00
2018-01-09 736703.0 37.15 39.00 37.15 38.00
2018-01-10 736704.0 38.70 38.70 37.15 37.25
2018-01-11 736705.0 37.50 37.50 36.55 36.70
2018-01-12 736706.0 37.00 37.40 36.90 37.20
2018-01-15 736709.0 37.50 37.70 37.15 37.70
2018-01-16 736710.0 37.80 38.25 37.45 37.95
2018-01-17 736711.0 38.00 38.05 37.65 37.75
2018-01-18 736712.0 38.00 38.20 37.70 37.75
2018-01-19 736713.0 36.70 37.10 35.30 36.45
2018-01-22 736716.0 36.25 36.25 35.50 36.10
2018-01-23 736717.0 36.20 36.30 35.65 36.00
2018-01-24 736718.0 35.80 36.00 35.60 36.00
2018-01-25 736719.0 36.10 36.10 35.45 35.45
2018-01-26 736720.0 35.50 35.75 35.00 35.00
2018-01-29 736723.0 34.80 35.00 33.65 33.70
2018-01-30 736724.0 33.70 34.45 33.65 33.90
I've converted the date value to number using mdates.date2num
After that, I've tried to plot candlestick graph with codes below:
f1, ax = plt.subplots(figsize= (10,5))
candlestick_ohlc(ax, ohlc.values, width=.6, colorup='red', colordown='green')
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d'))
plt.show()
However, I'm still getting the graph with gaps.
I've tried the possible solution from How do I plot only weekdays using Python's matplotlib candlestick?
However, I was not able to solve my problem with the solution above.
Can anyone kindly help me with this issue?
Thanks!

Panda data-frame column without labels

I have following data set in panda dataframe
print data
Result:
Open High Low Close Adj Close Volume
Date
2018-05-25 12.70 12.73 12.48 12.61 12.610000 1469800
2018-05-24 12.99 13.08 12.93 12.98 12.980000 814800
2018-05-23 13.19 13.30 13.06 13.12 13.120000 1417500
2018-05-22 13.46 13.57 13.25 13.27 13.270000 1189000
2018-05-18 13.41 13.44 13.36 13.38 13.380000 986300
2018-05-17 13.19 13.42 13.19 13.40 13.400000 1056200
2018-05-16 13.01 13.14 13.01 13.12 13.120000 481300
If I just want to print single column just close it shows with the date index
print data.Low
Result:
Date
2018-05-25 12.48
2018-05-24 12.93
2018-05-23 13.06
2018-05-22 13.25
2018-05-18 13.36
2018-05-17 13.19
2018-05-16 13.01
Is there way to slice/print just the closing price. So the output will be like:
12.48
12.93
13.06
13.25
13.36
13.19
13.01
In pandas Series and DataFrame always need some index values.
Default RangeIndex is possible create by:
print data.reset_index(drop=True).Low
But if need write only values to file as column without index and with no header:
data.Low.to_csv(file, index=False, header=None)
If need convert column to list:
print data.Low.tolist()
[12.48, 12.93, 13.06, 13.25, 13.36, 13.19, 13.01]
And for 1d numpy array:
print data.Low.values
[12.48 12.93 13.06 13.25 13.36 13.19 13.01]
If want 1xM array:
print (data[['Low']].values)
[[12.48]
[12.93]
[13.06]
[13.25]
[13.36]
[13.19]
[13.01]]

Select certain dates from Pandas dataframe

I am learning how to filter dates on a Pandas data frame and need some help with the following please. This is my original data frame (from this data):
data
Out[120]:
Open High Low Last Volume NumberOfTrades BidVolume AskVolume
Timestamp
2014-03-04 09:30:00 1783.50 1784.50 1783.50 1784.50 171 17 29 142
2014-03-04 09:31:00 1784.75 1785.75 1784.50 1785.25 28 21 10 18
2014-03-04 09:32:00 1785.00 1786.50 1785.00 1786.50 81 19 4 77
2014-03-04 09:33:00 1786.00 1786.00 1785.25 1785.25 41 14 8 33
2014-03-04 09:34:00 1785.00 1785.25 1784.75 1785.25 11 8 2 9
2014-03-04 09:35:00 1785.50 1786.75 1785.50 1785.75 49 27 13 36
2014-03-04 09:36:00 1786.00 1786.00 1785.25 1785.75 12 8 3 9
2014-03-04 09:37:00 1786.00 1786.25 1785.25 1785.25 15 8 10 5
2014-03-04 09:38:00 1785.50 1785.50 1784.75 1785.25 24 17 17 7
data.dtypes
Out[118]:
Open float64
High float64
Low float64
Last float64
Volume int64
NumberOfTrades int64
BidVolume int64
AskVolume int64
dtype: object
I then resampled to 5 minute sections:
five_min = data.resample('5T').sum()
And look for the high volume days:
max_volume = five_min.Volume.at_time('9:30') > 65000
I then try to get the days high volume days as follows:
five_min.Volume = max_volume[max_volume == True]
for_high_vol = five_min.Volume.dropna()
for_high_vol
Timestamp
2014-03-21 09:30:00 True
2014-04-11 09:30:00 True
2014-04-16 09:30:00 True
2014-04-17 09:30:00 True
2014-07-18 09:30:00 True
2014-07-31 09:30:00 True
2014-09-19 09:30:00 True
2014-10-07 09:30:00 True
2014-10-10 09:30:00 True
2014-10-14 09:30:00 True
2014-10-15 09:30:00 True
2014-10-16 09:30:00 True
2014-10-17 09:30:00 True
I would like to use the index from "for_high_vol" to select all of the days from the original "data" Pandas dataframe.
Im sure there are much better was to approach this so can someone please show me the simplest way to do this?
IIUC, you can do it this way:
x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
&
(x.Timestamp.dt.hour==9)
&
(x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)]
in order to set index back:
x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
&
(x.Timestamp.dt.hour==9)
&
(x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')
PS Timestamp is a regular column in my DF, not an index
Explanation:
resample / group our DF by 5 minutes interval, calculate the sum of Volume for each group and assign this sum to all rows in the group. For example in the example below 332 - is the sum of Volume in the first 5-min group
In [41]: (x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum')).head(10)
Out[41]:
0 332
1 332
2 332
3 332
4 332
5 113
6 113
7 113
8 113
9 113
dtype: int64
filter time - the conditions are self-explanatory:
(x.Timestamp.dt.hour==9) & (x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')
and finally combine all conditions (filters) together - pass it to .ix[] indexer and set index back to Timestamp:
x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
&
(x.Timestamp.dt.hour==9)
&
(x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')
Output:
Out[32]:
Timestamp Open High Low Last Volume NumberOfTrades BidVolume AskVolume
5011 2014-03-21 09:30:00 1800.75 1802.50 1800.00 1802.25 30181 6006 13449 16732
5012 2014-03-21 09:31:00 1802.50 1803.25 1802.25 1802.50 15588 3947 5782 9806
5013 2014-03-21 09:32:00 1802.50 1803.75 1802.25 1803.25 16409 3994 6867 9542
5014 2014-03-21 09:33:00 1803.00 1803.50 1802.75 1803.25 10790 3158 4781 6009
5015 2014-03-21 09:34:00 1803.25 1804.75 1803.25 1804.75 13377 3466 4690 8687
11086 2014-04-11 09:30:00 1744.75 1744.75 1743.00 1743.50 21504 5876 11178 10326
11087 2014-04-11 09:31:00 1743.50 1746.50 1743.25 1746.00 21582 6191 8830 12752
11088 2014-04-11 09:32:00 1746.00 1746.50 1744.25 1745.75 18961 5214 9521 9440
11089 2014-04-11 09:33:00 1746.00 1746.25 1744.00 1744.25 12832 3658 7219 5613
11090 2014-04-11 09:34:00 1744.25 1744.25 1742.00 1742.75 15478 4919 8912 6566
12301 2014-04-16 09:30:00 1777.50 1778.25 1776.25 1777.00 21178 5431 10775 10403
12302 2014-04-16 09:31:00 1776.75 1779.25 1776.50 1778.50 16456 4400 6351 10105
12303 2014-04-16 09:32:00 1778.50 1779.25 1777.25 1777.50 9956 3015 5810 4146
12304 2014-04-16 09:33:00 1777.50 1778.00 1776.25 1776.25 8724 2470 5326 3398
12305 2014-04-16 09:34:00 1776.25 1777.00 1775.50 1776.25 9566 2968 5098 4468
12706 2014-04-17 09:30:00 1781.50 1782.50 1781.25 1782.25 16474 4583 7510 8964
12707 2014-04-17 09:31:00 1782.25 1782.50 1781.00 1781.25 10328 2587 6310 4018
12708 2014-04-17 09:32:00 1781.25 1782.25 1781.00 1781.25 9072 2142 4618 4454
12709 2014-04-17 09:33:00 1781.00 1781.75 1780.25 1781.25 17866 3807 10665 7201
12710 2014-04-17 09:34:00 1781.50 1782.25 1780.50 1781.75 11322 2523 5538 5784
38454 2014-07-18 09:30:00 1893.50 1893.75 1892.50 1893.00 24864 5135 13874 10990
38455 2014-07-18 09:31:00 1892.75 1893.50 1892.75 1892.75 8003 1751 3571 4432
38456 2014-07-18 09:32:00 1893.00 1893.50 1892.75 1893.50 7062 1680 3454 3608
38457 2014-07-18 09:33:00 1893.25 1894.25 1893.00 1894.25 10581 1955 3925 6656
38458 2014-07-18 09:34:00 1894.25 1895.25 1894.00 1895.25 15309 3347 5516 9793
42099 2014-07-31 09:30:00 1886.25 1886.25 1884.25 1884.75 21668 5857 11910 9758
42100 2014-07-31 09:31:00 1884.50 1884.75 1882.25 1883.00 17487 5186 11403 6084
42101 2014-07-31 09:32:00 1883.00 1884.50 1882.50 1884.00 13174 3782 4791 8383
42102 2014-07-31 09:33:00 1884.25 1884.50 1883.00 1883.25 9095 2814 5299 3796
42103 2014-07-31 09:34:00 1883.25 1884.25 1883.00 1884.25 7593 2528 3794 3799
... ... ... ... ... ... ... ... ... ...
193508 2016-01-21 09:30:00 1838.00 1838.75 1833.00 1834.00 22299 9699 12666 9633
193509 2016-01-21 09:31:00 1834.00 1836.50 1833.00 1834.50 8851 4520 4010 4841
193510 2016-01-21 09:32:00 1834.25 1835.25 1832.50 1833.25 7957 3672 3582 4375
193511 2016-01-21 09:33:00 1833.00 1838.50 1832.00 1838.00 12902 5564 5174 7728
193512 2016-01-21 09:34:00 1838.00 1841.50 1837.75 1840.50 13991 6130 6799 7192
199178 2016-02-10 09:30:00 1840.00 1841.75 1839.00 1840.75 13683 5080 6743 6940
199179 2016-02-10 09:31:00 1840.75 1842.00 1838.75 1841.50 11753 4623 5616 6137
199180 2016-02-10 09:32:00 1841.50 1844.75 1840.75 1843.00 16402 6818 8226 8176
199181 2016-02-10 09:33:00 1843.00 1843.50 1841.00 1842.00 14963 5402 8431 6532
199182 2016-02-10 09:34:00 1842.25 1843.50 1840.00 1840.00 8397 3475 4537 3860
200603 2016-02-16 09:30:00 1864.00 1866.25 1863.50 1864.75 19585 6865 9548 10037
200604 2016-02-16 09:31:00 1865.00 1865.50 1863.75 1864.25 16604 5936 8095 8509
200605 2016-02-16 09:32:00 1864.25 1864.75 1862.75 1863.50 10126 4713 5591 4535
200606 2016-02-16 09:33:00 1863.25 1863.75 1861.50 1862.25 9648 3786 5824 3824
200607 2016-02-16 09:34:00 1862.25 1863.50 1861.75 1862.25 10748 4143 5413 5335
205058 2016-03-02 09:30:00 1952.75 1954.25 1952.00 1952.75 19812 6684 10350 9462
205059 2016-03-02 09:31:00 1952.75 1954.50 1952.25 1953.50 10163 4236 3884 6279
205060 2016-03-02 09:32:00 1953.50 1954.75 1952.25 1952.50 15771 5519 8135 7636
205061 2016-03-02 09:33:00 1952.75 1954.50 1952.50 1953.75 9556 3583 3768 5788
205062 2016-03-02 09:34:00 1953.75 1954.75 1952.25 1952.50 11898 4463 6459 5439
209918 2016-03-18 09:30:00 2027.50 2028.25 2026.50 2028.00 38092 8644 17434 20658
209919 2016-03-18 09:31:00 2028.00 2028.25 2026.75 2027.25 11631 3209 6384 5247
209920 2016-03-18 09:32:00 2027.25 2027.75 2027.00 2027.50 9664 3270 5080 4584
209921 2016-03-18 09:33:00 2027.50 2027.75 2026.75 2026.75 10610 3117 5358 5252
209922 2016-03-18 09:34:00 2026.75 2027.00 2026.00 2026.50 8076 3022 4670 3406
227722 2016-05-20 09:30:00 2034.25 2035.25 2033.50 2034.50 30272 7815 16098 14174
227723 2016-05-20 09:31:00 2034.75 2035.75 2034.50 2035.50 12997 3690 6458 6539
227724 2016-05-20 09:32:00 2035.50 2037.50 2035.50 2037.25 12661 3864 5233 7428
227725 2016-05-20 09:33:00 2037.25 2037.75 2036.50 2037.00 9057 2524 5190 3867
227726 2016-05-20 09:34:00 2037.00 2037.50 2036.75 2037.00 5190 1620 2748 2442
[255 rows x 9 columns]

Filter a timeseries with some predefined dates in Pandas

I have this code :
close[close['Datetime'].isin(datefilter)] #Only date in the range
close1='Close' ; start='12/18/2015 00:00:00';
end='3/1/2016 00:00:00'; freq='1d0h00min';
datefilter= pd.date_range(start=start, end=end, freq= freq).values
But, strangely, some columns are given back with Nan:
Datetime ENTA KITE BSTC SAGE AGEN MGNX ESPR FPRX
2015-12-18 31.73 63.38 16.34 56.88 12.24 NaN NaN 38.72
2015-12-21 32.04 63.60 16.26 56.75 12.18 NaN NaN 42.52
Just wondering the reasons, and how can we remedy ?
Original :
Datetime ENTA KITE BSTC SAGE AGEN MGNX ESPR FPRX
0 2013-03-21 17.18 29.0 20.75 30.1 11.52 11.52 38.72
1 2013-03-22 16.81 30.53 21.25 30.0 11.64 11.52 39.42
2 2013-03-25 16.83 32.15 20.8 27.59 11.7 11.52 42.52
3 2013-03-26 17.09 29.55 20.6 27.5 11.76 11.52 11.52
EDIT:
it seems related to the datetime hh:mm:ss filtering.

Categories

Resources