Select certain dates from Pandas dataframe - python

I am learning how to filter dates on a Pandas data frame and need some help with the following please. This is my original data frame (from this data):
data
Out[120]:
Open High Low Last Volume NumberOfTrades BidVolume AskVolume
Timestamp
2014-03-04 09:30:00 1783.50 1784.50 1783.50 1784.50 171 17 29 142
2014-03-04 09:31:00 1784.75 1785.75 1784.50 1785.25 28 21 10 18
2014-03-04 09:32:00 1785.00 1786.50 1785.00 1786.50 81 19 4 77
2014-03-04 09:33:00 1786.00 1786.00 1785.25 1785.25 41 14 8 33
2014-03-04 09:34:00 1785.00 1785.25 1784.75 1785.25 11 8 2 9
2014-03-04 09:35:00 1785.50 1786.75 1785.50 1785.75 49 27 13 36
2014-03-04 09:36:00 1786.00 1786.00 1785.25 1785.75 12 8 3 9
2014-03-04 09:37:00 1786.00 1786.25 1785.25 1785.25 15 8 10 5
2014-03-04 09:38:00 1785.50 1785.50 1784.75 1785.25 24 17 17 7
data.dtypes
Out[118]:
Open float64
High float64
Low float64
Last float64
Volume int64
NumberOfTrades int64
BidVolume int64
AskVolume int64
dtype: object
I then resampled to 5 minute sections:
five_min = data.resample('5T').sum()
And look for the high volume days:
max_volume = five_min.Volume.at_time('9:30') > 65000
I then try to get the days high volume days as follows:
five_min.Volume = max_volume[max_volume == True]
for_high_vol = five_min.Volume.dropna()
for_high_vol
Timestamp
2014-03-21 09:30:00 True
2014-04-11 09:30:00 True
2014-04-16 09:30:00 True
2014-04-17 09:30:00 True
2014-07-18 09:30:00 True
2014-07-31 09:30:00 True
2014-09-19 09:30:00 True
2014-10-07 09:30:00 True
2014-10-10 09:30:00 True
2014-10-14 09:30:00 True
2014-10-15 09:30:00 True
2014-10-16 09:30:00 True
2014-10-17 09:30:00 True
I would like to use the index from "for_high_vol" to select all of the days from the original "data" Pandas dataframe.
Im sure there are much better was to approach this so can someone please show me the simplest way to do this?

IIUC, you can do it this way:
x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
&
(x.Timestamp.dt.hour==9)
&
(x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)]
in order to set index back:
x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
&
(x.Timestamp.dt.hour==9)
&
(x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')
PS Timestamp is a regular column in my DF, not an index
Explanation:
resample / group our DF by 5 minutes interval, calculate the sum of Volume for each group and assign this sum to all rows in the group. For example in the example below 332 - is the sum of Volume in the first 5-min group
In [41]: (x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum')).head(10)
Out[41]:
0 332
1 332
2 332
3 332
4 332
5 113
6 113
7 113
8 113
9 113
dtype: int64
filter time - the conditions are self-explanatory:
(x.Timestamp.dt.hour==9) & (x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')
and finally combine all conditions (filters) together - pass it to .ix[] indexer and set index back to Timestamp:
x.ix[(x.groupby(pd.Grouper(key='Timestamp', freq='5T'))['Volume'].transform('sum') > 65000)
&
(x.Timestamp.dt.hour==9)
&
(x.Timestamp.dt.minute>=30) & (x.Timestamp.dt.minute<=34)].set_index('Timestamp')
Output:
Out[32]:
Timestamp Open High Low Last Volume NumberOfTrades BidVolume AskVolume
5011 2014-03-21 09:30:00 1800.75 1802.50 1800.00 1802.25 30181 6006 13449 16732
5012 2014-03-21 09:31:00 1802.50 1803.25 1802.25 1802.50 15588 3947 5782 9806
5013 2014-03-21 09:32:00 1802.50 1803.75 1802.25 1803.25 16409 3994 6867 9542
5014 2014-03-21 09:33:00 1803.00 1803.50 1802.75 1803.25 10790 3158 4781 6009
5015 2014-03-21 09:34:00 1803.25 1804.75 1803.25 1804.75 13377 3466 4690 8687
11086 2014-04-11 09:30:00 1744.75 1744.75 1743.00 1743.50 21504 5876 11178 10326
11087 2014-04-11 09:31:00 1743.50 1746.50 1743.25 1746.00 21582 6191 8830 12752
11088 2014-04-11 09:32:00 1746.00 1746.50 1744.25 1745.75 18961 5214 9521 9440
11089 2014-04-11 09:33:00 1746.00 1746.25 1744.00 1744.25 12832 3658 7219 5613
11090 2014-04-11 09:34:00 1744.25 1744.25 1742.00 1742.75 15478 4919 8912 6566
12301 2014-04-16 09:30:00 1777.50 1778.25 1776.25 1777.00 21178 5431 10775 10403
12302 2014-04-16 09:31:00 1776.75 1779.25 1776.50 1778.50 16456 4400 6351 10105
12303 2014-04-16 09:32:00 1778.50 1779.25 1777.25 1777.50 9956 3015 5810 4146
12304 2014-04-16 09:33:00 1777.50 1778.00 1776.25 1776.25 8724 2470 5326 3398
12305 2014-04-16 09:34:00 1776.25 1777.00 1775.50 1776.25 9566 2968 5098 4468
12706 2014-04-17 09:30:00 1781.50 1782.50 1781.25 1782.25 16474 4583 7510 8964
12707 2014-04-17 09:31:00 1782.25 1782.50 1781.00 1781.25 10328 2587 6310 4018
12708 2014-04-17 09:32:00 1781.25 1782.25 1781.00 1781.25 9072 2142 4618 4454
12709 2014-04-17 09:33:00 1781.00 1781.75 1780.25 1781.25 17866 3807 10665 7201
12710 2014-04-17 09:34:00 1781.50 1782.25 1780.50 1781.75 11322 2523 5538 5784
38454 2014-07-18 09:30:00 1893.50 1893.75 1892.50 1893.00 24864 5135 13874 10990
38455 2014-07-18 09:31:00 1892.75 1893.50 1892.75 1892.75 8003 1751 3571 4432
38456 2014-07-18 09:32:00 1893.00 1893.50 1892.75 1893.50 7062 1680 3454 3608
38457 2014-07-18 09:33:00 1893.25 1894.25 1893.00 1894.25 10581 1955 3925 6656
38458 2014-07-18 09:34:00 1894.25 1895.25 1894.00 1895.25 15309 3347 5516 9793
42099 2014-07-31 09:30:00 1886.25 1886.25 1884.25 1884.75 21668 5857 11910 9758
42100 2014-07-31 09:31:00 1884.50 1884.75 1882.25 1883.00 17487 5186 11403 6084
42101 2014-07-31 09:32:00 1883.00 1884.50 1882.50 1884.00 13174 3782 4791 8383
42102 2014-07-31 09:33:00 1884.25 1884.50 1883.00 1883.25 9095 2814 5299 3796
42103 2014-07-31 09:34:00 1883.25 1884.25 1883.00 1884.25 7593 2528 3794 3799
... ... ... ... ... ... ... ... ... ...
193508 2016-01-21 09:30:00 1838.00 1838.75 1833.00 1834.00 22299 9699 12666 9633
193509 2016-01-21 09:31:00 1834.00 1836.50 1833.00 1834.50 8851 4520 4010 4841
193510 2016-01-21 09:32:00 1834.25 1835.25 1832.50 1833.25 7957 3672 3582 4375
193511 2016-01-21 09:33:00 1833.00 1838.50 1832.00 1838.00 12902 5564 5174 7728
193512 2016-01-21 09:34:00 1838.00 1841.50 1837.75 1840.50 13991 6130 6799 7192
199178 2016-02-10 09:30:00 1840.00 1841.75 1839.00 1840.75 13683 5080 6743 6940
199179 2016-02-10 09:31:00 1840.75 1842.00 1838.75 1841.50 11753 4623 5616 6137
199180 2016-02-10 09:32:00 1841.50 1844.75 1840.75 1843.00 16402 6818 8226 8176
199181 2016-02-10 09:33:00 1843.00 1843.50 1841.00 1842.00 14963 5402 8431 6532
199182 2016-02-10 09:34:00 1842.25 1843.50 1840.00 1840.00 8397 3475 4537 3860
200603 2016-02-16 09:30:00 1864.00 1866.25 1863.50 1864.75 19585 6865 9548 10037
200604 2016-02-16 09:31:00 1865.00 1865.50 1863.75 1864.25 16604 5936 8095 8509
200605 2016-02-16 09:32:00 1864.25 1864.75 1862.75 1863.50 10126 4713 5591 4535
200606 2016-02-16 09:33:00 1863.25 1863.75 1861.50 1862.25 9648 3786 5824 3824
200607 2016-02-16 09:34:00 1862.25 1863.50 1861.75 1862.25 10748 4143 5413 5335
205058 2016-03-02 09:30:00 1952.75 1954.25 1952.00 1952.75 19812 6684 10350 9462
205059 2016-03-02 09:31:00 1952.75 1954.50 1952.25 1953.50 10163 4236 3884 6279
205060 2016-03-02 09:32:00 1953.50 1954.75 1952.25 1952.50 15771 5519 8135 7636
205061 2016-03-02 09:33:00 1952.75 1954.50 1952.50 1953.75 9556 3583 3768 5788
205062 2016-03-02 09:34:00 1953.75 1954.75 1952.25 1952.50 11898 4463 6459 5439
209918 2016-03-18 09:30:00 2027.50 2028.25 2026.50 2028.00 38092 8644 17434 20658
209919 2016-03-18 09:31:00 2028.00 2028.25 2026.75 2027.25 11631 3209 6384 5247
209920 2016-03-18 09:32:00 2027.25 2027.75 2027.00 2027.50 9664 3270 5080 4584
209921 2016-03-18 09:33:00 2027.50 2027.75 2026.75 2026.75 10610 3117 5358 5252
209922 2016-03-18 09:34:00 2026.75 2027.00 2026.00 2026.50 8076 3022 4670 3406
227722 2016-05-20 09:30:00 2034.25 2035.25 2033.50 2034.50 30272 7815 16098 14174
227723 2016-05-20 09:31:00 2034.75 2035.75 2034.50 2035.50 12997 3690 6458 6539
227724 2016-05-20 09:32:00 2035.50 2037.50 2035.50 2037.25 12661 3864 5233 7428
227725 2016-05-20 09:33:00 2037.25 2037.75 2036.50 2037.00 9057 2524 5190 3867
227726 2016-05-20 09:34:00 2037.00 2037.50 2036.75 2037.00 5190 1620 2748 2442
[255 rows x 9 columns]

Related

From hours to String

I have this df:
Index Dates
0 2017-01-01 23:30:00
1 2017-01-12 22:30:00
2 2017-01-20 13:35:00
3 2017-01-21 14:25:00
4 2017-01-28 22:30:00
5 2017-08-01 13:00:00
6 2017-09-26 09:39:00
7 2017-10-08 06:40:00
8 2017-10-04 07:30:00
9 2017-12-13 07:40:00
10 2017-12-31 14:55:00
The purpose was that between the time ranges 5:00 to 11:59 a new df would be created with data that would say: morning. To achieve this I converted those hours to booleans:
hour_morning=(pd.to_datetime(df['Dates']).dt.strftime('%H:%M:%S').between('05:00:00','11:59:00'))
and then passed them to a list with "morning" str
text_morning=[str('morning') for x in hour_morning if x==True]
I have the error in the last line because it only returns ´morning´ string values, it is as if the 'X' ignored the 'if' condition. Why is this happening and how do i fix it?
Do
text_morning=[str('morning') if x==True else 'not_morning' for x in hour_morning ]
You can also use np.where:
text_morning = np.where(hour_morning, 'morning', 'not morning')
Given:
Dates values
0 2017-01-01 23:30:00 0
1 2017-01-12 22:30:00 1
2 2017-01-20 13:35:00 2
3 2017-01-21 14:25:00 3
4 2017-01-28 22:30:00 4
5 2017-08-01 13:00:00 5
6 2017-09-26 09:39:00 6
7 2017-10-08 06:40:00 7
8 2017-10-04 07:30:00 8
9 2017-12-13 07:40:00 9
10 2017-12-31 14:55:00 10
Doing:
# df.Dates = pd.to_datetime(df.Dates)
df = df.set_index("Dates")
Now we can use pd.DataFrame.between_time:
new_df = df.between_time('05:00:00','11:59:00')
print(new_df)
Output:
values
Dates
2017-09-26 09:39:00 6
2017-10-08 06:40:00 7
2017-10-04 07:30:00 8
2017-12-13 07:40:00 9
Or use it to update the original dataframe:
df.loc[df.between_time('05:00:00','11:59:00').index, 'morning'] = 'morning'
# Output:
values morning
Dates
2017-01-01 23:30:00 0 NaN
2017-01-12 22:30:00 1 NaN
2017-01-20 13:35:00 2 NaN
2017-01-21 14:25:00 3 NaN
2017-01-28 22:30:00 4 NaN
2017-08-01 13:00:00 5 NaN
2017-09-26 09:39:00 6 morning
2017-10-08 06:40:00 7 morning
2017-10-04 07:30:00 8 morning
2017-12-13 07:40:00 9 morning
2017-12-31 14:55:00 10 NaN

Calculate time difference with mean in pandas dataframe?

Let's take any two date time columns and I wanna calculate the below formula inorder to get the mean values.
mean(24*(closed_time - created_time ))
In excel, I tried by applying the same logic, and getting the below value,
closed time created date mean(24*(closed_time - created_time ))
5/14/2022 8:35 5/11/2022 1:08 79.45
5/14/2022 8:12 5/13/2022 8:45 23.45
5/14/2022 8:34 5/13/2022 11:47 20.78333333
5/11/2022 11:21 5/9/2022 16:43 42.63333333
5/11/2022 11:30 5/8/2022 19:51 63.65
5/11/2022 11:22 5/6/2022 16:45 114.6166667
5/11/2022 11:25 5/9/2022 19:53 39.53333333
5/11/2022 11:28 5/9/2022 10:52 48.6
Any help would be appreciatable!!
Not sure about mean, in sample data got same ouput by subtracted columns with convert seconds to hours:
cols = ['closed time','created date']
df[cols] = df[cols].apply(pd.to_datetime)
df['mean1'] = df['closed time'].sub(df['created date']).dt.total_seconds().div(3600)
print (df)
closed time created date mean mean1
0 2022-05-14 08:35:00 2022-05-11 01:08:00 79.450000 79.450000
1 2022-05-14 08:12:00 2022-05-13 08:45:00 23.450000 23.450000
2 2022-05-14 08:34:00 2022-05-13 11:47:00 20.783333 20.783333
3 2022-05-11 11:21:00 2022-05-09 16:43:00 42.633333 42.633333
4 2022-05-11 11:30:00 2022-05-08 19:51:00 63.650000 63.650000
5 2022-05-11 11:22:00 2022-05-06 16:45:00 114.616667 114.616667
6 2022-05-11 11:25:00 2022-05-09 19:53:00 39.533333 39.533333
7 2022-05-11 11:28:00 2022-05-09 10:52:00 48.600000 48.600000
Mean of both datetimes is count by:
df['mean']=pd.to_datetime(df[['closed time','created date']].astype(np.int64).mean(axis=1))
print (df)
closed time created date mean
0 2022-05-14 08:35:00 2022-05-11 01:08:00 2022-05-12 16:51:30
1 2022-05-14 08:12:00 2022-05-13 08:45:00 2022-05-13 20:28:30
2 2022-05-14 08:34:00 2022-05-13 11:47:00 2022-05-13 22:10:30
3 2022-05-11 11:21:00 2022-05-09 16:43:00 2022-05-10 14:02:00
4 2022-05-11 11:30:00 2022-05-08 19:51:00 2022-05-10 03:40:30
5 2022-05-11 11:22:00 2022-05-06 16:45:00 2022-05-09 02:03:30
6 2022-05-11 11:25:00 2022-05-09 19:53:00 2022-05-10 15:39:00
7 2022-05-11 11:28:00 2022-05-09 10:52:00 2022-05-10 11:10:00

Trying to insert the values of one Pandas Dataframe into another Dataframe using DateTimeIndex

can't figure this one out. I want to merge the values of one Dataframe into the other Dataframe, using DateTimeIndex, but I can't seem to make it work. Here are the (printed) dataframes:
open high low close volume trade_count vwap minutes
timestamp
2022-04-11 04:00:00-04:00 17.95 18.13 17.89 17.95 7107 184 17.963452 0
2022-04-11 04:01:00-04:00 17.90 17.94 17.84 17.84 2978 71 17.895107 1
2022-04-11 04:02:00-04:00 17.90 17.94 17.66 17.70 4495 67 17.717039 2
2022-04-11 04:03:00-04:00 17.90 17.94 17.66 17.84 3795 58 17.764274 3
2022-04-11 04:04:00-04:00 17.90 17.94 17.66 17.80 3912 55 17.758436 4
... ... ... ... ... ... ... ... ...
2022-04-11 19:55:00-04:00 18.37 18.44 18.30 18.34 7957 31 18.327004 55
2022-04-11 19:56:00-04:00 18.37 18.44 18.30 18.35 5361 6 18.340563 56
2022-04-11 19:57:00-04:00 18.37 18.44 18.30 18.36 1250 16 18.346664 57
2022-04-11 19:58:00-04:00 18.37 18.44 18.30 18.37 2524 30 18.366807 58
2022-04-11 19:59:00-04:00 18.37 18.44 18.30 18.43 3305 41 18.409014 59
[790 rows x 8 columns]
open high low close volume trade_count vwap
timestamp
2022-04-08 04:00:00-04:00 19.69 19.88 19.69 19.82 8246 157 19.780987
2022-04-08 04:15:00-04:00 19.82 19.84 19.77 19.77 2995 73 19.804855
2022-04-08 04:30:00-04:00 19.80 19.80 19.77 19.80 2630 42 19.794878
2022-04-08 04:45:00-04:00 19.79 19.80 19.79 19.79 2294 23 19.793871
2022-04-08 05:00:00-04:00 19.80 19.81 19.80 19.81 888 15 19.805281
... ... ... ... ... ... ... ...
2022-04-11 18:45:00-04:00 18.40 18.42 18.31 18.34 21587 112 18.368550
2022-04-11 19:00:00-04:00 18.39 18.67 18.35 18.39 26144 72 18.388739
2022-04-11 19:15:00-04:00 18.39 18.39 18.30 18.35 46662 128 18.340306
2022-04-11 19:30:00-04:00 18.33 18.44 18.33 18.44 10784 61 18.351895
2022-04-11 19:45:00-04:00 18.42 18.43 18.30 18.43 24923 163 18.356868
[128 rows x 7 columns]
help!!

Pandas - How to get get sum of rows by multiple columns in a DataFrame

I have the following Pandas DataFrame object df, which denotes incidents that occurred between 2000-07-01 to 2018-03-31. Each row represents an incident that occurred on that particular date. FID_1 is the index column and can be used to uniquely identify each row of incident. The ICC_NAME column contains 33 unique values for where it occurred.
comb_date ICC_NAME
FID_1
267 2000-09-18 09:49:00 Alexandra
462 2000-10-19 01:00:00 Alexandra
696 2000-11-26 15:08:00 Alexandra
734 2000-11-27 19:20:00 Alexandra
760 2000-11-28 20:00:00 Alexandra
761 2000-11-28 20:30:00 Alexandra
945 2000-05-12 12:37:00 Alexandra
1242 2000-12-12 14:35:00 Alexandra
1440 2000-12-16 06:45:00 Alexandra
1523 2000-12-17 12:55:00 Alexandra
1701 2000-12-19 18:40:00 Alexandra
1899 2000-12-26 11:42:00 Alexandra
1963 2000-12-29 09:43:00 Alexandra
1975 2000-12-29 15:54:00 Alexandra
2004 2000-12-30 13:26:00 Alexandra
2044 2000-12-31 13:18:00 Alexandra
2100 2001-01-01 00:06:00 Alexandra
2202 2001-02-01 13:34:00 Alexandra
2826 2001-11-01 13:32:00 Alexandra
2991 2001-01-15 10:55:00 Alexandra
3175 2001-01-20 11:18:00 Alexandra
3176 2001-01-20 11:35:00 Alexandra
3212 2001-01-20 22:55:00 Alexandra
3371 2001-01-26 14:25:00 Alexandra
3386 2001-01-26 19:05:00 Alexandra
3395 2001-01-27 13:20:00 Alexandra
3432 2001-01-28 18:03:00 Alexandra
3701 2001-06-02 18:29:00 Alexandra
3881 2001-02-14 10:00:00 Alexandra
4131 2001-02-21 17:48:00 Alexandra
... ... ...
... ... ...
... ... Boort
... ... Boort
... ... ...
... ... ...
96968 2018-01-25 17:27:00 Woori Yallock
96983 2018-01-25 19:04:00 Woori Yallock
96995 2018-01-26 00:03:00 Woori Yallock
97002 2018-01-26 09:39:00 Woori Yallock
97105 2018-01-28 11:12:00 Woori Yallock
97143 2018-01-29 14:42:00 Woori Yallock
97144 2018-01-29 15:00:00 Woori Yallock
97160 2018-01-30 21:54:00 Woori Yallock
97249 2018-06-02 22:40:00 Woori Yallock
97314 2018-11-02 12:38:00 Woori Yallock
97361 2018-02-13 16:49:00 Woori Yallock
97362 2018-02-13 16:55:00 Woori Yallock
97368 2018-02-14 05:48:00 Woori Yallock
97446 2018-02-18 11:17:00 Woori Yallock
97475 2018-02-19 18:52:00 Woori Yallock
97485 2018-02-20 15:42:00 Woori Yallock
97496 2018-02-20 22:19:00 Woori Yallock
97514 2018-02-22 14:47:00 Woori Yallock
97563 2018-02-25 20:37:00 Woori Yallock
97641 2018-02-28 17:19:00 Woori Yallock
97642 2018-02-28 17:45:00 Woori Yallock
97769 2018-07-03 07:35:00 Woori Yallock
97786 2018-07-03 22:05:00 Woori Yallock
97902 2018-11-03 16:20:00 Woori Yallock
97938 2018-12-03 14:33:00 Woori Yallock
97939 2018-12-03 14:35:00 Woori Yallock
97946 2018-12-03 20:23:00 Woori Yallock
98046 2018-03-17 18:24:00 Woori Yallock
98090 2018-03-18 11:06:00 Woori Yallock
98207 2018-03-22 19:58:00 Woori Yallock
[98372 rows x 2 columns]
What I want to achieve is to get sum of incidents per YYYY-MM and for each ICC_NAME.
yyyy-mm Alexandra Boort ... Woori Yallock
2000-07 29 12 ... 8
2000-08 20 16 ... 13
... ...
... ...
2018-03 41 8 ... 28
I was thinking of using resample but not sure on which column the sum() should be applied.
Use crosstab with convert datetimes to month periods by Series.dt.to_period, last change index, columns names by DataFrame.rename_axis and convert PeriodIndex to column by DataFrame.reset_index:
df['comb_date'] = pd.to_datetime(df['comb_date'])
df1 = (pd.crosstab(df['comb_date'].dt.to_period('m'), df['ICC_NAME'])
.rename_axis(columns=None, index='yyy-mm')
.reset_index())
print (df1)
yyy-mm Alexandra Woori Yallock
0 2000-05 1 0
1 2000-09 1 0
2 2000-10 1 0
3 2000-11 4 0
4 2000-12 9 0
5 2001-01 9 0
6 2001-02 3 0
7 2001-06 1 0
8 2001-11 1 0
9 2018-01 0 8
10 2018-02 0 11
11 2018-03 0 3
12 2018-06 0 1
13 2018-07 0 2
14 2018-11 0 2
15 2018-12 0 3

How to concatenate a series of smaller size to the bottom of a pandas dataframe

I'm trying to concatenate a Series onto the right side of a dataframe with the column name 'RSI'. However, because the Series is of shorter length than the other columns in the dataframe, I need to ensure that NaN values are appended to the top of the column and not the bottom. Right now, I've used the following code but I can't find an argument that would allow me to have the desired output.
RSI = pd.Series(RSI)
df = pd.concat((df, RSI.rename('RSI'), axis='columns')
So far, this is my output:
Dates Prices Volumes RSI
0 2013-02-08 201.68 2893254 47.7357
1 2013-02-11 200.16 2944651 53.3967
2 2013-02-12 200.04 2461779 56.3866
3 2013-02-13 200.09 2169757 60.1845
4 2013-02-14 199.65 3294126 62.1784
5 2013-02-15 200.98 3627887 63.9720
6 2013-02-19 200.32 2998317 62.9671
7 2013-02-20 199.31 3715311 63.9232
8 2013-02-21 198.33 3923051 66.8817
9 2013-02-22 201.09 3107876 72.8258
10 2013-02-25 197.51 3845276 69.6578
11 2013-02-26 199.14 3391562 63.8458
12 2013-02-27 202.33 4185545 64.2776
13 2013-02-28 200.83 4689698 67.2445
14 2013-03-01 202.91 3308544 58.2408
15 2013-03-04 205.19 3693365 57.7058
16 2013-03-05 206.53 3807706 53.7482
17 2013-03-06 208.38 3594899 57.5396
18 2013-03-07 209.42 3884317 53.2722
19 2013-03-08 210.38 3700086 58.6824
20 2013-03-11 210.08 3048901 56.0161
21 2013-03-12 210.55 3591261 60.2066
22 2013-03-13 212.06 3355969 55.3322
23 2013-03-14 215.80 5505484 51.7492
24 2013-03-15 214.92 7935024 47.1241
25 2013-03-18 213.21 3006125 46.9102
26 2013-03-19 213.44 3198577 46.6569
27 2013-03-20 215.06 3019153 54.0822
28 2013-03-21 212.26 5830566 56.2525
29 2013-03-22 212.08 3015847 51.8359
... ... ... ... ...
1229 2017-12-26 152.83 2479017 80.1930
1230 2017-12-27 153.13 2149257 80.7444
1231 2017-12-28 154.04 2687624 56.4425
1232 2017-12-29 153.42 3327087 56.9183
1233 2018-01-02 154.25 4202503 63.6958
1234 2018-01-03 158.49 9441567 61.1962
1235 2018-01-04 161.70 7556249 61.3816
1236 2018-01-05 162.49 5195764 64.7724
1237 2018-01-08 163.47 5237523 63.0508
1238 2018-01-09 163.83 4341810 53.9559
1239 2018-01-10 164.18 4174105 54.1351
1240 2018-01-11 164.20 3794453 50.6824
1241 2018-01-12 163.14 5031886 43.0222
1242 2018-01-16 163.85 7794195 32.7428
1243 2018-01-17 168.65 11710033 39.4754
1244 2018-01-18 169.12 14259345 37.3409
1245 2018-01-19 162.37 21172488 NaN
1246 2018-01-22 162.60 8480795 NaN
1247 2018-01-23 166.25 7466232 NaN
1248 2018-01-24 165.37 5645003 NaN
1249 2018-01-25 165.47 3302520 NaN
1250 2018-01-26 167.34 3787913 NaN
1251 2018-01-29 166.80 3516995 NaN
1252 2018-01-30 163.62 4902341 NaN
1253 2018-01-31 163.70 4072830 NaN
1254 2018-02-01 162.40 4434242 NaN
1255 2018-02-02 159.03 5251938 NaN
1256 2018-02-05 152.53 8746599 NaN
1257 2018-02-06 155.34 9867678 NaN
1258 2018-02-07 153.85 6149207 NaN
However, I need it to look like this:
Dates Prices Volumes RSI
0 2013-02-08 201.68 2893254 NaN
1 2013-02-11 200.16 2944651 NaN
2 2013-02-12 200.04 2461779 NaN
3 2013-02-13 200.09 2169757 NaN
4 2013-02-14 199.65 3294126 NaN
5 2013-02-15 200.98 3627887 NaN
6 2013-02-19 200.32 2998317 NaN
7 2013-02-20 199.31 3715311 NaN
8 2013-02-21 198.33 3923051 NaN
9 2013-02-22 201.09 3107876 NaN
10 2013-02-25 197.51 3845276 NaN
11 2013-02-26 199.14 3391562 NaN
12 2013-02-27 202.33 4185545 NaN
13 2013-02-28 200.83 4689698 NaN
14 2013-03-01 202.91 3308544 NaN
15 2013-03-04 205.19 3693365 57.7058
16 2013-03-05 206.53 3807706 53.7482
17 2013-03-06 208.38 3594899 57.5396
18 2013-03-07 209.42 3884317 53.2722
19 2013-03-08 210.38 3700086 58.6824
20 2013-03-11 210.08 3048901 56.0161
21 2013-03-12 210.55 3591261 60.2066
22 2013-03-13 212.06 3355969 55.3322
23 2013-03-14 215.80 5505484 51.7492
24 2013-03-15 214.92 7935024 47.1241
25 2013-03-18 213.21 3006125 46.9102
26 2013-03-19 213.44 3198577 46.6569
27 2013-03-20 215.06 3019153 54.0822
28 2013-03-21 212.26 5830566 56.2525
29 2013-03-22 212.08 3015847 51.8359
... ... ... ... ...
1229 2017-12-26 152.83 2479017 80.1930
1230 2017-12-27 153.13 2149257 80.7444
1231 2017-12-28 154.04 2687624 56.4425
1232 2017-12-29 153.42 3327087 56.9183
1233 2018-01-02 154.25 4202503 63.6958
1234 2018-01-03 158.49 9441567 61.1962
1235 2018-01-04 161.70 7556249 61.3816
1236 2018-01-05 162.49 5195764 64.7724
1237 2018-01-08 163.47 5237523 63.0508
1238 2018-01-09 163.83 4341810 53.9559
1239 2018-01-10 164.18 4174105 54.1351
1240 2018-01-11 164.20 3794453 50.6824
1241 2018-01-12 163.14 5031886 43.0222
1242 2018-01-16 163.85 7794195 32.7428
1243 2018-01-17 168.65 11710033 39.4754
1244 2018-01-18 169.12 14259345 36.9999
1245 2018-01-19 162.37 21172488 41.1297
1246 2018-01-22 162.60 8480795 12.1231
1247 2018-01-23 166.25 7466232 39.0977
1248 2018-01-24 165.37 5645003 63.6958
1249 2018-01-25 165.47 3302520 56.4425
1250 2018-01-26 167.34 3787913 80.7444
1251 2018-01-29 166.80 3516995 61.1962
1252 2018-01-30 163.62 4902341 58.6824
1253 2018-01-31 163.70 4072830 53.7482
1254 2018-02-01 162.40 4434242 43.0222
1255 2018-02-02 159.03 5251938 61.1962
1256 2018-02-05 152.53 8746599 56.4425
1257 2018-02-06 155.34 9867678 36.0978
1258 2018-02-07 153.85 6149207 41.1311
Thanks for the help.
Another way is manipulating rsi series index to match df index from bottom up(I use only 13 rows of your sample for demo)
size_diff = df.index.size - rsi.index.size
rsi.index = df.index[size_diff:]
pd.concat([df, rsi], axis=1)
Out[1490]:
Dates Prices Volumes RSI
0 2013-02-08 201.68 2893254 NaN
1 2013-02-11 200.16 2944651 NaN
2 2013-02-12 200.04 2461779 NaN
3 2013-02-13 200.09 2169757 NaN
4 2013-02-14 199.65 3294126 NaN
5 2013-02-15 200.98 3627887 47.7357
6 2013-02-19 200.32 2998317 53.3967
7 2013-02-20 199.31 3715311 56.3866
8 2013-02-21 198.33 3923051 60.1845
9 2013-02-22 201.09 3107876 62.1784
10 2013-02-25 197.51 3845276 63.9720
11 2013-02-26 199.14 3391562 62.9671
12 2013-02-27 202.33 4185545 63.9232
13 2013-02-28 200.83 4689698 66.8817
Try like this:
df["RSI"].shift(len(df)-len(df["RSI"].dropna()))
We can get the difference in rows between the Series and the dataframe.
Then append the difference in NaN to the series (on top) with np.repeat
Finally append the new series with NaN to your original dataframe over axis=1 (columns)
diff = df.shape[0] - RSI.shape[0]
rpts = np.repeat(np.NaN, diff)
RSI = pd.concat([pd.Series(rpts, name='RSI'), RSI], ignore_index=True)
pd.concat([df, RSI['RSI']], axis=1).head(20)
Dates Prices Volumes RSI
0 2013-02-08 201.68 2893254 NaN
1 2013-02-11 200.16 2944651 NaN
2 2013-02-12 200.04 2461779 NaN
3 2013-02-13 200.09 2169757 NaN
4 2013-02-14 199.65 3294126 NaN
5 2013-02-15 200.98 3627887 NaN
6 2013-02-19 200.32 2998317 NaN
7 2013-02-20 199.31 3715311 NaN
8 2013-02-21 198.33 3923051 NaN
9 2013-02-22 201.09 3107876 NaN
10 2013-02-25 197.51 3845276 NaN
11 2013-02-26 199.14 3391562 NaN
12 2013-02-27 202.33 4185545 NaN
13 2013-02-28 200.83 4689698 47.7357
14 2013-03-01 202.91 3308544 53.3967
15 2013-03-04 205.19 3693365 56.3866
16 2013-03-05 206.53 3807706 60.1845
17 2013-03-06 208.38 3594899 62.1784
18 2013-03-07 209.42 3884317 63.9720
19 2013-03-08 210.38 3700086 62.9671

Categories

Resources