I have a timeseries
date
2009-12-23 0.0
2009-12-28 0.0
2009-12-29 0.0
2009-12-30 0.0
2009-12-31 0.0
2010-01-04 0.0
2010-01-05 0.0
2010-01-06 0.0
2010-01-07 0.0
2010-01-08 0.0
2010-01-11 0.0
2010-01-12 0.0
2010-01-13 0.0
2010-01-14 0.0
2010-01-15 0.0
2010-01-18 0.0
2010-01-19 0.0
2010-01-20 0.0
2010-01-21 0.0
2010-01-22 0.0
2010-01-25 0.0
2010-01-26 0.0
2010-01-27 0.0
2010-01-28 0.0
2010-01-29 0.0
2010-02-01 0.0
2010-02-02 0.0
I would like to set the value to 1 based on the following rule:
If the constant is set 9 this means the 9th of each month. Due to
that that 2010-01-09 doesn't exist I would like to set the next date
that exists in the series to 1 which is 2010-01-11 above.
I have tried to create two series one (series1) with day < 9 set to 1 and one (series2) with day > 9 to 1 and then series1.shift(1) * series2
It works in the middle of the month but not if day is set to 1 due to that the last date in previous month is set to 0 in series1.
Assume your timeseries is s with a datetimeindex
I want to create a groupby object of all index values whose days are greater than or equal to 9.
g = s.index.to_series().dt.day.ge(9).groupby(pd.TimeGrouper('M'))
Then I'll check that there is at least one day past >= 9 and grab the first among them. With those, I'll assign the value of 1.
s.loc[g.idxmax()[g.any()]] = 1
s
date
2009-12-23 1.0
2009-12-28 0.0
2009-12-29 0.0
2009-12-30 0.0
2009-12-31 0.0
2010-01-04 0.0
2010-01-05 0.0
2010-01-06 0.0
2010-01-07 0.0
2010-01-08 0.0
2010-01-11 1.0
2010-01-12 0.0
2010-01-13 0.0
2010-01-14 0.0
2010-01-15 0.0
2010-01-18 0.0
2010-01-19 0.0
2010-01-20 0.0
2010-01-21 0.0
2010-01-22 0.0
2010-01-25 0.0
2010-01-26 0.0
2010-01-27 0.0
2010-01-28 0.0
2010-01-29 0.0
2010-02-01 0.0
2010-02-02 0.0
Name: val, dtype: float64
Note that 2009-12-23 also was assigned a 1 as it satisfies this requirement as well.
Related
I have a dataframe which I pivoted and I now want to select spefici rows from the data. I have seen similar questions such as the one here: Selecting columns in a pandas pivot table based on specific row value?. In my case I want to return all the columns but I want to select only specific rows.
timestamp,value
2008-03-01 00:00:00,55.0
2008-03-01 00:15:00,20.0
2008-03-01 00:30:00,13.0
2008-03-01 00:45:00,78.0
2008-03-01 01:00:00,34.0
2008-03-01 01:15:00,123.0
2008-03-01 01:30:00,25.0
2008-03-01 01:45:00,91.0
2008-03-02 00:00:00,55.0
2008-03-02 00:15:00,46.0
2008-03-02 00:30:00,66.0
2008-03-02 00:45:00,24.0
2008-03-02 01:00:00,70.0
2008-03-02 01:15:00,32.0
2008-03-02 01:30:00,15.0
2008-03-02 01:45:00,92.0
I have done the below to generate the below output
import pandas as pd
import numpy as np
from datetime import datetime
df = pd.read_csv('df.csv')
df.timestamp = pd.to_datetime(df.timestamp)
df = df.set_index('timestamp')
df['date'] = df.index.map(lambda t: t.date())
df['time'] = df.index.map(lambda t: t.time())
df_pivot = pd.pivot_table(df, values='value', index='timestamp', columns='time')
df_pivot = df_pivot.fillna(0.0)
print(df_pivot)
Generated output
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 01:15:00 01:30:00 01:45:00
timestamp
2008-03-01 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 00:15:00 0.0 20.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 00:30:00 0.0 0.0 13.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 00:45:00 0.0 0.0 0.0 78.0 0.0 0.0 0.0 0.0
2008-03-01 01:00:00 0.0 0.0 0.0 0.0 34.0 0.0 0.0 0.0
2008-03-01 01:15:00 0.0 0.0 0.0 0.0 0.0 123.0 0.0 0.0
2008-03-01 01:30:00 0.0 0.0 0.0 0.0 0.0 0.0 25.0 0.0
2008-03-01 01:45:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 91.0
2008-03-02 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-02 00:15:00 0.0 46.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-02 00:30:00 0.0 0.0 66.0 0.0 0.0 0.0 0.0 0.0
2008-03-02 00:45:00 0.0 0.0 0.0 24.0 0.0 0.0 0.0 0.0
2008-03-02 01:00:00 0.0 0.0 0.0 0.0 70.0 0.0 0.0 0.0
2008-03-02 01:15:00 0.0 0.0 0.0 0.0 0.0 32.0 0.0 0.0
2008-03-02 01:30:00 0.0 0.0 0.0 0.0 0.0 0.0 15.0 0.0
2008-03-02 01:45:00 0.0 0.0 0.0 0.0 0.0 0.0 0.0 92.0
I want to select e.g., only the data for 2008-03-01 00:00:00, 2008-03-01 01:15:00, and 2008-03-02 01:00:00.
Expected output
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 01:15:00 01:30:00 01:45:00
timestamp
2008-03-01 00:00:00 55.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2008-03-01 01:15:00 0.0 0.0 0.0 0.0 0.0 123.0 0.0 0.0
2008-03-02 01:00:00 0.0 0.0 0.0 0.0 70.0 0.0 0.0 0.0
How can I do that
Use list of datetimes converted by to_datetime and select by DataFrame.loc:
#create DatetimeIndex
df = pd.read_csv('df.csv', index_col='timestamp', parse_dates=['timestamp'])
#used pandas methods
df['date'] = df.index.date
df['time'] = df.index.time
#added fill_value parameter
df_pivot = pd.pivot_table(df,values='value',index='timestamp',columns='time',fill_value=0)
L = ['2008-03-01 00:00:00','2008-03-01 01:15:00','2008-03-02 01:00:00']
df = df_pivot.loc[pd.to_datetime(L)]
print (df)
time 00:00:00 00:15:00 00:30:00 00:45:00 01:00:00 \
2008-03-01 00:00:00 55 0 0 0 0
2008-03-01 01:15:00 0 0 0 0 0
2008-03-02 01:00:00 0 0 0 0 70
time 01:15:00 01:30:00 01:45:00
2008-03-01 00:00:00 0 0 0
2008-03-01 01:15:00 123 0 0
2008-03-02 01:00:00 0 0 0
UPDATED - 4.13.22
I am new to programming python and am trying to create a program using For Loops that will go through a data frame by rows to identify different types of 'group sales' made up by different combinations of product sales and posting the results in a 'Result' column.
I was told in previous comments to print the df and paste it:
Date LFMIX SALE LCSIX SALE LOTIX SALE LSPIX SALE LEQIX SALE \
0 0.0 0.0 30000.0 0.0 0.0 0.0
1 0.0 0.0 30000.0 0.0 0.0 0.0
2 0.0 30000.0 0.0 0.0 0.0 0.0
3 0.0 25000.0 25000.0 0.0 0.0 0.0
4 0.0 30000.0 30000.0 0.0 0.0 0.0
5 0.0 30000.0 0.0 0.0 0.0 30000.0
6 0.0 0.0 30000.0 0.0 0.0 30000.0
7 0.0 25000.0 25000.0 0.0 0.0 25000.0
AUM LFMIX AUM LCSIX AUM LOTIX AUM LSPIX AUM LEQIX \
0 200000.0 0.0 0.0 0.0 0.0
1 500000.0 0.0 0.0 0.0 0.0
2 0.0 200000.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 200000.0
5 0.0 200000.0 0.0 0.0 0.0
6 200000.0 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0 0.0
is the sale = 10% of pairing fund AUM LFMIX LCSIX LOTIX LSPIX LEQIX \
0 0.0 1 1 0.0 0.0 0.0
1 0.0 1 1 0.0 0.0 0.0
2 0.0 1 1 0.0 0.0 0.0
3 0.0 1 1 0.0 0.0 0.0
4 0.0 1 1 0.0 0.0 1.0
5 0.0 1 1 0.0 0.0 1.0
6 0.0 1 1 0.0 0.0 1.0
7 0.0 1 1 0.0 0.0 1.0
Expected_Result Result
0 DP1
1 0
2 DP2
3 DP3
4 TT1
5 TT2
6 TT3
7 TT4
my Python code to sort just the 1st row:
for row in range(len(df)):
if (df["LCSIX"][row] >= (df["AUM LFMIX"][row] * .1)): df["Result"][row] = "DP1"
and the results:
Date LFMIX SALE LCSIX SALE LOTIX SALE LSPIX SALE LEQIX SALE \
0 0.0 0.0 30000.0 0.0 0.0 0.0
1 0.0 0.0 30000.0 0.0 0.0 0.0
2 0.0 30000.0 0.0 0.0 0.0 0.0
3 0.0 25000.0 25000.0 0.0 0.0 0.0
4 0.0 30000.0 30000.0 0.0 0.0 0.0
5 0.0 30000.0 0.0 0.0 0.0 30000.0
6 0.0 0.0 30000.0 0.0 0.0 30000.0
7 0.0 25000.0 25000.0 0.0 0.0 25000.0
AUM LFMIX AUM LCSIX AUM LOTIX AUM LSPIX AUM LEQIX \
0 200000.0 0.0 0.0 0.0 0.0
1 500000.0 0.0 0.0 0.0 0.0
2 0.0 200000.0 0.0 0.0 0.0
3 0.0 0.0 0.0 0.0 0.0
4 0.0 0.0 0.0 0.0 200000.0
5 0.0 200000.0 0.0 0.0 0.0
6 200000.0 0.0 0.0 0.0 0.0
7 0.0 0.0 0.0 0.0 0.0
is the sale = 10% of pairing fund AUM LFMIX LCSIX LOTIX LSPIX LEQIX \
0 0.0 1 1 0.0 0.0 0.0
1 0.0 1 1 0.0 0.0 0.0
2 0.0 1 1 0.0 0.0 0.0
3 0.0 1 1 0.0 0.0 0.0
4 0.0 1 1 0.0 0.0 1.0
5 0.0 1 1 0.0 0.0 1.0
6 0.0 1 1 0.0 0.0 1.0
7 0.0 1 1 0.0 0.0 1.0
Expected_Result Result
0 DP1
1 0
2 DP2 DP1
3 DP3 DP1
4 TT1 DP1
5 TT2 DP1
6 TT3
7 TT4 DP1
As you can see, the code fail to identify row[0] as a DP1 and misidentifies other rows.
I am planning on coding 'For Loops' that will identify 17 different types of group sales, this is simply the 1st group I am trying to identify...
Thanks for the help.
When you're working with pandas, you need to think in terms of doing things with whole columns, NOT row by row, which is hopelessly slow in pandas. If you need to go row by row, then do all of that before you convert to pandas.
In this case, you need to set the "result" column for all rows where your condition is met. This does that in one line:
df["result"][df["LCIX"] >= df["AUM_LFMIX"]*0.1] = "DP1"
So, we select the column as "result", and we select the rows where the relation is true. Simple. ;)
Suppose I have the following code that calculates how many products I can purchase given my budget-
import math
import pandas as pd
data = [['2021-01-02', 5.5], ['2021-02-02', 10.5], ['2021-03-02', 15.0], ['2021-04-02', 20.0]]
df = pd.DataFrame(data, columns=['Date', 'Current_Price'])
df.Date = pd.to_datetime(df.Date)
mn = df.Date.min()
mx = df.Date.max()
dr = pd.date_range(mn - pd.tseries.offsets.MonthBegin(), mx + pd.tseries.offsets.MonthEnd(), name="Date")
df = df.set_index("Date").reindex(dr).reset_index()
df['Current_Price'] = df.groupby(
pd.Grouper(key='Date', freq='1M'))['Current_Price'].ffill().bfill()
# The dataframe below shows the current price of the product
# I'd like to buy at the specific date_range
print(df)
# Create 'Day' column to know which day of the month
df['Day'] = pd.to_datetime(df['Date']).dt.day
# Create 'Deposit' column to record how much money is
# deposited in, say, my bank account to buy the product.
# 'Withdrawal' column is to record how much I spent in
# buying product(s) at the current price on a specific date.
# 'Num_of_Products_Bought' shows how many items I bought
# on that specific date.
#
# Please note that the calculate below takes into account
# the left over money, which remains after I've purchased a
# product, for future purchase. For example, if you observe
# the resulting dataframe at the end of this code, you'll
# notice that I was able to purchase 7 products on March 1, 2021
# although my deposit on that day was $100. That is because
# on the days leading up to March 1, 2021, I have been saving
# the spare change from previous product purchases and that
# extra money allows me to buy an extra product on March 1, 2021
# despite my budget of $100 should only allow me to purchase
# 6 products.
df[['Deposit', 'Withdrawal', 'Num_of_Products_Bought']] = 0.0
# Suppose I save $100 at the beginning of every month in my bank account
df.loc[df['Day'] == 1, 'Deposit'] = 100.0
for index, row in df.iterrows():
if df.loc[index, 'Day'] == 1:
# num_prod_bought = (sum_of_deposit_so_far - sum_of_withdrawal)/current_price
df.loc[index, 'Num_of_Products_Bought'] = math.floor(
(sum(df.iloc[0:(index + 1)]['Deposit'])
- sum(df.iloc[0:(index + 1)]['Withdrawal']))
/ df.loc[index, 'Current_Price'])
# Record how much I spent buying the product on specific date
df.loc[index, 'Withdrawal'] = df.loc[index, 'Num_of_Products_Bought'] * df.loc[index, 'Current_Price']
print(df)
# This code above is working as intended,
# but how can I make it more efficient/pandas-like?
# In particular, I don't like to idea of having to
# iterate the rows and having to recalculate
# the running (sum of) deposit amount and
# the running (sum of) the withdrawal.
As mentioned in the comment in the code, I would like to know how to accomplish the same without having to iterate the rows one by one and calculating the sum of the rows up to the current row in my iteration (I read around StackOverflow and saw cumsum() function, but I don't think cumsum has the notion of current row in the iteration).
Thank you very much in advance for your suggestions/answers!
A solution using .apply:
def fn():
leftover = 0
amount, deposit = yield
while True:
new_amount, new_deposit = yield (deposit + leftover) // amount
leftover = (deposit + leftover) % amount
amount, deposit = new_amount, new_deposit
df = df.set_index("Date")
s = fn()
next(s)
m = df.index.day == 1
df.loc[m, "Deposit"] = 100
df.loc[m, "Num_of_Products_Bought"] = df.loc[
m, ["Current_Price", "Deposit"]
].apply(lambda x: s.send((x["Current_Price"], x["Deposit"])), axis=1)
df.loc[m, "Withdrawal"] = (
df.loc[m, "Num_of_Products_Bought"] * df.loc[m, "Current_Price"]
)
print(df.fillna(0).reset_index())
Prints:
Date Current_Price Deposit Num_of_Products_Bought Withdrawal
0 2021-01-01 5.5 100.0 18.0 99.0
1 2021-01-02 5.5 0.0 0.0 0.0
2 2021-01-03 5.5 0.0 0.0 0.0
3 2021-01-04 5.5 0.0 0.0 0.0
4 2021-01-05 5.5 0.0 0.0 0.0
5 2021-01-06 5.5 0.0 0.0 0.0
6 2021-01-07 5.5 0.0 0.0 0.0
7 2021-01-08 5.5 0.0 0.0 0.0
8 2021-01-09 5.5 0.0 0.0 0.0
9 2021-01-10 5.5 0.0 0.0 0.0
10 2021-01-11 5.5 0.0 0.0 0.0
11 2021-01-12 5.5 0.0 0.0 0.0
12 2021-01-13 5.5 0.0 0.0 0.0
13 2021-01-14 5.5 0.0 0.0 0.0
14 2021-01-15 5.5 0.0 0.0 0.0
15 2021-01-16 5.5 0.0 0.0 0.0
16 2021-01-17 5.5 0.0 0.0 0.0
17 2021-01-18 5.5 0.0 0.0 0.0
18 2021-01-19 5.5 0.0 0.0 0.0
19 2021-01-20 5.5 0.0 0.0 0.0
20 2021-01-21 5.5 0.0 0.0 0.0
21 2021-01-22 5.5 0.0 0.0 0.0
22 2021-01-23 5.5 0.0 0.0 0.0
23 2021-01-24 5.5 0.0 0.0 0.0
24 2021-01-25 5.5 0.0 0.0 0.0
25 2021-01-26 5.5 0.0 0.0 0.0
26 2021-01-27 5.5 0.0 0.0 0.0
27 2021-01-28 5.5 0.0 0.0 0.0
28 2021-01-29 5.5 0.0 0.0 0.0
29 2021-01-30 5.5 0.0 0.0 0.0
30 2021-01-31 5.5 0.0 0.0 0.0
31 2021-02-01 10.5 100.0 9.0 94.5
32 2021-02-02 10.5 0.0 0.0 0.0
33 2021-02-03 10.5 0.0 0.0 0.0
34 2021-02-04 10.5 0.0 0.0 0.0
35 2021-02-05 10.5 0.0 0.0 0.0
36 2021-02-06 10.5 0.0 0.0 0.0
37 2021-02-07 10.5 0.0 0.0 0.0
38 2021-02-08 10.5 0.0 0.0 0.0
39 2021-02-09 10.5 0.0 0.0 0.0
40 2021-02-10 10.5 0.0 0.0 0.0
41 2021-02-11 10.5 0.0 0.0 0.0
42 2021-02-12 10.5 0.0 0.0 0.0
43 2021-02-13 10.5 0.0 0.0 0.0
44 2021-02-14 10.5 0.0 0.0 0.0
45 2021-02-15 10.5 0.0 0.0 0.0
46 2021-02-16 10.5 0.0 0.0 0.0
47 2021-02-17 10.5 0.0 0.0 0.0
48 2021-02-18 10.5 0.0 0.0 0.0
49 2021-02-19 10.5 0.0 0.0 0.0
50 2021-02-20 10.5 0.0 0.0 0.0
51 2021-02-21 10.5 0.0 0.0 0.0
52 2021-02-22 10.5 0.0 0.0 0.0
53 2021-02-23 10.5 0.0 0.0 0.0
54 2021-02-24 10.5 0.0 0.0 0.0
55 2021-02-25 10.5 0.0 0.0 0.0
56 2021-02-26 10.5 0.0 0.0 0.0
57 2021-02-27 10.5 0.0 0.0 0.0
58 2021-02-28 10.5 0.0 0.0 0.0
59 2021-03-01 15.0 100.0 7.0 105.0
60 2021-03-02 15.0 0.0 0.0 0.0
61 2021-03-03 15.0 0.0 0.0 0.0
62 2021-03-04 15.0 0.0 0.0 0.0
63 2021-03-05 15.0 0.0 0.0 0.0
64 2021-03-06 15.0 0.0 0.0 0.0
65 2021-03-07 15.0 0.0 0.0 0.0
66 2021-03-08 15.0 0.0 0.0 0.0
67 2021-03-09 15.0 0.0 0.0 0.0
68 2021-03-10 15.0 0.0 0.0 0.0
69 2021-03-11 15.0 0.0 0.0 0.0
70 2021-03-12 15.0 0.0 0.0 0.0
71 2021-03-13 15.0 0.0 0.0 0.0
72 2021-03-14 15.0 0.0 0.0 0.0
73 2021-03-15 15.0 0.0 0.0 0.0
74 2021-03-16 15.0 0.0 0.0 0.0
75 2021-03-17 15.0 0.0 0.0 0.0
76 2021-03-18 15.0 0.0 0.0 0.0
77 2021-03-19 15.0 0.0 0.0 0.0
78 2021-03-20 15.0 0.0 0.0 0.0
79 2021-03-21 15.0 0.0 0.0 0.0
80 2021-03-22 15.0 0.0 0.0 0.0
81 2021-03-23 15.0 0.0 0.0 0.0
82 2021-03-24 15.0 0.0 0.0 0.0
83 2021-03-25 15.0 0.0 0.0 0.0
84 2021-03-26 15.0 0.0 0.0 0.0
85 2021-03-27 15.0 0.0 0.0 0.0
86 2021-03-28 15.0 0.0 0.0 0.0
87 2021-03-29 15.0 0.0 0.0 0.0
88 2021-03-30 15.0 0.0 0.0 0.0
89 2021-03-31 15.0 0.0 0.0 0.0
90 2021-04-01 20.0 100.0 5.0 100.0
91 2021-04-02 20.0 0.0 0.0 0.0
92 2021-04-03 20.0 0.0 0.0 0.0
93 2021-04-04 20.0 0.0 0.0 0.0
94 2021-04-05 20.0 0.0 0.0 0.0
95 2021-04-06 20.0 0.0 0.0 0.0
96 2021-04-07 20.0 0.0 0.0 0.0
97 2021-04-08 20.0 0.0 0.0 0.0
98 2021-04-09 20.0 0.0 0.0 0.0
99 2021-04-10 20.0 0.0 0.0 0.0
100 2021-04-11 20.0 0.0 0.0 0.0
101 2021-04-12 20.0 0.0 0.0 0.0
102 2021-04-13 20.0 0.0 0.0 0.0
103 2021-04-14 20.0 0.0 0.0 0.0
104 2021-04-15 20.0 0.0 0.0 0.0
105 2021-04-16 20.0 0.0 0.0 0.0
106 2021-04-17 20.0 0.0 0.0 0.0
107 2021-04-18 20.0 0.0 0.0 0.0
108 2021-04-19 20.0 0.0 0.0 0.0
109 2021-04-20 20.0 0.0 0.0 0.0
110 2021-04-21 20.0 0.0 0.0 0.0
111 2021-04-22 20.0 0.0 0.0 0.0
112 2021-04-23 20.0 0.0 0.0 0.0
113 2021-04-24 20.0 0.0 0.0 0.0
114 2021-04-25 20.0 0.0 0.0 0.0
115 2021-04-26 20.0 0.0 0.0 0.0
116 2021-04-27 20.0 0.0 0.0 0.0
117 2021-04-28 20.0 0.0 0.0 0.0
118 2021-04-29 20.0 0.0 0.0 0.0
119 2021-04-30 20.0 0.0 0.0 0.0
Suppose we have a list of dataframes A which contains three dataframes df_1, df_2, and df_3:
A = [df_a, df_b, df_c]
df_a =
morning noon night
date
2019-12-31 B 3.0 3.0 0.0
C 0.0 0.0 1.0
D 0.0 1.0 0.0
E 142.0 142.0 142.0
df_b =
morning noon night
date
2020-01-31 A 3.0 0.0 0.0
B 1.0 0.0 0.0
E 142.0 145.0 145.0
df_c =
morning noon night
date
2020-02-29 F 145.0 145.0 145.0
All dataframes have morning, noon, night columns and have same index which is date and [A,B,C,D,E,F] column and I want to concatenate those three dataframes into one dataframe (let's say full_df) which every date have equal rows/indexes.
But as you see each dataframe have different number of rows, df_1,df_2, and df_3 have [B,C,D,E], [A,B,E] and [F] respectively.
Is there some way we can concat those dataframes but this time, each dataframe have index of all unique index from those three combined ? It returns 0.0 if the corresponding index is not available on the original dataframe.
This is what I was thinking about full_df:
full_df =
morning noon night
date
2019-12-31 A 0.0 0.0 0.0
B 3.0 3.0 0.0
C 0.0 0.0 1.0
D 0.0 1.0 0.0
E 142.0 142.0 142.0
F 0.0 0.0 0.0
2020-01-31 A 3.0 0.0 0.0
B 1.0 0.0 0.0
C 0.0 0.0 0.0
D 0.0 0.0 0.0
E 142.0 145.0 145.0
F 0.0 0.0 0.0
2020-02-29 A 0.0 0.0 0.0
B 0.0 0.0 0.0
C 0.0 0.0 0.0
D 0.0 0.0 0.0
E 0.0 0.0 0.0
F 145.0 145.0 145.0
You can try:
pd.concat(A).unstack(level=-1, fill_value=0).stack()
Here is the sample dataframe:-
Trade_signal
2007-07-31 0.0
2007-08-31 0.0
2007-09-28 0.0
2007-10-31 0.0
2007-11-30 0.0
2007-12-31 0.0
2008-01-31 0.0
2008-02-29 0.0
2008-03-31 0.0
2008-04-30 0.0
2008-05-30 0.0
2008-06-30 0.0
2008-07-31 -1.0
2008-08-29 0.0
2008-09-30 -1.0
2008-10-31 -1.0
2008-11-28 -1.0
2008-12-31 0.0
2009-01-30 -1.0
2009-02-27 -1.0
2009-03-31 0.0
2009-04-30 0.0
2009-05-29 1.0
2009-06-30 1.0
2009-07-31 1.0
2009-08-31 1.0
2009-09-30 1.0
2009-10-30 0.0
2009-11-30 1.0
2009-12-31 1.0
1 represents buy and -1 represents sell. I want to subset the dataframe so that the new dataframe starts with first 1 occurrence. Expected Output:-
2009-05-29 1.0
2009-06-30 1.0
2009-07-31 1.0
2009-08-31 1.0
2009-09-30 1.0
2009-10-30 0.0
2009-11-30 1.0
2009-12-31 1.0
Please suggest the way forward. Apologies if this is a repeated question.
Simply do. Here df[1] refers to the column containing buy/sell data.
new_df = df.iloc[df[df["Trade Signal"]==1].index[0]:,:]