I've the following code:
def excel_date(date1):
temp = datetime.datetime(1899, 12, 30)
delta = date1 - temp if date1 != 0 else temp - temp
return float(delta.days) + (float(delta.seconds) / 86400)
df3['SuperID'] = df3['Break_date'].apply(excel_date)
df3['SuperID2'] = df3['ticker'] + str(df3['SuperID'])
Where I use a date to insert in date1 and I get a number from the excel date function.
My ticker and SuperID fields are OK:
I want to concatenate both and get TSLA44462 BUT it's concatenating the whole series if I use str() or .astype(str) in my SuperID column.
The column types:
Here my solution if I understood your problem :
import pandas as pd
df = pd.DataFrame({"Col1":[1.0,2.0,3.0,4.4], "Col2":["Michel", "Sardou", "Paul", "Jean"], "Other Col":[2,3,5,2]})
df["Concat column"] = df["Col1"].astype(int).astype(str) + df["Col2"]
df[df["Concat column"] == "1Michel"]
or
df = pd.DataFrame({"Col1":[1.0,2.0,3.0,4.4], "Col2":["Michel", "Sardou", "Paul", "Jean"], "Other Col":[2,3,5,2]})
df[(df["Col1"]==1) & (df["Col2"]=="Michel")]
After some hours of investigation and the help of comments the way to work with series, integers, floats and strings which worked for me is this:
def excel_date(date1):
temp = datetime.datetime(1899, 12, 30)
delta = date1 - temp if date1 != 0 else temp - temp
return float(delta.days) + (float(delta.seconds) / 86400)
First of all I convert float to integer to avoid decimals. int(x) is not feasible for series, so you better use .astype(int) which works fine.
df3['SuperID'] = df3['Break_date'].apply(excel_date).astype(int)
After that, convert everything to char with char.array and not str(x) or .astype. You then just need to sum columns using .astype(str) to get the desired result.
a = np.char.array(df3['ticker'].values)
b = np.char.array(df3['SuperID'].values)
df3['SuperID2'] = (a + b).astype(str)
Hope this help to others working with series.
regards
Related
sample table here
i am trying to look up corresponding commodity prices from columns(CU00.SHF,AU00.SHF,SC00.SHF,I8888.DCE C00.DCE), with a new set of timestamps, the dates of which are 32 days later than the dates in column 'history_date'.
i tried .loc and .at in a loop to extract the matching values with below functions:
latest_day = data.iloc[data.shape[0] - 1, 0].date()
def next_trade_day(x):
x = pd.to_datetime(x).date() #imported is_workday funtion requires datetime type
while True:
if is_workday(x + timedelta(32)) != False:
break
return (pd.Timestamp((x + timedelta(32))))
if is_workday(x + timedelta(32)) == False:
x = x + timedelta(1)
return pd.Timestamp(x + timedelta(32))
def end_price(x):
x = pd.Timestamp(x)
if x <= latest_day:
return data.at[x,'CU00.SHF']
if x > latest_day:
return'None'
return data.at[x,'CU00.SHF']
but it always gives
KeyError: Timestamp('2023-02-03 00:00:00')
any idea how should i achieve the target?
thanks in advance!
if you want work datetime:
convert column datetime
check date converted, use filte
pd.to_datetime(df['your column'],errors='ignore')
df.loc[df.['your column'] > 'your-date' ]
if work both, then check your full code.
I have below dataframe called "df" and calculating the sum by unique id called "Id".
Can anyone help me in optimizing the code i have tried.
import pandas as pd
from datetime import datetime, timedelta
df= {'Date':['2019-01-11 10:23:45','2019-01-09 10:23:45', '2019-01-11 10:27:45',
'2019-01-11 10:25:45', '2019-01-11 10:30:45', '2019-01-11 10:35:45',
'2019-02-09 10:25:45'],
'Id':['100','200','300','100','100', '100','200'],
'Amount':[200,400,330,100,300,200,500],
}
df= pd.DataFrame(df)
df["Date"] = pd.to_datetime(df['Date'])
You can try to use groupby, after this each adjust within sub-groupby not to the whole df
s = {}
for x , y in df.groupby(['Id','NCC']):
for i in y.index:
start_date = y['Date'][i] - timedelta(seconds=300)
end_date = y['Date'][i]
mask = (y['Date'] >= start_date) & (y['Date'] < end_date)
count = y.loc[mask]
count = count.loc[(y['Sys'] == 1)]
if len(count) == 0:
s.update({i : 0})
else:
s.update({i : count['Amount'].sum()})
df['New']=pd.Series(s)
If the original data frame has 2 million rows, it would probably be faster to convert the 'Date' column to an index and sort it. Then you can sub select each 5-minute interval:
df = df.set_index('Date').sort_index()
df['Sum_Amt'] = 0
for end in df.index:
start = end - pd.Timedelta('5min')
current_window = df[start : end] # data frame with 5-minute look-back
sum_amt = <calc logic applied to `current_window` goes here>
df.at[end, 'Sum_Amt'] = sum_amt
print(current_window)
print()
I'm not following the logic for calculating Sum_Amt, so I left that out.
I tried the following code.
The result1 is filtered by a given date, but the result2 isn't filtered.
How can I filter by date in a function?
import pandas as pd
over20='https://gist.githubusercontent.com/shinokada/dfcdc538dedf136d4a58b9bcdcfc8f18/raw/d1db4261b76af67dd67c00a400e373c175eab428/LNS14000024.csv'
df_over20 = pd.read_csv(over20)
display(df_over20)
result1=df_over20[df_over20['DATE']>='1972-01-01']
display(result1)
def changedate(item):
# something more here
item['DATE']=pd.to_datetime(item['DATE'])
start=pd.to_datetime('1972-01-01')
item[item['DATE']>=start]
return item
result2=changedate(df_over20)
display(result2)
In my experience I would make the Date column the index by running:
df.index = df[“DATE”]
df.drop(“DATE” , inplace = True , axis = 1 )
Try to use the index column
date = DT.datetime(‘2020-04-01’)
x = df[df.index > date]
You can also use the following command to make sure your index is a datetime index
df.index = pd.to_datetime( df.index )
You should not compare datetime by own string. it leads bad result.
please use this.
import datetime
def compare (date1,date2):
date1 = datetime.datetime.fromisoformat(date1).timestamp()
date2 = datetime.datetime.fromisoformat(date2).timestamp()
if(date1>date2):
return 1
elif(date1 == date2):
return 0
else:
return -1
I'm making a function to calculate the time difference between two durations using Pandas.
The function is:
def time_calc(dur1, dur2):
date1 = pd.to_datetime(pd.Series(dur2))
date2 = pd.to_datetime(pd.Series(dur1))
df = pd.DataFrame(dict(ID = ids, DUR1 = date2, DUR2 = date1))
df1 = pd.DataFrame(dict(ID = ids, Duration1 = date2, Duration2 = date1))
df1['Duration1'] = df['DUR1'].dt.strftime('%H:%M:%S.%f')
df1['Duration2'] = df['DUR2'].dt.strftime('%H:%M:%S.%f')
cols = df.columns.tolist()
cols = ['ID', 'DUR1', 'DUR2']
df = df[cols]
df['diff_seconds'] = df['DUR2'] - df['DUR1']
df['diff_seconds'] = df['diff_seconds']/np.timedelta64(1,'s')
df['TimeDelta'] = df['diff_seconds'].apply(lambda d: str(datetime.timedelta(seconds=abs(d))))
df3 = df1.merge(df, on='ID')
cols = df3.columns.tolist()
cols = ['ID', 'Duration1', 'Duration2', 'TimeDelta', 'diff_seconds']
df3 = df3[cols]
print(df3)
The math is: Duration2-Duration1=TimeDelta
The function does it nicely:
Duration1 Duration2 TimeDelta diff_seconds
00:00:23.999891 00:00:25.102076 0:00:01.102185 1.102185
00:00:43.079173 00:00:44.621481 0:00:01.542308 1.542308
But when Duration2 < Duration1 we have a negative diff_seconds, but TimeDelta is still positive:
Duration1 Duration2 TimeDelta diff_seconds
00:05:03.744332 00:04:58.008081 0:00:05.736251 -5.736251
So what I need my function to do is to convert TimeDelta to negative value like this:
Duration1 Duration2 TimeDelta diff_seconds
00:05:03.744332 00:04:58.008081 -0:00:05.736251 -5.736251
I suppose that I need to convert 'TimeDelta' in another way, but all my attempts were useless.
I'll be very thankful if somebody will help me with this.
Thanks in advance!
I've solved this issue.
Made one by one timestamp picking logic and pass timestamps to 'time_convert' function
df['diff_seconds'] = df['DUR2'] - df['DUR1']
df['diff_seconds'] = df['diff_seconds']/np.timedelta64(1,'s')
for i in df['diff_seconds']:
df['TimeDelta'] = time_convert(i)
And the time_convert function just appends "-" to formatted timestamp if the seconds were negative:
def time_convert(d):
if d > 0:
lst.append(str(datetime.timedelta(seconds=d)))
else:
lst.append('-' + str(datetime.timedelta(seconds=abs(d))))
And then, I've just created new data frame using lst, and merged all together
df_t = pd.DataFrame(dict(ALERTS = alerts, TimeDelta = lst))
df_f = df_t.merge(df3, on='ID')
Hope this will help somebody.
I have a dataframe with 'Date' and 'Value', where the Date is in format m/d/yyyy. I need to convert to yyyymmdd.
df2= df[["Date", "Transaction"]]
I know datetime can do this for me, but I can't get it to accept my format.
example data files:
6/15/2006,-4.27,
6/16/2006,-2.27,
6/19/2006,-6.35,
You first need to convert to datetime, using pd.datetime, then you can format it as you wish using strftime:
>>> df
Date Transaction
0 6/15/2006 -4.27
1 6/16/2006 -2.27
2 6/19/2006 -6.35
df['Date'] = pd.to_datetime(df['Date'],format='%m/%d/%Y').dt.strftime('%Y%m%d')
>>> df
Date Transaction
0 20060615 -4.27
1 20060616 -2.27
2 20060619 -6.35
You can say:
df['Date']=df['Date'].dt.strftime('%Y%m%d')
dt accesor's strftime method is your clear friend now.
Note: if didn't convert to pandas datetime yet, do:
df['Date']=pd.to_datetime(df['Date']).dt.strftime('%Y%m%d')
Output:
Date Transaction
0 20060615 -4.27
1 20060616 -2.27
2 20060619 -6.35
For a raw python solution, you could try something along the following (assuming datafile is a string).
datafile="6/15/2006,-4.27,\n6/16/2006,-2.27,\n6/19/2006,-6.35"
def zeroPad(str, desiredLen):
while (len(str) < desiredLen):
str = "0" + str
return str
def convToYYYYMMDD(datafile):
datafile = ''.join(datafile.split('\n')) # remove \n's, they're unreliable and not needed
datafile = datafile.split(',') # split by comma so '1,2' becomes ['1','2']
out = []
for i in range(0, len(datafile)):
if (i % 2 == 0):
tmp = datafile[i].split('/')
yyyymmdd = zeroPad(tmp[2], 4) + zeroPad(tmp[0], 2) + zeroPad(tmp[1], 2)
out.append(yyyymmdd)
else:
out.append(datafile[i])
return out
print(convToYYYYMMDD(datafile))
This outputs: ['20060615', '-4.27', '20060616', '-2.27', '20060619', '-6.35'].