extracting values matching timestamps by a new set of timestamps - python

sample table here
i am trying to look up corresponding commodity prices from columns(CU00.SHF,AU00.SHF,SC00.SHF,I8888.DCE C00.DCE), with a new set of timestamps, the dates of which are 32 days later than the dates in column 'history_date'.
i tried .loc and .at in a loop to extract the matching values with below functions:
latest_day = data.iloc[data.shape[0] - 1, 0].date()
def next_trade_day(x):
x = pd.to_datetime(x).date() #imported is_workday funtion requires datetime type
while True:
if is_workday(x + timedelta(32)) != False:
break
return (pd.Timestamp((x + timedelta(32))))
if is_workday(x + timedelta(32)) == False:
x = x + timedelta(1)
return pd.Timestamp(x + timedelta(32))
def end_price(x):
x = pd.Timestamp(x)
if x <= latest_day:
return data.at[x,'CU00.SHF']
if x > latest_day:
return'None'
return data.at[x,'CU00.SHF']
but it always gives
KeyError: Timestamp('2023-02-03 00:00:00')
any idea how should i achieve the target?
thanks in advance!

if you want work datetime:
convert column datetime
check date converted, use filte
pd.to_datetime(df['your column'],errors='ignore')
df.loc[df.['your column'] > 'your-date' ]
if work both, then check your full code.

Related

How to concatenate series in Python

I've the following code:
def excel_date(date1):
temp = datetime.datetime(1899, 12, 30)
delta = date1 - temp if date1 != 0 else temp - temp
return float(delta.days) + (float(delta.seconds) / 86400)
df3['SuperID'] = df3['Break_date'].apply(excel_date)
df3['SuperID2'] = df3['ticker'] + str(df3['SuperID'])
Where I use a date to insert in date1 and I get a number from the excel date function.
My ticker and SuperID fields are OK:
I want to concatenate both and get TSLA44462 BUT it's concatenating the whole series if I use str() or .astype(str) in my SuperID column.
The column types:
Here my solution if I understood your problem :
import pandas as pd
df = pd.DataFrame({"Col1":[1.0,2.0,3.0,4.4], "Col2":["Michel", "Sardou", "Paul", "Jean"], "Other Col":[2,3,5,2]})
df["Concat column"] = df["Col1"].astype(int).astype(str) + df["Col2"]
df[df["Concat column"] == "1Michel"]
or
df = pd.DataFrame({"Col1":[1.0,2.0,3.0,4.4], "Col2":["Michel", "Sardou", "Paul", "Jean"], "Other Col":[2,3,5,2]})
df[(df["Col1"]==1) & (df["Col2"]=="Michel")]
After some hours of investigation and the help of comments the way to work with series, integers, floats and strings which worked for me is this:
def excel_date(date1):
temp = datetime.datetime(1899, 12, 30)
delta = date1 - temp if date1 != 0 else temp - temp
return float(delta.days) + (float(delta.seconds) / 86400)
First of all I convert float to integer to avoid decimals. int(x) is not feasible for series, so you better use .astype(int) which works fine.
df3['SuperID'] = df3['Break_date'].apply(excel_date).astype(int)
After that, convert everything to char with char.array and not str(x) or .astype. You then just need to sum columns using .astype(str) to get the desired result.
a = np.char.array(df3['ticker'].values)
b = np.char.array(df3['SuperID'].values)
df3['SuperID2'] = (a + b).astype(str)
Hope this help to others working with series.
regards

Hhow to filter by date of DataFrame in python function

I tried the following code.
The result1 is filtered by a given date, but the result2 isn't filtered.
How can I filter by date in a function?
import pandas as pd
over20='https://gist.githubusercontent.com/shinokada/dfcdc538dedf136d4a58b9bcdcfc8f18/raw/d1db4261b76af67dd67c00a400e373c175eab428/LNS14000024.csv'
df_over20 = pd.read_csv(over20)
display(df_over20)
result1=df_over20[df_over20['DATE']>='1972-01-01']
display(result1)
def changedate(item):
# something more here
item['DATE']=pd.to_datetime(item['DATE'])
start=pd.to_datetime('1972-01-01')
item[item['DATE']>=start]
return item
result2=changedate(df_over20)
display(result2)
In my experience I would make the Date column the index by running:
df.index = df[“DATE”]
df.drop(“DATE” , inplace = True , axis = 1 )
Try to use the index column
date = DT.datetime(‘2020-04-01’)
x = df[df.index > date]
You can also use the following command to make sure your index is a datetime index
df.index = pd.to_datetime( df.index )
You should not compare datetime by own string. it leads bad result.
please use this.
import datetime
def compare (date1,date2):
date1 = datetime.datetime.fromisoformat(date1).timestamp()
date2 = datetime.datetime.fromisoformat(date2).timestamp()
if(date1>date2):
return 1
elif(date1 == date2):
return 0
else:
return -1

Apply a function in a dataframe's columns [Python]

I just wrote this function to calculated the age's person based in two columns in a Python DataFrame. Unfortunately, if a use the return the function return the same value for all rows, but if I use the print statement the function gives me the right values.
Here is the code:
def calc_age(dataset):
index = dataset.index
for element in index:
year_nasc = train['DT_NASCIMENTO_BENEFICIARIO'][element][6:]
year_insc = train['ANO_CONCESSAO_BOLSA'][element]
age = int(year_insc) - int(year_nasc)
print ('Age: ', age)
#return age
train['DT_NASCIMENTO_BENEFICIARIO'] = 03-02-1987
train['ANO_CONCESSAO_BOLSA'] = 2009
What am I doing wrong?!
If what you want is to subtract the year of DT_NASCIMENTO_BENEFICIARIO from ANO_CONCESSAO_BOLSA, and df is your DataFrame:
# cast to datetime
df["DT_NASCIMENTO_BENEFICIARIO"] = pd.to_datetime(df["DT_NASCIMENTO_BENEFICIARIO"])
df["age"] = df["ANO_CONCESSAO_BOLSA"] - df["DT_NASCIMENTO_BENEFICIARIO"].dt.year
# print the result, or do something else with it:
print(df["age"])

Adding a calculated column to pandas dataframe

I am completely new to Python, pandas and programming in general, and I cannot figure out the following:
I have accessed a database with the help of pandas and I have put the data from the query into a dataframe, df. One of the column contains birthdays, which can have the following forms:
- 01/25/1980 (string)
- 01/25 (string)
- None (NoneType)
Now, I would like to add a new column to df, which stores the ages of the people in the database. So I have done the following:
def addAge(df):
today = date.today()
df["age"] = None
for index, row in df.iterrows():
if row["birthday"] != None:
if len(row["birthday"]) == 10:
birthday = df["birthday"]
birthdayDate = datetime.date(int(birthday[6:]), int(birthday[:2]), int(birthday[3:5]))
row["age"] = today.year - birthdayDate.year - ((today.month, today.day) < (birthdayDate.month, birthdayDate.day))
print row["birthday"], row["age"] #this is just for testing
addAge(df)
print df
The line print row["birthday"], row["age"] correctly prints the birthdays and the ages. But when I call print df, the column age always contains "None". Could you guys explain to me what I have been doing wrong? Thanks!
When you call iterrows() you are getting copies of each row and cannot assign back to the larger dataframe. In general, you should be trying to using vectorized methods, rather than iterating over the rows.
So for example in this case, to parse the 'birthday' column, you could do something like this: For the rows that have a length of 10, the string will parsed into a datetime, otherwise it will be filled with a missing value.
import numpy as np
import pandas as pd
df['birthday'] = np.where(df['birthday'].str.len() == 10, pd.to_datetime(df['birthday']), '')
To calculate the ages, you can use .apply, which applies a function over each row of a series.
So if you wrapped your age calculation in a function:
def calculate_age(birthdayDate, today):
if pd.isnull(birthdayDate):
return np.nan
else:
return today.year - birthdayDate.year -
((today.month, today.day) < (birthdayDate.month, birthdayDate.day))
Then, you could calculate the age column like this:
today = date.today()
df['age'] = df['birthday'].apply(lambda x: calculate_age(x, today))

Python: xlrd discerning dates from floats

I wanted to import a file containing text, numbers and dates using xlrd on Python.
I tried something like:
if "/" in worksheet.cell_value:
do_this
else:
do_that
But that was of no use as I latter discovered dates are stored as floats, not strings. To convert them to datetime type I did:
try:
get_row = str(datetime.datetime(*xlrd.xldate_as_tuple(worksheet.cell_value(i, col - 1), workbook.datemode)))
except:
get_row = unicode(worksheet.cell_value(i, col - 1))
I have an exception in place for when the cell contains text. Now i want to get the numbers as numbers and the dates as dates, because right now all numbers are converted to dates.
Any ideas?
I think you could make this much simpler by making more use of the tools available in xlrd:
cell_type = worksheet.cell_type(row - 1, i)
cell_value = worksheet.cell_value(row - 1, i)
if cell_type == xlrd.XL_CELL_DATE:
# Returns a tuple.
dt_tuple = xlrd.xldate_as_tuple(cell_value, workbook.datemode)
# Create datetime object from this tuple.
get_col = datetime.datetime(
dt_tuple[0], dt_tuple[1], dt_tuple[2],
dt_tuple[3], dt_tuple[4], dt_tuple[5]
)
elif cell_type == xlrd.XL_CELL_NUMBER:
get_col = int(cell_value)
else:
get_col = unicode(cell_value)
Well, never mind, I found a solution and here it is!
try:
cell = worksheet.cell(row - 1, i)
if cell.ctype == xlrd.XL_CELL_DATE:
date = datetime.datetime(1899, 12, 30)
get_ = datetime.timedelta(int(worksheet.cell_value(row - 1, i)))
get_col2 = str(date + get_)[:10]
d = datetime.datetime.strptime(get_col2, '%Y-%m-%d')
get_col = d.strftime('%d-%m-%Y')
else:
get_col = unicode(int(worksheet.cell_value(row - 1, i)))
except:
get_col = unicode(worksheet.cell_value(row - 1, i))
A bit of explanation: it turns out that with xlrd you can actually check the type of a cell and check if it's a date or not. Also, Excel seems to have a strange way to save daytimes. It saves them as floats (left part for days, right part for hours) and then it takes a specific date (1899, 12, 30, seems to work OK) and adds the days and hours from the float to create the date. So, to create the date that I wanted, I just added them them and kept only the 10 first letters ([:10]) to get rid of the hours(00.00.00 or something...). I also changed the order of days_months-years because in Greece we use a different order. Finally, this code also checks if it can convert a number to an integer(I don't want any floats to show at my program...) and if everything fails, it just uses the cell as it is(in cases there are strings in the cells...).
I hope that you find that useful, I think there are other threads that say that this is impossible or something...

Categories

Resources