Have a dateframe like that:
Trying to change '2001-01-01' value in column to date (function of today's date). But this one approach does not work:
date = dt.date.today()
df.loc[df['dat_csz_opzione_tech'] == '2001-01-01', 'dat_csz_opzione_tech'] = date
How can I do this?
Try this
import pandas as pd
import time
df = pd.DataFrame({ 'dat_csz_opzione_tech' :['2001-02-01','2001-01-01','2001-03-01','2001-04-01']})
todaysdate = time.strftime("%Y-%m-%d")
df.loc[df['dat_csz_opzione_tech'] == '2001-01-01', 'dat_csz_opzione_tech'] = todaysdate
print df
Output
dat_csz_opzione_tech
0 2001-02-01
1 2017-02-14
2 2001-03-01
3 2001-04-01
Related
I have a dataframe like so:
CREATED_AT COUNT
'1990-01-01' '2022-01-01 07:30:00' 5
'1990-01-02' '2022-01-01 07:30:00' 10
...
Where the index is a date and the CREATED_AT column is a datetime that is the same value for all rows.
How can I update the CREATED_AT_COLUMN such that it inherits its date portion from the index?
The result should look like:
CREATED_AT COUNT
'1990-01-01' '1990-01-01 07:30:00' 5
'1990-01-02' '1990-01-02 07:30:00' 10
...
Attempts at this result in errors like:
cannot add DatetimeArray and DatetimeArray
You can use df.reset_index() to use the index as a column and then do a simple maniuplation to get the output you want like this:
# Creating a test df
import pandas as pd
from datetime import datetime, timedelta, date
df = pd.DataFrame.from_dict({
"CREATED_AT": [datetime.now(), datetime.now() + timedelta(hours=1)],
"COUNT": [5, 10]
})
df_with_index = df.set_index(pd.Index([date.today() - timedelta(days=10), date.today() - timedelta(days=9)]))
# Creating the column with the result
df_result = df_with_index.reset_index()
df_result["NEW_CREATED_AT"] = pd.to_datetime(df_result["index"].astype(str) + ' ' + df_result["CREATED_AT"].dt.time.astype(str))
Result:
index CREATED_AT COUNT NEW_CREATED_AT
0 2022-11-11 2022-11-21 16:15:31.520960 5 2022-11-11 16:15:31.520960
1 2022-11-12 2022-11-21 17:15:31.520965 10 2022-11-12 17:15:31.520965
You can use:
# ensure CREATED_AT is a datetime
s = pd.to_datetime(df['CREATED_AT'])
# subtract the date to only get the time, add to the index
# ensuring the index is of datetime type
df['CREATED_AT'] = s.sub(s.dt.normalize()).add(pd.to_datetime(df.index))
If everything is already of datetime type, this simplifies to:
df['CREATED_AT'] = (df['CREATED_AT']
.sub(df['CREATED_AT'].dt.normalize())
.add(df.index)
)
Output:
CREATED_AT COUNT
1990-01-01 1990-01-01 07:30:00 5
1990-01-02 1990-01-02 07:30:00 10
How would I add two df columns together (date + weeks):
This works for me:
df['Date'] = pd.to_datetime(startDate, format='%Y-%m-%d') + datetime.timedelta(weeks = 3)
But when I try to add weeks from a column, I get a type error: unsupported type for timedelta weeks component: Series
df['Date'] = pd.to_datetime(startDate, format='%Y-%m-%d') + datetime.timedelta(weeks = df['Duration (weeks)'])
Would appreciate any help thank you!
You can use the pandas to_timelta function to transform the number of weeks column to a timedelta, like this:
import pandas as pd
import numpy as np
# create a DataFrame with a `date` column
df = pd.DataFrame(
pd.date_range(start='1/1/2018', end='1/08/2018'),
columns=["date"]
)
# add a column `weeks` with a random number of weeks
df['weeks'] = np.random.randint(1, 6, df.shape[0])
# use `pd.to_timedelta` to transform the number of weeks column to a timedelta
# and add it to the `date` column
df["new_date"] = df["date"] + pd.to_timedelta(df["weeks"], unit="W")
df.head()
date weeks new_date
0 2018-01-01 5 2018-02-05
1 2018-01-02 2 2018-01-16
2 2018-01-03 2 2018-01-17
3 2018-01-04 4 2018-02-01
4 2018-01-05 3 2018-01-26
I want to add a column called 'Date' which starts from todays date and adds business days as you go down the df up until a year. I am trying the below code but it repeats days as its adding a BD to Friday and Saturdays. The output should have row 1 = 2021-10-07 and end with 2022-10-08 with only BD being shown. Can anyone help please?
import datetime as dt
from pandas.tseries.offsets import BDay
from datetime import date
df = pd.DataFrame({'Date': pd.date_range(start=date.today(), end=date.today() + dt.timedelta(days=365))})
df['Date'] = df['Date'] + BDay(1)
It is unclear what your desired output is, but if you want a column 'Date' that only shows the dates for business days, you can use the code below.
import datetime as dt
import pandas as pd
from datetime import date
df = pd.DataFrame({'Date': pd.date_range(start=date.today(), end=date.today() + dt.timedelta(days=365))})
df = df[df.Date.dt.weekday < 5] # 0 is Monday, # 6 is Sunday
So I have a dataframe
https://docs.google.com/spreadsheets/d/19ssG8bvkZKVDR6V5yU9fZVRJbJNfTTEYmWqLwmDwBa0/edit#gid=0
This is the out put that my code gives.
Here is the code:
from yahoofinancials import YahooFinancials
import pandas as pd
import datetime as datetime
df = pd.read_excel('C:/Users/User/Downloads/Div Tickers.xlsx', sheet_name='Sheet1')
tickers_list = df['Ticker'].tolist()
data = pd.DataFrame(columns=tickers_list)
yahoo_financials_ecommerce = YahooFinancials(data)
ecommerce_income_statement_data = yahoo_financials_ecommerce.get_financial_stmts('annual', 'income')
data = ecommerce_income_statement_data['incomeStatementHistory']
df_dict = dict()
for ticker in tickers_list:
df_dict[ticker] = pd.concat([pd.DataFrame(data[ticker][x]) for x in range(len(data[ticker]))],
sort=False, join='outer', axis=1)
df = pd.concat(df_dict, sort=True)
df_l = pd.DataFrame(df.stack())
df_l.reset_index(inplace=True)
df_l.columns = ['ticker', 'financials', 'date', 'value']
df_w = df_l.pivot_table(index=['date.year', 'financials'], columns='ticker', values='value')
export_excel = df_w.to_excel(r'C:/Users/User/Downloads/Income Statement Histories.xlsx', sheet_name="Sheet1", index= True)
How would I go about condensing the months into years so that the data is comparable Year-over-Year?
IIUC, you need to melt, then use groupby on your date column to group by year.
#df['date'] = pd.to_datetime(df['date'])
df = pd.melt(df,id_vars=['date','financials'],var_name='ticker')
df.groupby([df['date'].dt.year,df['financials'],df['ticker']])['value'].sum().unstack()
ticker AEM AGI ALB \
date financials
2016 costOfRevenue 1.030000e+09 309000000.0 1.710000e+09
discontinuedOperations 0.000000e+00 0.0 2.020000e+08
ebit 3.360000e+08 21300000.0 5.370000e+08
grossProfit 1.110000e+09 173000000.0 9.700000e+08
incomeBeforeTax 2.680000e+08 -7600000.0 5.750000e+08
... ... ... ...
2019 researchDevelopment 0.000000e+00 0.0 5.828700e+07
sellingGeneralAdministrative 1.210000e+08 19800000.0 4.390000e+08
totalOperatingExpenses 1.650000e+09 557000000.0 2.830000e+09
totalOtherIncomeExpenseNet -1.000000e+08 2900000.0 -6.900000e+07
totalRevenue 2.490000e+09 683000000.0 3.590000e+09
Not sure since you didn't give us any data, but you can change a datetime column to year with the following code. The first bit is just generating some smaple data:
from datetime import datetime, timedelta
from random import randint
df = pd.DataFrame({
'dates': [datetime.today() - timedelta(randint(0, 1000)) for _ in range(50)]
})
print(df.head())
dates
0 2019-09-02 21:01:46.702300
1 2019-11-03 21:01:46.702329
2 2019-04-01 21:01:46.702338
3 2019-03-04 21:01:46.702345
4 2019-03-28 21:01:46.702351
The part that matters
df.dates.dt.to_period('Y')
0 2018
1 2018
2 2019
3 2018
4 2019
5 2020
I am working on a project and I am trying to calculate the number of business days within a month. What I currently did was extract all of the unique months from one dataframe into a different dataframe and created a second column with
df2['Signin Date Shifted'] = df2['Signin Date'] + pd.DateOffset(months=1)
Thus the current dataframe looks like:
I know I can do dt.daysinmonth or a timedelta but that gives me all of the days within a month including Sundays/Saturdays (which I don't want).
Using busday_count from np
Ex:
import pandas as pd
import numpy as np
df = pd.DataFrame({"Signin Date": ["2018-01-01", "2018-02-01"]})
df["Signin Date"] = pd.to_datetime(df["Signin Date"])
df['Signin Date Shifted'] = pd.DatetimeIndex(df['Signin Date']) + pd.DateOffset(months=1)
df["bussDays"] = np.busday_count( df["Signin Date"].values.astype('datetime64[D]'), df['Signin Date Shifted'].values.astype('datetime64[D]'))
print(df)
Output:
Signin Date Signin Date Shifted bussDays
0 2018-01-01 2018-02-01 23
1 2018-02-01 2018-03-01 20
MoreInfo