I have trouble setting the correct datime format with Pandas, I do not understand why my command does not work. Any solution?
date = ['01/10/2014 00:03:20']
value = [33.24]
df = pd.DataFrame({'value':value,'index':date})
df.index = pd.to_datetime(df.index,format='%d/%m/%y %H:%M:%S')
Solution for DatetimeIndex:
date = ['01/10/2014 00:03:20']
value = [33.24]
#create index by date list
df = pd.DataFrame({'value':value},index=date)
#use Y for match YYYY, y is for match YY years format
df.index = pd.to_datetime(df.index,format='%d/%m/%Y %H:%M:%S')
print (df)
value
2014-10-01 00:03:20 33.24
If want index column name is necessary use [] for avoid selecting RangeIndex:
df = pd.DataFrame({'value':value,'index':date})
df['index'] = pd.to_datetime(df['index'],format='%d/%m/%Y %H:%M:%S')
print (df)
value index
0 33.24 2014-10-01 00:03:20
Calling a column 'index' is a bit confusing, changed it to 'index_date'.
import pandas as pd
date = ['01/10/2014 00:03:20']
value = [33.24]
df = pd.DataFrame({'value':value,'index_date':date})
df['index_date'] = pd.to_datetime(df["index_date"], errors="coerce")
Output of df:
value index_date
0 33.24 2014-01-10 00:03:20
And if you run df.dtypes
value float64
index_date datetime64[ns]
Related
import yfinance as yf
import numpy as np
import pandas as pd
ETF_DB = ['QQQ', 'EGFIX']
fundsret = yf.download(ETF_DB, start=datetime.date(2020,12,31), end=datetime.date(2022,4,30), interval='1mo')['Adj Close'].pct_change()
df = pd.DataFrame(fundsret)
df
Gives me:
I'm trying to remove the rows in the dataframe that aren't month end such as the row 2021-03-22. How do I have the dataframe go through and remove the rows where the date doesn't end in '01'?
df.reset_index(inplace=True)
# Convert the date to datetime64
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
#select only day = 1
filtered = df.loc[df['Date'].dt.day == 1]
Did you mean month start?
You can use:
df = df[df.index.day==1]
reproducible example:
df = pd.DataFrame(columns=['A', 'B'],
index=['2021-01-01', '2021-02-01', '2021-03-01',
'2021-03-22', '2021-03-31'])
df.index = pd.to_datetime(df.index, dayfirst=False)
output:
A B
2021-01-01 NaN NaN
2021-02-01 NaN NaN
2021-03-01 NaN NaN
end of month
for the end of month, you can add 1 day and check if this jumps to the next month:
end = (df.index+pd.Timedelta('1d')).month != df.index.month
df = df[end]
or add an offset and check if the value is unchanged:
end = df.index == (df.index + pd.offsets.MonthEnd(0))
df = df[end]
output:
A B
2021-03-31 NaN NaN
import pandas as pd
import re
# Dummy Dictionary
dict={
'Date': ['2021-01-01','2022-03-01','2023-04-22','2023-04-01'],
'Name' : ['A','B','C','D']
}
# Making a DataFrame
df=pd.DataFrame(dict)
# Date Pattern Required
pattern= '(\d{4})-(\d{2})-01'
new_df=df[df['Date'].str.match(r'((\d{4})-(\d{2})-01)')]
print(new_df)
My data frame contains a IGN_DATE column in which the values are of the form 20080727142700, format is YYYYMMDDHHMMSS.
The column type is float64.
How can I get the a separate column for time, date (without 00:00:00), day, month.
What I tried:
Column name IGN_DATE
dataframe - df
df['IGN_DATE'] = df['IGN_DATE'].apply(str)
df['DATE'] = pd.to_datetime(df['IGN_DATE'].str.slice(start = 0, stop = 8))
df['MONTH'] = df['IGN_DATE'].str.slice(start = 4, stop = 6).astype(int)
df['DAY'] = df['IGN_DATE'].str.slice(start = 6, stop = 8).astype(int)
df['TIME'] = df['IGN_DATE'].str.slice(start = 8, stop = 13)
DATE is in the format YYYY-MM-DD 00:00:00. I don't want 00:00:00 in DATE.
How to get the time--which has type string--to HH:MM:SS ?
Is there any simpler way to do this?
If nan values are not important can dropna then convert to_datetime with a specified format then use the dt accessor to access desired values:
# Drop Rows with nan in IGN_DATE column
df = df.dropna(subset=['IGN_DATE'])
# Convert dtype to whole number then to `str`
df['IGN_DATE'] = df['IGN_DATE'].astype('int64').astype(str)
# Series of datetime values from Column
s = pd.to_datetime(df['IGN_DATE'], format='%Y%m%d%H%M%S')
# Extract out and add to DataFrame from `s`
df['DATE'] = s.dt.date
df['MONTH'] = s.dt.month
df['DAY'] = s.dt.day
df['TIME'] = s.dt.time
Otherwise can mask notna values from IGN_DATE and assign only those rows:
# Mask not null values
m = df['IGN_DATE'].notna()
# Convert to String
df.loc[m, 'IGN_DATE'] = df.loc[m, 'IGN_DATE'].astype('int64').astype(str)
# Series of datetime values from Column
s = pd.to_datetime(df['IGN_DATE'], format='%Y%m%d%H%M%S')
# Extract out and add to DataFrame from `s`
df.loc[m, 'DATE'] = s.dt.date
df.loc[m, 'MONTH'] = s.dt.month
df.loc[m, 'DAY'] = s.dt.day
df.loc[m, 'TIME'] = s.dt.time
Sample DF:
import numpy as np
import pandas as pd
df = pd.DataFrame({'IGN_DATE': [20080727142700, np.nan, 20151015171807]})
Sample Output with dropna:
IGN_DATE DATE MONTH DAY TIME
0 20080727142700 2008-07-27 7 27 14:27:00
2 20151015171807 2015-10-15 10 15 17:18:07
Sample Output with mask:
IGN_DATE DATE MONTH DAY TIME
0 20080727142700 2008-07-27 7.0 27.0 14:27:00
1 NaN NaN NaN NaN NaN
2 20151015171807 2015-10-15 10.0 15.0 17:18:07
df = pd.DataFrame('23.Jan.2020 01.Mar.2017 5663:33 20.May.2021 626'.split())
I want to convert to date-like elements to datetime and for numbers, to return the original value.
I have tried
t=pd.to_datetime(df[0], format='%d.%b.%Y', errors='ignore')
which just returns to original df with no change. And I have tried to change errors to 'coerce', which does the conversion for date like elements, but numbers are dropped
t=pd.to_datetime(df[0], format='%d.%b.%Y', errors='coerce')
Then I attempt to return the original df value if NaT, else substitute with the new datetime from t
df.where(t.isnull(), other=t, axis=1)
Which works for returning the original df value where NaT, but it doesn't transfer the datetime
Maybe this is what you want?
dt = pd.Series('23.Jan.2020 01.Mar.2017 5663:33 20.May.2021 626'.split())
res = pd.to_datetime(dt, format="%d.%b.%Y", errors='coerce').fillna(dt)
This way the resulting elements in the series has the correct types:
>>> res.map(type)
0 <class 'pandas._libs.tslibs.timestamps.Timesta...
1 <class 'pandas._libs.tslibs.timestamps.Timesta...
2 <class 'str'>
3 <class 'pandas._libs.tslibs.timestamps.Timesta...
4 <class 'str'>
dtype: object
PS: I used a Series because it's easier to pass to to_datetime, and to Series.fillna.
this will combine the two field types in the way you have specified:
import pandas as pd
df = pd.DataFrame('23.Jan.2020 01.Mar.2017 5663:33 20.May.2021 626'.split())
mod = pd.to_datetime(df[0], format='%d.%b.%Y', errors='coerce')
ndf = pd.concat([df, mod], axis=1)
ndf.columns = ['original', 'modified']
def funk(col1,col2):
return col1 if pd.isnull(col2) else col2
ndf.apply(lambda x: funk(x.original,x.modified), axis=1)
# 0 2020-01-23 00:00:00
# 1 2017-03-01 00:00:00
# 2 5663:33
# 3 2021-05-20 00:00:00
# 4 626
I would like to put the dates in a variable in order to pass them via django to charts.js.
Now I have the problem, that I cannot access the dates, since they are apparently in the second row.
print df['Open'] or print df['High'] works fpr example, but print df['Date'] doesn't work.
Can you guys tell me how I can restructure the df in a way that I can print the dates as well?
Thanks a lot for your help and kind regards.
Dates are not accessable
First column is called index, so for select need:
print (df.index)
dates = df.index
Or add DataFrame.reset_index for new column from values of index:
df = df.reset_index()
dates = df['Date']
Sample:
df = pd.DataFrame({'Open':[1,2,3], 'High':[8,9,2]},
index=pd.date_range('2015-01-01', periods=3))
df.index.name = 'Date'
print (df)
High Open
Date
2015-01-01 8 1
2015-01-02 9 2
2015-01-03 2 3
print (df.index)
DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03'],
dtype='datetime64[ns]', name='Date', freq='D')
df = df.reset_index()
print (df['Date'])
0 2015-01-01
1 2015-01-02
2 2015-01-03
Name: Date, dtype: datetime64[ns]
df.reset_index(inplace=True)
print (df['Date'])
0 2015-01-01
1 2015-01-02
2 2015-01-03
Name: Date, dtype: datetime64[ns]
I have a data frame that has DatetimeIndex. I would like to create an input, the user will write the date, then python will look up the first passed month.
Here's an example: df is the name of the dataframe
date = input('Enter a date in YYYY-MM-DD format: ')
Enter a date in YYYY-MM-DD format: 2017-01-31
I would like that python will do df[date-1] and then print the result so that I get:
2016-12-31 8.257478e+04
It's possible if the input date is in the index already, but I'm looking find a way when the input is not.
Any ideas ? Thanks in advance
It seems you need get_loc for position of value in index and then iloc for selecting:
pos = df.index.get_loc(d)
print (df.iloc[[pos - 1]])
Sample:
start = pd.to_datetime('2016-11-30')
rng = pd.date_range(start, periods=10, freq='M')
df = pd.DataFrame({'a': range(10)}, index=rng)
print (df)
a
2016-11-30 0
2016-12-31 1
2017-01-31 2
2017-02-28 3
2017-03-31 4
2017-04-30 5
2017-05-31 6
2017-06-30 7
2017-07-31 8
2017-08-31 9
d = '2017-01-31'
pos = df.index.get_loc(d)
print (df.iloc[[pos - 1]])
a
2016-12-31 1
If date is not in index add method='nearest':
d = '2017-01-20'
pos = df.index.get_loc(d, method='nearest')
print (df.iloc[[pos - 1]])
a
2016-12-31 1
But if need more general solution you have to use some conditions like:
d = '2017-11-30'
pos = df.index.get_loc(d, method='nearest')
if pos == 0:
print ('Value less or same as minimal date in DataTimeIndex')
else:
print ('Value nearest less or same as date', df.index[pos])
print ('Previous value', df.iloc[[pos - 1]])