I have a data frame that has DatetimeIndex. I would like to create an input, the user will write the date, then python will look up the first passed month.
Here's an example: df is the name of the dataframe
date = input('Enter a date in YYYY-MM-DD format: ')
Enter a date in YYYY-MM-DD format: 2017-01-31
I would like that python will do df[date-1] and then print the result so that I get:
2016-12-31 8.257478e+04
It's possible if the input date is in the index already, but I'm looking find a way when the input is not.
Any ideas ? Thanks in advance
It seems you need get_loc for position of value in index and then iloc for selecting:
pos = df.index.get_loc(d)
print (df.iloc[[pos - 1]])
Sample:
start = pd.to_datetime('2016-11-30')
rng = pd.date_range(start, periods=10, freq='M')
df = pd.DataFrame({'a': range(10)}, index=rng)
print (df)
a
2016-11-30 0
2016-12-31 1
2017-01-31 2
2017-02-28 3
2017-03-31 4
2017-04-30 5
2017-05-31 6
2017-06-30 7
2017-07-31 8
2017-08-31 9
d = '2017-01-31'
pos = df.index.get_loc(d)
print (df.iloc[[pos - 1]])
a
2016-12-31 1
If date is not in index add method='nearest':
d = '2017-01-20'
pos = df.index.get_loc(d, method='nearest')
print (df.iloc[[pos - 1]])
a
2016-12-31 1
But if need more general solution you have to use some conditions like:
d = '2017-11-30'
pos = df.index.get_loc(d, method='nearest')
if pos == 0:
print ('Value less or same as minimal date in DataTimeIndex')
else:
print ('Value nearest less or same as date', df.index[pos])
print ('Previous value', df.iloc[[pos - 1]])
Related
My data frame has 6 columns of dates which i want them to in 1 column
DATA FRAME IMAGE HERE
Code to make another column is as below
df['Mega'] = df['Mega'].append(df['RsWeeks','RsMonths','RsDays','PsWeeks','PsMonths','PsDays'])
i am new to python and pandas i would like to learn more so please point me sources too as i am really bad with debugging as i have no programming background.
Pandas documentation is a great source for good examples. Click here to visit a page with a lot of examples and visuals.
For your particular case:
We construct a sample DataFrame:
import pandas as pd
df = pd.DataFrame([
{"RsWeeks": "2015-11-10", "RsMonths": "2016-08-01"},
{"RsWeeks": "2015-11-11", "RsMonths": "2015-12-30"}
])
print("DataFrame preview:")
print(df)
Output:
DataFrame preview:
RsWeeks RsMonths
0 2015-11-10 2016-08-01
1 2015-11-11 2015-12-30
We concatenate the columns RsWeeks and RsMonths to create a Series:
my_series = pd.concat([df["RsWeeks"], df["RsMonths"]], ignore_index=True)
print("\nSeries preview:")
print(my_series)
Output:
Series preview:
0 2015-11-10
1 2015-11-11
2 2016-08-01
3 2015-12-30
Edit
If you really need to add the new Series as a column to your DataFrame, you can do the following:
df2 = pd.DataFrame({"Mega": my_series})
df = pd.concat([df, df2], axis=1)
print("\nDataFrame preview:")
print(df)
Output:
DataFrame preview:
RsWeeks RsMonths Mega
0 2015-11-10 2016-08-01 2015-11-10
1 2015-11-11 2015-12-30 2015-11-11
2 NaN NaN 2016-08-01
3 NaN NaN 2015-12-30
Data:
df = pd.DataFrame({"name" : 'Dav Las Oms'.split(),
'age' : [25, 50, 70]})
df['Name'] = list(['a', 'M', 'm'])
df:
name age Name
0 Dav 25 a
1 Las 50 M
2 Oms 70 m
df = pd.DataFrame(df.astype(str).apply('|'.join, axis=1))
df:
0
0 Dav|25|a
1 Las|50|M
2 Oms|70|m
You can use pd.melt() which makes your dataframe from wide to long:
df_reshaped = pd.melt(df, id_vars = ['id_1','id_2','id_3'], var_name = 'new_name', value_name = 'Mega')
There is a huge dataframe containing multiple data types in different columns. I want to find rows that contain date values in different columns.
Here a test dataframe:
dt = pd.Series(['abc', datetime.now(), 12, '', None, np.nan, '2020-05-05'])
dt1 = pd.Series([3, datetime.now(), 'sam', '', np.nan, 'abc-123', '2020-05-25'])
dt3 = pd.Series([1,2,3,4,5,6,7])
df = pd.DataFrame({"A":dt.values, "B":dt1.values, "C":dt3.values})
Now, I want to create a new dataframe that contains only dates in both columns A and B, here rows 2nd and last.
Expected output:
A B C
1 2020-06-01 16:58:17.274311 2020-06-01 17:13:20.391394 2
6 2020-05-05 2020-05-25 7
What is the best way to do that? Thanks.
P.S.> Dates can be in any standard format.
Use:
m = df[['A', 'B']].transform(pd.to_datetime, errors='coerce').isna().any(axis=1)
df = df[~m]
Result:
# print(df)
A B C
1 2020-06-01 17:54:16.377722 2020-06-01 17:54:16.378432 2
6 2020-05-05 2020-05-25 7
Solution for test only A,B columns is boolean indexing with DataFrame.notna and DataFrame.all for not match any non datetimes:
df = df[df[['A','B']].apply(pd.to_datetime, errors='coerce').notna().all(axis=1)]
print (df)
A B C
1 2020-06-01 16:14:35.020855 2020-06-01 16:14:35.021855 2
6 2020-05-05 2020-05-25 7
import pandas as pd
from datetime import datetime
dt = pd.Series(['abc', datetime.now(), 12, '', None, np.nan, '2020-05-05'])
dt1 = pd.Series([3, datetime.now(), 'sam', '', np.nan, 'abc-123', '2020-05-25'])
dt3 = pd.Series([1,2,3,4,5,6,7])
df = pd.DataFrame({"A":dt.values, "B":dt1.values, "C":dt3.values})
m = pd.concat([pd.to_datetime(df['A'], errors='coerce'),
pd.to_datetime(df['B'], errors='coerce')], axis=1).isna().all(axis=1)
print(df[~m])
Prints:
A B C
1 2020-06-01 12:17:51.320286 2020-06-01 12:17:51.320826 2
6 2020-05-05 2020-05-25 7
I have trouble setting the correct datime format with Pandas, I do not understand why my command does not work. Any solution?
date = ['01/10/2014 00:03:20']
value = [33.24]
df = pd.DataFrame({'value':value,'index':date})
df.index = pd.to_datetime(df.index,format='%d/%m/%y %H:%M:%S')
Solution for DatetimeIndex:
date = ['01/10/2014 00:03:20']
value = [33.24]
#create index by date list
df = pd.DataFrame({'value':value},index=date)
#use Y for match YYYY, y is for match YY years format
df.index = pd.to_datetime(df.index,format='%d/%m/%Y %H:%M:%S')
print (df)
value
2014-10-01 00:03:20 33.24
If want index column name is necessary use [] for avoid selecting RangeIndex:
df = pd.DataFrame({'value':value,'index':date})
df['index'] = pd.to_datetime(df['index'],format='%d/%m/%Y %H:%M:%S')
print (df)
value index
0 33.24 2014-10-01 00:03:20
Calling a column 'index' is a bit confusing, changed it to 'index_date'.
import pandas as pd
date = ['01/10/2014 00:03:20']
value = [33.24]
df = pd.DataFrame({'value':value,'index_date':date})
df['index_date'] = pd.to_datetime(df["index_date"], errors="coerce")
Output of df:
value index_date
0 33.24 2014-01-10 00:03:20
And if you run df.dtypes
value float64
index_date datetime64[ns]
I would like to put the dates in a variable in order to pass them via django to charts.js.
Now I have the problem, that I cannot access the dates, since they are apparently in the second row.
print df['Open'] or print df['High'] works fpr example, but print df['Date'] doesn't work.
Can you guys tell me how I can restructure the df in a way that I can print the dates as well?
Thanks a lot for your help and kind regards.
Dates are not accessable
First column is called index, so for select need:
print (df.index)
dates = df.index
Or add DataFrame.reset_index for new column from values of index:
df = df.reset_index()
dates = df['Date']
Sample:
df = pd.DataFrame({'Open':[1,2,3], 'High':[8,9,2]},
index=pd.date_range('2015-01-01', periods=3))
df.index.name = 'Date'
print (df)
High Open
Date
2015-01-01 8 1
2015-01-02 9 2
2015-01-03 2 3
print (df.index)
DatetimeIndex(['2015-01-01', '2015-01-02', '2015-01-03'],
dtype='datetime64[ns]', name='Date', freq='D')
df = df.reset_index()
print (df['Date'])
0 2015-01-01
1 2015-01-02
2 2015-01-03
Name: Date, dtype: datetime64[ns]
df.reset_index(inplace=True)
print (df['Date'])
0 2015-01-01
1 2015-01-02
2 2015-01-03
Name: Date, dtype: datetime64[ns]
Is there a better way than bdate_range() to measure business days between two columns of dates via pandas?
df = pd.DataFrame({ 'A' : ['1/1/2013', '2/2/2013', '3/3/2013'],
'B': ['1/12/2013', '4/4/2013', '3/3/2013']})
print df
df['A'] = pd.to_datetime(df['A'])
df['B'] = pd.to_datetime(df['B'])
f = lambda x: len(pd.bdate_range(x['A'], x['B']))
df['DIFF'] = df.apply(f, axis=1)
print df
With output of:
A B
0 1/1/2013 1/12/2013
1 2/2/2013 4/4/2013
2 3/3/2013 3/3/2013
A B DIFF
0 2013-01-01 00:00:00 2013-01-12 00:00:00 9
1 2013-02-02 00:00:00 2013-04-04 00:00:00 44
2 2013-03-03 00:00:00 2013-03-03 00:00:00 0
Thanks!
brian_the_bungler was onto the most efficient way of doing this using numpy's busday_count:
import numpy as np
A = [d.date() for d in df['A']]
B = [d.date() for d in df['B']]
df['DIFF'] = np.busday_count(A, B)
print df
On my machine this is 300x faster on your test case, and 1000s of times faster on much larger arrays of dates
You can use pandas' Bday offset to step through business days between two dates like this:
new_column = some_date - pd.tseries.offsets.Bday(15)
Read more in this conversation: https://stackoverflow.com/a/44288696
It also works if some_date is a single date value, not a series.