How can I sort a DataFrame by date ddMMMyyyy? - python

I have a sample dataframe as follows.
How can I sort it by the index by year instead of by month?
test=DataFrame([1,2,3],index=['28FEB1993','28FEB1994','30MAR1993'],columns=['value'])
I would like to have the following dataFrame as the result
test=DataFrame([1,2,3],index=['28FEB1993','30MAR1993','28FEB1994'],columns=['value'])
I think I stuck at how to parse ddMMMyyyy data format to a datetime object.
Thanks aton!

You can use strptime:
from datetime import datetime
test.index = np.array([datetime.strptime(s, "%d%b%Y") for s in test.index.values])
test.sort_index()
# value
# 1993-02-28 1
# 1993-03-30 3
# 1994-02-28 2
Or as suggested by #chrisb:
test.index = pd.to_datetime(test.index, format="%d%b%Y")

Related

How can I add a zero to dates in a string so all months are 2 characters? [duplicate]

Using a Python script, I need to read a CVS file where dates are formated as DD/MM/YYYY, and convert them to YYYY-MM-DD before saving this into a SQLite database.
This almost works, but fails because I don't provide time:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%Y-%m-%d")
#ValueError: time data did not match format: data=21/12/2008 fmt=%Y-%m-%d
print lastconnection
I assume there's a method in the datetime object to perform this conversion very easily, but I can't find an example of how to do it. Thank you.
Your example code is wrong. This works:
import datetime
datetime.datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
The call to strptime() parses the first argument according to the format specified in the second, so those two need to match. Then you can call strftime() to format the result into the desired final format.
you first would need to convert string into datetime tuple, and then convert that datetime tuple to string, it would go like this:
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime('%Y-%m-%d')
I am new to programming. I wanted to convert from yyyy-mm-dd to dd/mm/yyyy to print out a date in the format that people in my part of the world use and recognise.
The accepted answer above got me on the right track.
The answer I ended up with to my problem is:
import datetime
today_date = datetime.date.today()
print(today_date)
new_today_date = today_date.strftime("%d/%m/%Y")
print (new_today_date)
The first two lines after the import statement gives today's date in the USA format (2017-01-26). The last two lines convert this to the format recognised in the UK and other countries (26/01/2017).
You can shorten this code, but I left it as is because it is helpful to me as a beginner. I hope this helps other beginner programmers starting out!
Does anyone else else think it's a waste to convert these strings to date/time objects for what is, in the end, a simple text transformation? If you're certain the incoming dates will be valid, you can just use:
>>> ddmmyyyy = "21/12/2008"
>>> yyyymmdd = ddmmyyyy[6:] + "-" + ddmmyyyy[3:5] + "-" + ddmmyyyy[:2]
>>> yyyymmdd
'2008-12-21'
This will almost certainly be faster than the conversion to and from a date.
#case_date= 03/31/2020
#Above is the value stored in case_date in format(mm/dd/yyyy )
demo=case_date.split("/")
new_case_date = demo[1]+"-"+demo[0]+"-"+demo[2]
#new format of date is (dd/mm/yyyy) test by printing it
print(new_case_date)
If you need to convert an entire column (from pandas DataFrame), first convert it (pandas Series) to the datetime format using to_datetime and then use .dt.strftime:
def conv_dates_series(df, col, old_date_format, new_date_format):
df[col] = pd.to_datetime(df[col], format=old_date_format).dt.strftime(new_date_format)
return df
Sample usage:
import pandas as pd
test_df = pd.DataFrame({"Dates": ["1900-01-01", "1999-12-31"]})
old_date_format='%Y-%m-%d'
new_date_format='%d/%m/%Y'
conv_dates_series(test_df, "Dates", old_date_format, new_date_format)
Dates
0 01/01/1900
1 31/12/1999
The most simplest way
While reading the csv file, put an argument parse_dates
df = pd.read_csv("sample.csv", parse_dates=['column_name'])
This will convert the dates of mentioned column to YYYY-MM-DD format
Convert date format DD/MM/YYYY to YYYY-MM-DD according to your question, you can use this:
from datetime import datetime
lastconnection = datetime.strptime("21/12/2008", "%d/%m/%Y").strftime("%Y-%m-%d")
print(lastconnection)
df is your data frame
Dateclm is the column that you want to change
This column should be in DateTime datatype.
df['Dateclm'] = pd.to_datetime(df['Dateclm'])
df.dtypes
#Here is the solution to change the format of the column
df["Dateclm"] = pd.to_datetime(df["Dateclm"]).dt.strftime('%Y-%m-%d')
print(df)

python pandas converting UTC integer to datetime

I am calling some financial data from an API which is storing the time values as (I think) UTC (example below):
enter image description here
I cannot seem to convert the entire column into a useable date, I can do it for a single value using the following code so I know this works, but I have 1000's of rows with this problem and thought pandas would offer an easier way to update all the values.
from datetime import datetime
tx = int('1645804609719')/1000
print(datetime.utcfromtimestamp(tx).strftime('%Y-%m-%d %H:%M:%S'))
Any help would be greatly appreciated.
Simply use pandas.DataFrame.apply:
df['date'] = df.date.apply(lambda x: datetime.utcfromtimestamp(int(x)/1000).strftime('%Y-%m-%d %H:%M:%S'))
Another way to do it is by using pd.to_datetime as recommended by Panagiotos in the comments:
df['date'] = pd.to_datetime(df['date'],unit='ms')
You can use "to_numeric" to convert the column in integers, "div" to divide it by 1000 and finally a loop to iterate the dataframe column with datetime to get the format you want.
import pandas as pd
import datetime
df = pd.DataFrame({'date': ['1584199972000', '1645804609719'], 'values': [30,40]})
df['date'] = pd.to_numeric(df['date']).div(1000)
for i in range(len(df)):
df.iloc[i,0] = datetime.utcfromtimestamp(df.iloc[i,0]).strftime('%Y-%m-%d %H:%M:%S')
print(df)
Output:
date values
0 2020-03-14 15:32:52 30
1 2022-02-25 15:56:49 40

How to get the time only from timestamps?

I have a column of (created AT) in my DataFrame which has a timestamps like shown below:
Created AT
1) 2021-04-19T09:14:10.526Z
2) 2021-04-19T09:13:06.809Z
3) 2021-04-19T09:13:06.821Z
I want to extract the time only from above column etc . It should show like:
9:14:8 etc
How to extract this ?
If your date column is a string, you need to convert it to datetime and then take a substring of the time:
df = pd.DataFrame(data = {"Created At":["2021-04-19T09:14:10.526Z","2021-04-19T09:14:10.526Z"]})
df['Created At'] = pd.to_datetime(df['Created At'])
df['Created At'] = df['Created At'].dt.time.astype(str).str[:8]
df['time'] = pd.to_datetime(df['Created AT'])
print(df['time'].dt.time)
On the first line, convert the datetime to objects and write in a new column.
On the second, we get the time from datetime objects
I have a solution to your question. It can have multiple solutions but here I am giving some solution here using time, DateTime
you can get the string using
import time
import datetime
s = '2021-04-19T09:14:10.526Z'
t = s.split('T')[1].split('.')[0]
print(t)
and for getting time stamp of it do one more line
print(datetime.datetime.strptime(t,"%H:%M:%S"))
Convert to datetime and use strftime to format exactly as you like it.
data = ['2021-04-19T09:14:10.526Z',
'2021-04-19T09:13:06.809Z',
'2021-04-19T09:13:06.821Z']
df = pd.DataFrame(data=data, columns=['Created AT'])
df['Created AT'] = pd.to_datetime(df['Created AT']).dt.strftime('%H:%M:%S')
print(df)
Created AT
0 09:14:10
1 09:13:06
2 09:13:06
First convert the column to datetime format if not already in that format:
df['Created AT'] = pd.to_datetime(df['Created AT'])
Then, add the new column time with formatting by .dt.strftime() as follows (if you don't want the nano-second part):
df['time'] = df['Created AT'].dt.strftime('%H:%M:%S')
print(df)
Created AT time
0 2021-04-19 09:14:10.526000+00:00 09:14:10
1 2021-04-19 09:13:06.809000+00:00 09:13:06
2 2021-04-19 09:13:06.821000+00:00 09:13:06

Python Dataframe Date plus months variable which comes from the other column

I have a dataframe with the date and month_diff variable. I would like to get a new date (name it as Target_Date) based on the following logic:
For example, the date is 2/13/2019, month_diff is 3, then the target date should be the month-end of the original date plus 3 months, which is 5/31/2019
I tried the following method to get the traget date first:
df["Target_Date"] = df["Date"] + pd.DateOffset(months = df["month_diff"])
But it failed, as I know, the parameter in the dateoffset should be a varaible or a fixed number.
I also tried:
df["Target_Date"] = df["Date"] + relativedelta(months = df["month_diff"])
It failes too.
Anyone can help? thank you.
edit:
this is a large dataset with millions rows.
You could try this
import pandas as pd
from dateutil.relativedelta import relativedelta
df = pd.DataFrame({'Date': [pd.datetime(2019,1,1), pd.datetime(2019,2,1)], 'month_diff': [1,2]})
df.apply(lambda row: row.Date + relativedelta(months=row.month_diff), axis=1)
Or list comprehension
[date + relativedelta(months=month_diff) for date, month_diff in df[['Date', 'month_diff']].values]
I would approach in the following method to compute your "target_date".
Apply the target month offset (in your case +3months), using your pd.DateOffset.
Get the last day of that target month (using for example calendar.monthrange, see also "Get last day of the month"). This will provide you with the "flexible" part of that date" offset.
Apply the flexible day offset, when comparing the result of step 1. and step 2. This could be a new pd.DateOffset.
A solution could look something like this:
import calendar
from dateutil.relativedelta import relativedelta
for ii in df.index:
new_ = df.at[ii, 'start_date'] + relativedelta(months=df.at[ii, 'month_diff'])
max_date = calendar.monthrange(new_.year, new_.month)[1]
end_ = new_ + relativedelta(days=max_date - new_.day)
print(end_)
Further "cleaning" into a function and / or list comprehension will probably make it much faster
import pandas as pd
from datetime import datetime
from datetime import timedelta
This is my approach in solving your issue.
However for some reason I am getting a semantic error in my output even though I am sure it is the correct way. Please everyone correct me if you notice something wrong.
today = datetime.now()
today = today.strftime("%d/%m/%Y")
month_diff =[30,5,7]
n = 30
for i in month_diff:
b = {'Date': today, 'month_diff':month_diff,"Target_Date": datetime.now()+timedelta(days=i*n)}
df = pd.DataFrame(data=b)
Output:
For some reason the i is not getting updated.
I was looking for a solution I can write in one line only and apply does the job. However, by default apply function performs action on each column, so you have to remember to specify correct axis: axis=1.
from datetime import datetime
from dateutil.relativedelta import relativedelta
# Create a new column with date adjusted by number of months from 'month_diff' column and later adjust to the last day of month
df['Target_Date'] = df.apply(lambda row: row.Date # to current date
+ relativedelta(months=row.month_diff) # add month_diff
+ relativedelta(day=+31) # and adjust to the last day of month
, axis=1) # 1 or ‘columns’: apply function to each row.

how to change the data type date object to datetime in python?

In a train data set, datetime column is an object . First row of this column : 2009-06-15 17:26:21 UTC . I tried splitting the data
train['Date'] = train['pickup_datetime'].str.slice(0,11)
train['Time'] = test['pickup_datetime'].str.slice(11,19)
So that I can split the Date and time as two variables and change them to datetime data type. Tried lot of methods but could not get the result.
train['Date']=pd.to_datetime(train['Date'], format='%Y-%b-%d')
Also tried spliting the date,time and UTC
train['DateTime'] = pd.to_datetime(train['DateTime'])
Please suggest a code for this. I am a begginer.
Thanks in advance
I would try the following
import pandas as pd
#create some random dates matching your formatting
df = pd.DataFrame({"date": ["2009-06-15 17:26:21 UTC", "2010-08-16 19:26:21 UTC"]})
#convert to datetime objects
df["date"] = pd.to_datetime(df["date"])
print(df["date"].dt.date) #returns the date part without tz information
print(df["date"].dt.time) #returns the time part
Output:
0 2009-06-15
1 2010-08-16
Name: date, dtype: object
0 17:26:21
1 19:26:21
Name: date, dtype: object
For further information feel free to consult the docs:
dt.date
dt.time
For your particular case:
#convert to datetime object
df['pickup_datetime']= pd.to_datetime(df['pickup_datetime'])
# seperate date and time
df['Date'] = df['pickup_datetime'].dt.date
df['Time'] = df['pickup_datetime'].dt.time

Categories

Resources