Converting a series of DateTime values to the proper format

Converting a series of DateTime values to the proper format - python

Currently, I have a series of Datetime Values that display as so
0 Datetime
1 20041001
2 20041002
3 20041003
4 20041004
they are within a series named
d['Datetime']
They were originally something like
20041001ABCDEF
But I split the end off just to leave them with the remaining numbers. How do I go about putting them into the following format?
2004-10-01

You can do the following,
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y%m%d'))

Related

how to convert date in dd/mm/yyyy to days in python csv

i will like to covert dates to days
Example 01/01/2001 should be Day 1, 02/01/2001 should be Day 2
I have tried
prices_df['01/01/2001'] = prices_df['days'].dt.days

Without using any external function or a complicated solution, you can simply take the first 2 chars of the string.
int('01/01/2001'[0:2])
Output:
1
If you have to do this in a pandas column:
import pandas as pd
pd.to_numeric(df['days'].str[0:2])
N.B. This works if all date are in the form day/month/year

How to convert all data in column to datetime - pandas

I have a large dataframe that, in its date column, has a mixture of date formats (only 2).
Most are in the correct format but there is some data that is in a different format.
i.e. most are 2013-11-07. Some are 20170510. Pandas throws an exception when i try to validate the code against a schema i have.
Is there a quick way to convert all dates to have the same format as the majority? Or do i have to do something more painful/manual?
i.e.
date \
0 2013-11-07 False
2 2013-11-07 False
... ... ... ... ... ...
3595037 20170510 NaN
3595038 20200701 NaN

Is there a quick way to convert all dates to have the same format as the majority?
Considering that you have only two formats, one represented by 2013-11-07 and another by 20170510 it is enough to remove - from first to get common format, i.e.
import pandas as pd
df = pd.DataFrame({'day':['2013-11-07','20170510']})
df['day'] = df['day'].str.replace('-','')
print(df)
output
day
0 20131107
1 20170510
pandas.to_datetime does understand it correctly
df['day'] = pd.to_datetime(df['day'])
print(df)
output
day
0 2013-11-07
1 2017-05-10
Disclaimer: I converted to format of minority not majority. It is possible to convert that to format of majority using regular expression, however if you are interested in datetime objects, this is unnecessary complication.

How to convert string/float to time in python/pandas?

I have a dataset which stores durations like 3 hours and 7 minutes in the format of, 3.11 as a string.
I want to convert the column containing these values into datetime in a way that I get: 03:07.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'])
I get: 1970-01-01 00:00:00.000000003 which is obviousely not what I want.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'], format='%H:%M')
I get the following error: ValueError: time data '3' does not match format '%H:%M' (match)
Any help is highly appreciated

You want to convert this values to timedelta instead of datetime. Thus you should use the pd.to_timedelta method, like:
pd.to_timedelta(df["ConnectedDuration"].astype('float'), unit='h')

Filtering out improperly formatted datetime values in Python DataFrame

I have a DataFrame with one column storing the date.
However, some of these dates are properly formatted datetime objects like'2018-12-24 17:00:00'while others are not and are stored like '20181225'.
When I tried to plot these using plotly, the improperly formatted values got turned into EPOCH dates, which is a problem.
Is there any way I can get a copy of the DataFrame with only those rows with properly formatted dates?
I tried using
clean_dict= dailySum_df.where(dailySum_df[isinstance(dailySum_df['time'],datetime.datetime)])
methods and but it doesn't to work due to the 'Array conditional must be same shape as self' error.
dailySum_df = pd.DataFrame(list(cursors['dailySum']))
trace = go.Scatter(
x=dailySum_df['time'],
y=dailySum_df['countMessageIn']
)
data = [trace]
py.plot(data, filename='basic-line')

Apply dateutil.parser, see also my answer here:
import dateutil.parser as dparser
def myparser(x):
try:
return dparser.parse(x)
except:
return None
df = pd.DataFrame( {'time': ['2018-12-24 17:00:00', '20181225', 'no date at all'], 'countMessageIn': [1,2,3]})
df.time = df.time.apply(myparser)
df = df[df.time.notnull()]
Input:
time countMessageIn
0 2018-12-24 17:00:00 1
1 20181225 2
2 no date at all 3
Output:
time countMessageIn
0 2018-12-24 17:00:00 1
1 2018-12-25 00:00:00 2
Unlike Gustavo's solution this can handle rows with no recognizable date at all and it filters out such rows as required by your question.
If your original time column may contain other text besides the dates themselves, include the fuzzy=True parameter as shown here.

Try parsing the dates column of your dataframe using dateutil.parser.parse and Pandas apply function.

how to change dd-mm-yyyy date format to yyyy-dd-mm in pandas

How to change dd-mm-yyyy date format to yyyy-dd-mm in pandas. I have a datefield which is already in dd-mm-yyyy format but when I try
df[('date')] = pd.to_datetime(df[('date')]).dt.strftime('%Y-%m-%d')
it gives output a yyyy-dd-mm

I believe this is what you needed.
import pandas as pd
df = pd.read_csv("dates.csv")
df
id date
0 1 25/06/2018
1 2 14-11-2005
2 3 03/10/2010
3 4 13-08-2008
4 5 05-05-2005
Here no need to specify the format as you have tried.
df['date'] =pd.to_datetime(df['date'])
df
id date
0 1 2018-06-25
1 2 2005-11-14
2 3 2010-03-10
3 4 2008-08-13
4 5 2005-05-05

Pandas datetime series data do not have an inherent string format.
datetime values are stored internally as integers. For more details, see this answer. String representations are just that, representations. For example, when you use the print command, a specific string representation is used so that data is displayed in a human-readable way.
For most purposes, you should not worry about the representation. If you need a format different to the default representation, i.e. "YYYY-MM-DD", you can use pd.Series.dt.strftime and specify a string format. For this Python's strftime directives is a useful resource.

Use this:
import pandas as pd
df['date'] = pd.to_datetime(df['date'],format='%d-%m-%Y').dt.strftime('%Y-%m-%d')#specify input format '%d-%m-%Y' and output format '%Y-%m-%d' or change output as desired i.e. %d/%m/%Y to give dd/mm/yyyy

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting a series of DateTime values to the proper format - python

You can do the following, df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y%m%d'))

Related

how to convert date in dd/mm/yyyy to days in python csv

How to convert all data in column to datetime - pandas

How to convert string/float to time in python/pandas?

Filtering out improperly formatted datetime values in Python DataFrame

how to change dd-mm-yyyy date format to yyyy-dd-mm in pandas

Categories

Resources