Currently, I have a series of Datetime Values that display as so
0 Datetime
1 20041001
2 20041002
3 20041003
4 20041004
they are within a series named
d['Datetime']
They were originally something like
20041001ABCDEF
But I split the end off just to leave them with the remaining numbers. How do I go about putting them into the following format?
2004-10-01
You can do the following,
df['Datetime'] = pd.to_datetime(df['Datetime'], format='%Y%m%d'))
Related
i will like to covert dates to days
Example 01/01/2001 should be Day 1, 02/01/2001 should be Day 2
I have tried
prices_df['01/01/2001'] = prices_df['days'].dt.days
Without using any external function or a complicated solution, you can simply take the first 2 chars of the string.
int('01/01/2001'[0:2])
Output:
1
If you have to do this in a pandas column:
import pandas as pd
pd.to_numeric(df['days'].str[0:2])
N.B. This works if all date are in the form day/month/year
I have a large dataframe that, in its date column, has a mixture of date formats (only 2).
Most are in the correct format but there is some data that is in a different format.
i.e. most are 2013-11-07. Some are 20170510. Pandas throws an exception when i try to validate the code against a schema i have.
Is there a quick way to convert all dates to have the same format as the majority? Or do i have to do something more painful/manual?
i.e.
date \
0 2013-11-07 False
2 2013-11-07 False
... ... ... ... ... ...
3595037 20170510 NaN
3595038 20200701 NaN
Is there a quick way to convert all dates to have the same format as the majority?
Considering that you have only two formats, one represented by 2013-11-07 and another by 20170510 it is enough to remove - from first to get common format, i.e.
import pandas as pd
df = pd.DataFrame({'day':['2013-11-07','20170510']})
df['day'] = df['day'].str.replace('-','')
print(df)
output
day
0 20131107
1 20170510
pandas.to_datetime does understand it correctly
df['day'] = pd.to_datetime(df['day'])
print(df)
output
day
0 2013-11-07
1 2017-05-10
Disclaimer: I converted to format of minority not majority. It is possible to convert that to format of majority using regular expression, however if you are interested in datetime objects, this is unnecessary complication.
I have a dataset which stores durations like 3 hours and 7 minutes in the format of, 3.11 as a string.
I want to convert the column containing these values into datetime in a way that I get: 03:07.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'])
I get: 1970-01-01 00:00:00.000000003 which is obviousely not what I want.
When I do:
df["ConnectedDuration"] = pd.to_datetime(df['ConnectedDuration'], format='%H:%M')
I get the following error: ValueError: time data '3' does not match format '%H:%M' (match)
Any help is highly appreciated
You want to convert this values to timedelta instead of datetime. Thus you should use the pd.to_timedelta method, like:
pd.to_timedelta(df["ConnectedDuration"].astype('float'), unit='h')
I have a DataFrame with one column storing the date.
However, some of these dates are properly formatted datetime objects like'2018-12-24 17:00:00'while others are not and are stored like '20181225'.
When I tried to plot these using plotly, the improperly formatted values got turned into EPOCH dates, which is a problem.
Is there any way I can get a copy of the DataFrame with only those rows with properly formatted dates?
I tried using
clean_dict= dailySum_df.where(dailySum_df[isinstance(dailySum_df['time'],datetime.datetime)])
methods and but it doesn't to work due to the 'Array conditional must be same shape as self' error.
dailySum_df = pd.DataFrame(list(cursors['dailySum']))
trace = go.Scatter(
x=dailySum_df['time'],
y=dailySum_df['countMessageIn']
)
data = [trace]
py.plot(data, filename='basic-line')
Apply dateutil.parser, see also my answer here:
import dateutil.parser as dparser
def myparser(x):
try:
return dparser.parse(x)
except:
return None
df = pd.DataFrame( {'time': ['2018-12-24 17:00:00', '20181225', 'no date at all'], 'countMessageIn': [1,2,3]})
df.time = df.time.apply(myparser)
df = df[df.time.notnull()]
Input:
time countMessageIn
0 2018-12-24 17:00:00 1
1 20181225 2
2 no date at all 3
Output:
time countMessageIn
0 2018-12-24 17:00:00 1
1 2018-12-25 00:00:00 2
Unlike Gustavo's solution this can handle rows with no recognizable date at all and it filters out such rows as required by your question.
If your original time column may contain other text besides the dates themselves, include the fuzzy=True parameter as shown here.
Try parsing the dates column of your dataframe using dateutil.parser.parse and Pandas apply function.
How to change dd-mm-yyyy date format to yyyy-dd-mm in pandas. I have a datefield which is already in dd-mm-yyyy format but when I try
df[('date')] = pd.to_datetime(df[('date')]).dt.strftime('%Y-%m-%d')
it gives output a yyyy-dd-mm
I believe this is what you needed.
import pandas as pd
df = pd.read_csv("dates.csv")
df
id date
0 1 25/06/2018
1 2 14-11-2005
2 3 03/10/2010
3 4 13-08-2008
4 5 05-05-2005
Here no need to specify the format as you have tried.
df['date'] =pd.to_datetime(df['date'])
df
id date
0 1 2018-06-25
1 2 2005-11-14
2 3 2010-03-10
3 4 2008-08-13
4 5 2005-05-05
Pandas datetime series data do not have an inherent string format.
datetime values are stored internally as integers. For more details, see this answer. String representations are just that, representations. For example, when you use the print command, a specific string representation is used so that data is displayed in a human-readable way.
For most purposes, you should not worry about the representation. If you need a format different to the default representation, i.e. "YYYY-MM-DD", you can use pd.Series.dt.strftime and specify a string format. For this Python's strftime directives is a useful resource.
Use this:
import pandas as pd
df['date'] = pd.to_datetime(df['date'],format='%d-%m-%Y').dt.strftime('%Y-%m-%d')#specify input format '%d-%m-%Y' and output format '%Y-%m-%d' or change output as desired i.e. %d/%m/%Y to give dd/mm/yyyy