Pandas parsing dates when reading CSV file [duplicate]

Pandas parsing dates when reading CSV file [duplicate] - python

This question already has answers here:
Can pandas automatically read dates from a CSV file?
(13 answers)
Closed 3 years ago.
I have a csv file which contains a date column, the dates in this file have the format of 'dd.mm.yy', when pandas parse the dates it understands the day as a month if it was less than or equal to 12, so 05.01.05 becomes 01/05/2005.
How can I solve this issue
Regards

This is one way to solve it using pandas.to.datetime and setting the argument dayfirst=True. However, I've had to make assumptions about the format of your data since you are not sharing any code. In the case below the original format of the date column is object.
import pandas as pd
df = pd.DataFrame({
'date': ['01.02.20', '25.12.19', '10.03.18'],
})
df = pd.to_datetime(df['date'], dayfirst=True)
df
0 2020-02-01
1 2019-12-25
2 2018-03-10
Name: date, dtype: datetime64[ns]

Related

How to keep date format the same in pandas? [duplicate]

This question already has answers here:
How to change the datetime format in Pandas
(8 answers)
Closed 1 year ago.
import pandas as pd
import sys
df = pd.read_csv(sys.stdin, sep='\t', parse_dates=['Date'], index_col=0)
df.to_csv(sys.stdout, sep='\t')
Date Open
2020/06/15 182.809924
2021/06/14 257.899994
I got the following output with the input shown above.
Date Open
2020-06-15 182.809924
2021-06-14 257.899994
The date format is changed. Is there a way to maintain the date format automatically? (For example, if the input is in YYYY/MM/DD format, the output should be in YYYY-MM-DD. If the input is in YYYY-MM-DD, the output should in YYYY-MM-DD, etc.)
I prefer a way that I don't have to manually test the data format. It is best if there is an automatical way to maintain the date format, no matter what the particular date format is.

You can specify the date_format argument in to_csv:
df.to_csv(sys.stdout, sep='\t', date_format="%Y/%m/%d")

Keep the dates as strings and parse them into an extra column if you need to operate on them as dates?
df = pd.read_csv(sys.stdin, sep='\t', index_col=0)
df['DateParsed'] = pd.to_datetime(df["Date"])

Pandas data type catch errors while converting without using Try Except [duplicate]

This question already has answers here:
How to check for wrong datetime entries (python/pandas)?
(2 answers)
How do I convert strings in a Pandas data frame to a 'date' data type?
(10 answers)
Closed 2 years ago.
I am trying to change the data type of a column from object to date in pandas dataframe. I cannot control the data type with dtypes because both the string(text) and date data are of object type. (I shouldn't use Try Except). Can I find out if the selected column contains string values without using Try Except?

Pandas's to_datetime() has an errors argument. You can set it to 'coerce', for instance, to turn bad dates into NaT.
df = pd.DataFrame({'t': ['20200101', '2020-01-01', 'foobar', '2020-01-01T12:17:00.333']})
pd.to_datetime(df['t'], errors='coerce')
# out:
0 2020-01-01 00:00:00.000
1 2020-01-01 00:00:00.000
2 NaT
3 2020-01-01 12:17:00.333
Name: t, dtype: datetime64[ns]

Try this to convert object to datetime
df[col] = pd.to_datetime(df[col], errors='coerce')

How do I sort a pandas dataframe by a datetime column? [duplicate]

This question already has answers here:
how to sort pandas dataframe from one column
(13 answers)
Closed 2 years ago.
I have a column in my csv file that I want to have sorted by the datetime. It's in the format like 2020-10-06 03:28:00. I tried doing it like this but nothing seems to have happened.
df = pd.read_csv('data.csv')
df = df.sort_index()
df.to_csv('btc.csv', index= False)
I need to have that index= False in the .to_csv so that it is formatted properly for later so I can't remove that if that is causing an issue. The dtime is my first column in the csv file and the second column is a unix timestamp so I could also use that if it would work better.

sort_values(by=column_name) to sort pandas. DataFrame by the contents of a column named column_name . Before doing this, the data in the column must be converted to datetime if it is in another format using pandas. to_datetime(arg) with arg as the column of dates.

Find months between dates pandas [duplicate]

This question already has an answer here:
Create date range list with pandas
(1 answer)
Closed 2 years ago.
I have a large DataFrame with two columns - start_date and finish_date with dates in string format. f.e. "2018-06-01"
I want to create third column with list of months between two dates.
So, if I have a start_date - "2018-06-01", finish_date - "2018-08-01", in the third column I expect ["2018-06-01", "2018-07-01", "2018-08-01"]. Day doesn't matter for me, so we can delete it.
I find many ways to do it for simple strings, but no one to do it for pandas DataFrame.

Pandas has a function called apply which allows you to apply logic to every row of a dataframe.
We can use dateutil to get all months between the start and end date, then apply the logic to every row of your dataframe as a new column.
import pandas as pd
import time
import datetime
from dateutil.rrule import rrule, MONTHLY
#Dataframe creation, this is just for the example, use the one you already have created.
data = {'start': datetime.datetime.strptime("10-10-2020", "%d-%m-%Y"), 'end': datetime.datetime.strptime("10-12-2020", "%d-%m-%Y")}
df = pd.DataFrame(data, index=[0])
#df
# start end
#0 2020-10-10 2020-12-10
# Find all months between the start and end date, apply to every row in the dataframe. Result is a list.
df['months'] = df.apply(lambda x: [date.strftime("%m/%Y") for date in rrule(MONTHLY, dtstart=x.start, until=x.end)], axis = 1)
#df
# start end months
#0 2020-10-10 2020-12-10 [10/2020, 11/2020, 12/2020]

Modify csv data in pandas dataframe [duplicate]

This question already has answers here:
How to change the datetime format in Pandas
(8 answers)
Closed 3 years ago.
i have a csv file and want to select one specific colum (date string). then i want to change the format of the date string from yyyymmdd to dd.mm.yyyy for every entry.
i read the csv file in a dataframe with pandas and then saved the specific column with the header DATE to a variable.
import pandas as pd
# read csv file
df = pd.read_csv('csv_file')
# save specific column
df_date_col = df['DATE']
now i want to change the values in df_date_col. How can i do this?
I know i can do it a step before like this:
df['DATE'] = modify(df['DATE'])
Is this possible just using the variable df_date_col?
If i try df_date_Col['DATE']=... it will give a KeyError.

Use to_datetime with Series.dt.strftime:
df['DATE'] = pd.to_datetime(df['DATE'], format='%Y%m%d').dt.strftime('%d.%m.%Y')
Is this possible just using the variable df_date_col?
Sure, but working with Series, so cannot again select by []:
df_date_col = df['DATE']
df_date_col = pd.to_datetime(df_date_col, format='%Y%m%d').dt.strftime('%d.%m.%Y')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas parsing dates when reading CSV file [duplicate] - python

Related

How to keep date format the same in pandas? [duplicate]

Pandas data type catch errors while converting without using Try Except [duplicate]

How do I sort a pandas dataframe by a datetime column? [duplicate]

Find months between dates pandas [duplicate]

Modify csv data in pandas dataframe [duplicate]

Categories

Resources