Converting Date Format in a Dataframe from a CSV File [duplicate]

Converting Date Format in a Dataframe from a CSV File [duplicate] - python

This question already has answers here:
Convert DataFrame column type from string to datetime
(6 answers)
Convert Pandas Column to DateTime
(8 answers)
Closed 1 year ago.
I need to convert the date format of my csv file into the proper pandas format so I could sort it later on. My current format cannot be interacted reasonably in pandas so I had to convert it.
This is what my csv file looks like:
ARTIST,ALBUM,TRACK,DATE
ARTIST1,ALBUM1,TRACK1,23 Nov 2019 02:08
ARTIST1,ALBUM1,TRACK1,23 Nov 2019 02:11
ARTIST1,ALBUM1,TRACK1,23 Nov 2019 02:15
So far I've successfully converted it into pandas format by doing this:
df= pd.read_csv("mycsv.csv", delimiter=',')
convertdate= pd.to_datetime(df["DATE"])
print convertdate
####
#Original date format: 23 Nov 2019 02:08
#Output and desired date format: 2019-11-23 02:08:00
However, that only changes the values in the entire "DATE" column. Printing the dataframe of the csv file still outputs the original, non-converted date format. I need to append the converted format into the source csv file.
My desired output would then be
ARTIST,ALBUM,TRACK,DATE
ARTIST1,ALBUM1,TRACK1,2019-11-23 02:08:00
ARTIST1,ALBUM1,TRACK1,2019-11-23 02:11:00
ARTIST1,ALBUM1,TRACK1,2019-11-23 02:15:00

There are many options to the read_csv method.
Make sure to read the data in in the format you want instead of fixing it later.
df = pd.read_csv('mycsv.csv"', parse_dates=['DATE'])
Just pass in to the parse_dates argument the column names you want transformed.
There were 2 problems in the original code.
It wasn't a part of the original dataframe because you didn't save it back to the column once you transformed it.
so instead of:
convertdate= pd.to_datetime(df["DATE"])
use:
df["DATE"]= pd.to_datetime(df["DATE"])
and for goodness sake stop using python 2.

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S')
df = pd.read_csv('mycsv.csv', parse_dates=['DATE'], date_parser=dateparse)

Related

Python Pandas - Splitting a column

I am trying to split a column from a CSV file. The first column contains a date (YYmmdd) and then time (HHmmss) so the string looks like 20221001131245. I want to split this so it reads 2022 10 01 in one column and then 13:12:45 in another.
I have tried the str.split but I recognise my data isn't in a string so this isn't working.
Here is my code so far:
import pandas as pd
CSVPath = "/Desktop/Test Data.csv"
data = pd.read_csv(CSVPath)
print(data)

To answer the question from your comment:
You can use df.drop(['COLUMN_1', 'COLUMN_2'], axis=1) to drop unwanted columns.
I am guessing you want to write the data back to a .csv file? Use the following snippet to only write specific columns:
df[['COLUMN_1', 'COLUMN_2']].to_csv("/Desktop/Test Data Edited.csv")

Use to_datetime combined with strftime:
# convert to datetime
s = pd.to_datetime(df['col'], format='%Y%m%d%H%M%S')
# or if integer as input
# s = pd.to_datetime(df['col'].astype(str), format='%Y%m%d%H%M%S')
# format strings
df['date'] = s.dt.strftime('%Y %m %d')
df['time'] = s.dt.strftime('%H:%M:%S')
Output:
col date time
0 20221001131245 2022 10 01 13:12:45
alternative
using string slicing and concatenation
s = df['col'].str
df['date'] = s[:4]+' '+s[4:6]+' '+s[6:8]
df['time'] = s[8:10]+':'+s[10:12]+':'+s[12:]

Keep the date and not the time in to_datetime pandas (while importing data from csv) [duplicate]

This question already has answers here:
Keep only date part when using pandas.to_datetime
(13 answers)
Closed last month.
Can you please help me with the following issue? When I import a csv file I have a dataframe like smth like this:
df = pd.DataFrame(['29/12/17',
'30/12/17', '31/12/17', '01/01/18', '02/01/18'], columns=['Date'])
What I want is to convert `Date' column of df into Date Time object. So I use the code below:
df['date_f'] = pd.to_datetime(df['Date'])
What I get is smth like this:
df1 = pd.DataFrame({'Date': ['29/12/17', '30/12/17', '31/12/17', '01/01/18', '02/01/18'],
'date_f':['2017-12-29T00:00:00.000Z', '2017-12-30T00:00:00.000Z', '2017-12-31T00:00:00.000Z', '2018-01-01T00:00:00.000Z', '2018-02-01T00:00:00.000Z']})
The question is, why am I getting date_f in the following format ('2017-12-29T00:00:00.000Z') and not just ('2017-12-29') and how can I get the later format ('2017-12-29')?
P.S.
I you use the code above it will the date_f in the format that I need. However, if the data is imported from csv, it provides the date_f format as specified above

use dt.date
df['date_f'] = pd.to_datetime(df['Date']).dt.date
or
df['date_f'] = pd.to_datetime(df['Date'], utc=False)
both cases will get same outputs
Date date_f
0 29/12/17 2017-12-29
1 30/12/17 2017-12-30
2 31/12/17 2017-12-31
3 01/01/18 2018-01-01
4 02/01/18 2018-02-01

Date and Month formatting issue into Excel from DF

On Python 3.9 and Pandas 1.3.4.
So I'm trying to format 2 columns of my df and export it to excel. These 2 columns are date and month. Date is supposed to be formatted as %m/%d/%y and Month is supposed to be formatted as %B %Y.
When I do print(df['Date']) and print(df['Month']) it prints 01/04/22 and January 2022 respectively. However when I do df.to_csv(file.csv) it shows in excel as 1/4/2022 and Jan-22. I would like it to be formatted as 01/04/22 and January 2022 respectively. How can I solve this?
This is my current code:
import pandas as pd
df = pd.DataFrame(pd.read_csv(file.csv, dtype=str))
df["Date"] = pd.Timestamp("today").strftime("%m/%d/%y")
df["Month"] = pd.Timestamp("today").strftime("%B %Y")
print(df["Date"])
print(df["Month"])
df.to_excel('file.xlsx', index=False)
NOTE: to_excel fixed this issue

Bad datetime conversion in pandas when a csv file it's opened

I have a simple csv in which there are a Date and Activity column like this:
and when I open it with pandas and I try to convert the Date column with pd.to_datetime its change the date. When there are a change of month like this
Its seems that pandas change the day by the month or something like that:
The format of date that I want it's dd-mm-yyyy or yyyy-mm-dd.
This it's the code that I using:
import pandas as pd
dataset = pd.read_csv(directory + "Time 2020 (Activities).csv", sep = ";")
dataset[["Date"]] = dataset[["Date"]].apply(pd.to_datetime)
How can I fix that?

You could specify the date format in the pd.to_datetime parameters:
dataset['Date'] = pd.to_datetime(dataset['Date'], format='%Y-%m-%d')

Data parsing in pandas, python

I have an excel file with many columns, one of them, 'Column3' is date with some text in it, basically it looks like that:
26/05/20
XXX
YYY
12/05/2020
The data is written in DD/MM/YY format but pandas, just like excel, thinks that 12/05/2020 it's 05 Dec 2020 while it is 12 May 2020. (My windows is set to american date format)
Important note: when I open stock excel file, cells with 12/05/2020 already are Date type, trying to convert it to text it gives me 44170 which will give me wrong date if I just reformat it into DD/MM/YY
I added this line of code:
iport pandas as pd
dateparse = lambda x: pd.datetime.strptime(x,'%d/%m/%y')
df = pd.read_excel("my_file.xlsx", parse_dates=['Column3'], date_parser=dateparse)
But the text in the column generates an error.
ValueError: time data 'XXX' does not match format '%d/%m/%y'
I went a step further and manually removed all text (obviously I can't do it all the time) to see whether it works or nor, but then I got following error
dateparse = lambda x: pd.datetime.strptime(x,'%d/%m/%y')
TypeError: strptime() argument 1 must be str, not datetime.datetime
I also tried this:
df['Column3'] = pd.to_datetime(df.Column3, format ='%d/%m/%y', errors="coerce")
# if I make errors="ignore" it doesn't change anything.
in that case my 26/05/20 was correctly converted to 26 May 2020 but I lost all my text data(it's ok) and other dates which didn't match with my format argument. Because previously they were recognized as American type date.
My objective is to convert the data in Column3 to the same format so I could apply filters with pandas.
I think it's couple solutions:
tell Pandas to not convert text to date at all (but it is already saved as Date type in stock file, will it work?)
somehow ignore text values and use date_parser= method co convert add dates to DD/MM/YY
with help of pd.to_datetime convert 26/05/20 to 26 May 2020 and than convert 2020-09-06 00:00:00 to 9 June 2020 (seems to be the simplest one but ignore argument doesn't work.)
Here's link to small sample file https://easyupload.io/ca5p6w

You can pass a date_parser to read_excel:
dateparser = lambda x: pd.to_datetime(x, dayfirst=True)
pd.read_excel('test.xlsx', date_parser = dateparser)

Posting this as an answer, since it's too long for a comment
The problem originates in Excel. If I open it in Excel, I see 2 strings that look like dates 26/05/20, 05/12/2020 and 06/02/2020. Note the difference between the 20 and 2020 On lines 24 and 48 I see dates in Column4. This seems to indicate the Excel is put together. Is this Excel assembled by copy-paste, or programmatically?
loading it with just pd.read_excel gives these results for the dates:
26/05/20
2020-12-05 00:00:00
2020-02-06 00:00:00
If I do df["Column3"].apply(type)
gives me
str
<class 'datetime.datetime'>
<class 'datetime.datetime'>
So in the Excel file these are marked as datetime.
Loading them with df = pd.read_excel(DATA_DIR / "sample.xlsx", dtype={"Column3": str}) changes the type of all to str, but does not change the output.
If you open the extract the file, and go look at the xml file xl\worksheets\sheet1.xml directly and look for cell C26, you see it as 44170, while C5 is 6, which is a reference to 26/05/20 in xl/sharedStrings.xml
How do you 'make' this Excel file? This can best be solved in how this file is put together.
Workaround
As a workaround, you can convert the dates piecemeal. The different format allows this:
format1 = "%d/%m/%y"
format2 = "%Y-%d-%m %H:%M:%S"
Then you can do pd.to_datetime(dates, format=format1, errors="coerce") to only get the first dates, and NaT for the ones not according to the format. Then you use combine_first to fill the voids.
dates = df["Column3"] # of the one imported with dtype={"Column3": str}
dates_parsed = (
pd.to_datetime(dates, format=format1, errors="coerce")
.combine_first(pd.to_datetime(dates, format=format2, errors="coerce"))
.astype(object)
.combine_first(dates)
)
The astype(object) is needed to fill in the empty places with the string values.

I think, first you should import the file without date parsing then convert it to date format using following:
df['column3']= pd.to_datetime(df['column3'], errors='coerce')
Hope this will work

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Converting Date Format in a Dataframe from a CSV File [duplicate] - python

dateparse = lambda x: pd.datetime.strptime(x, '%Y-%m-%d %H:%M:%S') df = pd.read_csv('mycsv.csv', parse_dates=['DATE'], date_parser=dateparse)

Related

Python Pandas - Splitting a column

Keep the date and not the time in to_datetime pandas (while importing data from csv) [duplicate]

Date and Month formatting issue into Excel from DF

Bad datetime conversion in pandas when a csv file it's opened

Data parsing in pandas, python

Categories

Resources