This question already has an answer here:
Can I parse dates in different formats?
(1 answer)
Closed 5 years ago.
I have some data in csv file which has some entries in the MM/DD/YYYY format and some entries in the DD-MM-YYYY format. I would like to read this column of entries and store it as a new column in a pandas dataframe? How would I go about this?
Example:
Entry Sampling Date
1 01-10-2004
2 01-13-2004
3 16/1/2004
I would like to convert the first two rows' date format to that in the third row.
Use the datetime module, define a function and then apply it to your column
import datetime.datetime
def read_date(string):
if '/' in entry:
date = datetime.datetime.strptime(string,'%m/%d/%Y')
elif '-' in entry:
date = datetime.datetime.strptime(string, '%d-%m-%Y')
return date
# If df is your dataframe
df['newdate'] = df['Sampling Date'].apply(read_date)
Related
This question already has answers here:
Pandas read_excel doesn't parse dates correctly - returns a constant date instead
(1 answer)
Convert Excel style date with pandas
(3 answers)
Closed 1 year ago.
I want to read xlsb file in pandas.
and I have 3 datetime column
1st column format is (2021-5-31 01:20:23 )
2nd column format (total time) is ( 01:20:23 )
3st column format is ( 01:20:23 am )
but when I am reading the file I am getting column 46090.0
Is there any method that can read the excel column as it is?
This question already has answers here:
Convert pyspark string to date format
(6 answers)
Closed 2 years ago.
I have a dataset which contains Multiple columns and rows.
Currently, it's in String type And, I wanted to convert to a date-time format for further task.
I tried this below code which returns null
df = df.withColumn('Date_Time',df['Date_Time'].cast(TimestampType()))
df.show()
I tried some of the solutions from here, but none of them is working all, in the end, returns me null.
Convert pyspark string to date format
Since your date format is non-standard, you need to use to_timestamp and specify the corresponding format:
import pyspark.sql.functions as F
df2 = df.withColumn('Date_Time', F.to_timestamp('Date_Time', 'dd/MM/yyyy hh:mm:ss a'))
This question already has answers here:
Extracting the hour from a time column in pandas
(3 answers)
Convert string to timedelta in pandas
(4 answers)
Closed 2 years ago.
I want to convert each value in a pandas dataframe column to a string and then delete some text. The values are times. For example, if the value is 11:21, I would like to delete every to the right of the : in every element in the column. 11:21 should be converted to 11.
Let's say you have following dataset:
df = pd.DataFrame({
'time': ['09:30:00','09:40:01','09:50:02','10:00:03']
})
df.head()
Output:
If you want to work with time column as a string, following code may be used:
df['hour'] = df['time'].apply(lambda time : time.split(':')[0])
df.head()
Output:
Alternatively time can be converted to datetime and hour can be extracted:
df['hour'] = pd.to_datetime(df['time'], format='%H:%M:%S').dt.hour
df.head()
Output:
This question already has answers here:
How to convert date to the first day of month in a PySpark Dataframe column?
(3 answers)
Closed 3 years ago.
I have a dataframe like below:
+------+----------+----+
|ID | date |flag|
+------+----------+----+
|123456|2015-04-21|null|
|234567|2017-04-18|null|
|345678|2009-06-25|null|
|456789|2001-11-07|null|
|567890|2016-10-02|null|
+------+----------+----+
I am trying to modify the dataframe to change the dates in the date column to show as 'YYYY-mm-01' like below.
+------+----------+----+
|ID | date |flag|
+------+----------+----+
|123456|2015-04-01|null|
|234567|2017-04-01|null|
|345678|2009-06-01|null|
|456789|2001-11-01|null|
|567890|2016-10-01|null|
+------+----------+----+
I am trying to do so like this:
df = df.withColumn("date", f.trunc("date", "month"))
But it looks as if it's messing up the date and making all the dates the same date. How can I change my pyspark column elements from their original YYYY-mm-dd to YYYY-mm-01 for every row?
You can use datetime.replace. For example, say you have one date df[0]['date']
date = datetime.strptime('df[0]['date']', '%y-%m-%d')
newdate = date.replace(day=1)
This question already has answers here:
How to change the datetime format in Pandas
(8 answers)
Closed 3 years ago.
i have a csv file and want to select one specific colum (date string). then i want to change the format of the date string from yyyymmdd to dd.mm.yyyy for every entry.
i read the csv file in a dataframe with pandas and then saved the specific column with the header DATE to a variable.
import pandas as pd
# read csv file
df = pd.read_csv('csv_file')
# save specific column
df_date_col = df['DATE']
now i want to change the values in df_date_col. How can i do this?
I know i can do it a step before like this:
df['DATE'] = modify(df['DATE'])
Is this possible just using the variable df_date_col?
If i try df_date_Col['DATE']=... it will give a KeyError.
Use to_datetime with Series.dt.strftime:
df['DATE'] = pd.to_datetime(df['DATE'], format='%Y%m%d').dt.strftime('%d.%m.%Y')
Is this possible just using the variable df_date_col?
Sure, but working with Series, so cannot again select by []:
df_date_col = df['DATE']
df_date_col = pd.to_datetime(df_date_col, format='%Y%m%d').dt.strftime('%d.%m.%Y')