Modify csv data in pandas dataframe [duplicate] - python

This question already has answers here:
How to change the datetime format in Pandas
(8 answers)
Closed 3 years ago.
i have a csv file and want to select one specific colum (date string). then i want to change the format of the date string from yyyymmdd to dd.mm.yyyy for every entry.
i read the csv file in a dataframe with pandas and then saved the specific column with the header DATE to a variable.
import pandas as pd
# read csv file
df = pd.read_csv('csv_file')
# save specific column
df_date_col = df['DATE']
now i want to change the values in df_date_col. How can i do this?
I know i can do it a step before like this:
df['DATE'] = modify(df['DATE'])
Is this possible just using the variable df_date_col?
If i try df_date_Col['DATE']=... it will give a KeyError.

Use to_datetime with Series.dt.strftime:
df['DATE'] = pd.to_datetime(df['DATE'], format='%Y%m%d').dt.strftime('%d.%m.%Y')
Is this possible just using the variable df_date_col?
Sure, but working with Series, so cannot again select by []:
df_date_col = df['DATE']
df_date_col = pd.to_datetime(df_date_col, format='%Y%m%d').dt.strftime('%d.%m.%Y')

Related

I'm stuck trying to figure how to parse this datetime that is formatted as 12.6.5 (datetime formatted with periods) in pandas [duplicate]

This question already has answers here:
Convert Pandas Column to DateTime
(8 answers)
Closed 1 year ago.
I'm stuck trying to figure how to parse this object containing date that is formatted as 18.26.01 (object formatted with periods) in pandas. I'm using the pd.to_datetime(youtube_US['trending_date']) method but it's returning a parsing error. The error is as follows:
ParserError: month must be in 1..12: 17.14.11
How do I parse this date so that it returns a proper datetime object? Do I need to use any kind of loop?
The Error tells you to explicitly mention month in the front. You can handle the error by reformatting the column.
import pandas as pd
youtube_US = {'trending_date': ['18.26.01', '18.26.01']}
df = pd.DataFrame(data=youtube_US)
def datetime_split(df):
split = df.split('.')
return split[1]+"."+split[2]+"."+split[0]
# Reformat 'trending_date' column
df['trending_date'] = df['trending_date'].apply(datetime_split)
# Select only date from column
df['trending_date'] = pd.to_datetime(df['trending_date']).dt.date
print(df)
I hope this resolves your error.
or simply use format as per Buran's comment
import pandas as pd
youtube_US = {'trending_date': ['18.26.01', '18.26.01']}
df = pd.DataFrame(data=youtube_US)
df['trending_date']= pd.to_datetime(df['trending_date'], format="%y.%d.%m")
df['trending_date'] = pd.to_datetime(df['trending_date']).dt.date
print(df)
There is another field in the notebook that I'm working with which I don't understand.
youtube_US['count_max_view']=youtube_US.groupby(['video_id'])['views'].transform(max)
I don't understand the purpose of .transform(max) and what it's doing and in fact the whole line of code.
here is the info on the dataset:

How to change String column to Date-Time Format in PySpark? [duplicate]

This question already has answers here:
Convert pyspark string to date format
(6 answers)
Closed 2 years ago.
I have a dataset which contains Multiple columns and rows.
Currently, it's in String type And, I wanted to convert to a date-time format for further task.
I tried this below code which returns null
df = df.withColumn('Date_Time',df['Date_Time'].cast(TimestampType()))
df.show()
I tried some of the solutions from here, but none of them is working all, in the end, returns me null.
Convert pyspark string to date format
Since your date format is non-standard, you need to use to_timestamp and specify the corresponding format:
import pyspark.sql.functions as F
df2 = df.withColumn('Date_Time', F.to_timestamp('Date_Time', 'dd/MM/yyyy hh:mm:ss a'))

How do I sort a pandas dataframe by a datetime column? [duplicate]

This question already has answers here:
how to sort pandas dataframe from one column
(13 answers)
Closed 2 years ago.
I have a column in my csv file that I want to have sorted by the datetime. It's in the format like 2020-10-06 03:28:00. I tried doing it like this but nothing seems to have happened.
df = pd.read_csv('data.csv')
df = df.sort_index()
df.to_csv('btc.csv', index= False)
I need to have that index= False in the .to_csv so that it is formatted properly for later so I can't remove that if that is causing an issue. The dtime is my first column in the csv file and the second column is a unix timestamp so I could also use that if it would work better.
sort_values(by=column_name) to sort pandas. DataFrame by the contents of a column named column_name . Before doing this, the data in the column must be converted to datetime if it is in another format using pandas. to_datetime(arg) with arg as the column of dates.

Pandas parsing dates when reading CSV file [duplicate]

This question already has answers here:
Can pandas automatically read dates from a CSV file?
(13 answers)
Closed 3 years ago.
I have a csv file which contains a date column, the dates in this file have the format of 'dd.mm.yy', when pandas parse the dates it understands the day as a month if it was less than or equal to 12, so 05.01.05 becomes 01/05/2005.
How can I solve this issue
Regards
This is one way to solve it using pandas.to.datetime and setting the argument dayfirst=True. However, I've had to make assumptions about the format of your data since you are not sharing any code. In the case below the original format of the date column is object.
import pandas as pd
df = pd.DataFrame({
'date': ['01.02.20', '25.12.19', '10.03.18'],
})
df = pd.to_datetime(df['date'], dayfirst=True)
df
0 2020-02-01
1 2019-12-25
2 2018-03-10
Name: date, dtype: datetime64[ns]

Convert from MM/DD/YYYY to DD-MM-YYY [duplicate]

This question already has an answer here:
Can I parse dates in different formats?
(1 answer)
Closed 5 years ago.
I have some data in csv file which has some entries in the MM/DD/YYYY format and some entries in the DD-MM-YYYY format. I would like to read this column of entries and store it as a new column in a pandas dataframe? How would I go about this?
Example:
Entry Sampling Date
1 01-10-2004
2 01-13-2004
3 16/1/2004
I would like to convert the first two rows' date format to that in the third row.
Use the datetime module, define a function and then apply it to your column
import datetime.datetime
def read_date(string):
if '/' in entry:
date = datetime.datetime.strptime(string,'%m/%d/%Y')
elif '-' in entry:
date = datetime.datetime.strptime(string, '%d-%m-%Y')
return date
# If df is your dataframe
df['newdate'] = df['Sampling Date'].apply(read_date)

Categories

Resources