Change date format of a column in Data Frame - python

I've merged several excel files into single Dataframe. After some additional steps, I am saving it into single excel spreadsheet. There are two columns, that are saved as date in format:
3/8/2021 12:00:00 AM
How do I change the format of the date? I need to get it in format: 'YYYY-MM-DD' without time, or even better save it as a string as "YYYY-MM_DD"
Thanks

You can do:
df['Date_column']=pd.to_datetime(df['Date_column'])
df['Date_column']=df['Date_column'].dt.strftime("%Y-%m_%d")

Related

Openpyxl Number_Format Not Applied Until Manually Applied

Code:
def write_pandas_dataframe_to_excel(df):
book = openpyxl.load_workbook('~/Documents/test.xlsm', read_only=False, keep_vba=True)
sheet = book['Database']
# Delete all rows after the header so that we can replace them with the contents of our pandas dataframe
sheet.delete_rows(1,sheet.max_row)
#Write values from the pandas dataframe to the sheet
for r in dataframe_to_rows(df,index=include_index, header=True):
sheet.append(r)
for row in sheet[2:sheet.max_row]: # skip the header
cell = row[0] # column A is a Date Field.
cell.number_format = 'YYYY-mm-dd'
book.save(excel_file_path)
book.close()
Expected Result: I open up test.xlsm, and in column A, all dates should already be in the format YYYY-mm-dd
Actual Result: While the YYYY-mm-dd format gets applied without any issues when I run the python code, I then have to open up the excel file, select each cell manually and hit 'Return' in the formula window for the YYYY-mm-dd format to be applied.
Is there a way for my specified date format to be applied through the python code rather than having to manually apply it by opening up excel and selecting each cell, going to the formula bar and hitting 'Return' every time?
Thanks in advance!
I've figured out the answer. Put simply, the date was being written to excel as a string, and that was causing the issue.
In the pandas dataframe I'm containing my data I had used strptime to format the date, which converted the date type to a generic 'object' type. I removed the strptime so that it maintained the datetime object, and that way when I write to excel it writes as a pandas Timestamp object rather than a string.

Convert pyspark dataframe to json file

I have a dataframe below and want to write that contents to a .json file.
And while creating output files , I do not want success part log files, so I tried to collect () the values from dataframe and used json_dumps() to create the file. But i am losing the column names and formats as opposed to the expected format in picture
Please help!

Convert xlsx to csv while dropping sheets and keeping date formats

I have a .xlsx file that I want to convert to .csv file. I have done a demo file as shown in the screenshot. In the .xlsx file, I have 3 sheets and I want to keep the last sheet only. In addition, I want to preserve my dates in a MM/DD/YYYY format.
Found a few solutions here and there on converting then dropping sheets or vice versa. The closest I have come to is using the solution from this link :
But it doesn't keep the date format of MM/DD/YYYY and instead converts it to numbers e.g. 44079. Tried searching solution to convert the numbers to date but there is nothing on this.
Can anyone help me with this? I can provide more clarification if needed.
I am coding in Python.
Hi I solved my own question by using the answer from this Python using pandas to convert xlsx to csv file. How to delete index column?
In addition, because the date is converted into something, not I want in the converted .csv filee.g.
05-09-2020 00:00:00
I used pandas and load the converted csv file to a dataframe. From there I used df['date made'] = pd.to_datetime(df['date made']) to convert the date from an object to datetime. After that I used df['date made'] = df['date made'].dt.strftime(%m/%d/%Y) to get
09/05/2020
which is the date format I wanted. I repeat the steps for Date Due as well.
Hope this helps those who are looking to convert .xlsx to .csv and do some formatting on the date.

Python Routine for modifications in excel columns

I want to create a routine in Python that reads an excel file in a given folder, modifies it, and saves it. What I want it to do with the excel file is to modify a column of dates in a mm/yyyy format into two columns with the same dates in a mm yyyy format.
This is what the initial spreadsheet looks like
This is what I would like to change it to:

convert object column into date type column using python

i have a csv file. that have a column named DOB. but when i want to change the data type into date type. its gave error.
here is the code
b['DOB'] = pd.to_datetime(b['DOB'], format='%Y-%m-%d')
When you read csv in pandas, read it like below: pd.read_csv(file_name,parse_dates=True)
parse_dates=True converts data to date format if it has date.

Categories

Resources