Convert xlsx to csv while dropping sheets and keeping date formats - python

I have a .xlsx file that I want to convert to .csv file. I have done a demo file as shown in the screenshot. In the .xlsx file, I have 3 sheets and I want to keep the last sheet only. In addition, I want to preserve my dates in a MM/DD/YYYY format.
Found a few solutions here and there on converting then dropping sheets or vice versa. The closest I have come to is using the solution from this link :
But it doesn't keep the date format of MM/DD/YYYY and instead converts it to numbers e.g. 44079. Tried searching solution to convert the numbers to date but there is nothing on this.
Can anyone help me with this? I can provide more clarification if needed.
I am coding in Python.

Hi I solved my own question by using the answer from this Python using pandas to convert xlsx to csv file. How to delete index column?
In addition, because the date is converted into something, not I want in the converted .csv filee.g.
05-09-2020 00:00:00
I used pandas and load the converted csv file to a dataframe. From there I used df['date made'] = pd.to_datetime(df['date made']) to convert the date from an object to datetime. After that I used df['date made'] = df['date made'].dt.strftime(%m/%d/%Y) to get
09/05/2020
which is the date format I wanted. I repeat the steps for Date Due as well.
Hope this helps those who are looking to convert .xlsx to .csv and do some formatting on the date.

Related

Change date format of a column in Data Frame

I've merged several excel files into single Dataframe. After some additional steps, I am saving it into single excel spreadsheet. There are two columns, that are saved as date in format:
3/8/2021 12:00:00 AM
How do I change the format of the date? I need to get it in format: 'YYYY-MM-DD' without time, or even better save it as a string as "YYYY-MM_DD"
Thanks
You can do:
df['Date_column']=pd.to_datetime(df['Date_column'])
df['Date_column']=df['Date_column'].dt.strftime("%Y-%m_%d")

How to stop python auto date parsing while reading a excel file

i need to read a excel file without changing any date , time format , float format and convert to data-frame. This is working fine if i convert the excel to CSV and read it using read_csv() .
eg:
import pandas as pd
import numpy as np
#code for reading excel
df=pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group 1",keep_default_na=False,dtype=str)
print("df_excel:")
#code for reading csv
df1=pd.read_csv("Group 1.csv",parse_dates=False,dtype=str,na_filter = False)
print("df_csv:",df1)
output:
in the above code parse_dates=False is working fine while reading CSV file, but parse_dates=False is not working in read_excel()
Expected output:
Need the exact excel data into a data-frame without changing the date , time format.
From the Pandas docs on the parse_dates parameter for read_excel():
If a column or index contains an unparseable date, the entire column or index will be returned unaltered as an object data type. If you don`t want to parse some cells as date just change their type in Excel to “Text”.
You could try this:
df = pd.read_excel("605.xlsx",parse_dates=False,sheet_name="Group1",keep_default_na=False,dtype=str, converters={'as_at_date': str})
Explicitly converting the date column to string might help.

Converting dates with multiple formats in a CSV file

I have a CSV full of tweets containing a few headers. Among them, for some unknown reason, the date format changes midway from %Y-%m-%d to %d/%m/%Y as shown in the image below.
This makes it difficult when trying to export it into another program e.g. Matlab. I'm attempting to solve this in Python, but any other solution would be great.
I've attempted multiple solutions from just googling around. Mainly parsing in a date format when reading the CSV, DateTime.strptime and others. I'm very new to Python so I'm sorry if I'm a bit clueless
I'm looking to standardise all the dates, e.g. changing the %d/%m/%Y to the other format, while keeping it individual row separate.
I'm thinking of following the approach held here, but adding an if statement if it recognises a certain format. How would I go about breaking the date down and changing it then?
This might work but I'm too lazy to check it against an image of a CSV file.
import pandas as pd
# Put all the formats into a list
possible_formats = ['%Y-%m-%d', '%d/%m/%Y']
# Read in the data
data = pd.read_csv("data_file.csv")
date_column = "date"
# Parse the dates in each format and stash them in a list
fixed_dates = [pd.to_datetime(data[date_column], errors='coerce', format=fmt) for fmt in possible_formats]
# Anything we could parse goes back into the CSV
data[date_column] = pd.NaT
for fixed in fixed_dates:
data.loc[~pd.isnull(fixed), date_column] = fixed[~pd.isnull(fixed)]
data.to_csv("new_file.csv")

Converting CSV to HTML keeping format

My objective is: Converting DF to HTML which is send as an everyday mail
Current Method : converting df to csv to html
Problem: I have created my df which has as_index=True set, but when I save it to a csv this formatting is lost :
Example DataFrame:
Now when I save this df using to_csv(), the formatting in the index is lost ( means that ABC is now written 3 times across the index, instead of once as I want it)
I want the CSV to have the same formatting is that possible?
Please install pandas and use to_html().
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_html.html
Hope it can help you.

Python Openpyxl writing date to excel as short date

I am trying to write dates to an excel file using Openpyxl. I am using the following lines to write the date.
dttm = datetime.datetime.strptime(ls25Dict[cell.value][2], "%m/%d/%Y" )
ws1['B'+ str(cell.row)].value = dttm
This writes the date to excel but in the wrong format. This is the output:
2018-01-09 0:00:00
I am trying to get it to be 1/9/2018. Basically change the format to Short Date in excel.
Anyone know how to change it before the date is written to excel?
In Excel you always have to provide your own format for dates and times, because these are stored as serials. openpyxl defaults to ISO formats for minimal ambiguity.

Categories

Resources