datetime issue with xlrd & xlwt python libs

datetime issue with xlrd & xlwt python libs - python

I'm trying to write some dates from one excel spreadsheet to another. Currently, I'm getting a representation in excel that isn't quite what I want such as this: "40299.2501157407"
I can get the date to print out fine to the console, however it doesn't seem to work right writing to the excel spreadsheet -- the data must be a date type in excel, I can't have a text version of it.
Here's the line that reads the date in:
date_ccr = xldate_as_tuple(sheet_ccr.cell(row_ccr_index, 9).value, book_ccr.datemode)
Here's the line that writes the date out:
row.set_cell_date(11, datetime(*date_ccr))
There isn't anything being done to date_ccr in between those two lines other than a few comparisons.
Any ideas?

You can write the floating point number directly to the spreadsheet and set the number format of the cell. Set the format using the num_format_str of an XFStyle object when you write the value.
https://secure.simplistix.co.uk/svn/xlwt/trunk/xlwt/doc/xlwt.html#xlwt.Worksheet.write-method
The following example writes the date 01-05-2010. (Also includes time of 06:00:10, but this is hidden by the format chosen in this example.)
import xlwt
# d can also be a datetime object
d = 40299.2501157407
wb = xlwt.Workbook()
sheet = wb.add_sheet('new')
style = xlwt.XFStyle()
style.num_format_str = 'DD-MM-YYYY'
sheet.write(5, 5, d, style)
wb.save('test_new.xls')
There are examples of number formats (num_formats.py) in the examples folder of the xlwt source code. On my Windows machine: C:\Python26\Lib\site-packages\xlwt\examples
You can read about how Excel stores dates (third section on this page): https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html

Related

Pandas: How to write custom time duration format to Excel file with pd.ExcelWriter via Openpyxl

Using the Openpyxl engine for Pandas via pd.ExcelWriter, I'd like to know if there is a way to specify a (custom) Excel duration format for elapsed time.
The format I would like to use is: [hh]:mm:ss which should give a time like: 01:01:01 for 1 hour, 1 minute, 1 second.
I want to write from a DataFrame into this format so that Excel can recognize it when I open the spreadsheet file in the Excel application, after writing the file.
Here is my current demo code, taking a duration of two datetime.now() timestamps:
import pandas as pd
from time import sleep
from datetime import datetime
start_time = datetime.now()
sleep(1)
end_time = datetime.now()
elapsed_time = end_time - start_time
df = pd.DataFrame([[elapsed_time]], columns=['Elapsed'])
with pd.ExcelWriter('./sheet.xlsx') as writer:
df.to_excel(writer, engine='openpyxl', index=False)
Note that in this implementation, type(elapsed_time) is <type 'datetime.timedelta'>.
The code will create an Excel file with approximately the value 0.0000116263657407407 in the column of "Elapsed". In Excel's time/date format, the value 1.0 equals 1 full day, so this is roughly 1 second of that 1 day.
If I under Format > Cells > Number (CMD + 1) select the Custom Category and specify the custom format [hh]:mm:ss for the cell, I will now see:
This desired format I want to see, every time I open the file in Excel, after writing the file.
However, I have looked around for solutions, and I cannot find a way to inherently tell pd.ExcelWriter, df.to_excel, or Openpyxl how to format the datetime.timedelta object in this way.
The Openpyxl documentation gives some very sparse indications:
Handling timedelta values Excel users can use number formats
resembling [h]:mm:ss or [mm]:ss to display time interval durations,
which openpyxl considers to be equivalent to timedeltas in Python.
openpyxl recognizes these number formats when reading XLSX files and
returns datetime.timedelta values for the corresponding cells.
When writing timedelta values from worksheet cells to file, openpyxl
uses the [h]:mm:ss number format for these cells.
How can I accomplish my goal of writing Excel-interpretable time (durations) in the format [hh]:mm:ss?
To achieve this, I do not require to use the current method of creating a datetime.timedelta object via datetime.now(). If it's possible to achieve this objective by using/converting to a datetime object or similar and formatting it, I would like to know how.
NB: I am using Python 2 with its latest pandas version 0.24.2 (and the openpyxl version installed with pip is the latest, 2.6.4). I hope that is not a problem as I cannot upgrade to Python 3 and later versions of pandas right now.

It was some time ago I worked on this, but the below solution worked for me in Python 2.7.18 using Pandas 0.24.2 and Openpyxl 2.6.4 from PyPi.
As stated in the question comments, later versions may solve this more elegantly (and there might furthermore be a more elegant way to do it in the old versions I use):
If writing to a new Excel file:
writer = pd.ExcelWriter(file = './sheet.xlsx', engine='openpyxl')
# Writes dataFrame to Writer Sheet, including column header
df.to_excel(writer, sheet_name='Sheet1', index=False)
# Selects which Sheet in Writer to manipulate
sheet = writer.sheets['Sheet1']
# Formats specific cell with desired duration format
cell = 'A2'
sheet[cell].number_format = '[hh]:mm:ss'
# Writes to file on disk
writer.save()
writer.close()
If writing to an existing Excel file:
file = './sheet.xlsx'
writer = pd.ExcelWriter(file = file, engine='openpyxl')
# Loads content from existing Sheet in file
workbook = load_workbook(file)
writer.book = workbook #writer.book potentially needs to be explicitly stated like this
writer.sheets = {sheet.title: sheet for sheet in workbook.worksheets}
sheet = writer.sheets['Sheet1']
# Writes dataFrame to Writer Sheet, below the last existing row, excluding column header
df.to_excel(writer, sheet_name='Sheet1', startrow=sheet.max_row, index=False, header=False)
# Updates the row count again, and formats specific cell with desired duration format
# (the last cell in column A)
cell = 'A' + str(sheet.max_row)
sheet[cell].number_format = '[hh]:mm:ss'
# Writes to file on disk
writer.save()
writer.close()
The above code can of course easily be abstracted into one function handling writing to both new files and existing files, and extended to managing any number of different sheets or columns, as needed.

How to get a range of cells while retaining their format from an .xlsx document?

I am trying to print a range of cells from xlsx to pdf using below code
from win32com import client
xlApp = client.Dispatch("Excel.Application")
books = xlApp.Workbooks.Open('C:\\Users\....\\2020.xlsx')
ws = books.Worksheets['TEST']
ws.Visible = 1
ws.ExportAsFixedFormat(0, 'C:\\Users\.....\\trial.pdf')
This code works fine. However, I would like to tweak to print only the cells ranging from A1 to K50; and should retain the cell formatting from excel when converted to PDF. Is this possible?

It is very hard to format using python code with other third party packages. Below solution is useful when either the formatting requires a lot of effort or when the formatting may be dynamic & you may need to just print the excel like the way it is:
This solution is done using the below code (note the trick is using IgnorePrintAreas=False), however, the ranges that I wanted to print I set it up as the printable range on excel. This will print the range that I am interested in & with the format that is present in Excel.
from win32com import client
xlApp = client.Dispatch("Excel.Application")
books = xlApp.Workbooks.Open('C:\\Users\....\\2020.xlsx')
ws = books.Worksheets['TEST']
ws.Visible = 1
ws.ExportAsFixedFormat(0, 'C:\\Users\.....\\trial.pdf',IgnorePrintAreas=False)

When converting to CSV UTF-8 with python, my date columns are being turned into date-time

When I run the following code
import glob,os
import pandas as pd
dirpath = os.getcwd()
inputdirectory = dirpath
for xls_file in glob.glob(os.path.join(inputdirectory,"*.xls*")):
data_xls = pd.read_excel(xls_file, sheet_name=0, index_col=None)
csv_file = os.path.splitext(xls_file)[0]+".csv"
data_xls.to_csv(csv_file, encoding='utf-8', index=False)
It will convert all xls files in the folder into CSV as I want.
HOWEVER, on doing so, any dates such as 20/12/2018 will be converted to 20/12/2018 00:00:00 which is causing major issues with later data processing.
What is going wrong with this?

Nothing is "going wrong" per se. You simply need to provide a custom date_format to df.to_csv:
date_format : string, default None
Format string for datetime objects
In your case that would be
data_xls.to_csv(csv_file, encoding='utf-8', index=False, date_format='%d/%m/%Y')
This will fix the way the raw data is saved to the file. If you will open the file in Excel you may still see it using the full format. This is because Excel tries to assume the cell formats based on their content. You will need to right click the column and select another cell formatting, there is nothing that pandas or Python can do about that (as long as you are using to_csv and not to_excel).

if the above answers still don't work, try this?
import datetime as dt
xls_data['date']=pd.to_datetime(xls_data['date'], format="%d/%m/%y")
xls_data['date'] = xls_data['date'].dt.date

The original xls file is actually storing this fields as datetime.
When you open it with Excel - you seeing it formated the way Excel think you want to see it based on your settings / OS locale / etc.
When python reads the file, the date cells becomes python date objects.
CSV files are basically just text, it cannot holds datetime objects.
When python needs to write datetime object to a text file it gets the full text.
So you have 2 options:
Change the original file date column to text type.
or the better option:
Use python to iterate this fields and change it the text format you would like to see in the csv.
I just tried to reproduce your issue with no success:
>>>import pandas as pd
>>>xls_data = pd.read_excel('test.xls', sheet_name=0, index_cole=None)
>>>xls_data
name date
0 walla 1988-12-10
1 cool 1999-12-10
>>>xls_data.to_csv(encoding='utf-8', index=False)
'name,date\nwalla,1988-12-10\ncool,1999-12-10\n'`
P.S. Any time you deal with datetime objects you should test the result to see if anything change based on your pc locale settings.

Preserving Rich Text Formatting in Excel via Python

I'm trying to add rows to an Excel file via Python (need this to run and refresh daily). The Excel file is essentially a template, at the top of which has some cells for some of the words within a cell have specific formatting, i.e. cell value "That cat is fluffy".
I can't quite find a way to get Python+Excel to work together to preserve that formatting - it takes the format of the first letter in the cell and applies it across the board.
From what I can tell, this is an issue with preserving rich text, but I haven't been able to find a package that can preserve rich text, read and write excel files.
I followed this thread to come up with the code below: writing to existing workbook using xlwt
But, it looks like that copy step from the xlutils package isn't preserving the rich text formatting.
import xlwt
import xlrd
from xlutils.copy import copy
rb = xlrd.open_workbook(templateFile,formatting_info=True)
r_sheet = rb.sheet_by_index(0)
wb = copy(rb)
w_sheet = wb.get_sheet(0)
xlsfile = Infile
insheet = xlrd.open_workbook(xlsfile,formatting_info=True).sheets()[0]
outrow_idx = 10
for row_idx in xrange(insheet.nrows):
for col_idx in xrange(insheet.ncols):
w_sheet.write(outrow_idx, col_idx,
insheet.cell_value(row_idx, col_idx))
outrow_idx += 1
wb.save(Outfile)

Please refer to this link over here, as it may help you with keeping the formatting
Preserving styles using python's xlrd,xlwt, and xlutils.copy
though it doesn't keep the cell comments

Format csv cells as text with python

I am giving a row of data to write to a csv file. They are mostly float type numbers. But when it writes to the csv file, the cells are default in custom format. So if I have an input number like 3.25, it prints as "Mar 25". How can I avoid this?
This is the piece of code:
data = [0.21, 3.25, 25.9, 5.2]
f = open('Boot.csv','w')
out = csv.writer(f, delimiter=';', quoting=csv.QUOTE_NONE)
out.writerow(data)

The csv module is writing the data fine - I'm guessing that you're opening it in Excel to look at the results and that Excel is deciding to autoformat it as a date.
It's an excel issue, you need to tell it not to play around with that field by changing it to Text (or anything that isn't General)
If you're writing Excel data, you may want to look at the xlwt module (check out the very useful site http://www.python-excel.org/) - then your value types will not be so liable to fluctuate.

This is not an issue, just MS Excel trying to 'help'. If you are going to programmatically process the output csv file further, you'll have no issues.
If you have to process/view the data in Excel you may want to quote all data (by using csv.QUOTE_ALL rather than csv.QUOTE_NONE, in which case Excel should treat everything as text and not try to be 'helpful'.

This isn't part of csv. csv is nothing more than comma separated values. If you open the file in notepad, it'll be as you expect.
When you open it in excel, it makes a guess as to what each value represents, since this information isn't and can't be encoded in the CSV file. For whatever reason, excel decides 3.25 represents a date, not a number.

Try using a format that can't be misinterpreted as a date:
out.writerow(['%.12f' % item for item in data])
This will include trailing zeros so it should always be parsed by Excel as a number.

This is not a problem with the code you've written; it's with Excel (which you're likely using to open the CSV)--it's interpreting 3.25 as March 25. You can fix this by selecting the affected cells, right-clicking and pressing "Format Cells", and then in the "Number" tab selecting "Number" as your category, ensuring that you have the proper number of decimal places displayed.

If all your problem is Excel importing CSV strangely, then you should directly write XLSX files instead of CSV. This gives you full control over the interpretation of cell content.
The best package I have used so far for writing Excel files in Python is openpyxl (even recommended by the author of the wider spread xlwt package).
Some example code taken from the openpyxl docs:
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("sample.xlsx")

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

datetime issue with xlrd & xlwt python libs - python

Related

Pandas: How to write custom time duration format to Excel file with pd.ExcelWriter via Openpyxl

How to get a range of cells while retaining their format from an .xlsx document?

When converting to CSV UTF-8 with python, my date columns are being turned into date-time

Preserving Rich Text Formatting in Excel via Python

Format csv cells as text with python

Categories

Resources