I am giving a row of data to write to a csv file. They are mostly float type numbers. But when it writes to the csv file, the cells are default in custom format. So if I have an input number like 3.25, it prints as "Mar 25". How can I avoid this?
This is the piece of code:
data = [0.21, 3.25, 25.9, 5.2]
f = open('Boot.csv','w')
out = csv.writer(f, delimiter=';', quoting=csv.QUOTE_NONE)
out.writerow(data)
The csv module is writing the data fine - I'm guessing that you're opening it in Excel to look at the results and that Excel is deciding to autoformat it as a date.
It's an excel issue, you need to tell it not to play around with that field by changing it to Text (or anything that isn't General)
If you're writing Excel data, you may want to look at the xlwt module (check out the very useful site http://www.python-excel.org/) - then your value types will not be so liable to fluctuate.
This is not an issue, just MS Excel trying to 'help'. If you are going to programmatically process the output csv file further, you'll have no issues.
If you have to process/view the data in Excel you may want to quote all data (by using csv.QUOTE_ALL rather than csv.QUOTE_NONE, in which case Excel should treat everything as text and not try to be 'helpful'.
This isn't part of csv. csv is nothing more than comma separated values. If you open the file in notepad, it'll be as you expect.
When you open it in excel, it makes a guess as to what each value represents, since this information isn't and can't be encoded in the CSV file. For whatever reason, excel decides 3.25 represents a date, not a number.
Try using a format that can't be misinterpreted as a date:
out.writerow(['%.12f' % item for item in data])
This will include trailing zeros so it should always be parsed by Excel as a number.
This is not a problem with the code you've written; it's with Excel (which you're likely using to open the CSV)--it's interpreting 3.25 as March 25. You can fix this by selecting the affected cells, right-clicking and pressing "Format Cells", and then in the "Number" tab selecting "Number" as your category, ensuring that you have the proper number of decimal places displayed.
If all your problem is Excel importing CSV strangely, then you should directly write XLSX files instead of CSV. This gives you full control over the interpretation of cell content.
The best package I have used so far for writing Excel files in Python is openpyxl (even recommended by the author of the wider spread xlwt package).
Some example code taken from the openpyxl docs:
from openpyxl import Workbook
wb = Workbook()
# grab the active worksheet
ws = wb.active
# Data can be assigned directly to cells
ws['A1'] = 42
# Rows can also be appended
ws.append([1, 2, 3])
# Python types will automatically be converted
import datetime
ws['A2'] = datetime.datetime.now()
# Save the file
wb.save("sample.xlsx")
Related
From Python i want to export to csv format a dataframe
The dataframe contains two columns like this
So when i write this :
df['NAME'] = df['NAME'].astype(str) # or .astype('string')
df.to_csv('output.csv',index=False,sep=';')
The excel output in csv format returns this :
and reads the value "MAY8218" as a date format "may-18" while i want it to be read as "MAY8218".
I've tried many ways but none of them is working. I don't want an alternative like putting quotation marks to the left and the right of the value.
Thanks.
If you want to export the dataframe to use it in excel just export it as xlsx. It works for me and maintains the value as string in the original format.
df.to_excel('output.xlsx',index=False)
The CSV format is a text format. The file contains no hint for the type of the field. The problem is that Excel has the worst possible support for CSV files: it assumes that CSV files always use its own conventions when you try to read one. In short, one Excel implementation can only read correctly what it has written...
That means that you cannot prevent Excel to interpret the csv data the way it wants, at least when you open a csv file. Fortunately you have other options:
import the csv file instead of opening it. This time you have options to configure the way the file should be processed.
use LibreOffice calc for processing CSV files. LibreOffice is a little behind Microsoft Office on most points except for csv file handling where it has an excellent support.
So I have a csv file with a column called reference_id. The values in reference id are 15 characters long, so something like '162473985649957'. When I open the CSV file, excel has changed the datatype to General and the numbers are something like '1.62474E+14'. To fix this in excel, I change the column type to Number and remove the decimals and it displays the correct value. I should add, it only does this in CSV file, if I output to xlsx, it works fine. PRoblem is, the file has to be csv.
Is there a way to fix this using python? I'm trying to automate a process. I have tried using the following to convert it to a string. It works in the sense that is converts the column to a string, but it still shows up incorrectly in the csv file.
df['reference_id'] = df['reference_id'].astype(str)
df.to_csv(r'Prev Day Branch Transaction Mems.csv')
Thanks
When I open the CSV file, excel has changed the data
This is an Excel problem. You can't fix how Excel decides to interpret your CSV. (You can work around some issues by using the text import format, but that's cumbersome.)
Either use XLS/XLSX files when working with Excel, or use eg. Gnumeric our something other that doesn't wantonly mangle your data.
import pandas as pd
check = pd.read_csv('1.csv')
nocheck = check['CUSIP'].str[:-1]
nocheck = nocheck.to_frame()
nocheck['CUSIP'] = nocheck['CUSIP'].astype(str)
nocheck.to_csv('NoCheck.csv')
This works but while writing the csv, a value for an identifier like 0003418 (type = str) converts to 3418 (type = general) when the csv file is opened in Excel. How do I avoid this?
I couldn't find a dupe for this question, so I'll post my comment as a solution.
This is an Excel issue, not a python error. Excel autoformats numeric columns to remove leading 0's. You can "fix" this by forcing pandas to quote when writing:
import csv
# insert pandas code from question here
# use csv.QUOTE_ALL when writing CSV.
nocheck.to_csv('NoCheck.csv', quoting=csv.QUOTE_ALL)
Note that this will actually put quotes around each value in your CSV. It will render the way you want in Excel, but you may run into issues if you try to read the file some other way.
Another solution is to write the CSV without quoting, and change the cell format in Excel to "General" instead of "Numeric".
2 Questions to ask:
Ques 1:
I just started studying about xlrd for reading excel file in python.
I was wondering if there is a method in xlsrd --> similar to get_active_sheet() in openpyxl or any other way to get the Active sheet ?
get_active_sheet() works this in openpyxl
import openpyxl
wb = openpyxl.load_workbook('example.xlsx')
active_sheet = wb.get_active_sheet()
output : Worksheet "Sheet1"
I had found methods in xlrd for retrieving the names of sheets, but none of them could tell me the active sheet.
Ques 2:
Is xlrd the best packaage in python for reading excel files? I also came across this which had info about other python packages(xlsxwriterxlwtxlutils) for reading and writing excel files.
Which of the above all will be best for making an App which reads an Excel File and applies different validations to to different columns
For eg: Column with Header 'ID' should have unique values and A column with Header 'Country' should have valid Countries.
The "active sheet" here seems you're referring to the last sheet selected when the workbook was saved/closed. You can get this sheet via the sheet_visible value.
import xlrd
xl = xlrd.open_workbook("example.xls")
for sht in xl.sheets():
# sht.sheet_visible value of 1 is "active sheet"
print(sht.name, sht.sheet_selected, sht.sheet_visible)
Usually only one sheet is selected at a time, so it may look like sheet_visible and sheet_selected are the same, but multiple sheets can be selected at a time (ctrl+click multiple sheet tabs, for example).
Another reason this may seem confusing is because Excel uses "visible" in terms of hidden/visible sheets. In xlrd, this is instead sheet.visibility (see https://stackoverflow.com/a/44583134/4258124)
Welcome to Stack Overflow.
I have been working with Excel files in Python for a while now, so I could help you with your question, I think.
openpyxl and xlrd solve different problems, one is for xlsx files (Excel 2007+), where the other one is for xls files (Excel 1997-2003), respectively.
Xenon said in his answer that Excel doesn't recognize the concept of an active sheet, which is not totally true. If you open an Excel document, go to some other sheet (that isn't the first one) and save and close the document, the next time you open it, Excel will open the document on the last sheet you were on.
However, xlrd does not support this kind of workflow, i.e. asking for the active sheet. If you know the sheet name, then you could use the method sheet_by_name, or if you know the sheet index, you could use the method sheet_by_index.
I don't know if the xlrd is the best package around, but it is pretty solid, and I have had nary a problem using it.
The example given could be solved by first iterating through the first row and keeping a dictionary of which column a header is. Then storing all the values in the ID column in a list and comparing the length of that list with the length of a set created from that list, i.e. len(values) == len(set(values)). Following that, you could iterate through the column with header of Country and check each value if it is in a dictionary you previously made with all the valid counties.
I hope this answer suits your needs.
Summary: Stick with xlrd because is mature enough.
You can see all worksheets in a given workbook with the sheet_names() function. Excel has no concept of an "active sheet", but if my assumption that you are referring to the first sheet is correct, you can get the first element of sheet_names() to get the "active sheet."
With regards to your second question, it's not easy to say that a package is better than another package objectively. However, xlrd is widely used, and the most popular Python library for what it does.
I would recommend sticking with it.
I'm trying to write some dates from one excel spreadsheet to another. Currently, I'm getting a representation in excel that isn't quite what I want such as this: "40299.2501157407"
I can get the date to print out fine to the console, however it doesn't seem to work right writing to the excel spreadsheet -- the data must be a date type in excel, I can't have a text version of it.
Here's the line that reads the date in:
date_ccr = xldate_as_tuple(sheet_ccr.cell(row_ccr_index, 9).value, book_ccr.datemode)
Here's the line that writes the date out:
row.set_cell_date(11, datetime(*date_ccr))
There isn't anything being done to date_ccr in between those two lines other than a few comparisons.
Any ideas?
You can write the floating point number directly to the spreadsheet and set the number format of the cell. Set the format using the num_format_str of an XFStyle object when you write the value.
https://secure.simplistix.co.uk/svn/xlwt/trunk/xlwt/doc/xlwt.html#xlwt.Worksheet.write-method
The following example writes the date 01-05-2010. (Also includes time of 06:00:10, but this is hidden by the format chosen in this example.)
import xlwt
# d can also be a datetime object
d = 40299.2501157407
wb = xlwt.Workbook()
sheet = wb.add_sheet('new')
style = xlwt.XFStyle()
style.num_format_str = 'DD-MM-YYYY'
sheet.write(5, 5, d, style)
wb.save('test_new.xls')
There are examples of number formats (num_formats.py) in the examples folder of the xlwt source code. On my Windows machine: C:\Python26\Lib\site-packages\xlwt\examples
You can read about how Excel stores dates (third section on this page): https://secure.simplistix.co.uk/svn/xlrd/trunk/xlrd/doc/xlrd.html