Keep leading zeros in csv and excel in Python

Keep leading zeros in csv and excel in Python - python

I want the '000100' value keep the same in csv file but it changes to 100 in Excel.

The problem is with excel not considering those zeros, not really the csv. I found this somewhere, perhaps add it and see if it works?
df['column']=df['column'].apply('="{}"'.format)

The csv module isn't doing this, it's the spreadsheet program you loaded it with (the csv module was passed a string after all; it can't even consider dropping leading zeroes). If you open the CSV file in a plain text editor you'll see this.
Given that, your options are:
Figure out if your spreadsheet program can customize the import rules to avoid screwing with the data (or after loading, change the column's interpretation rules to something that explicitly says it's text, not a numeric or auto-deduced type).
If the program supports recognizing quoted fields as non-numeric, pass quoting=csv.QUOTE_NONNUMERIC to the csv.writer so it quotes all the fields that aren't of numeric type (e.g. these strings) and hopefully stop the editor from treating them as numeric (causing the leading zero stripping).

The csv is probably fine. But you should open it in a text editor (not Excel) and see for yourself.
Excel will try to remove leading zeros, but you can force them back by adding a custom format to your column:
A filter of 000000 will force 6 digits to be shown:
Highlight all the cells you want to format:
Click on text text format box
Select more number formats
Select custom
In the Type: box, enter 000000:

Related

CSV cannot be interpreted by numeric values

(This is a mix between code and 'user' issue, but since i suspect the issue is code, i opted to post in StackOverflow instead of SuperUser Exchange).
I generated a .csv file with pandas.DataFrame.to_csv() method. This file consists in 2 columns: one is a label (text) and another is a numeric value called accuracy (float). The delimiter used to separate columns is comma (,) and all float values are stored with dot ponctuation like this: 0.9438245862
Even saving this column as float, Excel and Google Sheets infer its type as text. And when i try to format this column as number, they ignore "0." and return a very high value instead of decimals like:
(text) 0.9438245862 => (number) 9438245862,00
I double-checked my .csv file reimporting it again with pandas.read_csv() and printing dataframe.dtypes and the column is imported as float succesfully.
I'd thank for some guidance on what am i missing.
Thanks,

By itself, the csv file should be correct. Both you and Pandas know what delimiter and floating point format are. But Excel might not agree with you, depending on your locale. A simple way to make sure is to write a tiny Excel sheet containing on first row one text value and one floating point one. You then export the file as csv and control what delimiter and floating point formats are.
AFAIK, it is much more easy to change your Python code to follow what your Excel expects that trying to explain Excel that the format of CSV files can vary...
I know that you can change the delimiter and floating point format in the current locale in a Windows system. Simply it is a global setting...

A short example of data would be most useful here. Otherwise we have no idea what you're actually writing/reading. But I'll hazard a guess based on the information you've provided.
The pandas dataframe will have column names. These column names will be text. Unless you tell Excel/Sheets to use the first row as the column name, it will have to treat the column as text. If this isn't the case, could you perhaps save the head of the dataframe to a csv, check it in a text editor, and see how Excel/Sheets imports it. Then include those five rows and two columns in your follow up.

The coding is not necessarily the issue here, but a combination of various factors. I am assuming that your computer is not using the dot character as a decimal separator, due to your language settings (for example, French, Dutch, etc). Instead your computer (and thus also Excel) is likely using a comma as a decimal separator.
If you want to open the data of your analysis / work later with Excel with little to no changes, you can either opt to change how Excel works or how you store the data to a CSV file.
Choosing the later, you can specify the decimal character for the df.to_csv method. It has the "decimal" keyword. You should then also remember that you have to change the decimal character during the importing of your data (if you want to read again the data).
Continuing with the approach of adopting your Python code, you can use the following code snippets to change how you write the dataframe to a csv
import pandas as pd
... some transformations here ...
df.to_csv('myfile.csv', decimal=',')
If you, then, want to read that output file back in with Python (using Pandas), you can use the following:
import pandas as pd
df = pd.read_csv('myfile.csv', decimal=',')

Using Pandas to Read Excel with Column of Variable Type

I'm working on automating back-end Python data processing for an Excel data set provided to my system. One of the columns should be formatted as text containing alphanumeric characters along with hyphen ('-') and period ('.') delimiters.
Excel is auto-formatting this column and converting two numeric values separated with a hyphen as a date. So when I try to load with the Pandas library using read_excel(), it's using the Excel formatting and causing unwanted behavior.
For example: April 05, 2019 entered as "4-5" auto-converts to "5-Apr" but I want it to stay as "4-5". Of course I could just open the file manually and change the column to text but that defeats the goal of full automation.
Using pandas.read_excel(), the column is dtype=object. I tried to convert to str but it just keeps the excel format. Then I tried to convert to int followed by str but the alpha characters throw an error.
Is it possible to make this work on the raw Excel file in Python or do I need to ask the owner of the data source to force the desired formatting?

Pandas to_csv now not writing values correctly

I'm using to csv to save a datframe which looks like this:
PredictionIdx CustomerInterest
0 fe789a06f3 0.654059
1 6238f6b829 0.654269
2 b0e1883ce5 0.666289
3 85e07cdd04 0.664172
in which I've a value '0e15826235' in first column.I'm writing this dataframe to csv using pandas to_csv() . But when I open this csv in google excel or libreoffice it shows 0E in excel and 0 in libreoffice. It is giving me problem during submission in kaggle. But one point to note here is that when I'm reading the same csv using pandas read_csv it shows the above value correctly in dataframe.

As noted in the first comment, the error is resulting from your choice of editor. Many editors will use some version of scientific notation that reads an e (in specific places like the second character) as an indicator of an exponent. Excel, for instance, will read it as a "base X raised to the power Y" where X are the numbers before the e and Y are the numbers after the e. This is a brief description of Excel's scientific notation.
This does not happen in the other cell entries because there appear to be other string-like characters. Excel, Libre, and possibly Google attempt to interpret what the entry is, rather than taking it literally.
In your question you write '0e15826235' with single quotes, indicating that it might be a string, but this might be something to make sure of when writing out the values to a file -- Excel and the rest might not know this is meant to be a string literal.
In general, check for the format of the value and consider what your eventual editor might "think" it is when it opens. For Excel specifically, a single quote character at the start of the string will force Excel to read it as a string. See this answer.

For me code below works correctly with google spreadsheets:
import pandas as pd
df = pd.DataFrame({'PredictionIdx': ['fe789a06f3',
'6238f6b829',
'b0e1883ce5',
'85e07cdd04'],
'CustomerInterest': [0.654059,
0.654269,
0.666289,
0.664172]})
df.to_csv('./test.csv', index = None)
Also csv is very simple text format, it doesn't hold any information about data types.
So you could use df.to_excel() as Nihal suggested, or adjust column type settings in your favourite spreadsheets viewer.

Can the python csv.dictwriter handle string fields that start with a plus sign?

I am trying to create a csv using the python csv.dictwriter and open it with Excel 2013 so that everything is a string (left justified by default) and leave it in the *.csv format in Excel, too.
To get strings everywhere I use the QUOTE_ALL feature. To connect with Excel, I use the default Excel dialect. However, I get a name error when I put a string starting with a + sign into a cell, and any fields that are numeric (within the quotes). I can get rid of the name error by inserting a blank before the plus sign, but I really do not want the blank there.
I think that the way to get Excel to accept all this is to pass the strings with a leading equals sign before the opening double quote, ie:
="Field 1",="2",="+Third Field Here", ...
Is there any way to get the python libraries (latest 3.61 release) to accomplish something like this? I like the idea of a bulletproof library instead of rolling my own, as I have no control over what shows up within my data fields, but is this a covered use case for the standard library?

How to set cell format of currency with OpenPyXL?

I have a Python 3 script that is loading some data into an Excel file on a Windows machine. I need the cell not just the number to be formatted as Currency.
I can use the following format to set the Number format for a cell:
sheet['D48'].number_format = '#,##0'
However, when I try a similar approach using the number format for Currency:
sheet['M48'].number_format = '($#,##0.00_);[Red]($#,##0.00)'
I get this for the custom format. Notice the extra backslashes, they are being added to the format so it does not match with the pre-defined Currency style.
(\$#,##0.00_);[Red](\$#,##0.00)
I have seen this question and used it to get this far. However the answer does not solve the extra backslash issue I am seeing.
Set openpyxl cell format to currency

I just formatted before placing into the cell.
"${:10,.2f}".format(7622086.82)
'$7,622,086.82'

I formatted the cell in Excel, and then copied the format.
This worked for me
.number_format = '[$$-409]#,##0.00;[RED]-[$$-409]#,##0.00'

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Keep leading zeros in csv and excel in Python - python

I want the '000100' value keep the same in csv file but it changes to 100 in Excel.

The problem is with excel not considering those zeros, not really the csv. I found this somewhere, perhaps add it and see if it works? df['column']=df['column'].apply('="{}"'.format)

Related

CSV cannot be interpreted by numeric values

Using Pandas to Read Excel with Column of Variable Type

Pandas to_csv now not writing values correctly

Can the python csv.dictwriter handle string fields that start with a plus sign?

How to set cell format of currency with OpenPyXL?

Categories

Resources