Changing dot to comma with openpyxl - python

I want to use openpyxl to work with an excel file.
Why doesnt the dot change into a comma?
I added this minimal reproducible Example:
my ExcelFile:
111.11
111.12
My Code:
import openpyxl
def someFunction():
wb = openpyxl.load_workbook('test.xlsx')
ws = wb.active
for cell in ws['A']:
cell.number_format = 'Comma'
print(cell.number_format)
print(cell.value)
wb.save('test.xlsx')
someFunction()
Note: I tried different number_format values, like #,##0 and it didnt work either
Expected Output:
Comma
111,11
Comma
111,12
Actual Output:
Comma
111.11
Comma
111.12

I don't have a direct answer for you but I have been troubleshooting this for some time and have some takeaways that may help.
You are definitely targeting and affecting the excel sheet.
If you input the desired output 111,111 and then ask what format that cell is you'll find it doesn't say Comma but instead #,##0 - Using this format instead of 'Comma' you can take a value of 111111 and convert it to 111,111... However, you cannot take the value of 111.111 and convert to 111,111 with this format. It will just change to 111 and hide the remaining 3 digits. No idea why it behaves this way.
When using your method and checking the resulting excel file it comes out as a text error and the actual cell contains a date. Something strange is happening here.
I tried a lot of steps including running outside a function, first formatting the cells to number before changing to comma, targetting individual cells and so on to no avail. Also it seems the comma format always spews an error, so perhaps it's just not supported?

Related

If there is a regex match, append to list

I am trying to check a column of an Excel file for values in a given format and, if there is a match, append it to a list. Here is my code:
from openpyxl import load_workbook
import re
#Open file and read column with PBSID.
PBSID = []
wb = load_workbook(filename="FILE_PATH", data_only=True)
sheet = wb.active
for col in sheet["E"]:
if re.search("\d{3}[-\.\s]??\d{5}", str(col)):
PBSID.append(col.value)
print(PBSID)
Column E of the Excel file contains IDs like 431-00456 that I would like to append to the list named PBSID.
Expected result: PBSID list to be populated with ID in regex mask XXX-XXXXX.
Actual result: Output is an empty list ("[]").
Am I missing something? (I know there are more elegant ways of doing this but I am relatively new to Python and very open to critism).
Thanks!
Semantically, I think the for loop should be written as:
for row in sheet["E"]:
As I'm guessing that sheet["E"] is simply referring to the column 'E' already.
Without seeing exact data that's in a cell, I think what's happening here is that python is interpreting your call to str() as follows:
It's performing a maths operation (in my example) '256 - 23690' before giving you the string of the answer, which is '-23434', and then looking for your regular expression in '-23434' for which it won't find any match (hence no results). Make sure the string is interpreted as a raw string.
You also appear to be referring to the whole row object in 'str(col)', and then referring separately to the row value in 'PBSID.append(col.value)'. It's best to refer to the same object, whichever is more suitable in your case.

Using pandas in Python to export a user input string split into multiple rows

I have a column in Excel that has ~11,500 rows that I'm using to format code to put into a SAS formatting code. Unfortunately, many of the rows have apostrophes in the text, so it is throwing off the added apostrophes I formatted in Excel when I copy it over.
I thought about the problem and thought why not strip the excess apostrophes in Python; I ran into a similar problem with the apostrophes interfering with Python, but I was able to work around this by making the string an input. Up to my conditional statement, everything works as expected -- I input multiple words with apostrophes and it returns them without.
At this point, I want to export the apostrophe-stripped results back to Excel (I'm aware that I could copy and paste, but I don't want to risk losing any data). I was also able to do this using pandas; the problem is that when I define my data to export, it puts all of the data into one cell in one row, whereas I want the data returned in the original ~11,500 rows. I tried using .split, but with no success, so I'm sure that isn't the way to go about it. Any suggestions?? See below:
#Apostrophe Remover
#Ask user to input the desired text as var1
MyString=str(input("Enter Text Here: "))
#Define var2 as an apostrophe
MySubstring="'"
#Search for apostrophes in entry and remove if applicable; state if not applicable
if MySubstring in MyString:
MyString = MyString.replace("'", "")
print()
print(MyString)
else:
print()
print("No apostrophes found!")
#Creating string split by commas as var3
MyStringSplit=MyString.split(',')
print(MyStringSplit)
#Export Python Output to Excel
import pandas as pd
data = {'ICD-10 Code & Description': [MyStringSplit],
}
df = pd.DataFrame(data, columns = ['ICD-10 Code & Description'])
df.to_excel (r'C:/Users/tjm4q2/Desktop/TM Thesis DX Codes Python Output.xlsx', index = False, header=True)
data = {'ICD-10 Code & Description': [MyStringSplit],
}
I believe you just need to remove the square brackets around MyStringSplit since it is already a list. .split() returns a list of strings.
Because it's already a list, you are creating a single row in pandas when including the square brackets:
Instead you want
data = {'ICD-10 Code & Description': MyStringSplit,
}
BTW, you don't even need to define your data dictionary, instead you could delete that entire line and simply do:
df = pd.DataFrame(MyStringSplit, columns = ['ICD-10 Code & Description'])

Python Panads, how do I can keep whitespaces of the cell values with read_excel?

I have Excel table with cell values in A column that contains whitespaces at the beginning (see also piture 1):
Все населення.................
Міське населення.............
Сільське населення...........
СІМФЕРОПОЛЬ(міськрада)......
Міське населення...........
м. СІМФЕРОПОЛЬ............
I read contents of the table with read_excel method:
df = pd.read_excel(file11, sheet_name=sheet, skiprows=9, usecols="A,D:E,G:H", dtype={"А": "str"})
but during the reading process it strips cell values, so that in the resulting dataframe (pic. 2) leading spaces are missing, even when I had explicitly indicated datatype of column A.
I also tried to use a simple converter function:
def savespaces(f):
return f.__repr__()
df = pd.read_excel(
file11, sheet_name=sheet, skiprows=9, usecols="A,D:E,G:H", converters={"А": savespaces})
but is has no desired effect too--it kept only one space for every cell text starting with 1, 2, 3 or more spaces--regardless of count of spaces in raw Excel data (see pic. 3).
These spaces are crucial for me, so I would to keep them.
UPDATE: As #dm2 mentioned below in his comment, "jupyter notebook it doesn't display them [spaces] when printing the whole df, but printing individual values the spaces are obviously there". With this update, the issue became more clear: Jupyter notebook uses HTML styles and display several spaces as one space accordingly. So problem has a different context, and has an answer here:
keep the extra whitespaces in display of pandas dataframe in jupyter notebook

Pandas to_csv now not writing values correctly

I'm using to csv to save a datframe which looks like this:
PredictionIdx CustomerInterest
0 fe789a06f3 0.654059
1 6238f6b829 0.654269
2 b0e1883ce5 0.666289
3 85e07cdd04 0.664172
in which I've a value '0e15826235' in first column.I'm writing this dataframe to csv using pandas to_csv() . But when I open this csv in google excel or libreoffice it shows 0E in excel and 0 in libreoffice. It is giving me problem during submission in kaggle. But one point to note here is that when I'm reading the same csv using pandas read_csv it shows the above value correctly in dataframe.
As noted in the first comment, the error is resulting from your choice of editor. Many editors will use some version of scientific notation that reads an e (in specific places like the second character) as an indicator of an exponent. Excel, for instance, will read it as a "base X raised to the power Y" where X are the numbers before the e and Y are the numbers after the e. This is a brief description of Excel's scientific notation.
This does not happen in the other cell entries because there appear to be other string-like characters. Excel, Libre, and possibly Google attempt to interpret what the entry is, rather than taking it literally.
In your question you write '0e15826235' with single quotes, indicating that it might be a string, but this might be something to make sure of when writing out the values to a file -- Excel and the rest might not know this is meant to be a string literal.
In general, check for the format of the value and consider what your eventual editor might "think" it is when it opens. For Excel specifically, a single quote character at the start of the string will force Excel to read it as a string. See this answer.
For me code below works correctly with google spreadsheets:
import pandas as pd
df = pd.DataFrame({'PredictionIdx': ['fe789a06f3',
'6238f6b829',
'b0e1883ce5',
'85e07cdd04'],
'CustomerInterest': [0.654059,
0.654269,
0.666289,
0.664172]})
df.to_csv('./test.csv', index = None)
Also csv is very simple text format, it doesn't hold any information about data types.
So you could use df.to_excel() as Nihal suggested, or adjust column type settings in your favourite spreadsheets viewer.

How to set cell format of currency with OpenPyXL?

I have a Python 3 script that is loading some data into an Excel file on a Windows machine. I need the cell not just the number to be formatted as Currency.
I can use the following format to set the Number format for a cell:
sheet['D48'].number_format = '#,##0'
However, when I try a similar approach using the number format for Currency:
sheet['M48'].number_format = '($#,##0.00_);[Red]($#,##0.00)'
I get this for the custom format. Notice the extra backslashes, they are being added to the format so it does not match with the pre-defined Currency style.
(\$#,##0.00_);[Red](\$#,##0.00)
I have seen this question and used it to get this far. However the answer does not solve the extra backslash issue I am seeing.
Set openpyxl cell format to currency
I just formatted before placing into the cell.
"${:10,.2f}".format(7622086.82)
'$7,622,086.82'
I formatted the cell in Excel, and then copied the format.
This worked for me
.number_format = '[$$-409]#,##0.00;[RED]-[$$-409]#,##0.00'

Categories

Resources