Convert Excel row,column indices to alphanumeric cell reference in python/openpyxl - python

I want to convert the row and column indices into an Excel alphanumeric cell reference like 'A1'. I'm using python and openpyxl, and I suspect there's a utility somewhere in that package that does this, but I haven't found anything after some searching.
I wrote the following, which works, but I'd rather use something that's part of the openpyxl package if it's available.
def xlref(row,column):
"""
xlref - Simple conversion of row, column to an excel string format
>>> xlref(0,0)
'A1'
>>> xlref(0,26)
'AA1'
"""
def columns(column):
from string import uppercase
if column > 26**3:
raise Exception("xlref only supports columns < 26^3")
c2chars = [''] + list(uppercase)
c2,c1 = divmod(column,26)
c3,c2 = divmod(c2,26)
return "%s%s%s" % (c2chars[c3],c2chars[c2],uppercase[c1])
return "%s%d" % (columns(column),row+1)
Does anyone know a better way to do this?

Here's the full new xlref using openpyxl.utils.get_column_letter from #Rick's answer:
from openpyxl.utils import get_column_letter
def xlref(row, column, zero_indexed=True):
if zero_indexed:
row += 1
column += 1
return get_column_letter(column) + str(row)
Now
>>> xlref(0, 0)
'A1'
>>> xlref(100, 100)
'CW101'

Looks like openpyxl.utils.get_column_letter does the same function as my columns function above, and is no doubt a little more hardened than mine is. Thanks for reading!

Older question, but maybe helpful: when using XlsxWriter, one can use xl_rowcol_to_cell() like this:
from xlsxwriter.utility import xl_rowcol_to_cell
cell = xl_rowcol_to_cell(1, 2) # C2
See Working with Cell Notation.

Related

make a copy of a string column and cut the string based on certain value

I have a DataFrame with a column with installation KKS-codes in Python.
The KKS-codes look like this:
1BLA43AA030
1BOR53AR021
1BHY28UI021
I want to make a new column where the string only has the relevant information. sometimes the code requires a number but it usually doesn't. The required number is given after the 3digit letter which specify the certain object. like this:
BLA
BOR
BHY2
I cut the full KKS-codes with
df_1['KKS'] = df_1.Object.str[1:4]
but for certain strings i need it to be
df_1['KKS'] = df_1.Object.str[1:5]
My if-statements don't work, please help
I dont fully understand what you mean by
The required number is given after the 3digit letter which specify the certain object.
If you can explain this further with examples I can help more. Otherwise, this is how you can apply a function to a row in a dataframe:
import pandas as pd
def test_for_four(s: str) -> bool:
third_digit_letter = s[4]
if third_digit_letter != "2":
return True
return False
def split_kks_code(s: str) -> str:
if test_for_four(s):
return s[1:4]
return s[1:5]
df = pd.DataFrame([{'KKS-Code': '1BLA43AA030'},
{'KKS-Code': '1BOR53AR021'},
{'KKS-Code': '1BHY28UI021'}])
df['KKS'] = df['KKS-Code'].apply(split_kks_code)

How do I present my output as a Pandas dataframe?

CHECK_OUTPUT_HERE
Currently, the output I am getting is in the string format. I am not sure how to convert that string to a pandas dataframe.
I am getting 3 different tables in my output. It is in a string format.
One of the following 2 solutions will work for me:
Convert that string output to 3 different dataframes. OR
Change something in the function so that I get the output as 3 different data frames.
I have tried using RegEx to convert the string output to a dataframe but it won't work in my case since I want my output to be dynamic. It should work if I give another input.
def column_ch(self, sample_count=10):
report = render("header.txt")
match_stats = []
match_sample = []
any_mismatch = False
for column in self.column_stats:
if not column["all_match"]:
any_mismatch = True
match_stats.append(
{
"Column": column["column"],
"{} dtype".format(self.df1_name): column["dtype1"],
"{} dtype".format(self.df2_name): column["dtype2"],
"# Unequal": column["unequal_cnt"],
"Max Diff": column["max_diff"],
"# Null Diff": column["null_diff"],
}
)
if column["unequal_cnt"] > 0:
match_sample.append(
self.sample_mismatch(column["column"], sample_count, for_display=True)
)
if any_mismatch:
for sample in match_sample:
report += sample.to_string()
report += "\n\n"
print("type is", type(report))
return report
Since you have a string, you can pass your string into a file-like buffer and then read it with pandas read_csv into a dataframe.
Assuming that your string with the dataframe is called dfstring, the code would look like this:
import io
bufdf = io.StringIO(dfstring)
df = pd.read_csv(bufdf, sep=???)
If your string contains multiple dataframes, split it with split and use a loop.
import io
dflist = []
for sdf in dfstring.split('\n\n'): ##this seems the separator between two dataframes
bufdf = io.StringIO(sdf)
dflist.append(pd.read_csv(bufdf, sep=???))
Be careful to pass an appropriate sep parameter, my ??? means that I am not able to understand what could be a proper parameter. Your field are separated by spaces, so you could use sep='\s+') but I see that you have also spaces which are not meant to be a separator, so this may cause a parsing error.
sep accept regex, so to have 2 consecutive spaces as a separator, you could do: sep='\s\s+' (this will require an additional parameter engine='python'). But again, be sure that you have at least 2 spaces between two consecutive fields.
See here for reference about the io module and StringIO.
Note that the io module exists in python3 but not in python2 (it has another name) but since the latest pandas versions require python3, I guess you are using python3.

Pyexcel changing a cell value

So I was using openpyxl for all my Excel projects, but now I have to work with .xls files, so I was forced to change library. I've chosen pyexcel cuz it seemed to be fairly easy and well documented. So I've gone through hell with creating hundreds of variables, cuz there is no .index property, or something.
What I want to do now is to read the column in the correct file, f.e "Quantity" column, and get f.e value 12 from it, then check the same column in other file, and if it is not 12, then make it 12. Easy. But I cannot find any words about changing a single cell value in their documentation. Can you help me?
I didn't get it, wouldn't it be the most simple thing?
column_name = 'Quantity'
value_to_find = 12
sheets1 = pe.get_book(file_name='Sheet1.xls')
sheets1[0].name_columns_by_row(0)
row = sheets1[0].column[column_name].index(value_to_find)
sheets2 = pe.get_book(file_name='Sheet2.xls')
sheets2[0].name_columns_by_row(0)
if sheets2[0][row, column_name] != value_to_find:
sheets2[0][row, column_name] = value_to_find
EDIT
Strange, you can only assign values if you use cell_address indexing, must be some bug. Add this function:
def index_to_letter(n):
alphabet = list('ABCDEFGHIJKLMNOPQRSTUVWXYZ')
result = []
while (n > 26):
result.insert(0, alphabet[(n % 26)])
n = n // 26
result.insert(0, alphabet[n])
return ''.join(result)
And modify the last part:
sheets2[0].name_columns_by_row(0)
col_letter = index_to_letter(sheets2[0].colnames.index(column_name))
cel_address = col_letter+str(row+1)
if sheets2[0][cel_address] != value_to_find:
sheets2[0][cel_address] = value_to_find
EDIT 2
Looks like you cannot assign only when you use the column name directly, so a around would be to find the column_name's index:
sheets2[0].name_columns_by_row(0)
col_index = sheets2[0].colnames.index(column_name)
if sheets2[0][row, col_index] != value_to_find:
sheets2[0][row, col_index] = value_to_find
Excel uses 2 sets of references to a cell. Cell name ("A1") and cell vector (Row, Column).
The PyExcel documentation tutorial states it supports both methods. caiohamamura's method tries to build the cell name - you don't need to if the cells are in the same location in each file, you can use the vector.
Once you have the cell, assigning a value to a single cell is simple - you assign the value. Example:
import pyexcel
sheet = pyexcel.get_sheet(file_name="Units.xls")
print(sheet[3,2]) # this gives me "cloud, underwater"
sheet[3,2] = "cloud, underwater, special"
sheet.save_as("Units1.xls")
Note that all I had to do was "sheet[3,2] =".
This is not explicitly stated but is hinted at in the pyexcel documentation where it states that to update a whole column you do:
sheet.column["Column 2"] = [11, 12, 13]
i.e. replace a list by assigning a new list. Same logic applies to a single cell - just assign a new value.
Bonus - the [Row, column] method gets around cell locations in columns greater than 26 (i.e. 'AA' and above).
Caveat - make sure in your comparison you are comparing like-for-like i.e. int is understood to be an int and not a str. Python should implicitly converted but in some circumstances it may not - especially if you are using python 2 and Unicode is involved.

Issues with adding a variable to python gspread

I have started to use the gspread library and have sheet already that I'd like to append after the last row that has data in it. I'll retrieve the values between A1 and maxrows to loop through them and check if they are empty. However, I am unable to add a variable to the second line here. But perhaps I am just not escaping it correct? I bet this is very simple:
maxrows = "A" + str(worksheet.row_count)
cell_list = worksheet.range('A1:A%s') % (maxrows)
Your variable maxrows already is in the form of "An", the concatenation already contains the letter and the number
But you are adding an extra A to it here worksheet.range('A1:A%s')
Also you're not using the string interpolation correctly with % (in your code you are not applying % to the range string)
It should have been one of these
maxrows = "A" + str(worksheet.row_count)
worksheet.range('A1:%s' % maxrows)
or
worksheet.range('A1:A%d' % worksheet.row_count)
(among other possible solutions)

Python xlrd package issue

Is there any way to overcome the next problem:
When using the xlrd package in Python, the value '1' in the excel file is shows as '1.0'.
Now, I am writing a script that should be able to notice the difference, I am using this values as indexes, so '1' and '1.0' are completely different indexes, but I cant find any way to overcome this issue.
Is there something I can do?
Thanks.
Yes, this is a common problem with xlrd.
You generally need to convert to int, when the number is actually of int type and rest of the times you need it as a float itself.
So here is something that works out well in most cases:
int_value = int(float_value)
if float_value == int_value:
converted_value = int_value
else:
converted_value = float_value
Example
>>> a = 123.0
>>> type(a)
<type 'float'>
>>> b = int(a)
>>> a == b
True
I had the same issue while parsing excel files using xlrd. The easiest route that I found was to just convert the known float values to ints. Unless you happen to have the situation, where you don't know the datatype of the field you are parsing.
In that situation, you can try converting to int, and catching the error. For eg:
try:
converted_int = int(float_number)
except Error:
//log or deal with error here.
use int(some_number) to convert the number to integer.
You can see the documentation for int function from here: BIF
Hope this will help you :)
You can convert the number to int using int(input_number).
An idea to use isinstance() method also helps. You can use something like
import xlrd
book = xlrd.open_workbook('input.xlsx')
sheet = book.sheet_by_index(0)
data = []
for row in range(1, sheet.nrows): # skip the header
l = []
for column in range(0, sheet.ncols):
val = sheet.cell(row, column).value
if isinstance(val,float):
l.append(int(val))
else:
l.append(val)
data.append(l)
print(data) # [['id101', 1], ['id102', 2], ['id103', 3]]

Categories

Resources