python excel reading last empty cell - python

I am trying to read till the last empty cell for the specified number of rows.
Here is my code:
for j in xrange(0,REPEAT_CONST,1):
if r_sheet.cell_type(row+j,0)== xlrd.XL_CELL_EMPTY:
break
This code only works if the cell has been written something earlier and deleted. but will not work if the cell is never been edited. Not sure how to handle this.
Could you please help me to do this.
I will be grateful for your support.
Regards,
Pavan

The error you're experiencing suggests the rowx or colx arguments are out of bounds for this sheet.
In the cell access functions, "rowx" is a row index, counting from zero, and "colx" is a column index, counting from zero. Negative values for row/column indexes and slice positions are supported in the expected fashion.
I just found this out (never used XLRD before) but if you query the r_sheet.nrows I bet you'll find that the value is less than row+j. It appears that xlrd only reads part of the worksheet, essentially the UsedRange from Excel.
So you can use some exception handling, either a try/catch block or you could do this. Note this should short-circuit on the first part of the boolean expression whenever row+j is greater than the nrows attribute of that sheet.
if (row+j <= r_sheet.nrows-1) and (r_sheet.cell_value(row+j,0) = ''):
break
Or, perhaps even with your original method:
if (row+j <= r_sheet.nrows-1) and (r_sheet.cell_type(row+j,0)== xlrd.XL_CELL_EMPTY):
break

Related

Python Pandas. Endless cycle

Why does this part of the code have an infinite loop? It can't be so, because where I stop this part of code (in Jupyter Notebook), all 99999 values have changed to oil_mean_by_year[data.loc[i]['year']]
for i in data.index:
if data.loc[i]['dcoilwtico'] == 99999:
data.loc[i, 'dcoilwtico'] = oil_mean_by_year[data.loc[i]['year']]
Use merge to align the oil mean of a year with the given row:
Merge on data['year'] vs oil_mean_by_year's index
data_with_oil_mean = pd.merge(data, oil_mean_by_year.rename("oil_mean"),
left_on="year", right_index=True, how="left")
data_with_oil_mean['dcoilwtico'] = data_with_oil_mean['dcoilwtico'].mask(lambda xs: xs.eq(99999), data_with_oil_mean['oil_mean'])
This is a common mistake when using Pandas and it happens due to some misunderstanding about how Python works with lists. Let's take a look at what actually happens here.
We are trying to change dcoilwtico value for each row that has year equal to 99999. We do that by assigning new value to this column only if current value equals 99999. That means we need to check every single element of our list against 99999 and then assign new value to dcoilwtico only if needed. But there is no way to perform such operation on a list like this one without knowing its length beforehand! So, as soon as you try to access any item from this list that doesn't exist yet - e.g., data.loc(i, 'dcoilwtico') - your program will crash. And since you don't know anything about size of this list before running the script, it'll keep crashing until either memory runs out or you manually terminate the process.
The solution to this problem is simple. Just make sure that your condition checks whether index exists first:
if data.loc(i, 'dcoilwtico') == 99999:
data.loc(i, 'dcoilwtico') = oil_mean_by_year.get(data.loc(i, 'year'), 0)
else:
#...
Now your script should work fine.

The code that works individually breaks in the loop on 3rd-4th iteration, no matter what the input is

I wrote a script (can't publish all of it here, it is big), that downloads the CSV file, checks the rages and creates a new CSV file that has all "out of range" info.
The script was checked on all existing CSV files and works without errors.
Now I am trying to loop through all of them to generate the "out of range" data but it errors after the 3rd or 4th iteration no matter what the input file is.
I tried to swap the queue of files, and the ones that errored before are processed just fine, but the error still appears on 3rd-4th iteration.
What may be the issue with this?
The error I get is the ValueError: cannot reindex on an axis with duplicate labels
when I run the line assigning the out of range values to the column
dataframe.loc[dataframe['Flagged_measure'] == flags[i][0], ['Flagged_measure']] = dataframe[dataframe['Flagged_measure'] == flags[i][0]]['Flagged_measure'].astype(str) + ' , ' + csv_report_df.loc[flags[i][1], flags[i][0]].astype(str)
The ValueError you mentioned occurs when you join/assign to a column that has duplicate index values. From what I can infer from the single line of code you posted, I'll break it down and maybe it could be clear whether your assignment makes sense:
dataframe.loc[dataframe['Flagged_measure'] == flags[i][0], ['Flagged_measure']]
I equate the rows of the column Flagged_measure in dataframe that matches with flags[i][0] with some RHS value, preferably a single value per iteration.
dataframe[dataframe['Flagged_measure'] == flags[i][0]]['Flagged_measure'].astype(str) + ' , ' + csv_report_df.loc[flags[i][1], flags[i][0]].astype(str)
This way of assignment makes no sense whatsoever. You perform a grouped operation but at the same time, use a single-value assignment for changing values in dataframe.
Might I suggest you try this?
dataframe['Flagged_measure'] = dataframe['Flagged_measure'].apply(lambda row: (" , ".join([str(row),str(csv_report_df.iloc[flags[i][1], flags[i][0]]]))) if row == flags[i][0])
If it still doesn't work, maybe you need to look into csv_report_df as well. As far as I know, loc is good for label-based indices, but not for numeric-based indexing, as I think you're looking to achieve here.

How to avoid reading #Div/0 from Excel using python

I'm using openpyxl to read some numerical values from Excel files, while proceeding to read the numbers on a column I want to avoid the division by zero cells. I know that there are 4 or 5 among 100 numbers.
I used the if not conditions in the way:
N=[]
If not ZerodivisionError:
N.append(cell.value)
Else
Break
But this turns the list empty. If I don't use the condition the error numbers will also be present in the list
I found the answer to my question and would like to leave it here to be more useful :
Since division by zero is as '#DIV/0!', that's enough to set the condition :
if type(cell.value) is not str :
and the cell will be avoided
Why not trap the error in the excel spreadsheet earlier:
=IFERROR(your_formula,0)
Replace the zero as suits your application : a blank or text etc.

python Win32 Excel is cell a range

I am writing a bit of Python code to automate the manipulation of Excel spreadsheets. The idea is to use spreadsheet templates to create daily reports. Saw this idea working several years ago using Perl. Anyway.
Here are the simple rules:
Sheets with the Workbook are process in the order they appear.
Within the sheets cells are process left to right, then top to bottom.
There are names defined which are single cell ranges, can contain static values or the results of queries. Cells can contain comments which contain SQL queries to run. ...
Here is the problem, as I process the cells I need to check if the cell has an attached comment and if the cell has a name. I am able to handle processing the attached cell comments. But I can not figure out how to determine if a cell is within a named range. In my case the single cell within the range.
I saw a posting the suggested this would work:
cellName = ws.ActiveCell.Name.Name
No luck.
Does anybody have any idea how to do this?
I am so close but no cigar.
Thanks for your attention to this matter.
KD
What you may consider doing is first building a list of all addresses of names in the worksheet, and checking the address of each cell against the list to see if it's named.
In VBA, you obtain the names collection (all the names in a workbook) this way:
Set ns = ActiveWorkbook.Names
You can determine if the names are pointed toward part of the current sheet, and a single cell, this way:
shname = ActiveSheet.Name
Dim SheetNamedCellAddresses(1 To wb.Names.Count) as String
i = 1
For Each n in ns:
If Split(n.Value, "!")(0) = "=" & shname And InStr(n.Value, ":") = 0 Then
' The name Value is something like "=Sheet1!A1"
' If there is no colon, it is a single cell, not a range of cells
SheetNamedCellAddresses(i) = Split(n,"=")(1) 'Add the address to your array, remove the "="
i = i + 1
End If
Next
So now you have a string array containing the addresses of all the named cells in your current sheet. Move that array into a python list and you are good to go.
OK so it errors out if the cell does NOT have a range name. If the cell has a range name the following bit of code returns the name: Great success!!
ws.Cells(r,c).Activate()
c = xlApp.ActiveCell
cellName = c.Name.Name
If there is no name associated with the cell, an exception is tossed.
So even in VBA you would have to wrap this bit of code in exception code. Sounds expensive to me to use exception processing for this call.

XlDirectionDown and selecting filled cells with Python

I've already asked the root question but I thought I might see if I can get more help with this. I'm trying to work with XlDirectionDown in order to select the last filled cell in an Excel spreadsheet.
Ultimately, I'd like to use Python to select all filled cells in this sheet from A through AE. It will be copied into a text file and appended into SQL Server...so I don't want any blanks.
What I have so far:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = 1;
excel.Workbooks.Open('G:/working.xlsx')
XlDirectionDown = 4
last = excel.Range("A:A").End(XlDirectionDown)
excel.Range("A1:A"+str(last)).Select()
First of all, the XlDirectionDown does not seem to work. The cursor in Excel remains on the first cell.
Secondly, I get an exception for the last line in this code (something to do with Range). Does anybody understand what's going on with this code? Also, is there ANY documentation on win32com or Pywin32 out there?? I can't find any how-to's! Thanks as always everyone.
I have used a specific cell rather than range of cells as starting point. Replace
last = excel.Range("A:A").End(XlDirectionDown)
with
last = excel.Range("A1:A1").End(XlDirectionDown)
However if there are any blank cells, this will stop just before it. You probably want to use UsedRange() instead. This will be the smallest range that contains all your cells, according to Excel: you may find (as I have) that resulting range is wider than AE (contains blank columns at end), and contains many entirely blank rows at the bottom. However, since you want to filter out blank cells anyways, those will be skipped during filtering.
As to the exception on last line of code, this is because End returns a Range object, and you can't convert a range to a string, or if you can then str(last) is a range so "A1:A"+str(last) will be an invalid range.
As to filtering out blank cells, I'm not sure what that means: when you copy the data to a text file, what will you put for blank cells? If you have "A blank C" will you put "A C"? The C will end up in wrong column of your database. Anyways just something that caught my attention.
There is no single place for documentation for win32com, although the Python on Windows book has a lot of info, and google gets you results quite useful, including SO hits. The one thing that keeps tripping me whenever I use Excel COM (this is not specific to python's win32com) is that everything in a workbook is a Range, you can't have an individual cells, even when some methods or properties might lead you to think you are getting a cell you're actually getting a range, it often requires a bit of extra thinking about how to go about getting to the desired cell.
I got started with win32com and Excel here.
In your code, what does excel.Range("A:A").End(XlDirectionDown) return? Test it. You might want to add .Select(), and then use excel.Selection.Address to get the last cell. Test it in interactive mode, it's easier to see what's going on there.
As an alternative, you can use a while loop to go through your cells. This code is looping the rows until an empty cell:
excel.Range("A1").Select()
while excel.ActiveCell.Value:
val = excel.ActiveCell.Value
print(val)
excel.ActiveCell.Offset(2,1).Select() # Move a row down
The last line is a bit funny; in VBA you should write Offset(1,0) to go one row down. However in Python you have to add one to both row and column. Maybe due to indexing?

Categories

Resources