XlDirectionDown and selecting filled cells with Python

XlDirectionDown and selecting filled cells with Python - python

I've already asked the root question but I thought I might see if I can get more help with this. I'm trying to work with XlDirectionDown in order to select the last filled cell in an Excel spreadsheet.
Ultimately, I'd like to use Python to select all filled cells in this sheet from A through AE. It will be copied into a text file and appended into SQL Server...so I don't want any blanks.
What I have so far:
import win32com.client as win32
excel = win32.gencache.EnsureDispatch('Excel.Application')
excel.Visible = 1;
excel.Workbooks.Open('G:/working.xlsx')
XlDirectionDown = 4
last = excel.Range("A:A").End(XlDirectionDown)
excel.Range("A1:A"+str(last)).Select()
First of all, the XlDirectionDown does not seem to work. The cursor in Excel remains on the first cell.
Secondly, I get an exception for the last line in this code (something to do with Range). Does anybody understand what's going on with this code? Also, is there ANY documentation on win32com or Pywin32 out there?? I can't find any how-to's! Thanks as always everyone.

I have used a specific cell rather than range of cells as starting point. Replace
last = excel.Range("A:A").End(XlDirectionDown)
with
last = excel.Range("A1:A1").End(XlDirectionDown)
However if there are any blank cells, this will stop just before it. You probably want to use UsedRange() instead. This will be the smallest range that contains all your cells, according to Excel: you may find (as I have) that resulting range is wider than AE (contains blank columns at end), and contains many entirely blank rows at the bottom. However, since you want to filter out blank cells anyways, those will be skipped during filtering.
As to the exception on last line of code, this is because End returns a Range object, and you can't convert a range to a string, or if you can then str(last) is a range so "A1:A"+str(last) will be an invalid range.
As to filtering out blank cells, I'm not sure what that means: when you copy the data to a text file, what will you put for blank cells? If you have "A blank C" will you put "A C"? The C will end up in wrong column of your database. Anyways just something that caught my attention.
There is no single place for documentation for win32com, although the Python on Windows book has a lot of info, and google gets you results quite useful, including SO hits. The one thing that keeps tripping me whenever I use Excel COM (this is not specific to python's win32com) is that everything in a workbook is a Range, you can't have an individual cells, even when some methods or properties might lead you to think you are getting a cell you're actually getting a range, it often requires a bit of extra thinking about how to go about getting to the desired cell.

I got started with win32com and Excel here.
In your code, what does excel.Range("A:A").End(XlDirectionDown) return? Test it. You might want to add .Select(), and then use excel.Selection.Address to get the last cell. Test it in interactive mode, it's easier to see what's going on there.
As an alternative, you can use a while loop to go through your cells. This code is looping the rows until an empty cell:
excel.Range("A1").Select()
while excel.ActiveCell.Value:
val = excel.ActiveCell.Value
print(val)
excel.ActiveCell.Offset(2,1).Select() # Move a row down
The last line is a bit funny; in VBA you should write Offset(1,0) to go one row down. However in Python you have to add one to both row and column. Maybe due to indexing?

Related

Pick From Drop-Down List Using XLwings in Python

I have an Excel sheet with a few cells locked with data validation using a drop-down list. I want to give a value to those cells using XLwings. I have seen some similar questions, but nothing solves my issue.
I tried with:
app = xw.App(visible=True)
wb = app.books.open(copy_file)
sht = wb.sheets['Sheet1']
list = sht.range('C21').api.Validation.Formula1[1:]
Honestly I don't know how that last line is supposed to work, I found it at https://github.com/ZoomerAnalytics/xlwings/issues/901.
When I try to run it, it just throws an error in xlwindows.py and stops the code.
Anyone can help?

I can't be sure, but after looking at the link you posted, I think the problem is that you're trying to set a range (the [1:] part means from index 1 to the end, not a single value). In the example from the link, it is asking for those values to be returned, so that was OK. But you're trying to set the value, and you can't set a drop-down menu to an entire range. Delete the ':' and you might have more success.

Pasting numbers into python so that they may be used as an array

Currently I have a large amount of numbers that I am copy and pasting from excel from a column into python and they paste as such:
12
13
14
15
16
...
Each number is on a seperate line.
Currently I am having to backspace in front of a number so that it jumps to the previous line then I insert commas between each number so that I can use this array in my code. These numbers come in quantities of hundreds and it is easier currently to copy and paste into excel and then into python versus saving an excel file with these numbers and referencing the excel file in my script. I am using visual studio code for my editor and I tried alt+v and that didnt work(read that as a shot). Let me know if there is an easy way to get my copy and pasted column into a useable array. Thanks

I would recommend you read in the file. But if that's not an option, I occasionally do something like this with Sublime when I want to paste in data quickly (should be able to do with any advanced text editor that allows multiple find/selection):

If I’m reading what you are trying to do correctly, in VS Code, you can do essentially what #sundance suggested by holding Alt and dragging your cursor down the different lines to get a multiline cursor.

You can use excel to transpose the data. copy the range you are wanting to transpose, then paste the column data on a blank sheet using the paste transposed funciton in excel. If your data is too large, 16384 rows, for the column limitations of excel this will cause an error. hopefully this is not the case. Once your data is horizontal you can paste it into python and use this scriptlet to change it into an array.
data = input("paste horizontal data here")
print(data)
data = data.split("\t")
print(data)
\t is a tab character. This what excel will use to separate the columns.
\n is a newline character.
if your data set it too large to use excel columns and you dont want to break it into separate rows, the next best thing is a text editor like sublime, as Sundance suggested.
I do the sublime editor in a different way though. In sublime, use find and replace to search for the return characters and replace them with tabs or commas or whatver separator you want to use.
I put the brackets first then paste the numbers. then go ctrl + h make sure the *. is selected for find regex.
Find : (\d.\d)\n
Replace: '$1',
there's a space after the comma.
make sure there's a new line after the last number in the list. click replace all or use ctrl + alt + enter to make the replacement. last thing is todelete the last comma and space. copy and paste that into your script.
there you go.
I think it would be much easier for users if the script you are using grabs the info from the web directly using selenium or beautiful soup and pandas data frames if you have to grab web tables. asking a user to do this stuff on sublime isn't the best solution in my view because they just want to click 1 button and complete the task they want done. they don't want to have to go through an algorithm each time they have to do this task. that's what programs and programmers are for.

python Win32 Excel is cell a range

I am writing a bit of Python code to automate the manipulation of Excel spreadsheets. The idea is to use spreadsheet templates to create daily reports. Saw this idea working several years ago using Perl. Anyway.
Here are the simple rules:
Sheets with the Workbook are process in the order they appear.
Within the sheets cells are process left to right, then top to bottom.
There are names defined which are single cell ranges, can contain static values or the results of queries. Cells can contain comments which contain SQL queries to run. ...
Here is the problem, as I process the cells I need to check if the cell has an attached comment and if the cell has a name. I am able to handle processing the attached cell comments. But I can not figure out how to determine if a cell is within a named range. In my case the single cell within the range.
I saw a posting the suggested this would work:
cellName = ws.ActiveCell.Name.Name
No luck.
Does anybody have any idea how to do this?
I am so close but no cigar.
Thanks for your attention to this matter.
KD

What you may consider doing is first building a list of all addresses of names in the worksheet, and checking the address of each cell against the list to see if it's named.
In VBA, you obtain the names collection (all the names in a workbook) this way:
Set ns = ActiveWorkbook.Names
You can determine if the names are pointed toward part of the current sheet, and a single cell, this way:
shname = ActiveSheet.Name
Dim SheetNamedCellAddresses(1 To wb.Names.Count) as String
i = 1
For Each n in ns:
If Split(n.Value, "!")(0) = "=" & shname And InStr(n.Value, ":") = 0 Then
' The name Value is something like "=Sheet1!A1"
' If there is no colon, it is a single cell, not a range of cells
SheetNamedCellAddresses(i) = Split(n,"=")(1) 'Add the address to your array, remove the "="
i = i + 1
End If
Next
So now you have a string array containing the addresses of all the named cells in your current sheet. Move that array into a python list and you are good to go.

OK so it errors out if the cell does NOT have a range name. If the cell has a range name the following bit of code returns the name: Great success!!
ws.Cells(r,c).Activate()
c = xlApp.ActiveCell
cellName = c.Name.Name
If there is no name associated with the cell, an exception is tossed.
So even in VBA you would have to wrap this bit of code in exception code. Sounds expensive to me to use exception processing for this call.

how can i change the size of a table in word using python (pywin32)

ms word table with python
I am working with python on word tables, i am generating tables, but all of them are
auto fit window..
is it possible to change it to auto fit contents?
i had tried something like this:
table = location.Tables.Add(location,len(df)+1,len(df.columns)
table.AutoFit(AutoFitBehavior.AutoFitToContents)
but it keeps to raise errors

You want to change you table creation to use this:
//''#Add two ones after your columns
table = location.Tables.Add(location,len(df)+1,len(df.columns),1,1)
Information about why you need those variables can be read here:
http://msdn.microsoft.com/en-us/library/office/ff845710(v=office.15).aspx
But basically, the default behavior is to disable Cell Autofitting and Use Table Autofit to Window. The first "1" enables Cell Autofitting. From the link I posted above, the DefaultTableBehavior can either be wdWord8TableBehavior (Autofit disabled --default) or wdWord9TableBehavior (Autofit enabled). The number comes from opening up Word's VBA editor and typing in the Immediate Window:
?Word.wdWord9TableBehavior
Next, from the link, we see another option called AutoFitBehavior. This is defined as:
Sets the AutoFit rules for how Word sizes tables. Can be one of the WdAutoFitBehavior constants.
So now we have another term to look up. In the VBA editor's Immediate window again type:
?Word.wdAutoFitBehavior.
After the last dot, the possible options should appear. These will be:
wdAutoFitContent
wdAutoFitFixed
wdAutoFitWindow
AutoFitContent looks to be the option we want, so let's finish up that previous line with:
?Word.wdAutoFitBehavior.wdAutoFitContent
The result will be a "1".
Now you may ask, why do we have to go through all this trouble finding the numerical representations of the values. From my experience, with using pywin32 with Excel, is that you can't get the Built-in values, from the string, most of the time. But putting in the numerical representation works just the same.
Also, One more reason for why your code may be failing is that the table object may not have a function "Autofit".
I'm using Word 2007, and Table has the function, AutoFitBehavior.
So Change:
table.AutoFit(AutoFitBehaviour.AutoFitToContent)
To:
table.AutoFitBehavior(1)
//''Which we know the 1 means wd.wdAutoFitBehavior.wdAutoFitContent
Hope I got it right, and this helps you out.

openpyxl please do not assume text as a number when importing

There are numerous questions about how to stop Excel from interpreting text as a number, or how to output number formats with openpyxl, but I haven't seen any solutions to this problem:
I have an Excel spreadsheet given to me by someone else, so I did not create it. When I open the file with Excel, I have certain values like "5E12" (clone numbers, if anyone cares) that appear to display correctly, but there's a little green arrow next to each one warning me that "This appears to be a number stored as text". Excel then asks me if I would like to convert it to a number, and if I saw yes, I get 5000000000000, which then converts automatically to scientific notation and displays 5E12 again, only this time a text output would show the full number with zeroes. Note that before the conversion, this really is text, even to Excel, and I'm only being warned/offered to convert it.
So, when reading this file in with openpyxl (from openpyxl.reader.excel import load_workbook), the 5E12 is getting converted automatically to 5000000000000. I assume that openpyxl is making the same assumption that Excel made, only the conversion happens without a prompt or input on my part.
How can I prevent this from happening? I do not want text that look like "numbers stored as text" to convert to numbers. They are text unless I say so.
So far, the only solution I have found is to add single quotes to the front of each cell, but this is not an ideal solution, as it's manual labor rather than a programmatic solution. Also, the solution needs to be general, since I don't always know where this problem might occur (I'm reading millions of lines per day, so I don't want to have to do anything by hand).
I think this is a problem with openpyxl. There is a google group discussion from the beginning of 2011 that mentions this problem, but assumes it's too rare to matter. https://groups.google.com/forum/?fromgroups=#!topic/openpyxl-users/HZfpShMp8Tk
So, any suggestions?

If you want to use openpyxl again (for whatever reason), the following changes to the worksheet reader routine do the trick of keeping the strings as strings:
diff --git a/openpyxl/reader/worksheet.py b/openpyxl/reader/worksheet.py
--- a/openpyxl/reader/worksheet.py
+++ b/openpyxl/reader/worksheet.py
## -134,8 +134,10 ##
data_type = element.get('t', 'n')
if data_type == Cell.TYPE_STRING:
value = string_table.get(int(value))
-
- ws.cell(coordinate).value = value
+ ws.cell(coordinate).set_value_explicit(value=value,
+ data_type=Cell.TYPE_STRING)
+ else:
+ ws.cell(coordinate).value = value
# to avoid memory exhaustion, clear the item after use
element.clear()
The Cell.value is a property and on assignment call Cell._set_value, which then does a Cell.bind_value which according to the method's doc: "Given a value, infer type and display options". As the types of the values are in the XML file those should be taken (here I only do that for strings) instead of doing something 'smart'.
As you can see from the code, the test whether it is a string was already there.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.