Using BeautifulSoup, how to guard against elements not being found?

Using BeautifulSoup, how to guard against elements not being found? - python

I am looping through table rows in a table, but the first 1 or 2 rows doesn't have the elements I am looking for (they are for table column headers etc.).
So after say the 3rd table row, there are elements in the table cells (td) that have what I am looking for.
e.g.
td[0].a.img['src']
But calling this fails since the first few rows don't have this.
How can I guard against these cases so my script doesn't fail?
I get errors like:
nonetype object is unsubscriptable

Simplest and clearest, if you want your code "in line":
theimage = td[0].a.img
if theimage is not None:
use(theimage['src'])
Or, preferably, wrap the None check in a tiny function of your own, e.g.:
def getsrc(image):
return None if image is None else image['src']
and use getsrc(td[0].a.img).

Starting from tr:
for td in tr.findChildren('td'):
img = td.findChild('img')
if img:
src = img.get('src', '') # return a blank string if there's no src attribute
if src:
# do something with src

Related

forLoop except criteria is deleting whole row, which i do not want

Let me start with the background: I have a Dataframe. It's a column of hyperlinks. I use a forLoop to extract the hyperlinks with an attribute target and add them to an appended column.
Result of successful forLoop.
now let me throw curveball: a blank/gap. let's say that there is a gap in the column and Source C is out of the picture – what happens to the forLoop then?
Result of unwanted forLoop
what if instead of deleting the entire row, I want the forLoop to put a blank cell there? so that no data is being rearranged and Source C has a blank cell or NaN cell next to it. Does that make sense? What are my options? (also note that my print() function is not really working as I intend it to.) for what it's worth, ws.cell is an openpyxl operation that accesses a cell of an Excel sheet.
Here is the hard code just in case:
links = []
for i in range(2, ws.max_row + 1): # 2nd arg in range() not inclusive, so add 1
try:
links.append(ws.cell(row=i, column=1).hyperlink.target)
except AttributeError or NaN:
print('nothing here')
df['link'] = pd.Series(links)
df

Can't see your input data, i.e. the input xlsx file and may not be able make a sure solution. Anyway, have you tried the following?
...
except AttributeError or NaN:
lists.append('') # still append a blank string to the list
print('nothing here')

How to get cell background color in python-docx?

I'm trying to read data from MS Word table using python-docx.
There is a way to set background color of a table cell:
tcPr = cell._tc.get_or_add_tcPr()
shd = OxmlElement("w:shd")
shd.set(qn("w:fill"), rgb2hex(*color))
tcPr.append(shd)
My task is contrary, I need to get the existing color. I'm not skilled in xml and I tried this:
cell = table.cell(row, col)
tcPr = cell._tc.get_or_add_tcPr().get(qn('w:shd'))
How ever it returns me None for each read cell regardless of its color.

As scanny proposed, I used parsing cell._tc.xml:
pattern = re.compile('w:fill=\"(\S*)\"')
match = pattern.search(cell._tc.xml)
result = match.group(1)
If there is data on color it returns either "auto" or hex code of background color which can be converted to RGB.

As scanny said, you should first be sure of the element/property you are looking for.
But to read the value of this element you should rather use the find method.
Ex:
cell._tc.get_or_add_tcPr().get(qn('w:shd')) #Returns None
cell._tc.get_or_add_tcPr().find(qn('w:shd')) #Returns <Element {http://schemas.openxmlformats.org/wordprocessingml/2006/main}shd at ...>

Use finditem() only on one Column

I have a QTableWidget populated with QtableWidgetItems.
I want a searchbar, where I can type in and as Response the Table should be refreshing and only showing the items that match partially with the string in the search field.
Im using finditem for that, but i want that only one column is used for the search. How can I do that?

Iterate the table manually.
columnOfInterest = 1 # or whatever
valueOfInterest = "foo"
for rowIndex in range(self.myTable.rowCount()):
twItem = self.myTable.item(rowIndex, columnOfInterest)
if twItem.text() == valueOfInterest:
self.myTable.setRowHidden(rowIndex, False)
else:
self.myTable.setRowHidden(rowIndex, True)
You will have to implement better matching criteria. You can use string functions like str.find and str.startswith and others if you want to do it yourself.

Unreachable code block (in python) with html templater

I'm having trouble creating a fiddly html table in python 3.4. The templater is html 1.16. Here's a simplified version of the problem: I would like to traverse a list. For each list item, I would like to write the data to a html table. The table should be two columns wide.
from html import HTML
#create html object
h = HTML()
comments=["blah1",
"blah2",
"blah3"
]
#create table object
c_table = h.table.tbody
for i, comment in enumerate(comments):
#create row if we are at an odd index
if i % 2 != 0:
row = c_table.tr
row.td(comment)
else:
#it is intended to add another <td> to the current row here
#but because the row was declared in the if block, it is out of scope
row.td(comment)
#write the html output now
print(h)
The difficulty is with the templater, specifically: accessing the row object for the second cell of the row without causing the </tr> closing tag. I have to create new cells through the row object, otherwise if I call c_table.tr.td it closes the row with </tr> and starts a new one.
Can anyone clever think of any code trickery that achieves what I'm trying to do in these circumstances?

Your comment is simply incorrect. Python does not have block scope, and the row that is defined in the if block is accessible in the else.
In fact, you can take the td out of the if block, and remove the else altogether.

You can't access that row object, because it was created inside the first if. In order to access it in your "else", you'll have to create it outside both clauses, which doesn't help you achieve your goal.
Try dividing the list into "chunks" - a list of lists with 2 objects each.
h = HTML()
comments=["blah1",
"blah2",
"blah3",
"blah4",
"blah5"
]
fixed_list = []
for i in xrange(0, len(comments), 2):
fixed_list.append(comments[i:i+2])
Now fixed list looks like this -
[["blah1", "blah2"], ["blah3", "blah4"], .....]
And now you can easily iterate over that list, and create a row for each list -
#create table object
body = h.body
tb = body.table
for comments_list in fixed_list:
row = tb.tr
for comment in comments_list:
row.td(comment)
print h

Get last row of View by couchbase query

i have a query which returns me a Viewobject with all the entries i want to process. I know i can iterate over this view Object so that i can use the single entries for my purposes.
Now i want to extract only the first and the last row. The first row is no problem because i can just iterate and break the loop after the first item.
Now my question is, how to get the last element from the View.
I tried by:
for row in result_rows:
rowvalue = row[3].value
diagdata = rowvalue[models.DIAGDATA]
if models.ODOMETER in diagdata:
start_mileage = diagdata[models.ODOMETER]
start_mileage_found = True
break
row = result_rows[len(result_rows)]
rowvalue = row[3].value
diagdata = rowvalue[models.DIAGDATA]
if models.ODOMETER in diagdata:
end_mileage = diagdata[models.ODOMETER]
end_mileage_found = True
The second value i obviously wont get, because view has neither a length nor can i access the rows by a index. Has anyone an idea how to get the last element?

You might run another request but with descending=True option, so that the server will stream results in reverse order.
Or you can convert iterator to array which basically the same a iterate through all values. I'm not a python expert, but it seems like list(result_rows) will do it for you. And when you are doing len(...) it probably doing it for you implicitly. There is rows_returned method to get the number of rows without turning it to list.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Using BeautifulSoup, how to guard against elements not being found? - python

Simplest and clearest, if you want your code "in line": theimage = td[0].a.img if theimage is not None: use(theimage['src']) Or, preferably, wrap the None check in a tiny function of your own, e.g.: def getsrc(image): return None if image is None else image['src'] and use getsrc(td[0].a.img).

Starting from tr: for td in tr.findChildren('td'): img = td.findChild('img') if img: src = img.get('src', '') # return a blank string if there's no src attribute if src: # do something with src

Related

forLoop except criteria is deleting whole row, which i do not want

How to get cell background color in python-docx?

Use finditem() only on one Column

Unreachable code block (in python) with html templater

Get last row of View by couchbase query

Categories

Resources