Python xml.etree.ElementTree Problems - python

NEVER MIND - I FOUND MY REAL ISSUE, IT WAS FURTHER ON IN MY CODE THAT I REALIZED.
I am having problems getting xml.etree.ElementTree to work like I expect it to.
xmlData = "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\"?><suggestedmatches><destination><sortOrder>1</sortOrder><destinationType>destinationType1</destinationType></destination><destination><sortOrder>2</sortOrder><destinationType>destinationType2</destinationType></destination></suggestedmatches>"
root = ET.fromstring(xmlData)
logging.debug("DIAG: %s: root.tag = %s"
% (FUNCTION_NAME, root.tag))
logging.debug("DIAG: %s: root = %r" % (FUNCTION_NAME, ET.tostring(root)))
destinations = root.findall("destination")
logging.debug('DIAG: %s: destinations = %r' % (FUNCTION_NAME, ET.tostring(destinations)))
I'm trying to figure out why I can't find destinations in root.
DEBUG:root:DIAG: findDestinations(): root.tag = suggestedmatches
DEBUG:root:DIAG: findDestinations(): root = b'<suggestedmatches><destination><sortOrder>1</sortOrder><destinationType>destinationType1</destinationType></destination><destination><sortOrder>2</sortOrder><destinationType>destinationType2</destinationType></destination></suggestedmatches>'
ERROR:root:findDestinations(): Encountered exception on root.findall() - 'list' object has no attribute 'iter'
And if I add the following code after I get root, I am seeing each of the destinations listed in the log:
for destination in root:
destinationList.append(destination)
logging.debug('DIAG: %s: destination.tag = %s'
% (FUNCTION_NAME, destination.tag))
This same code is working in a different script, so I'm not sure why it's not working here.

You are getting None because ET.dump writes to sys.stdout and you are logging return of dump which is None.
From docs:
xml.etree.ElementTree.dump(elem)
Writes an element tree or element structure to sys.stdout. This function should be used for debugging only.
The exact output format is implementation dependent. In this version, it’s written as an ordinary XML file.
elem is an element tree or an individual element.
Try using tostring method instead of dump.
logging.debug("DIAG: %s: root = %r" % (FUNCTION_NAME, ET.tostring(root)))

Related

Parsing Errno22 with xml Element Tree

I am trying to develop a simple web scraper of sorts, and keep having issues with the parsing code for the XML file used.
Whenever I run it it gives me Errno22, even though the path is valid. Could anyone assist?
try:
xmlTree = ET.parse('C:\TestWork\RWPlus\test.xml')
root = xmlTree.getroot()
returnValue = root[tariffPOS][childPOS].text
return returnValue
except Exception as error:
errorMessage = "A " + str(
error) + " error occurred when trying to read the XML file."
ErrorReport(errorMessage)
You are supposed to escape backslashes in Python strings
ET.parse('C:\\TestWork\\RWPlus\\test.xml')
or you can use raw strings (note the r)
ET.parse(r'C:\TestWork\RWPlus\test.xml')

Extracting values from parsed xml text

I am using lxml to parse the following XML text block:
<block>{<block_content><argument_list>(<argument><expr><name><name>String</name><operator>.</operator><name>class</name></name></expr></argument>, <argument><expr><name><name>Object</name><operator>.</operator><name>class</name></name></expr></argument>)</argument_list></block_content>}</block>
<block>{<block_content><argument_list>(<argument><expr><literal type="string">"Expected exception to be thrown"</literal></expr></argument>)</argument_list></block_content>}</block>
<block>{<block_content></block_content>}</block>
My requirement is to print the following from the above xml snippet:
String.class
Object.class
"Expected exception to be thrown"
Basically, I need to print the text values contained within the argument node of the xml snippet.
Below is the code block that I am using.
from lxml import etree
xml_text = '<unit>' \
'<block>{<block_content><argument_list>(<argument><expr><name><name>String</name><operator>.</operator><name>class</name></name></expr></argument>, <argument><expr><name><name>Object</name><operator>.</operator><name>class</name></name></expr></argument>)</argument_list></block_content>}</block> ' \
'<block>{<block_content><argument_list>(<argument><expr><literal type="string">"Expected exception to be thrown"</literal></expr></argument>)</argument_list></block_content>}</block> ' \
'<block>{<block_content></block_content>}</block>' \
'</unit>'
tree = etree.fromstring(xml_text)
args = tree.xpath('//argument_list/argument')
for i in range(len(args)):
print('%s. %s' %(i+1, etree.tostring(args[i]).decode("utf-8")))
However, the below output produced by this code does not meet my requirement.
1. <argument><expr><name><name>String</name><operator>.</operator><name>class</name></name></expr></argument>,
2. <argument><expr><name><name>Object</name><operator>.</operator><name>class</name></name></expr></argument>)
3. <argument><expr><literal type="string">"Expected exception to be thrown"</literal></expr></argument>)
Would appreciate it if someone can point out what modifications I need to make to my code
I found that the strip_tags function gets the job done. Below is the updated code:
for i in range(len(args)):
etree.strip_tags(args[i], "*")
print('%s. %s' %(i+1, args[i].text))
Output from the update code:
String.class
Object.class
"Expected exception to be thrown"

How do I run python '__main__' program file from bash prompt in Windows10?

I am trying to run a python3 program file and am getting some unexpected behaviors.
I'll start off first with my PATH and env setup configuration. When I run:
which Python
I get:
/c/Program Files/Python36/python
From there, I cd into the directory where my python program is located to prepare to run the program.
Roughly speaking this is how my python program is set up:
import modulesNeeded
print('1st debug statement to show program execution')
# variables declared as needed
def aFunctionNeeded():
print('2nd debug statement to show fxn exe, never prints')
... function logic...
if __name__ == '__main__':
aFunctionNeeded() # Never gets called
Here is a link to the repository with the code I am working with in case you would like more details as to the implementation. Keep in mind that API keys are not published, but API keys are in local file correctly:
https://github.com/lopezdp/API.Mashups
My question revolves around why my 1st debug statements inside the files are printing to the terminal, but not the 2nd debug statements inside the functions?
This is happening in both of the findRestaurant.py file and the geocode.py file.
I know I have written my if __name__ == '__main__': program entry point correctly as this is the same exact way I have done it for other programs, but in this case I may be missing something that I am not noticing.
If this is my output when I run my program in my bash terminal:
$ python findRestaurant.py
inside geo
inside find
then, why does it appear that my aFunctionNeeded() method shown in my pseudo code is not being called from the main?
Why do both programs seem to fail immediately after the first debug statements are printed to the terminal?
findRestaurant.py File that can also be found in link above
from geocode import getGeocodeLocation
import json
import httplib2
import sys
import codecs
print('inside find')
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
foursquare_client_id = "..."
foursquare_client_secret = "..."
def findARestaurant(mealType,location):
print('inside findFxn')
#1. Use getGeocodeLocation to get the latitude and longitude coordinates of the location string.
latitude, longitude = getGeocodeLocation(location)
#2. Use foursquare API to find a nearby restaurant with the latitude, longitude, and mealType strings.
#HINT: format for url will be something like https://api.foursquare.com/v2/venues/search?client_id=CLIENT_ID&client_secret=CLIENT_SECRET&v=20130815&ll=40.7,-74&query=sushi
url = ('https://api.foursquare.com/v2/venues/search?client_id=%s&client_secret=%s&v=20130815&ll=%s,%s&query=%s' % (foursquare_client_id, foursquare_client_secret,latitude,longitude,mealType))
h = httplib2.Http()
result = json.loads(h.request(url,'GET')[1])
if result['response']['venues']:
#3. Grab the first restaurant
restaurant = result['response']['venues'][0]
venue_id = restaurant['id']
restaurant_name = restaurant['name']
restaurant_address = restaurant['location']['formattedAddress']
address = ""
for i in restaurant_address:
address += i + " "
restaurant_address = address
#4. Get a 300x300 picture of the restaurant using the venue_id (you can change this by altering the 300x300 value in the URL or replacing it with 'orginal' to get the original picture
url = ('https://api.foursquare.com/v2/venues/%s/photos?client_id=%s&v=20150603&client_secret=%s' % ((venue_id,foursquare_client_id,foursquare_client_secret)))
result = json.loads(h.request(url, 'GET')[1])
#5. Grab the first image
if result['response']['photos']['items']:
firstpic = result['response']['photos']['items'][0]
prefix = firstpic['prefix']
suffix = firstpic['suffix']
imageURL = prefix + "300x300" + suffix
else:
#6. if no image available, insert default image url
imageURL = "http://pixabay.com/get/8926af5eb597ca51ca4c/1433440765/cheeseburger-34314_1280.png?direct"
#7. return a dictionary containing the restaurant name, address, and image url
restaurantInfo = {'name':restaurant_name, 'address':restaurant_address, 'image':imageURL}
print ("Restaurant Name: %s" % restaurantInfo['name'])
print ("Restaurant Address: %s" % restaurantInfo['address'])
print ("Image: %s \n" % restaurantInfo['image'])
return restaurantInfo
else:
print ("No Restaurants Found for %s" % location)
return "No Restaurants Found"
if __name__ == '__main__':
findARestaurant("Pizza", "Tokyo, Japan")
geocode.py File that can also be found in link above
import httplib2
import json
print('inside geo')
def getGeocodeLocation(inputString):
print('inside of geoFxn')
# Use Google Maps to convert a location into Latitute/Longitute coordinates
# FORMAT: https://maps.googleapis.com/maps/api/geocode/json?address=1600+Amphitheatre+Parkway,+Mountain+View,+CA&key=API_KEY
google_api_key = "..."
locationString = inputString.replace(" ", "+")
url = ('https://maps.googleapis.com/maps/api/geocode/json?address=%s&key=%s' % (locationString, google_api_key))
h = httplib2.Http()
result = json.loads(h.request(url,'GET')[1])
latitude = result['results'][0]['geometry']['location']['lat']
longitude = result['results'][0]['geometry']['location']['lng']
return (latitude,longitude)
The reason you're not seeing the output of the later parts of your code is that you've rebound the standard output and error streams with these lines:
sys.stdout = codecs.getwriter('utf8')(sys.stdout)
sys.stderr = codecs.getwriter('utf8')(sys.stderr)
I'm not exactly sure why those lines are breaking things for you, perhaps your console does not expect utf8 encoded output... But because they don't work as intended, you're not seeing anything from the rest of your code, including error messages, since you rebound the stderr stream along with the stdout stream.

shodan - country code python

I want to list once my script in python search for specific strings , but I also want to add country code first two letters , but when I try then it says invalid KeyError: 'country_code', but the api says ocation.country_code how can I achieve that?
#!/usr/bin/python
import shodan
SHODAN_API_KEY="xxxxxxxxxxxxxxxxxxxx"
api = shodan.Shodan(SHODAN_API_KEY)
try:
# Search Shodan
results = api.search('ProFTPd-1.3.3c')
# Show the results
for result in results['matches']:
print '%s' % result['ip_str']
print '%s' % result['country_code']
except shodan.APIError, e:
print 'Error: %s' % e
I think this is the method You are using in Python
https://github.com/achillean/shodan-python/blob/master/shodan/client.py#L324
and it triggers:
return self._request('/shodan/host/search', args)
Shodan API documentation:
https://developer.shodan.io/api
check out /shodan/host/search API
I just saw that the answer is in Your question but You ate one letter from location (ocation).
Try this:
print '%s' % result['location']['country_code']
So field You are looking for is there but it is in another dictionary.
I would recommend to read API documentation well next time and as Nofal Daud said, Python error are self explanatory if You have KeyError on dict it means that field is not there. Next time listen to Python it will reveal the truth.

Python: Why will this string print but not write to a file?

I am new to Python and working on a utility that changes an XML file into an HTML. The XML comes from a call to request = urllib2.Request(url), where I generate the custom url earlier in the code, and then set response = urllib2.urlopen(request) and, finally, xml_response = response.read(). This works okay, as far as I can tell.
My trouble is with parsing the response. For starters, here is a partial example of the XML structure I get back:
I tried adapting the slideshow example in the minidom tutorial here to parse my XML (which is ebay search results, by the way): http://docs.python.org/2/library/xml.dom.minidom.html
My code so far looks like this, with try blocks as an attempt to diagnose issues:
doc = minidom.parseString(xml_response)
#Extract relevant information and prepare it for HTML formatting.
try:
handleDocument(doc)
except:
print "Failed to handle document!"
def getText(nodelist): #taken straight from slideshow example
rc = []
for node in nodelist:
if node.nodeType == node.TEXT_NODE:
print "A TEXT NODE!"
rc.append(node.data)
return ''.join(rc) #this is a string, right?
def handleDocument(doc):
outputFile = open("EbaySearchResults.html", "w")
outputFile.write("<html>\n")
outputFile.write("<body>\n")
try:
items = doc.getElementsByTagName("item")
except:
"Failed to get elements by tag name."
handleItems(items)
outputFile.write("</html>\n")
outputFile.write("</body>\n")
def handleItems(items):
for item in items:
title = item.getElementsByTagName("title")[0] #there should be only one title
print "<h2>%s</h2>" % getText(title.childNodes) #this works fine!
try: #none of these things work!
outputFile.write("<h2>%s</h2>" % getText(title.childNodes))
#outputFile.write("<h2>" + getText(title.childNodes) + "</h2>")
#str = getText(title.childNodes)
#outputFIle.write(string(str))
#outputFile.write(getText(title.childNodes))
except:
print "FAIL"
I do not understand why the correct title text does print to the console but throws an exception and does not work for the output file. Writing plain strings like this works fine: outputFile.write("<html>\n") What is going on with my string construction? As far as I can tell, the getText method I am using from the minidom example returns a string--which is just the sort of thing you can write to a file..?
If I print the actual stack trace...
...
except:
print "Exception when trying to write to file:"
print '-'*60
traceback.print_exc(file=sys.stdout)
print '-'*60
traceback.print_tb(sys.last_traceback)
...
...I will instantly see the problem:
------------------------------------------------------------
Traceback (most recent call last):
File "tohtml.py", line 85, in handleItems
outputFile.write(getText(title.childNodes))
NameError: global name 'outputFile' is not defined
------------------------------------------------------------
Looks like something has gone out of scope!
Fellow beginners, take note.

Categories

Resources