Returning XPATH response as a python dictionary - python

Scrapy noob here. I am extracting an href 'rel'attribute which looks like the following:
rel=""prodimage":"image_link","intermediatezoomimage":"image_link","fullimage":"image_link""
This can be seen as a dict like structure within the attribute.
My main goal is to obtain the image url against 'fullimage'. Hence, I want to store the response as a python dictionary.
However, Xpath returns a unicode "list" ( Not just a string but a list!) with one item ( the whole rel contents as one item)
res = response.xpath('//*[#id="detail_product"]/div[1]/div[2]/ul/li[1]/a/#rel').extract()
print res
[u'"prodimage":"image_link", "intermediatezoomimage":"image_link", "fullimage":"image_link"']
type(res)
type 'list'
How do I convert the content of 'res' into something like a python dictionary ( with separated out items as list items, not just one whole item) so that I can grab individual components from the structure within 'rel'.
I hope I am clear. Thank you!

SOLVED
The XPATH response above is basically a list with ONE item in unicode.
Convert the respective items into strings ( using x.encode('ascii') )
and then form a string representation of a dict. In my case I had to append and prepend the string (the rel contents) with curly braces. Thats all!
Then convert that string representation of a dict into an actual dict using the method mentioned in the link below.
Convert a String representation of a Dictionary to a dictionary?

Related

Convert Json format String to Link{"link":"https://i.imgur.com/zfxsqlk.png"}

I try to convert this String to only the link: {"link":"https://i.imgur.com/zfxsqlk.png"}
I'm trying to create a discord bot, which sends random pictures from the API https://some-random-api.ml/img/red_panda.
With imageURL = json.loads(requests.get(redpandaurl).content) I get the json String, but what do I have to do that I only get the Link like this https://i.imgur.com/zfxsqlk.png
Sorry if my question is confusingly written, I'm new to programming and don't really know how to describe this problem.
You can simply do this:
image_url = requests.get(your_api_url).json()["link"]
Directly use requests.json(), no need to load the string with json.loads and other manual stuff.
What you get from json.loads() is a Python dict. You can access values in the dict by specifying their keys.
In your case, there is only one key-value pair in the dict: "link" is the key and "https://i.imgur.com/zfxsqlk.png" is the value. You can get the link and store it in the value by appending ["link"] to your line of code:
imageURL = json.loads(requests.get(redpandaurl).content)["link"]

Extract value from json data using python

After doing an API request I get the json 'data' this has each record in a different set if curly brackets under the results square brackets.
I want to extract the numbers and store/print them separated with a comma.
so requested output
0010041,0010042
I have tried using the below however it comes back with the following error.
TypeError: list indices must be integers or slices, not str
If the results only has one set of brackets it works fine, do I have to convert the multiple results into one so and then extract all the times when 'number' appears?
import json
import sys
#load the data into an element
data={'result': [{'number': '0010041', 'day_of_week': 'monday'}, {'number': '0010042', 'day_of_week': 'tuesday'}]}
#dumps the json object into an element
json_str = json.dumps(data)
#load the json to a string
resp = json.loads(json_str)
print (resp['result'])
print (resp['result']['number'])
Error message is clear: you are trying to access a list of dicts and you aren't doing it correctly.
Replace your last line with:
for i in resp['result']:
print(i['number'])
Update:
As suggested in comments, you can use list comprehension. So to get your desired result, you can do:
print(",".join([i['number'] for i in resp['result']]))

Pull url from string

I have the string below I'm trying to pull the url out of out with python django. Thoughts on how I can get to it? I've tried treating it like a list but didn't have any luck.
[(u'https://api.twilio.com/2010-04-01/Accounts/ACae738c5e6aaf12ffa887440a3143e55b/Messages/MM673cd77ab21b37ae435c1d1d5e767366/Media/ME33be4a0ae88358aaef2aa0ea25f31339', u'image/jpeg')]
It looks like your value is a list with one tuple with two items. So get the first of each using the 0th index:
lt = [(u'https://api.twilio.com/2010-04-01/Accounts/ACae738c5e6aaf12ffa887440a3143e55b/Messages/MM673cd77ab21b37ae435c1d1d5e767366/Media/ME33be4a0ae88358aaef2aa0ea25f31339', u'image/jpeg')]
url = lt[0][0]
print(url)
https://api.twilio.com/2010-04-01/Accounts/ACae738c5e6aaf12ffa887440a3143e55b/Messages/MM673cd77ab21b37ae435c1d1d5e767366/Media/ME33be4a0ae88358aaef2aa0ea25f31339
If your value is actually a string CONTAINING the list, you can get a list by using ast:
import ast
lt = ast.literal_eval(lt)
... then use the above code to access the inner contents of the list.

json change dictionary item to a list with one dictionary

I'm working with a Rest Api for finding address details. I pass it an address and it passes back details for that address: lat/long, suburb etc. I'm using the requests library with the json() method on the response and adding the json response to a list to analyse later.
What I'm finding is that when there is a single match for an address the 'FoundAddress' key in the json response contains a dictionary but when more than one match is found the 'FoundAddress' key contains a list of dictionaries.
The returned json looks something like:
For a single match:
{
'FoundAddress': {AddressDetails...}
}
For multiple matches:
{
'FoundAddress': [{Address1Details...}, {Address2Details...}]
}
I don't want to write code to handle a single match and then multiple matches.
How can I modify the 'FoundAddress' so that when there is a single match it changes it to a list with a single dictionary entry? Such that I get something like this:
{
'FoundAddress': [{AddressDetails...}]
}
If it's the external API sending responses in that format then you can't really change FoundAddress itself, since it will always arrive in that format.
You can change the response if you want to, since you have full control over what you've received:
r = json.parse(response)
fixed = r['FoundAddress'] if (type(r['FoundAddress']) is list) else [r['FoundAddress']]
r['FoundAddress'] = fixed
Alternatively you can do the distinction at address usage time:
def func(foundAddress):
# work with a single dictionary instance here
then:
result = map(func, r['FoundAddress']) if (type(r['FoundAddress']) is list) else [func(r['FoundAddress'])]
But honestly I'd take a clear:
if type(r['FoundAddress']) is list:
result = map(func, r['FoundAddress'])
else:
result = func(r['FoundAddress'])
or the response fix-up over the a if b else c one-liner any day.
If you can, I would just change the API. If you can't there's nothing magical you can do. You just have to handle the special case. You could probably do this in one place in your code with a function like:
def handle_found_addresses(found_addresses):
if not isinstance(found_addresses, list):
found_addresses = [found_addreses]
...
and then proceed from there to do whatever you do with found addresses as if the value is always a list with one or more items.

associative list python

i am parsing some html form with Beautiful soup. Basically i´ve around 60 input fields mostly radio buttons and checkboxes. So far this works with the following code:
from BeautifulSoup import BeautifulSoup
x = open('myfile.html','r').read()
out = open('outfile.csv','w')
soup = BeautifulSoup(x)
values = soup.findAll('input',checked="checked")
# echoes some output like ('name',1) and ('value',4)
for cell in values:
# the following line is my problem!
statement = cell.attrs[0][1] + ';' + cell.attrs[1][1] + ';\r'
out.write(statement)
out.close()
x.close()
As indicating in the code my problem ist where the attributes are selected, because the HTML template is ugly, mixing up the sequence of arguments that belong to a input field. I am interested in name="somenumber" value="someothernumber" . Unfortunately my attrs[1] approach does not work, since name and value do not occur in the same sequence in my html.
Is there any way to access the resulting BeautifulSoup list associatively?
Thx in advance for any suggestions!
My suggestion is to make values a dict. If soup.findAll returns a list of tuples as you seem to imply, then it's as simple as:
values = dict(soup.findAll('input',checked="checked"))
After that you can simply refer to the values by their attribute name, like what Peter said.
Of course, if soup.findAll doesn't return a list of tuples as you've implied, or if your problem is that the tuples themselves are being returned in some weird way (such that instead of ('name', 1) it would be (1, 'name')), then it could be a bit more complicated.
On the other hand, if soup.findAll returns one of a certain set of data types (dict or list of dicts, namedtuple or list of namedtuples), then you'll actually be better off because you won't have to do any conversion in the first place.
...Yeah, after checking the BeautifulSoup documentation, it seems that findAll returns an object that can be treated like a list of dicts, so you can just do as Peter says.
http://www.crummy.com/software/BeautifulSoup/documentation.html#The%20attributes%20of%20Tags
Oh yeah, if you want to enumerate through the attributes, just do something like this:
for cell in values:
for attribute in cell:
out.write(attribute + ';' + str(cell[attribute]) + ';\r')
I'm fairly sure you can use the attribute name like a key for a hash:
print cell['name']

Categories

Resources