I have the string below I'm trying to pull the url out of out with python django. Thoughts on how I can get to it? I've tried treating it like a list but didn't have any luck.
[(u'https://api.twilio.com/2010-04-01/Accounts/ACae738c5e6aaf12ffa887440a3143e55b/Messages/MM673cd77ab21b37ae435c1d1d5e767366/Media/ME33be4a0ae88358aaef2aa0ea25f31339', u'image/jpeg')]
It looks like your value is a list with one tuple with two items. So get the first of each using the 0th index:
lt = [(u'https://api.twilio.com/2010-04-01/Accounts/ACae738c5e6aaf12ffa887440a3143e55b/Messages/MM673cd77ab21b37ae435c1d1d5e767366/Media/ME33be4a0ae88358aaef2aa0ea25f31339', u'image/jpeg')]
url = lt[0][0]
print(url)
https://api.twilio.com/2010-04-01/Accounts/ACae738c5e6aaf12ffa887440a3143e55b/Messages/MM673cd77ab21b37ae435c1d1d5e767366/Media/ME33be4a0ae88358aaef2aa0ea25f31339
If your value is actually a string CONTAINING the list, you can get a list by using ast:
import ast
lt = ast.literal_eval(lt)
... then use the above code to access the inner contents of the list.
Related
Hi I am trying to get the first url of a google search based on queries in a list. For the sake of simplicity I am going to use the same code as a similar question 2 years prior.
from googlesearch import search
list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []
for query in list_of_queries:
results.append(search(query, tld="co.in", num=1, stop=1, pause=2))
print (results)
Now this returns a list of generator objects. A solution was found to print out the list of results by adding
for result in results:
print (list(results))
However I want the results list to be in the form of a list of strings in order to web scrape the urls for data. One solution I found was to add
results_str = []
for result in results:
results_str.append(list(result))
When I print results_str I get this as an output:
[['https://www.geeksforgeeks.org/'], ['https://stackoverflow.com/'], ['https://github.com/']]
As one can see I cannot even use results_str directly as a list of urls to webscrape because of the additional brackets around each url. I thought I could work around it by removing the brackets by following this answer and thus adding
results_str_new = [s.replace('[' and ']', '') for s in results_str]
But this simply results in an AttributeError
AttributeError: 'list' object has no attribute 'replace'
Either way even if I did get it to work it all seems unnecessarily unnecessary to do all this work just to convert a list of generator objects to strings to use as urls to webscrape so I was wondering if there were any alternatives. I know that one of my options is to use selenium but I don't really want to do that because I don't want the hassle of an instance of Chrome opening whenever I run my script.
Thanks in advance
You are getting back a list of lists of string. To change that, you can use a list comprehension like this
results_str = [url for result in results for url in result]
or you can change from append to extend if you don't want to go with a list comprehension. Extend just extends the list where es append inserts the lists into the list.
results_str = []
for result in results:
results_str.extend(result)
Looks like you may be using a different version of googlesearch. I'm using googlesearch-python 1.1.0 so the call parameters are different. However, this should help:
from googlesearch import search
list_of_queries = ["Geeksforgeeks", "stackoverflow", "GitHub"]
results = []
for query in list_of_queries:
results.extend([r for r in search(query, 1, 'en')])
print(results)
Output:
['https://www.youtube.com/c/GeeksforGeeksVideos/videos', 'https://stackoverflow.com/', 'https://stackoverflow.blog/', 'https://github.com/']
Which, as you can see, is a simple list of strings (URLs in this case)
This is the way I'm doing it, but I'm wondering if there's a better way. When I google the problem it suggests using list comprehensions or any() method which either doesn't work for me or I'm misunderstanding them. But I gathered that they just give out boolean values - but I want to keep on using the matched URL if one of the phrases is located in it.
for URL in URLs
if 'phrase1' in URL or 'phrase2' in URL or 'phrase3' in URL:
get_subcategories(URL) #Calling a function wit the matched URL
...
You need a list of phrases to check to iterate over:
phrases = ['phrase1', ...]
if any(phrase in URL for phrase in phrases):
...
I'm using spotipy to get a list of my playlists. I use
if token:
sp = spotipy.Spotify(auth=token)
playlists = sp.user_playlists(username)
for item in playlists['items']:
id= item['uri']
print id
This returns a list of playlist uri's that looks like
spotify:user:ultramusicofficial:playlist:0gvQoG7iMMz8L5Ltsa4lkT
spotify:user:spotify:playlist:4Ha7Qja6HY3AgvNBgWz87p
spotify:user:ministryofsounduk:playlist:7grWVkJDQpcBie8oqKP6hv
But there is something weird about the way it returns them. It's not a normal list and I can't seem to make it into one. If I use
print id[1]
It would return something like
p
p
p
I want to be able to do something like
print id[1]
and have it return
spotify:user:spotify:playlist:4Ha7Qja6HY3AgvNBgWz87p
I've tried joining it and splicing it in different ways, I've tried using it as a tuple, converting it to a string. Nothing works I'm clearly very unsure what to do. I feel like it's probably a simple and I'm just missing it. Any help would be appreciated thanks.
You are just printing the id, not gathering them into a list, so your id is the last item from the loop which is a single uri (a string). You can use a list comprehension to make a list out of the for loop:
id = [item['uri'] for item in playlists['items']]
Or start with an empty list and append the result to it:
id = []
for item in playlists['items']:
id.append(item['uri'])
In your example id is a string. So id[1] is the second character, which is p
Scrapy noob here. I am extracting an href 'rel'attribute which looks like the following:
rel=""prodimage":"image_link","intermediatezoomimage":"image_link","fullimage":"image_link""
This can be seen as a dict like structure within the attribute.
My main goal is to obtain the image url against 'fullimage'. Hence, I want to store the response as a python dictionary.
However, Xpath returns a unicode "list" ( Not just a string but a list!) with one item ( the whole rel contents as one item)
res = response.xpath('//*[#id="detail_product"]/div[1]/div[2]/ul/li[1]/a/#rel').extract()
print res
[u'"prodimage":"image_link", "intermediatezoomimage":"image_link", "fullimage":"image_link"']
type(res)
type 'list'
How do I convert the content of 'res' into something like a python dictionary ( with separated out items as list items, not just one whole item) so that I can grab individual components from the structure within 'rel'.
I hope I am clear. Thank you!
SOLVED
The XPATH response above is basically a list with ONE item in unicode.
Convert the respective items into strings ( using x.encode('ascii') )
and then form a string representation of a dict. In my case I had to append and prepend the string (the rel contents) with curly braces. Thats all!
Then convert that string representation of a dict into an actual dict using the method mentioned in the link below.
Convert a String representation of a Dictionary to a dictionary?
i am parsing some html form with Beautiful soup. Basically i´ve around 60 input fields mostly radio buttons and checkboxes. So far this works with the following code:
from BeautifulSoup import BeautifulSoup
x = open('myfile.html','r').read()
out = open('outfile.csv','w')
soup = BeautifulSoup(x)
values = soup.findAll('input',checked="checked")
# echoes some output like ('name',1) and ('value',4)
for cell in values:
# the following line is my problem!
statement = cell.attrs[0][1] + ';' + cell.attrs[1][1] + ';\r'
out.write(statement)
out.close()
x.close()
As indicating in the code my problem ist where the attributes are selected, because the HTML template is ugly, mixing up the sequence of arguments that belong to a input field. I am interested in name="somenumber" value="someothernumber" . Unfortunately my attrs[1] approach does not work, since name and value do not occur in the same sequence in my html.
Is there any way to access the resulting BeautifulSoup list associatively?
Thx in advance for any suggestions!
My suggestion is to make values a dict. If soup.findAll returns a list of tuples as you seem to imply, then it's as simple as:
values = dict(soup.findAll('input',checked="checked"))
After that you can simply refer to the values by their attribute name, like what Peter said.
Of course, if soup.findAll doesn't return a list of tuples as you've implied, or if your problem is that the tuples themselves are being returned in some weird way (such that instead of ('name', 1) it would be (1, 'name')), then it could be a bit more complicated.
On the other hand, if soup.findAll returns one of a certain set of data types (dict or list of dicts, namedtuple or list of namedtuples), then you'll actually be better off because you won't have to do any conversion in the first place.
...Yeah, after checking the BeautifulSoup documentation, it seems that findAll returns an object that can be treated like a list of dicts, so you can just do as Peter says.
http://www.crummy.com/software/BeautifulSoup/documentation.html#The%20attributes%20of%20Tags
Oh yeah, if you want to enumerate through the attributes, just do something like this:
for cell in values:
for attribute in cell:
out.write(attribute + ';' + str(cell[attribute]) + ';\r')
I'm fairly sure you can use the attribute name like a key for a hash:
print cell['name']