I've written a script in python to scrape some item names along with review texts and reviewers connected to each item name from a webpage using their api. The thing is my below script can do those things partially. I need to do those in an organized manner.
For example, in each item name there are multiple review texts and reviewer names connected to it. I wish to get them along the columns like:
Name review text reviewer review text reviewer -----
Basically, I can't get the idea how to make use of the already defined for loop in the right way within my script. Lastly, there are few item names which do not have any reviews or reviewers, so the code breaks when it doesn't find any reviews and so.
This s my approach so far:
import requests
url = "https://eatstreet.com/api/v2/restaurants/{}?yelp_site="
res = requests.get("https://eatstreet.com/api/v2/locales/madison-wi/restaurants")
for item in res.json():
itemid = item['id']
req = requests.get(url.format(itemid))
name = req.json()['name']
for texualreviews in req.json()['yelpReviews']:
reviews = texualreviews['message']
reviewer = texualreviews['reviewerName']
print(f'{name}\n{reviews}\n{reviewer}\n')
If I use print statement outside the for loop, It only gives me a single review and reviewer.
Any help to fix that will be highly appreciated.
You need to append the review and a reviewer name to an array to display as you wish.
Try the following code.
review_data = dict()
review_data['name'] = req.json()['name']
review_data['reviews'] = []
for texualreviews in req.json()['yelpReviews']:
review_sub_data = {'review': texualreviews['message'], 'reviewer': texualreviews['reviewerName']}
review_data['reviews'].append(review_sub_data)
#O/P {'name': 'xxx', 'reviews':[{'review':'xxx', 'reviewer': 'xxx'}, {'review':'xxx', 'reviewer': 'xxx'}]}
Hope this helps!
Related
Trying to extract coin names, price, and market cap from coinmarketcap.com. I first tried using soup.find_all to search for certain tags with a specific class but it always picked up information I didnt need or want. So instead I used find_all to search for 'td' and then planned on using a for loop to look for specific class names and to append those to a new list and then print that list but it returns a data type for some reason.
coin_table = soup.find_all('td')
class_value = 'sc-1eb5slv-0 iJjGCS'
for i in coin_table:
if class_value in coin_table:
list.append(i)
print(list)
But this returns:
<class 'list'>
to the console even though im not asking to see the data type. Very new to beautifulsoup and coding in general so sorry if this is a very basic question. Still trying to get my head around all of this stuff.
As #RJAdriaansen mentioned, you don't need to scrape website when they provide api. Here is how you do it with requests library:
import requests
url = 'https://api.coinmarketcap.com/data-api/v3/cryptocurrency/listing?start=1&limit=100&sortBy=market_cap&sortType=desc&convert=USD,BTC,ETH&cryptoType=all&tagType=all&audited=false&aux=ath,atl,high24h,low24h,num_market_pairs,cmc_rank,date_added,tags,platform,max_supply,circulating_supply,total_supply,volume_7d,volume_30d'
response = requests.get(url)
data = response.json()
This will give you json data. Now you can grab all you need by accessing correct keys:
final_list = []
temp = []
for each_crypto in data['data']['cryptoCurrencyList']:
temp.append(each_crypto['name'])
# each_crypto['quotes'] gives you list of price and market gap of each crypto
for quote in each_crypto['quotes']:
# assuming you want to get USD price of each crypto
if quote['name'] == "USD":
temp.append(quote['price'])
temp.append(quote['marketCap'])
final_list.append(temp)
temp = []
Final result would look like this:
[
['Bitcoin', 34497.01819639692, 646704595579.0485],
['Ethereum', 2195.11816422801, 255815488972.87268],
['Tether', 1.0003936138399, 62398426501.02027],
['Binance Coin', 294.2550537711805, 45148405357.003],
...
]
I'm new in this of API's and web development. so I'm sorry if my question is very basic :(.
I want to create a web browser of food recipes based in the ingredients contained. I'm using 2 queries urls to obtain the information because I need to acces to 2 json files. First one to obtain the id for each recipe based in the ingredient searched by the user and second one to obtain the information of each recipe based on the id returned in the first url.
The code I have is this one:
#Function that return id's of recipes that contains the word queried by user.
def ids(query):
try:
api_key = os.environ.get("API_KEY")
response = requests.get(f"https://api.spoonacular.com/recipes/autocomplete?apiKey={api_key}&query={urllib.parse.quote_plus(query)}")
response.raise_for_status()
except requests.RequestException:
return response
try:
ids = []
quotes = response.json()
for quote in quotes:
ids.append(quote['id'])
return ids
except (KeyError,TypeError, ValueError):
return None
#save inside a list named "ids", the id's of recipes that contains the ingredient chicken
ids = ids("chicken")
#function that return the differents options of recipes based in the ids.
def lookup(ids):
for ID in ids:
try:
api_key = os.environ.get("API_KEY")
response = requests.get(f"https://api.spoonacular.com/recipes/{ID}/information?apiKey{api_key}&includeNutrition=false")
response.raise_for_status()
except requests.RequestException:
return response
The main issue I have is that I don't know how to store the information returned in response, as you may notice I use into the "lookup" function a loop to get the responses for all ID contained in the list ids, but considering that I'll obtain 1 response for each ID (for instance if I have 6 ids, I'll obtain 6 different responses with 6 different information into the json files).
finally the info I want to store is this one
quote = response.json()
results = {'id':quote["id"],'title':quote["title"],'url':quote["sourceUrl"]}
This is the link with a sample of the data and the url used to obtain the json
https://spoonacular.com/food-api/docs#Get-Recipe-Information
I'm stucking trying to store this information located inside the different json files in a dictionary using python.
Any kind of help will be amazing!!
You would best use a dict for it with a structure matching the recipes you get back:
Assuming the API returns name, duration, difficulty and these are fields you will use later, as well as that you also save other data besides recipes for your program you could use a dict. If this is not the case simply use a list of dicts that represent single recipes
#just a dummy setup to simulate getting different recipes back from the API
one_response = {"name" : "Chicken and Egg", "duration" : 14, "difficulty" : "easy"}
another_response = {"name" : "Chicken square", "duration" : 100, "difficulty" : "hard"}
def get_recipe(id):
if id == 1:
return one_response
else:
return another_response
ids = [1,2]
# Here would be other information maybe as well, that you capture somewhere else. If you don't have this then simply use a list with recipes dicts inside..
queried_recipes = {"recipes" :[] }
for i in ids:
# Here you simply add a recipes to your recipes dict
queried _recipes["recipes"].append(get_recipe(i))
print (queried_recipes)
OUT: {'recipes': [{'name': 'Chicken and Egg', 'duration': 14, 'difficulty': 'easy'}, {'name': 'Chicken square', 'duration': 100, 'difficulty': 'hard'}]}
print(queried_recipes["recipes"][0]["duration"])
OUT: 14
You may want to use https://spoonacular.com/food-api/docs#Get-Recipe-Information-Bulk instead. That will get you all the information you want in one JSON document without having to loop through repeated calls to https://api.spoonacular.com/recipes/{ID}/information.
However, to answer the original question:
def lookup(ids):
api_key = os.environ.get("API_KEY")
results = []
for ID in ids:
response = requests.get(f"https://api.spoonacular.com/recipes/{ID}/information?apiKey{api_key}&includeNutrition=false")
response.raise_for_status()
quote = response.json()
result = {'id':quote["id"],'title':quote["title"],'url':quote["sourceUrl"]}
results.append(result)
return results
My code is meant to find a specific class element on a page labelled "lh-copy truncate silve" and then copy all links within the attribute as well as info into a list. As of right now, the code simply saves the list into a variable instead and I am having issues making the conversion.
Here is the code that I have so far:
age_sex = browser.find_elements_by_xpath('//*[#class="lh-copy truncate silver"]')
for ii in age_sex:
link = ii.find_element_by_xpath('.//a').get_attribute('href')
sex = ii.find_element_by_xpath('.//span').text
print(link, sex)
The code returns the information that I need in variable as opposed to list format.
Edit: The reason why I need it to be a list as opposed to a variable with a variable if I type variable[1], it'll just give me the second letter of the https:// link which is 't". Whereas if it is in list format, list[1] will return to me the full link. It's the only way that I know to be able to divide the block of text in a variable into separate links that can be accessed separately by my script.
It appears that your for loop is only printing individual elements. If you want lists of links and sexs, this may be helpful:
age_sex = browser.find_elements_by_xpath('//*[#class="lh-copy truncate silver"]')
link_list = []
sex_list = []
for ii in age_sex:
link = ii.find_element_by_xpath('.//a').get_attribute('href')
link_list.append(link)
sex = ii.find_element_by_xpath('.//span').text
sex_list.append(sex)
print(link_list, sex_list)
If you want to keep things together (i.e. list of link and sex pairs), you can have the following:
age_sex = browser.find_elements_by_xpath('//*[#class="lh-copy truncate silver"]')
result_list = []
for ii in age_sex:
link = ii.find_element_by_xpath('.//a').get_attribute('href')
sex = ii.find_element_by_xpath('.//span').text
result_list.append([link, sex])
print(result_list)
I hope I'm understanding your problem correctly.
# If your info in the variable are separated by something, a space for example or any specific char, try the following.
new_list = varibale.split(char)
# if it's a space:
new_list = varibale.split(' ')
Could you please explain clearer the problem?
I don't know if it's possible. Hopefully you guys knows what I try to do. I want to do the model changes in a FOR loop cause the keys of the values have always the same name as the model columns.
My current code:
sites = DataModel.objects.all()
for site in sites:
d = self.getDataBySoup(soup)
site.title = d['title']
site.text = d['text']
site.facebook = d['facebook']
site.twitter = d['twitter']
site.save()
As you can see, the keys are always the same as the django columns. So I thought its maybe possible to do it with less code.
What I tried (but it's not working):
sites = DataModel.objects.all()
for site in sites:
d = self.getDataBySoup(soup)
for key, value in d.items():
site.key = value
site.save()
I use Python 3.6
getDataBySoup Method is just returning a dict/array:
def getContentDataBySoup(self, soup):
data = {}
data['title'] = 'some text'
# etc
return data
etc.
sites = DataModel.objects.all()
for site in sites:
d = self.getDataBySoup(soup)
DataModel.objects.filter(id=site.pk).update(**d)
The code above would update each site entry with the data in the dictionary d. If you're using the update method it's important that you also specify the id, otherwise it won't know what entry to update.
I'm trying to parse the following XML data:
http://pastebin.com/UcbQQSM2
This is just an example of the 2 types of data I will run into. Companies with the needed address information and companies without the needed information.
From the data I need to collect 3 pieces of information:
1) The Company name
2) The Company street
3) The Company zipcode
I'm able to do this with the following code:
#Creates list of Company names
CompanyList = []
for company in xmldata.findall('company'):
name = company.find('name').text
CompanyList.append(name)
#Creates list of Company zipcodes
ZipcodeList = []
for company in xmldata.findall('company'):
contact_data = company.find('contact-data')
address1 = contact_data.find('addresses')
for address2 in address1.findall('address'):
ZipcodeList.append(address2.find('zip').text)
#Creates list of Company streets
StreetList = []
for company in xmldata.findall('company'):
contact_data = company.find('contact-data')
address1 = contact_data.find('addresses')
for address2 in address1.findall('address'):
StreetList.append(address2.find('street').text)
But it doesn't really do what I want it to, and I can't figure out how to do what I want. I believe it will be some type of 'if' statement but I don't know.
The problem is that where I have:
for address2 in address1.findall('address'):
ZipcodeList.append(address2.find('zip').text)
and
for address2 in address1.findall('address'):
StreetList.append(address2.find('street').text)
It only adds to the list the places that actually have a street name or zipcode listed in the XML, but I need a placemark for the companies that also DON'T have that information listed so that my lists match up.
I hope this makes sense. Let me know if I need to add more information.
But, basically, I'm trying to find a way to say if there isn't a zipcode/street name for the Company put "None" and if there is then put the zipcode/street name.
Any help/guidance is appreciated.
Well I am going to do a bad thing and suggest you use a conditional (ternary) operator.
StreetList.append(address2.find('street').text if address2.find('street').text else 'None')
So this statement says return address2.find('street').text if **address2.find('street') is not empty else return 'None'.
Additionally you could created a new method to do the same test and call it in both places, note my python is rusty but should get you close:
def returnNoneIfEmpty(testText):
if testText:
return testText
else:
return 'None'
Then just call it:
StreetList.append(returnNoneIfEmpty(address2.find('street').text))