I have a list of part numbers to be looked up from the search bar on: https://www.partsfinder.com/catalog/preview?q=0119000230this site.
I want to collect the prices off the results.
Here's what I put up together but I'm not sure where to go from here:
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.partsfinder.com/catalog/preview?q=0119000230')
soup = BeautifulSoup(r.text, 'html.parser')
resultsRow = soup.find_all('a', {'class': 'search_result_row'})
results = []
Any help appreciated, thanks!
Content is dynamically loaded via xhr POST request. You can see this in the dev tools of browser when refreshing the page. You can minimize the request to the following:
import requests
data = {"partOptionFilter":{"PartNumber":"0119000230","AlternativeOemId":"17155"}}
r = requests.post('https://www.partsfinder.com/Catalog/Service/GetPartOptions',json=data).json()
print(r['Data']['PartOptions'][0]['YourPrice'])
Related
When I inspect "https://dse.bigexam.hk/en/ssp?p=1&band=1&order=name&asc=1" I can find the data I want. For examples the total pages "Showing schools 1 to 10 of 143." can be found. However, I got no data from my scripts. Anyone can help? Thanks.
from bs4 import BeautifulSoup
import requests
def makeSoup(url):
response = requests.get(url)
return BeautifulSoup(response.text, 'lxml')
url = "https://dse.bigexam.hk/en/ssp?p=1&band=1&order=name&asc=1"
soup = makeSoup(url)
pages = soup.find('div', attrs={'class': 'col-sm'})
print(pages)
That's because it's loaded using Ajax/javascript. Requests library doesn't handle that, you'll need to use something that can execute these scripts and get the dom.
you can try selenium
I tried to fetch all product's name from the web page, but I could have only 12.
If I scroll down the web page then it gets refreshed and adds more information.
How can I to get all information?
import requests
from bs4 import BeautifulSoup
import re
url = "https://www.outre.com/product-category/wigs/"
res = requests.get(url)
res.raise_for_status()
soup = BeautifulSoup(res.text, "lxml")
items = soup.find_all("div", attrs={"class":"title-wrapper"})
for item in items:
print(item.p.a.get_text())
Your code is good. The thing is on the website; the products are dynamically loaded, so when you do your request you can only get the first 12 products.
You can check the developer console inside your browser to track the Ajax call made during browsing.
I did it, and it turns out a call is made to retrieve more product to the URL
https://www.outre.com/product-category/wigs/page/2/
So if you want to get all the products you need to browse multiple pages. I suggest you to use a loop and use your code several times.
N.B.: You can try to check the website to see is there is a more convenient place to get the product (like not from the main page)
The page loads the products from different URL via JavaScript, so Beautiful Soup doesn't see it. To get all pages, you can use the following example:
import requests
from bs4 import BeautifulSoup
url = "https://www.outre.com/product-category/wigs/page/{}/"
page = 1
while True:
soup = BeautifulSoup(requests.get(url.format(page)).content, "html.parser")
titles = soup.select(".product-title")
if not titles:
break
for title in titles:
print(title.text)
page += 1
Prints:
...
Wet & Wavy Loose Curl 18″
Wet & Wavy Boho Curl 20″
Nikaya
Jeanette
Natural Glam Body
Natural Free Deep
Hello I am trying to use beautiful soup and requests to log the data coming from an anemometer which updates live every second. The link to this website here:
http://88.97.23.70:81/
The piece of data I want to scrape is highlighted in purple in the image :
from inspection of the html in my browser.
I have written the code bellow in to try to print out the data however when I run the code it prints: None. I think this means that the soup object doesnt infact contain the whole html page? Upon printing soup.prettify() I cannot find the same id=js-2-text I find when inspecting the html in my browser. If anyone has any ideas why this might be or how to fix it I would be most grateful.
from bs4 import BeautifulSoup
import requests
wind_url='http://88.97.23.70:81/'
r = requests.get(wind_url)
data = r.text
soup = BeautifulSoup(data, 'lxml')
print(soup.find(id='js-2-text'))
All the best,
Brendan
The data is loaded from external URL, so beautifulsoup doesn't need it. You can try to use API URL the page is connecting to:
import requests
from bs4 import BeautifulSoup
api_url = "http://88.97.23.70:81/cgi-bin/CGI_GetMeasurement.cgi"
data = {"input_id": "1"}
soup = BeautifulSoup(requests.post(api_url, data=data).content, "html.parser")
_, direction, metres_per_second, *_ = soup.csv.text.split(",")
knots = float(metres_per_second) * 1.9438445
print(direction, metres_per_second, knots)
Prints:
210 006.58 12.79049681
I'm trying to read a news webpage to get the titles of their stories. I'm attempting to put them in a list, but I keep getting an empty list. Can someone please point in the right direction here? What am I missing? Please see code below. Thanks.
import requests
from bs4 import BeautifulSoup
url = 'https://nypost.com/'
ttl_lst = []
soup = BeautifulSoup(requests.get(url).text, "lxml")
title = soup.findAll('h2', {'class': 'story-heading'})
for row in title:
ttl_lst.append(row.text)
print (ttl_lst)
the requests module only returns the first html file sent to them. Sites like nypost use ajax requests to get their articles. You will have to use something like selenium for this, which allows for ajax requests after the page loads.
I'm trying to scrape the following website:
https://www.bandsintown.com/?came_from=257&sort_by_filter=Number+of+RSVPs
I'm able to successfully scrape the events listed on the page using beautifulsoup, using the following code:
from bs4 import BeautifulSoup
import requests
url = 'https://www.bandsintown.com/?came_from=257&sort_by_filter=Number+of+RSVPs'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
dates = soup.find_all('div', {'class': 'event-b58f7990'})
month=[]
day=[]
for i in dates:
md = i.find_all('div')
month.append(md[0].text)
day.append(md[1].text)
However, the issue I'm having is that I'm only able to scrape the first 18 events - the rest of the page is only available if the 'view all' button is clicked at the bottom. Is there a way in beautifulsoup, or otherwise, to simulate this button being clicked, so that I can scrape ALL of the data? I'd prefer to keep this in python as I'm doing most scraping with beautifulsoup. Thanks so much!
If you can work out the end point or set an end point for range in the following (with error handling for going too far) you can get a json response and parse out the info you require as follows. Depending on how many requests making you may choose to re-use connection with session.
import requests
import pandas as pd
url = 'https://www.bandsintown.com/upcomingEvents?came_from=257&sort_by_filter=Number+of+RSVPs&page={}&latitude=51.5167&longitude=0.0667'
results = []
for page in range(1,20):
data = requests.get(url.format(page)).json()
for item in data['events']:
results.append([item['artistName'], item['eventDate']['day'],item['eventDate']['month']])
df = pd.DataFrame(results)
print(df)