There are 4 switchable tabs on the website, I managed to extract from the first tab but couldn't figure out how to extract from other three tabs because the tab needs to be clicked on (i think).
Product Details
Feedback
Shipping & Payment
Seller Guarantees
my code :
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
myurl = 'https://www.aliexpress.com/item/Vfemage-Womens-Elegant-Ruched-Bow-Contrast-Patchwork-3-4-Sleeve-Vintage-Pinup-Work-Office-Party-Fitted/32831085887.html?spm=2114.search0103.3.12.iQlXqu&ws_ab_test=searchweb0_0,searchweb201602_3_10152_10065_10151_10344_10068_10345_10342_10325_10343_51102_10546_10340_10548_10341_10609_10541_10084_10083_10307_10610_10539_10312_10313_10059_10314_10534_100031_10604_10603_10103_10605_10594_10142_10107,searchweb201603_25,ppcSwitch_5&algo_expid=a3e03a67-d922-4c90-aba7-d3cc80101a75-1&algo_pvid=a3e03a67-d922-4c90-aba7-d3cc80101a75&rmStoreLevelAB=0'
uClient = uReq(myurl)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
productdetails = page_soup.select("ul.product-property-list.util-clearfix li")
How do you extract contents from the other 3 tabs ?
I used Selenium to click each tabs & extract contents from all tabs with it.
Related
I am trying to access the live price of a crypto currency coin from an exchange webpage through python.
The XPath of the value I want is in /html/body/luno-exchange-app/luno-navigation/div/luno-market/div/section[2]/luno-orderbook/div/luno-spread/div/luno-orderbook-entry[2]/div/span/span and I have been doing the following
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
exchage_url = 'https://www.luno.com/trade/markets/LTCXBT'
uClient = uReq(exchange_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
print(page_soup.html.body.luno-exchange-app.luno-navigation.div.luno-market.div.section[2].luno-orderbook.div.luno-spread.div.luno-orderbook-entry[2].div.span.span)
However the code doesn't work, due to the '-' in print(page_soup.html.body.luno-exchange-app...)
Is there a way to get that value I want?
How I got the XPath:
I pressed F12, right click the red box, and clicked "Inspect" which highlights a section in the html which I then right click, copy -> XPath. Here is a picture to visually show the value I want The value I am interested in
It is dynamically added from an ajax request
import requests
r = requests.get('https://ajax.luno.com/ajax/1/ticker?pair=LTCXBT')
print(r.json()['bid'])
I am trying to scrape some data from a website.
But when I want to print it I just get the tags back, but with out the information in it.
This is the code:
#Imports
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
#URL
my_url = 'https://website.com'
#Opening connection grabbing the page
uClient = uReq(my_url)
page_html = uClient.read()
#closing the page
uClient.close()
#Parse html
page_soup = soup(page_html,"html.parser")
price = page_soup.findAll("div",{"id":"lastTrade"})
print(price)
This ist what I get back:
[<div id="lastTrade"> </div>]
So does anyone can tell me what i have to change or add so I receive the actual infortmation from inside this tag?
Maybe loop through your list like this :
for res in price:
print(res.text)
I'm trying to scrape data from Robintrack but, I cannot get the data from the increase/decrease section. I can only scrape the home page data. Here is my Soup
import bs4
import requests
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
robin_track_url= 'https://robintrack.net/popularity_changes?changeType=increases'
#r = requests.get('https://robintrack.net/popularity_changes?changeType=increases')
#soup = bs4.BeautifulSoup(r.text, "xml")
#Grabs and downloads html page
uClient = uReq(robin_track_url)
page_html = uClient.read()
uClient.close()
#Does HTML parsing
page_soup = soup(page_html, "html.parser")
print("On Page")
print(page_soup.body)
stocks = page_soup.body.findAll("div", {"class":"ReactVirtualized__Table__row"})
print(len(stocks))
print(stocks)
Am I doing something wrong?
Your version not will be work because the data that you want to load is loaded via JS.
requests load the only static page.
if you want to get data that you want to do next:
requests.get('https://robintrack.net/api/largest_popularity_increases?hours_ago=24&limit=50&percentage=false&min_popularity=50&start_index=0').json()
I'm new to python and trying some webscraping with beautifulsoup. After following some youtube videos and endlessly looking for answers on stackflow, I decided I would post for help.
With the below code I am hoping to extract data from the website. I've noticed that one 3 div are listed when I print(len(div)), but there are several div's on the webpage. Other posts point to soup.findAll('div', {'class': 'responsive-text-label'}) as a solution, however, when I print() that code (and others similar), I get [] as a result from pyCharm.
enter code here
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
my_url = 'https://coronavirus.jhu.edu/map.html'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, 'html.parser')
total_cases = page_soup.findAll('div', {'class': 'responsive-text-label'})
print(total_cases)
RESULTS FROM PYCHARM
[]
Thank you in advance for taking the time and helping me out.
very new to Python. The following code will only allow me to display individual p entries from the extracted website (the first entry, 0, being the current example).
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = "https://en.wikipedia.org/wiki/Young_Thug"
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
page_soup.findAll("p")
paragraphs = page_soup.findAll("p")
paragraph = paragraphs[0].text.strip()
print(paragraph)
For some reason, I can't grip the particular for argument I would need to display all of the p elements on the site in a single block of text.
The eventual goal of the above code snippet is a reading grade level app, hence the stripped down text. Any help would be appreciated, thank you!
I’m not near my laptop to include the output, but generally it would be:
paragraphs = page_soup.findAll("p")
for para in paragraphs:
print (para.text.strip())