how to extract a certain number from a website in python

how to extract a certain number from a website in python - python

I need to make a bot for a client that gets data from http://backpack.tf/stats/Unique/AWPer%20Hand/Tradable/Craftable and does some research.
On the top of the website you see the recomended price (3 ref) and if you scroll down you can see what people are actaully selling them for.
I need to see if what there selling them for is less then the recomended price. I have inspected the element and found that each listing uses a class called "media listing" followed by a random ID. Where do i go from here?

I would suggest reading the BeautifulSoup documentation, but this should give you a good idea of what you want to do:
from bs4 import BeautifulSoup
import requests
url = "http://backpack.tf/stats/Unique/AWPer%20Hand/Tradable/Craftable"
r = requests.get(url)
soup = BeautifulSoup(r.text)
curPrice = soup.find('h2').findNext('a').text
print 'The current price is: {0}'.format(curPrice)
print 'These are the prices they are being sold at: '
print '\n'.join([item.text for item in soup.find_all('span', attrs={'class': 'label label-black', 'data-tip': 'bottom'})])

Related

Price Scraping: Element not visible in html

I am trying to extract the price value of in the linked website using beautfulsoup in python. I am able to see the where the price is when I use 'Inspect Element', but I do not see it when using 'View Source'
import requests
from bs4 import BeautifulSoup
r = requests.get('https://www.rolex.com/en-us/watches/air-king/m126900-0001.html')
soup = BeautifulSoup(r.content, 'html.parser')
s = soup.find('span', "class=sc-fznKkj sc-fzqNJr sc-qWdEB emvJfj")
When I run this code the object s is empty rather than including the price.

The price data loads from a json file. You can see this by loading up Dev Tools and searching for the price (7,400 for me) and seeing what request loaded the price.
The code to get the price is simple :)
response = requests.get("https://www.rolex.com/content/api/rolex/model-price.US.m126900-0001.en_us.json").json()
price = response['formattedPrice'] # $7,400

Trying to Get A Link Embedded in an HTML Page with Python

On the webpage: https://podcasts.apple.com/us/podcast/id979020229, there is a title that reads "Python at the US Federal Election Commission". When you click on that title, a link opens. What I'm trying to do is firstly, find the first title on the webpage that has an embedded link. Then, to get that link and print it. I'm not sure how to do this in Python, but I've tried using different ways I thought would work. One of the ways involved the BeautifulSoup module. My code is below.
Code:
page = requests.get(link)
soup = bs4.BeautifulSoup(page.text, "html.parser")
eps = soup.find_all('a')
i = 0
while (len(open) < 1):
s = str(eps[i].get('href'))
if s[8] == 'q':
open.append(s)
i += 1
for i in open:
print(i)

You can use findall with the class of the link.
I used inspect element to determine which class to select, in this case, tracks__track__link--block
Then you can iterate through the links.
from bs4 import BeautifulSoup
import requests
page = requests.get('https://podcasts.apple.com/us/podcast/id979020229')
soup = BeautifulSoup(page.text, "html.parser")
eps = soup.find_all(class_ = 'tracks__track__link--block')
for a in eps:
# gets the text of the link
print(a.text)
# gets the link
print(a['href'])
prints
Python at the US Federal Election Commission
https://podcasts.apple.com/us/podcast/python-at-the-us-federal-election-commission/id979020229?i=1000522628049
Flask 2.0
https://podcasts.apple.com/us/podcast/flask-2-0/id979020229?i=1000521751060
Awesome FastAPI extensions and add ons
https://podcasts.apple.com/us/podcast/awesome-fastapi-extensions-and-add-ons/id979020229?i=1000520763681
Ask us about modern Python projects and tools
https://podcasts.apple.com/us/podcast/ask-us-about-modern-python-projects-and-tools/id979020229?i=1000519506466
Automate your data exchange with PyDantic
https://podcasts.apple.com/us/podcast/automate-your-data-exchange-with-pydantic/id979020229?i=1000518286844
Python Apps that Scale to Billions of Users
https://podcasts.apple.com/us/podcast/python-apps-that-scale-to-billions-of-users/id979020229?i=1000517664109

Navigating html tree with beautifulsoup

I'm trying to scrape some data with beautifulsoup on python (url:http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html)
When the data is first occurrence, no problem like
titlebook = soup.find("h1")
titlebook = titlebook.text
but i want to scrape different values, further in page, like upc, price incl.tax, etc
Upc value is first and i have it running universal_product_code= soup.find("tr").find("td").text
I tried so many solutions to access the other ones (i've read beautifulsoup documentation and tried lot of things but it didn't really help me)
So my question is, how to access specific values in a tree where tags are same? I joined a screenshot of the tree to help you understand what i'm talking about
Thank you for your help

For example, if you want to find price (excluding tax), you can use string= parameter in .find and then search for text in next <td>:
import requests
from bs4 import BeautifulSoup
url = "http://books.toscrape.com/catalogue/tipping-the-velvet_999/index.html"
soup = BeautifulSoup(requests.get(url).content, "html.parser")
# get "Price (excl. tax)" from the table:
key = "Price (excl. tax)"
print("{}: {}".format(key, soup.find("th", string=key).find_next("td").text))
Prints:
Price (excl. tax): £53.74
Or: Use CSS selector:
print(soup.select_one('th:-soup-contains("Price (excl. tax)") + td').text)

How to get the text and URL from a link using beautifulsoup

I have the following code which prints out a list of the links for each team in a table:
import requests
from bs4 import BeautifulSoup
# Get all teams in Big Sky standings table
URL = 'https://www.espn.com/college-football/standings/_/group/20/view/fcs-i-aa'
page = requests.get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
standings = soup.find_all('table', 'Table Table--align-right Table--fixed Table--fixed-left')
for team in standings:
team_data = team.find_all('span', 'hide-mobile')
print(team_data)
The code prints out the entire list and if I pinpoint an index such as 'print(team_data[0])', it will print out the specific link from the page.
How can I then go into that link and get the string from the URL as well as the text for the link?
For example, my code prints out the following for the first index in the list.
<span class="hide-mobile"><a class="AnchorLink" data-clubhouse-uid="s:20~l:23~t:2692" href="/college-football/team/_/id/2692/weber-state-wildcats" tabindex="0">Weber State Wildcats</a></span>
How can I pull
/college-football/team/_/id/2692/weber-state-wildcats
and
Weber State Wildcats
from the link?
Thank you for your time and if there is anything I can add for clarification, please don't hesitate to ask.

Provided that you have an html like:
<span class="hide-mobile"><a class="AnchorLink" data-clubhouse-uid="s:20~l:23~t:2692" href="/college-football/team/_/id/2692/weber-state-wildcats" tabindex="0">Weber State Wildcats</a></span>
To get the /college-football/team/_/id/2692/weber-state-wildcats:
>>> team_data.find_all('a')[0]['href']
'/college-football/team/_/id/2692/weber-state-wildcats'
To get the Weber State Wildcats:
>>> team_data.find_all('a')[0].text
'Weber State Wildcats''

In terms of the href/url, you can do something like this.
In regards to the link text, you could do something like this.
Both amount to filtering down to the target element, and then extracting the desired attribute.

Scraping page with BS in python only captures first column of splitColumn

I'm trying to scrape the last part of this page through BeautifulSoup in python.
I want to retrieve all the companies listed in the bottom. Furthermore, the companies are ordered alphabetically, where the companies with titles starting with "A-F" appear under the first tab, then "G-N" under the second tab and so on. You have to click the tabs for the names to appear, so I'll loop through the different "name pages" and apply the same code.
I'm having trouble retrieving all the names of a single page, however.
When looking at the companies named "A-F" I can only retrieve the names of the first column of the table.
My code is:
from bs4 import BeautifulSoup as Soup
import requests
incl_page_url = "https://www.triodos.com/en/investment-management/socially-
responsible-investment/sustainable-investment-universe/companies-atmf1/"
page = requests.get(incl_page_url)
soup = Soup(page.content, "html.parser")
for header in soup.find("h2").next_siblings:
try:
for a in header.childGenerator():
if str(type(a)) == "<class 'bs4.element.NavigableString'>":
print(str(a))
except:
pass
As can be seen by running this, I only get the names from the first column.
Any help is very much appreciated.

Give this a shot and tell me this is not what you wanted:
from bs4 import BeautifulSoup
import requests
incl_page_url = "https://www.triodos.com/en/investment-management/socially-responsible-investment/sustainable-investment-universe/companies-atmf1/"
page = requests.get(incl_page_url).text
soup = BeautifulSoup(page, "lxml")
for items in soup.select(".splitColumn p"):
title = '\n'.join([item for item in items.strings])
print(title)
Result:
3iGroup
8point3 Energy Partners 
A
ABN AMRO
Accell Group
Accsys Technologies
Achmea
Acuity Brands
Adecco
Adidas
Adobe Systems

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to extract a certain number from a website in python - python

Related

Price Scraping: Element not visible in html

Trying to Get A Link Embedded in an HTML Page with Python

Navigating html tree with beautifulsoup

How to get the text and URL from a link using beautifulsoup

Scraping page with BS in python only captures first column of splitColumn

Categories

Resources