Extract the username with beautifulsoup from Facebook - python

I want to extract the username from Facebook posts without API. I've already succeeded in extraction the timestamp, but the same algorithm is not working with the username.
As input I have a list of links like these:
https://www.facebook.com/barackobama/photos/a.10155401589571749/10156901908101749/?type=3&theater
https://www.facebook.com/photo.php?fbid=391679854902607&set=gm.325851774772841&type=1&theater
https://www.facebook.com/FisherHouse/photos/pcb.10157433176029134/10157433170239134/?type=3&theater
I've already tried searching with pageTitle, but it is not working as expected because there are many unuseful information.
facebook = BeautifulSoup(req.text, "html.parser")
facebookusername = str (facebook.select('[id="pageTitle"]'))
My code now is:
req = requests.get(url)
facebook = BeautifulSoup(req.text, "html.parser")
divs = facebook.find_all('div', class_="_title")
for iteration in range (len(divs)):
if 'title' in str(divs[iteration]):
print (divs[iteration])
I need only the username as output.

As WizKid said, you should use the API. But to give you an answer: The name of the page seems to be nested within the h5-title. Extract the h5 first and then get the name.
x = facebook.find('h5')
title = x.find('a').getText()
I can't try it at the moment but that should do the trick.

Related

How to specify needed fields using Beautiful Soup and properly call upon website elements using HTML tags

I have been trying to create a web scraping program that will return the values of the Title, Company, and Location from job cards on Indeed. I finally am not returning error codes, however, I am only returning one value for each of the desired fields when there are multiple job cards I am attempting to call on. Also, the value for the company field is returning the value for the title field because is used there as well in the HTML code. I am unfamiliar with HTML code or how to specify my needs using Beautiful Soup. I have tried to use the documentation and played with a few different methods to solve this problem, but have been unsuccessful.
import requests
from bs4 import BeautifulSoup
page = requests.get("https://au.indeed.com/jobs?
q=web%20developer&l=perth&from=searchOnHP&vjk=6c6cd45320143cdf").text
soup = BeautifulSoup(page, "lxml")
results = soup.find(id="mosaic-zone-jobcards")
job_elements = results.find("div", class_="slider_container")
for job_element in job_elements:
title = job_element.find("h2")
company = job_element.find("span")
location = job_element.find("div", class_="companyLocation")
print(title.text)
print(company.text)
print(location.text)
Here is what returns to the console:
C:\Users\Admin\PycharmProjects\WebScraper1\venv\Scripts\python.exe
C:/Users/Admin/PycharmProjects/WebScraper1/Indeed.py
Web App Developer
Web App Developer
Perth WA 6000
Process finished with exit code 0
job_elements only returns the first matching element because you used find instead of find_all. For the same reason company links to the first span found in div.slider_container. The span you want contains class="companyName". Also, the prints should be inside the for loop. Here's the improved code.
job_elements = results.find_all("div", class_="slider_container")
for job_element in job_elements:
title = job_element.find("h2")
company = job_element.find("span", class_="companyName")
location = job_element.find("div", class_="companyLocation")
print(title.text)
print(company.text)
print(location.text)

Scraping stock price data

I'm trying to get the prices from this URL.
Can someone tell me what I'm doing wrong?
https://stockcharts.com/h-hd/?%24GSPCE
resp = requests.get(BASE_URL).content
soup = BeautifulSoup(resp, 'html.parser')
prices = soup.find('div', {'class': 'historical-data-descrip'})
content = str(prices)
print(content)
Stockcharts only provides historical data to StockChart members, so you probably need to pass some kind of authentication.
Or use an api like this one
You must be logged in to your StockCharts.com account to see this information.
are you signed in on your browser?
get the soup.contents() and open them in a notepad document and search for the above to see if you're straight up getting a login error

Retrieving information from a web page in Python?

I have the following page:
http://www.noncode.org/keyword.php
from where I want to extract some information from it by performing an external search by Python. Maybe it will sound simple, but I have not programmed web applications before.
So I would like to put in the search box something like:
NONHSAT146018.2
to perform a search and from the resulting webpage, which is this:
From the results I need to extract the field that it says Sequence. I have read some information about the BeatifulSoup library and some examples, but they do not include in the address the php form. I will really appreciate your help on this matter. Thanks.
Update: Following the advice of the users and with the help of #Lukas Newman, I made the following:
data="NONHSAT146018.2"
page = requests.get("http://www.noncode.org/show_rna.php?id=" + data)
soup=BS(page.content,'html.parser')
target = soup.find('h2',text='Sequence')
print(target)
target = soup.find('table',text='table-1')
print(target)
table = soup.find('table', attrs={'class'},text='table-1')
print(table)
when I inspect the results I got that the sequence is in the following field:
How can I extract that part by using Python?
Look at the url
http://www.noncode.org/show_rna.php?id=NONHSAT000002
The search is just passed as a get parameter. So to access the side just set the start url to something like:
import requests
from bs4 import *
id = "NONHSAT146018"
page = requests.get("http://www.noncode.org/show_rna.php?id=" + id)
soup = BeautifulSoup(page.content, "html.parser")
element = soup.findAll('table', class_="table-1")[1]
element2 = element.findAll('tr')[1]
element3 = element2.findNext('td')
your_data = str(element3.renderContents(), "utf-8")
print(your_data)

When I scrape data from a website it only returns a newline

I've tried the code with different websites and elements, but nothing was working.
import requests
from lxml import html
page = requests.get('https://www.instagram.com/username.html')
tree = html.fromstring(page.content)
follow = tree.xpath('//span[#class="g47SY"]/text()')
print(follow)
input()
Above is the code I tried to use to aquire the number of instagram followers someone had.
One issue with web scraping Instagram is that a lot of content, including tag attribute values, is rendered dynamically. So the class you are using to fetch followers may change.
If you are able to use the Beautiful Soup library in Python, you might have an easier time parsing the page and getting the data. You can install it using pip install bs4. You can then search for the og:description descriptor, which follows the Open Graph protocol, and parse it to get follower counts.
Here's an example script that should get the follower count for a particular user:
import requests
from bs4 import BeautifulSoup
username = 'google'
html = requests.get('https://www.instagram.com/' + username)
bs = BeautifulSoup(html.text, 'lxml')
item = bs.select_one("meta[property='og:description']")
name = item.find_previous_sibling().get("content").split("•")[0]
follower_count = item.get("content").split(",")[0]
print(follower_count)

Scrape Facebook friends with BeautifulSoup

I've already done some basic web scraping with BeautifulSoup. For my next project I've chosen to scrape facebook friend list of a specified user. The problem is, facebook lets you see friend lists of people only if you are logged in. So my question is, can I somehow bypass it and if not, can I make BeautifulSoup act like if it was logged in?
Here's my code:
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = input("enter url: ")
try:
page = urlopen(url)
except:
print("Error opening the URL")
soup = BeautifulSoup(page, 'html.parser')
content = soup.find('div', {"class": "_3i9"})
friends = ''
for i in content.findAll('a'):
friends = friends + ' ' + i.text
print(friends)
BeautifulSoup doesn't require that you use an URL. Instead:
Inspect the friends list
Copy the parent tag containing the list to a new file (ParentTag.html)
Open the file as a string, and pass it to BeautifulSoup()
with open("path/to/ParentTag.html", encoding="utf8") as html:
soup = BeautifulSoup(html, "html.parser")
Then, "you make-a the soup-a."
The problem is, facebook lets you see friend lists of people only if
you are logged in
You can overcome this using Selenium. You'll need it to authenticate yourself, then you can find the user. Once you found it, you can proceed in two ways:
You can get the HTML source with driver.page_sourceand from there use Beatiful Soup
Use the methods that Selenium provide you to scrape friends

Categories

Resources