Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a list of youtube video links like this https://www.youtube.com/watch?v=ywZevdHW5bQ and I need to scrape the views count using BeautifulSoup and requests library
import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com/watch?v=ywZevdHW5bQ'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
print(soup.select_one('meta[itemprop="interactionCount"][content]')['content'])
Prints:
5186856
An alternative way is getting yourself a Youtube API key, then using videos/list api endpoint to get information about the video, then using the response to extract viewCount.
https://developers.google.com/youtube/v3/quickstart/python
https://developers.google.com/youtube/v3/docs/videos/list
https://developers.google.com/youtube/v3/docs/videos#resource
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 days ago.
Improve this question
I am trying to scrape some data from a webpage through requests and BeautifulSoup libraries in python via following lines of code:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.century21.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/?")
c = r.content
soup=BeautifulSoup(c,'html.parser')
all=soup.find_all("div",{'class' : "infinite-item property-card clearfix property-card-CBR52611979 initialized visited" })
all
In the output I am getting an empty list:
[]
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 days ago.
Improve this question
I was scrapping the price, product name and review on the e-commerce website https://www.ubaldi.com/electromenager/lavage/lave-linge/candy/lave-linge-frontal-candy--css1410twmre-47--40412978.php,
but I could not solve the image captcha. I am getting return "Please enable JS and disable ad blocker" error.
so can anyone tell me how to solve that captcha using python.
here is my code.
# load library
from bs4 import BeautifulSoup
from requests import get
import pandas as pd
siteUrl = "https://www.ubaldi.com/electromenager/lavage/lave-linge/candy/lave-linge-frontal-candy--css1410twmre-47--40412978.php"
responce = get(siteUrl)
main_container = BeautifulSoup(responce.text , 'html.parser')
print(main_container)
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Any idea why the following returns NaN?
postDate,postUser, postVerifiedUser,
postLikes,postVerifiedTags,postUnverifiedTags,
postComment,postLocation, postAccessibility,postLink
The full code link is here https://github.com/kitsamho/Instagram_Scraper_Graph/blob/master/InstagramScraper.ipynb
I remember I run with your problem long time ago, the problem was that some websites data works only when JavaScript code load, and using beautifulsoup don't wait to the JavaScript code to load. The solution I come up with was this.
Use HTMLSessions,
from requests_html import HTMLSession
Then you define the session
session = HTMLSession()
Then you apply BeautifulSoup
URL = "Link to scrape, https://...."
page = session.get(URL).content
soup = BeautifulSoup(page, 'html.parser')
data = soup.find_all('p') # All the <p> tags
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
When i extract a url, it displays as below
https://tv.line.me/v/14985624_%E0%B8%A0%E0%B8%B9%E0%B8%95%E0%B8%A3%E0%B8%B1%E0%B8%95%E0%B8%95%E0%B8%B4%E0%B8%81%E0%B8%B2%E0%B8%A5-ep3-6-6-%E0%B8%8A%E0%B9%88%E0%B8%AD%E0%B8%878
how do i convert this to more readable format like below in python. The link below is the same as above.
Link to the image of how the url appears on browser address bar
You can use urllib module to decode this url
from urllib.parse import unquote
url = unquote('https://tv.line.me/v/14985624_%E0%B8%A0%E0%B8%B9%E0%B8%95%E0%B8%A3%E0%B8%B1%E0%B8%95%E0%B8%95%E0%B8%B4%E0%B8%81%E0%B8%B2%E0%B8%A5-ep3-6-6-%E0%B8%8A%E0%B9%88%E0%B8%AD%E0%B8%878')
print(url)
This will give you the result as follows.
https://tv.line.me/v/14985624_ภูตรัตติกาล-ep3-6-6-ช่อง8
Thank you
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to download the data from this page https://www.nordpoolgroup.com/Market-data1/Power-system-data/Production1/Wind-Power-Prognosis/DK/Hourly/?view=table
As you can see there is a button that can automatically export the data to Excel on the right. I want to create something that is able to automatically export the data present on this to Excel everyday - kind of like a scraper, but I am not able to figure it out.
So far this is my code
import urllib2
nord='https://www.nordpoolgroup.com/Market-data1/Power-system-
data/Production1/Wind-Power-Prognosis/DK/Hourly/?view=table'
page=urllib2.urlopen(nord)
from bs4 import BeautifulSoup as bs
soup=bs(page)
pretty=soup.prettify()
all_links=soup.find_all("a")
for link in all_links:
print link.get("href")
all_tables=soup.find_all('tables')
right_table=soup.find('table', class_='ng-scope')
And this is where I am stuck, because it seems that the table class is not defined.
You can use the requests module for this.
Ex:
import requests
url = "https://www.nordpoolgroup.com/api/marketdata/exportxls"
r = requests.post(url) #POST Request
with open('data_123.xls', 'wb') as f:
f.write(r.content)