Automatic Download from webpage python [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to download the data from this page https://www.nordpoolgroup.com/Market-data1/Power-system-data/Production1/Wind-Power-Prognosis/DK/Hourly/?view=table
As you can see there is a button that can automatically export the data to Excel on the right. I want to create something that is able to automatically export the data present on this to Excel everyday - kind of like a scraper, but I am not able to figure it out.
So far this is my code
import urllib2
nord='https://www.nordpoolgroup.com/Market-data1/Power-system-
data/Production1/Wind-Power-Prognosis/DK/Hourly/?view=table'
page=urllib2.urlopen(nord)
from bs4 import BeautifulSoup as bs
soup=bs(page)
pretty=soup.prettify()
all_links=soup.find_all("a")
for link in all_links:
print link.get("href")
all_tables=soup.find_all('tables')
right_table=soup.find('table', class_='ng-scope')
And this is where I am stuck, because it seems that the table class is not defined.

You can use the requests module for this.
Ex:
import requests
url = "https://www.nordpoolgroup.com/api/marketdata/exportxls"
r = requests.post(url) #POST Request
with open('data_123.xls', 'wb') as f:
f.write(r.content)

Related

Hi, I am trying to scrape some data from a webpage through 'requests' and 'BeautifulSoup' libraries in python via following line of code [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 days ago.
Improve this question
I am trying to scrape some data from a webpage through requests and BeautifulSoup libraries in python via following lines of code:
import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.century21.com/real-estate/rock-springs-wy/LCWYROCKSPRINGS/?")
c = r.content
soup=BeautifulSoup(c,'html.parser')
all=soup.find_all("div",{'class' : "infinite-item property-card clearfix property-card-CBR52611979 initialized visited" })
all
In the output I am getting an empty list:
[]

How to Solve Image Captcha Using Python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 days ago.
Improve this question
I was scrapping the price, product name and review on the e-commerce website https://www.ubaldi.com/electromenager/lavage/lave-linge/candy/lave-linge-frontal-candy--css1410twmre-47--40412978.php,
but I could not solve the image captcha. I am getting return "Please enable JS and disable ad blocker" error.
so can anyone tell me how to solve that captcha using python.
here is my code.
# load library
from bs4 import BeautifulSoup
from requests import get
import pandas as pd
siteUrl = "https://www.ubaldi.com/electromenager/lavage/lave-linge/candy/lave-linge-frontal-candy--css1410twmre-47--40412978.php"
responce = get(siteUrl)
main_container = BeautifulSoup(responce.text , 'html.parser')
print(main_container)

Why the beautifulsoup code part returns NaN? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
Any idea why the following returns NaN?
postDate,postUser, postVerifiedUser,
postLikes,postVerifiedTags,postUnverifiedTags,
postComment,postLocation, postAccessibility,postLink
The full code link is here https://github.com/kitsamho/Instagram_Scraper_Graph/blob/master/InstagramScraper.ipynb
I remember I run with your problem long time ago, the problem was that some websites data works only when JavaScript code load, and using beautifulsoup don't wait to the JavaScript code to load. The solution I come up with was this.
Use HTMLSessions,
from requests_html import HTMLSession
Then you define the session
session = HTMLSession()
Then you apply BeautifulSoup
URL = "Link to scrape, https://...."
page = session.get(URL).content
soup = BeautifulSoup(page, 'html.parser')
data = soup.find_all('p') # All the <p> tags

Scrape YouTube video views [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 3 years ago.
Improve this question
I have a list of youtube video links like this https://www.youtube.com/watch?v=ywZevdHW5bQ and I need to scrape the views count using BeautifulSoup and requests library
import requests
from bs4 import BeautifulSoup
url = 'https://www.youtube.com/watch?v=ywZevdHW5bQ'
soup = BeautifulSoup(requests.get(url).text, 'lxml')
print(soup.select_one('meta[itemprop="interactionCount"][content]')['content'])
Prints:
5186856
An alternative way is getting yourself a Youtube API key, then using videos/list api endpoint to get information about the video, then using the response to extract viewCount.
https://developers.google.com/youtube/v3/quickstart/python
https://developers.google.com/youtube/v3/docs/videos/list
https://developers.google.com/youtube/v3/docs/videos#resource

Scraping PHP from popup [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there a way to scrape data from a popup? I'd like to import data from the site tennisinsight.com.
For example, http://tennisinsight.com/match-preview/?matchid=191551201
This is a sample data extraction link. When clicking "overview" there is a button with "Match Stats", I'd like to be able to import those data from many links in a text or CSV file.
What's the best way to accomplish this? Is Scrapy able to do this? Is there software able to do this?
You want to open the network analyzer in your browser (e.g. in Web Developer in Firefox) to see what requests are sent when you click the "match stats" button in order to replicate them using python.
When I do it, a POST request is sent to http://tennisinsight.com/wp-admin/admin-ajax.php with action and matchID parameters.
You presumably already know the match ID (see URL you posted above), so you just need to set up a POST request for each matchID you have.
import requests
r = requests.post('http://tennisinsight.com/wp-admin/admin-ajax.php', data={'action':'showMatchStats', 'matchID':'191551201'})
print r.text #this is your content of interest

Categories

Resources