Scan aPDF file using online scanner [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to scan about 1000 pdf files using "wepawet" which is an online scanner but it takes one file at a time how could I scan the whole 1000 files, could I do that using python ?
https://wepawet.iseclab.org/
could any one help me please?
thank you in advance for helping

You can automate the process by using python tools like selenium, mechanize or urllib(I'm not sure about urllib). Fill the form using mechanize (a simple example of filling a form and submitting)
response = br.open(url)
print response.read()
response1 = br.response()
print response1.read()
br.select_form("form1")
br.form = list(br.forms())[0]
response = br.submit()
print response.read()
and submit it as in the code. For more info on mechanize, visit http://www.pythonforbeginners.com/cheatsheet/python-mechanize-cheat-sheet. Hope it works.

Related

what is the better way to get the information from this website with scrapy? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am trying to scrape this website with scrapy and I have had to search for each link extracting the information from each one, I would like to know if there is an API of the site that I can use (I don't know how to find it).
I would also like to know how I can obtain the latitude and longitude? Currently the map is shown but I do not know how to obtain the numbers
I appreciate any suggestions
The website may be loading the data dynamically using Javascript. Use your browser dev tools and look at the networking tab, look for any XHR calls which may be accessing an API. Then you can scrape from that directly.

Trying to read the text of an FTP website into a string in pythong [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
This is the site I can open in Chrome and see text:
ftp://ftp.cmegroup.com/pub/settle/stlags
Any idea how to read this into a string in python?
Don`t know if this helps but this will get you the html of a website:
import urllib.request
url = "ftp://ftp.cmegroup.com/pub/settle/stlags"
html = urllib.request.urlopen(url)
htmlB=html.read()
htmlS = htmlB.decode()
print(htmlS)

Automatic Download from webpage python [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am trying to download the data from this page https://www.nordpoolgroup.com/Market-data1/Power-system-data/Production1/Wind-Power-Prognosis/DK/Hourly/?view=table
As you can see there is a button that can automatically export the data to Excel on the right. I want to create something that is able to automatically export the data present on this to Excel everyday - kind of like a scraper, but I am not able to figure it out.
So far this is my code
import urllib2
nord='https://www.nordpoolgroup.com/Market-data1/Power-system-
data/Production1/Wind-Power-Prognosis/DK/Hourly/?view=table'
page=urllib2.urlopen(nord)
from bs4 import BeautifulSoup as bs
soup=bs(page)
pretty=soup.prettify()
all_links=soup.find_all("a")
for link in all_links:
print link.get("href")
all_tables=soup.find_all('tables')
right_table=soup.find('table', class_='ng-scope')
And this is where I am stuck, because it seems that the table class is not defined.
You can use the requests module for this.
Ex:
import requests
url = "https://www.nordpoolgroup.com/api/marketdata/exportxls"
r = requests.post(url) #POST Request
with open('data_123.xls', 'wb') as f:
f.write(r.content)

How to send data through HTTPS with python? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I would like to know how to send data to a website using HTTPS in python.
It seems simple to do it with HTTP but I could not manage to find the same for HTTPS requests...
It's pretty simple with requests:
import requests
r = requests.get('https://example.com')
print r.status_code
If you want to use urllib2, here is a snippet taken directly from their examples:
>>> import urllib2
>>> req = urllib2.Request(url='https://localhost/cgi-bin/test.cgi',
... data='This data is passed to stdin of the CGI')
>>> f = urllib2.urlopen(req)
>>> print f.read()
Got Data: "This data is passed to stdin of the CGI"

Scraping PHP from popup [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
Is there a way to scrape data from a popup? I'd like to import data from the site tennisinsight.com.
For example, http://tennisinsight.com/match-preview/?matchid=191551201
This is a sample data extraction link. When clicking "overview" there is a button with "Match Stats", I'd like to be able to import those data from many links in a text or CSV file.
What's the best way to accomplish this? Is Scrapy able to do this? Is there software able to do this?
You want to open the network analyzer in your browser (e.g. in Web Developer in Firefox) to see what requests are sent when you click the "match stats" button in order to replicate them using python.
When I do it, a POST request is sent to http://tennisinsight.com/wp-admin/admin-ajax.php with action and matchID parameters.
You presumably already know the match ID (see URL you posted above), so you just need to set up a POST request for each matchID you have.
import requests
r = requests.post('http://tennisinsight.com/wp-admin/admin-ajax.php', data={'action':'showMatchStats', 'matchID':'191551201'})
print r.text #this is your content of interest

Categories

Resources