Grab data from a post request with Python 3 - python

Can anyone help me to grab the date from this website?
I want to get the data from the new website after I click "Submit Query". That needs a post request because a form being submitted
https://henke.lbl.gov/optical_constants/pert_form.html
I tried multiple methods (post request) online but all failed. Don't know why.
Many thanks!

If You want to grab text contents of page for example, try:
import requests
r = requests.get('https://henke.lbl.gov/optical_constants/pert_form.html')
print(r.text)
For more go to https://docs.python-requests.org/en/master/

Related

how to perform a post request just like the browser to get the same results

I have this webpage: https://www.dsbmobile.de. I would like to automate a bot that checks in and gets the newest file. I have an account and login credentials so I tried this with python:
import requests
url = "https://www.dsbmobile.de/Login.aspx?ReturnUrl=%2f"
payload = {'txtUser': 'username', 'txtPass': 'password'}
x = requests.post(url, data=payload)
print(x.text)
I get a result but its just the login page instead of the new page I should be ridirected to .
When looking at the source, I saw that there are hidden input fields such as "__EVENTVALIDATION"
do I need to send them too? Or maybe I need to set something into the header idk. It would be very nice if someone could tell me how to write the post request just like the browser sends it so that I get the right response
I am new but it would be extremely useful to me if I could automate that process.
Thank you very much for trying to help me

How to submit data to a webpage using xpaths and the such

I'm in the process of writing an API for a craigslist like website, and I finished the data getting part using html from lxml.
Now I want to submit data (login info, things to be posted ...) to the website.
Can I do it using lxml or do I have to use another module ?
As #furas mentioned you can use requests module for posting the data (login in your case).
Here is the simple example where you can use requests module for both get and post
import requests
# get the token
resp = requests.get("https://www.botoxcosmetic.com/sc/api/findclinic/GetFadToken?_=1556315102966")
# print the token
print (resp.json())
# storing all the input data in dataI. In your case you have to replace them with username and password (check the API documentation or devtools to make sure the param name(s) is correct)
dataI = {'ZipCode':'10022','MileRadius':'1','PerPage':'5','Token':resp.json()}
# post the data (you don't have to click on any submit button) and capture the response to the the post.
resp = requests.post("https://www.botoxcosmetic.com/sc/api/findclinic/FindSpecialists",data= dataI)
# print the response from post call
print(resp.json())

Is it posssible to follow only redirect status codes and get redirect links instead of downloading webpage page in requests or other Python library?

Here is my scenario.
I have a lot of links. I want to know if any of them redirect to a different site (maybe a particular one) and only get those redirect URLs.(I want to preserve them for further scraping).
I don't want to get contents of webpage. I only want to get the link it redirects to. If there are multiple redirects, I may want to get the urls until say the 3rd redirect (So, that I'm not in a redirect loop).
How do I achieve this?
Can I do this in requests?
Requests seems to have a r.status, but it only works after fetching the page.
You can use requests.head(url, allow_redirects=True) which will only get the headers. If the response has the Location header it will follow the redirect and head the next url.
import requests
response = requests.head('http://httpbin.org/redirect/3', allow_redirects=True)
for redirect in response.history:
print(redirect.url)
print(response.url)
Output:
http://httpbin.org/redirect/3
http://httpbin.org/relative-redirect/2
http://httpbin.org/relative-redirect/1
http://httpbin.org/get

Python: Sending POST request doesn't work

I'm trying to scrape a webpage sending POST to fill a form, generally I use selenium to scrape a page with python, but I recently read that sending a POST request is a better way to scrape results. Anyway, I followed some instructions for make my code, but when I post my data, I get the same page with the form filled (the POST doesn't submit the form), what I'm doing wrong? Also the same page has another form to fill after the first, so if I achieve fill the first form I really don't know how to keep that response for get the final response, so if someone can help with some ideas, I reall would appreciate it! Thanks and I include my code and the page that I'm looking for scrape final quotation:
https://www.santander.cl/cotizador-web/
import requests, lxml.html
import time
s = requests.session()
login = s.get('https://www.santander.cl/cotizador-web/cotizador/pasosSolicitud.xhtml')
login_html = lxml.html.fromstring(login.text)
hidden_inputs = login_html.xpath(r'//form//input[#type="hidden"]')
form = {x.attrib["name"]: x.attrib["value"] for x in hidden_inputs}
form['pasosForm:marcas']='27'
form['pasosForm:modelos']='1978'
form['pasosForm:ano']='2015'
form['pasosForm:uso']='1'
form['pasosForm:j_id93373712_1a32e354_input']='on'
form['formDialogCotiSelec:j_id216370348_64c01a10_active'] = '1'
form['javax.faces.partial.execute']='pasosForm pasosForm:siguiente1'
response = s.post('https://www.santander.cl/cotizador-web/cotizador/pasosSolicitud.xhtml', data=form)
print(response.text)
I see that all forms have hidden field like this
<input type="hidden" name="javax.faces.ViewState"
id="javax.faces.ViewState" value="zDmSF7aJ4QSdyqjY5D4dGbfEaQr5OiS6WorNARY6pfHWSXIe/APb5e
/wcHsiGvPVaXW4IFpVHFyFHNSSJMPdHt2mhaYm4TQ9WPo+TQgWFTB1ZRE1wwiJtXQfmKuwE2+R+iRmONBAmZCR9E8x" />
It's csrf token and generated from current session. You should visit(create session) of form page, before you make post request.
More info here:
https://www.owasp.org/index.php/Cross-Site_Request_Forgery_%28CSRF%29_Prevention_Cheat_Sheet

Using Python 3.5 to Login, Navigate, and Scrape Without Using a Browser

I'm trying to scrape multiple financial websites (Wells Fargo, etc.) to pull my transaction history for data analysis purposes. I can do the scraping part once I get to the page I need; the problem I'm having is getting there. I don't know how to pass my username and password and then navigate from there. I would like to do this without actually opening a browser.
I found Michael Foord's article "HOWTO Fetch Internet Resources Using The urllib Package" and tried to adapt one of the examples to meet my needs but can't get it to work (I've tried adapting to several other search results as well). Here's my code:
import bs4
import urllib.request
import urllib.parse
##Navigate to the website.
url = 'https://www.wellsfargo.com/'
values = {'j_username':'USERNAME', 'j_password':'PASSWORD'}
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
req = urllib.request.Request(url, data)
with urllib.request.urlopen(req) as response:
the_page = response.read()
soup = bs4.BeautifulSoup(the_page,"html.parser")
The 'j_username' and 'j_password' both come from inspecting the text boxes on the login page.
I just don't think I'm pointing to the right place or passing my credentials correctly. The URL I'm using is just the login page so is it actually logging me in? When I print the URL from response it returns https://wellsfargo.com/. If I'm ever able to successfully login, it just takes me to a summary page of my accounts. I would then need to follow another link to my checking, savings, etc.
I really appreciate any help you can offer.

Categories

Resources