I am trying to make an HTTP Post request using Python. The specific form I want to submit is on the following page: http://143.137.111.105/Enlace/Resultados2010/Basica2010/R10Folio.aspx
Using Chrome Dev Tools it seems like pushing the button makes an HTTP Post request but I am trying to figure out the exact request that is made. I currently have the following in Python:
import requests
url = 'http://143.137.111.105/Enlace/Resultados2010/Basica2010/R10Folio.aspx'
values = {
'txtFolioAlumno': '210227489P10',
}
r = requests.post(url, values)
print r.content
However, when I run this it simply prints out the HTML of the old page instead of returning the data from the new page (I am interested in getting the number next to 'Matematicas', 422 in this case). I have achieved this task using Selenium which actually opens a test browser, but I want to query the server directly.
Related
Like we open a URL to a normal browser so it will redirect to another website url. Example a shorted link. After you open this it will redirect you to the main url.
So how to do this in python I mean I need to open a URL on python and this will redirect to other website page then I will copy the other website page link.
That's all I want to know thank you.
I tried it with python requests and urllib module.
Like this
import requests
a = requests.get("url", allow_redirects = True)
And
import urllib.request
a = urllib.request.urlopen("url")
But it's not working at all. I mean didn't get the redirected page.
I know 4 types of redirections.
server sends response with status 3xx and new address
HTTP/1.1 302 Found
Location: https://new_domain.com/some/folder
Wikipedia: HTTP 301, HTTP 302, HTTP 303
server sends header Refresh with time in seconds and new address
Refresh: 0; url=https://new_domain.com/some/folder
server sends HTML with meta tag which emulates header Refresh
<meta http-equiv="refresh" content="0; url=https://new_domain.com/some/folder">
Wikipedia: meta refresh
JavaScript sets new location
location = url
location.href = url
location.replace(url)
location.assing(url)
The same for document.location, window.location
There should be also combination with open(),document.open(), window.open()
requests automatically redirects for first and (probably) second type. With urllib probably you would have to check status, get url, and run next request - but this is easy. You can even run it in loop because some pages may have many redirections. You can test it on httpbin.org (even for multi-redirections)
For third type it is easy to check if HTML has meta tag and run next request with new url. And again you can run in loop because some pages may have many redirections.
But forth type makes problem because requests can't run JavaScript and there are many different methods to assign new location. They can also hide it in code - "obfuscation".
In requests you can check response.history to see executed redirections
Hi I am new with python requests and would like to have some help.
When I try to use python requests and get the session cookie, use the following command:
session_req = requests.session()
result = session_req.get(
get_url
)
after execute GET from requests, I use the '.cookies' property ant the respective key I want to send at the POST Header, I get the value successfully, but the POST action is not working.
session_req.cookies['IFCSHOPSESSID']
but when I get the request from the same API via POSTMAN and try to get the cookie property (exporting the code as python requests) I found some differences, and if I use this same cookie exported from POSTMAN it works.
POSTMAN EXAMPLE
'cookie': 'IFCSHOPSESSID=hrthhiqdeg0dvf4ecooc83lui3; nikega=GA1.4.831513767.1599354095; nikega_gid=GA1.4.1839484382.1599354095; _ga=GA1.3.831513767.1599354095; _gid=GA1.3.733956911.1599354099; chaordic_browserId=0-fv_3j6NdVlbNFFwPRzUGQVse7e1bbqga-3OS1599354098234702; chaordic_anonymousUserId=anon-0-fv_3j6NdVlbNFFwPRzUGQVse7e1bbqga-3OS1599354098234702; chaordic_testGroup=%7B%22experiment%22%3Anull%2C%22group%22%3Anull%2C%22testCode%22%3Anull%2C%22code%22%3Anull%2C%22session%22%3Anull%7D; user_unic_ac_id=bec863cf-4e06-0ab1-d881-b566595d3e8f; _gcl_au=1.1.1305519862.1599354100; _fbp=fb.2.1599354100232.504934336; smeventsclear_16df2784b41e46129645c2417f131191=true; smViewOnSite=true; __pr.cvh=4ftsyf8x16; _gaexp=GAX1.3.tupm6REJTMeD-piAakRDMA.18557.0; blueID=75a502b6-e7c2-4eb3-8442-75aea5d95fdc; _cm_ads_activation_retry=false; sback_client=5816989a58791059954e4c52; sback_partner=false; sb_days=1599356617672; sback_refresh_wp=no; smClickOnSite=true; smClickOnSite_652c0aaee02549a3a6ea89988778d3fc=true; _rtbhouse_source_=socialminer; RKT=false; dedup=socialminer; lmd_cj=socialminer; advcake_url=https%3A%2F%2Fwww.nike.com.br%2Flancamentos%3Futm_source%3Dsocialminer%26utm_medium%3Dsocialminer_onsitedesktop%26utm_campaign%3Dsocialminer_onsitedesktop_lancamentos_desk%26smid%3D3-17; advcake_trackid=dd7e2ef0-dd50-889a-aeea-559a0d8bcd22; advcake_utm_content=socialminer_onsitedesktop_lancamentos_desk; advcake_utm_campaign=socialminer; Campanha=; Parceiro=; Midia=; AMCVS_F0935E09512D2C270A490D4D%40AdobeOrg=1; s_cc=true; lmd_orig=direct; SIZEBAY_SESSION_ID=0AC1A70CB19F4f03610665d04bb088ef3b9af0942fc8; sback_customer_w=true; sback_browser=0-87718800-1599408894bff13e290b9fee5fc2b430382f639b87dd9cf25112334287575f550afed62983-14051381-17920887216,13017640152-1599408894; sback_access_token=eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpc3MiOiJhcGkuc2JhY2sudGVjaCIsImlhdCI6MTU5OTQwODg5NSwiZXhwIjoxNTk5NDk1Mjk1LCJhcGkiOiJ2MiIsImRhdGEiOnsiY2xpZW50X2lkIjoiNTgxNjk4OWE1ODc5MTA1OTk1NGU0YzUyIiwiY2xpZW50X2RvbWFpbiI6Im5pa2UuY29tLmJyIiwiY3VzdG9tZXJfaWQiOiI1ZjU0M2VjODA5ZjFkMDkzMmQzMjQ2OTUiLCJjdXN0b21lcl9hbm9ueW1vdXMiOmZhbHNlLCJjb25uZWN0aW9uX2lkIjoiNWY1NDNlYzgwOWYxZDA5MzJkMzI0Njk2IiwiYWNjZXNzX2xldmVsIjoiY3VzdG9tZXIifX0.K6FYVBasHjMg_PLbT1yZfrnIp97USqijoMObF4eUSms.WrWrDrHeHezRqBiYiYHeDr; sback_customer=$2gSxATWYdVYOVGMI10bUdkW2pWeoZERU1kc1YWWhd1SNR0aMJ0QUVzTHpHdJZERnpVS6FTSkRUTOBjMys2bUdnT2$12; sback_pageview=false; ak_bmsc=B6177778CB59637165F7EC43342C1559C9063147DA220000234E555F8D78F831~plACNrc4cNxoHZNcO7aF4o+U0KQNKjzPECGSfb42NdayPvdNkBWwUT9QOhGjuLJJ3vStuFIRkiI/35wsHEyUE3/h2guphhaEy71BnfekvDtb/6F84hS+fWhPxxVG5RAlph8WzGpYMn6NZESNVcgnZYfH4HoZ/IzBPR6AMG9UGn6W4xm/j/j9kOfef8v/fZf2pXw4mxJuiN5Cxc7g2sV4nCdoEW98Q4AgqplzxWZjpamZk=; bm_sz=6586256DDAFC895D740341E4214D0D40~YAAQRzEGybYT7yN0AQAAfDw5ZQnXjJtKI2SxkwQFV9vLZpF5mACXNUtUFDSkidKuYM2fac5sQgRozU9fA3+017dht/PUtH+wtibATtTmoVOlpKnW+V76+1rySk3HK6q83Q9rtQc/LaaQ8VYtK/tDi0VOc7/0wLyKy/+Z4OLtgUpySYZZcEX4k8/46no8rFD6OQ==; AMCV_F0935E09512D2C270A490D4D%40AdobeOrg=359503849%7CMCIDTS%7C18512%7CMCMID%7C56897587165425478193529762442649463163%7CMCAAMLH-1600030892%7C4%7CMCAAMB-1600030892%7CRKhpRz8krg2tLO6pguXWp5olkAcUniQYPHaMWWgdJ3xzPWQmdj0y%7CMCOPTOUT-1599433292s%7CNONE%7CMCSYNCSOP%7C411-18519%7CvVersion%7C5.0.1; sback_total_sessions=3; sback_session=5f554e3c73a63da56d739d87; lmd_traf=direct-1599402359608&direct-1599408890286&direct-1599414284313&direct-1599427194077; chaordic_realUserId=2962653; chaordic_session=1599429266491-0.4343169041143473; _st_ses=49222273669791505; _st_cart_script=helper_nike.js; _st_cart_url=/; _sptid=1592; _spcid=1592; _st_id=cnVkc29ucmFtb25AZ21haWwuY29t; _st_idb=cnVkc29ucmFtb25AZ21haWwuY29t; lx_sales_channel=%5B%222%22%5D; sback_cart=5f555ba24f507d767721c387; CSRFtoken=1ac8a198f88ac1ccc1f8555ab41c8a95; gpv_v70=nikecombr%3Echeckout%3Eaddress; pv_templateName=CHECKOUT; gptype_v60=checkout%3Aaddress; stc119288=env:1599429270%7C20201007215430%7C20200906222939%7C5%7C1088071:20210906215939|uid:1599354102799.1149977977.6705985.119288.1871143352:20210906215939|srchist:1088071%3A1599429270%3A20201007215430:20210906215939|tsa:1599429270805.1898407973.364911.7635034620790062.2:20200906222939; bm_sv=C9C3A8C6B2F6CB232317BB794ADC0497~ZnoksXquh4Yrh4uN87gycXdh+ixzU+xMFsb94sO9uE5JMLyZz9eJPp5odX7vx944KIXG1nvOxuq8pdrQUDjBrchRJLC4yiD1yWX0h4BjWhZwbfHPtnzaT3ASbIZnf2Ts1TRt+ZAescJJwrNPs4oV2If7vyiWi2AYILFvCstCTS8=; _uetsid=a9a0bfd4fe4e4db52bcd4ca66850a785; _uetvid=9ba47ed116a48f496f6b1a9844e21c95; __udf_j=f08aeb668454efbf6ddc83dd9d4b7a8385abde9f9fbd92526f1de0441da2126ec40330dfc36d0b9c3eae98557c94447d; _spl_pv=40; s_sq=lojanike-new-production%252Clojanike-nikebr%3D%2526c.%2526a.%2526activitymap.%2526page%253Dnikecombr%25253Echeckout%25253Eaddress%2526link%253DSeguir%252520para%252520pagamento%2526region%253Didentificacao-form%2526pageIDType%253D1%2526.activitymap%2526.a%2526.c%2526pid%253Dnikecombr%25253Echeckout%25253Eaddress%2526pidt%253D1%2526oid%253DSeguir%252520para%252520pagamento%2526oidt%253D3%2526ot%253DSUBMIT; RT="z=1&dm=nike.com.br&si=92b42534-25ee-4155-aa1a-e7d127581869&ss=kermvxyl&sl=9&tt=17e8&bcn=%2F%2F173e2544.akstat.io%2F"; _abck=F6E1C280C3F9D735A2B1AB62443DB479~-1~YAAQVjEGycno+iJ0AQAAmtRxZQT8kxLFalTup4dkYT5+cq/PavPcY4/0zAeJv4GoSQQwYVj4EWydkfxbJR3Rgaa4k6ma+5O72J/lsiajATrx0oaZJuB5b/FIP6RymanPRVGlb3kLJXpBQDkCmVv62kkxLKxySrlAYDCg0ORCpSXlTCbFBVEchC9ih5t094egSeVdM6VjfQSO9uDKISBoP4923qkJMTpbk9B1nOoiylKK+y+FGFu8pzEpQqZYj7tIMTJVpqe0OpXaQ8m8nPyp0K+PmBcAndIHcBMTZUEqma9/72Enx8yvGbKXrYbAzNDw6ZtKY9OAbNuVeqprza/Af0aUkinm0l3JqxjTH1LpglNxNN4=~-1~-1~-1; CSRFtoken=20a208bad599aa3ead0bbe944b27a368; bm_sv=C9C3A8C6B2F6CB232317BB794ADC0497~ZnoksXquh4Yrh4uN87gycXdh+ixzU+xMFsb94sO9uE5JMLyZz9eJPp5odX7vx944KIXG1nvOxuq8pdrQUDjBrchRJLC4yiD1yWX0h4BjWhbSXhHWWrgkUsOTt9033P5Wxu1qmo5M6w0VAWeAzBaCN7yZC2Ll7DiGq0CwpjxlOW4=; _abck=F6E1C280C3F9D735A2B1AB62443DB479~-1~YAAQVjEGyRKO+iJ0AQAA+4U9ZQSNIWTEz/60Uk5gz2tnzVtbMbX0hpaMbkbeJxSYSMD1xo7TTedXnJ0UuTLxxcHhLVrRRCrZfSjZ+yH00Ld6FLIajmYFefKPehzA6GgwjnLyucI1O6nDw2ZU1CV0WJLeWGgcmX7sinsLr3DVtmoGJyNR1Q9EWpvq71/W1Ys4Bqhq1628YKEz/0Z1Ic1bWMujcG03064ZZYYXTSTz9jrkxHKaEoJQNQgyUg9NXQhv4EFoMSESy/AIKRy+hVCULLJscbkpH8WakuvYQ1raghVfheks/Xra9AmiUoOqAbWAPXOij1nWQ9PSV2hxQZfkibD0+YP14pTXPoCAUA9jCQHRJIw=~0~-1~-1'
session_req.cookies['IFCSHOPSESSID'] EXAMPLE
qnabtagl4pu7gm2jg3sij03cu6
Other curious thing is that when I use the '.cookies' property, my POST call return sucess even without update the cart where it should be inserting a new register.
As I am trying to develop one site bot, I would like to generate this same cookie via python requests code. Can anyone try to help me on it?
This is an example with python 3. You can customize it.
import requests
data ="param_1=value_1¶m_2=value_2&.....¶m_n=value_n"; #your request parameters.
cookie = "cookie_name=xxxxxxxx;....." #define cookie
url_endpoint = "htpps://........." # your url endpoint
# add cookies to endpoints
resp = requests.get(url_endpoint, data=data.encode('utf-8'),cookies=cookie)
if(resp.status_code==200):
print("success ")
else:
print("error ")
I am using Python and requests library to do web-scraping. I've got a problem with the loading of a page, I would like to make the requests.get() wait before getting the result.
I saw some people with the same "problem" they resolved using Selenium, but I don't want to use another API. I am wondering if it's possible using only urllib, urllib2 or requests.
I have tried to put time.sleep() in the get method, it didn't work.
It seems that I need to find where the website get the data before showing it but I can't find it.
import requests
def search():
url= 'https://academic.microsoft.com/search?q=machine%20learning'
mySession = requests.Session()
response = mySession.get(url)
myResponse = response.text
The response is the html code of the loading page (you can see it if you go to the link in the code) with the loading blocks but I need to get the results of the research.
requests cannot get loaded elements from ajax. See this explanation from w3schools.com.
Read data from a web server - after a web page has loaded
The only thing requests do is to download the html, but it does not interpret the javascript code and so cannot load elements which is normally loaded via ajax in a web browser (or using Selenium).
This site is making another requests and using javascript to render it. You cannot execute javascript with requests. That's why some people use Selenium.
https://academic.microsoft.com/search?q=machine%20learning is not meant to by used without browser.
If you want data specifically from academic.microsoft.com use their api.
import requests
url = 'https://academic.microsoft.com/api/search'
data = {"query": "machine learning",
"queryExpression": "",
"filters": [],
"orderBy": None,
"skip": 0,
"sortAscending": True,
"take": 10}
r = requests.post(url=url, json=data)
result = r.json()
You will get data in nice format and easy to use.
I am trying to update an already saved form on a system using HTTP requests. Due to the server configuration for the third party app we use, updating by POST requires sending a fully filled out payload every single time.
I want to get round this by recovering the form data already present on the server and converting it into a dictionary. Then changing any values I need and reposting to make changes sever side.
The application we use sends a POST request when the save button is clicked for a particular form.
Here I send a post request with no payload.
[This simulates pressing the save button and is also the point where dev tools shows me a the payload I want to capture]
post_test = self.session.post(url_to_retrieve_from)
I thought that now I should be able to print the output, which should resemble what Google Dev tools Form data captures.
print(post_test.text)
This just gives me html found on the webpage.
If Dev Tools can get this from the server then I should also be able to?
Example of Data I am trying to get via requests:
Form Data
If Dev Tools can get this from the server then I should also be able to?
Yes, of course. In requests you pass form data in data keyword:
import requests
url = 'http://www.example.com'
data = {
'name': 'value',
}
response = requests.post(url, data=data)
You can get the data you sent with a request from the response in this way:
import requests
response = requests.post('http://your_url', data=data) # send request
body = response.request.body
parsed_data = dict(data.split('=') for data in body.split('&')) # parse request body
Here you can find more information about data argument
In the documentation, in the class requests.Response we can find the attribute:
request = None
The PreparedRequest object to which this is a response.
In requests.PreparedRequest class we can read:
body = None
request body to send to the server.
HELLO I'm now trying to get information from the website that needs log in.
But I already get 200 response in the reqeustURL where I should POST some ID, passwords and requests.
headers dict have requests_headers that can be seen in the chrome developer network tap. form data dict have the ID and passwords.
login_site = requests.post(requestUrl, headers=headers, data=form_data)
status_code = login_site.status_code print(status_code)
I got 200
The code below is the way I've tried.
1. Session.
when I tried to set cookies with session, I failed. I've heard that session could set the cookies when I scrape other pages that need log-in.
session = requests.Session()
session.post(requestUrl, headers=headers, data=form_data)
test = session.get('~~') #the website that I want to scrape
print(test.status_code)
I got 403
2. Manually set cookie
I manually made the cookie dict that I can get
cookies = {'wcs_bt':'...','_production_session_id':'...'}
r = requests.post('http://engoo.co.kr/dashboard', cookies = cookies)
print(r.status_code)
I also got 403
Actually, I don't know what should I write in the cookies dict. when I get,'wcs_bt=AAA; _production_session_id=BBB; _ga=CCC;',should I change it to dict {'wcs_bt':'AAA'.. }?
When I get cookies
login_site = requests.post(requestUrl, headers=headers, data=form_data)
print(login_site.cookies)
in this code, I only can get
RequestsCookieJar[Cookie _production_session_id=BBB]
Somehow, I failed it also.
How can I scrape it with the cookie?
Scraping a modern (circa 2017 or later) Web site that requires a login can be very tricky, because it's likely that some important portion of the login process is implemented in Javascript.
Unless you execute that Javascript exactly as a browser would, you won't be able to complete the login. Unfortunately, the basic Python libraries won't help.
Consider Selenium with Python, which is used for testing Web sites but can be used to automate any interaction with a Web site.