I am using this code to update some itens in my list of products
# Inform the payload data
payload = {
"EPrincipal":"888407233616",
"SiteId":106
}
# POST request
adicionar_url = "MY URL"
post = session_req.post(
adicionar_url,
data = payload
)
Once I try to debbug, the status code that it is returning is 200, but when I write the result using soup, I got
soup = BeautifulSoup(result.text, 'html.parser')
#Return
{"success": false, "site_id":"" }
and the itens are not updated in my account. Can someone try to help me on it?
I got a solution using the request exported by the postman using the python requests library.
import requests
url = "MY SITE"
payload = "Principal=9999&Site=999&g-recaptcha-response="
headers = {
'authority': 'ROOT-SITE',
'x-sec-clge-req-type': 'ajax',
'accept': 'application/json, text/javascript, */*; q=0.01',
'x-requested-with': 'XMLHttpRequest',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36',
'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
'origin': 'https://ROOT-SITE',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://ROOT-SITE',
'accept-language': 'pt-BR,pt;q=0.9,en-US;q=0.8,en;q=0.7,fr;q=0.6,es;q=0.5',
}
response = requests.request("POST", url, headers=headers, data = payload)
print(response.text.encode('utf8'))
Related
I'm trying to create a script using scrapy to grab json content from this webpage. I've used headers within the script accordingly but when I run it, I always end up getting JSONDecodeError. The site sometimes throws captcha but not always. However, I've never got any success using the script below even when I used vpn. How can I fix it?
This is how I've tried:
import scrapy
import urllib
class ImmobilienScoutSpider(scrapy.Spider):
name = "immobilienscout"
start_url = "https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/wohnung-kaufen"
headers = {
'accept': 'application/json; charset=utf-8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'x-requested-with': 'XMLHttpRequest',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
}
params = {
'price': '1000.0-',
'constructionyear': '-2000',
'pagenumber': '1'
}
def start_requests(self):
req_url = f'{self.start_url}?{urllib.parse.urlencode(self.params)}'
yield scrapy.Request(
url=req_url,
headers=self.headers,
callback=self.parse,
)
def parse(self,response):
yield {"response":response.json()}
This is how the output should look like (truncated):
{"searchResponseModel":{"additional":{"lastSearchApiUrl":"/region?realestatetype=apartmentbuy&price=1000.0-&constructionyear=-2000&pagesize=20&geocodes=1276010&pagenumber=1","title":"Eigentumswohnung in Nordrhein-Westfalen - ImmoScout24","sortingOptions":[{"description":"Standardsortierung","code":0},{"description":"Kaufpreis (höchste zuerst)","code":3},{"description":"Kaufpreis (niedrigste zuerst)","code":4},{"description":"Zimmeranzahl (höchste zuerst)","code":5},{"description":"Zimmeranzahl (niedrigste zuerst)","code":6},{"description":"Wohnfläche (größte zuerst)","code":7},{"description":"Wohnfläche (kleinste zuerst)","code":8},{"description":"Neubau-Projekte (Projekte zuerst)","code":31},{"description":"Aktualität (neueste zuerst)","code":2}],"pagerTemplate":"|Suche|de|nordrhein-westfalen|wohnung-kaufen?price=1000.0-&constructionyear=-2000&pagenumber=%page%","sortingTemplate":"|Suche|de|nordrhein-westfalen|wohnung-kaufen?price=1000.0-&constructionyear=-2000&sorting=%sorting%","world":"LIVING","international":false,"device":{"deviceType":"NORMAL","devicePlatform":"UNKNOWN","tablet":false,"mobile":false,"normal":true}
EDIT:
This is how the script built upon requests module looks like:
import requests
link = 'https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/wohnung-kaufen'
headers = {
'accept': 'application/json; charset=utf-8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'x-requested-with': 'XMLHttpRequest',
'content-type': 'application/json; charset=utf-8',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/102.0.0.0 Safari/537.36',
'referer': 'https://www.immobilienscout24.de/Suche/de/nordrhein-westfalen/wohnung-kaufen?price=1000.0-&constructionyear=-2000&pagenumber=1',
# 'cookie': 'hardcoded cookies'
}
params = {
'price': '1000.0-',
'constructionyear': '-2000',
'pagenumber': '2'
}
sess = requests.Session()
sess.headers.update(headers)
resp = sess.get(link,params=params)
print(resp.json())
Scrapy's CookiesMiddleware disregards 'cookie' passed in headers.
Reference: scrapy/scrapy#1992
Pass cookies explicitly:
yield scrapy.Request(
url=req_url,
headers=self.headers,
callback=self.parse,
# Add the following line:
cookies={k: v.value for k, v in http.cookies.SimpleCookie(self.headers.get('cookie', '')).items()},
),
Note: That site uses GeeTest CAPTCHA, which cannot be solved by simply rendering the page or using Selenium, so you still need to periodically update the hardcoded cookie (cookie name: reese84) taken from the browser, or use a service like 2Captcha.
I'm getting a 419 page expired status code when using requests on this site. I gathered the information for the headers and data by monitoring the network tab of the developer console. How can I use the Python requests module to successfully login?
import requests
url = 'https://rates.itgtrans.com/login'
headers = {
'authority': 'rates.itgtrans.com',
'cache-control': 'max-age=0',
'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'upgrade-insecure-requests': '1',
'origin': 'https://rates.itgtrans.com',
'content-type': 'application/x-www-form-urlencoded',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'navigate',
'sec-fetch-user': '?1',
'sec-fetch-dest': 'document',
'referer': 'https://rates.itgtrans.com/login',
'accept-language': 'en-US,en;q=0.9',
'cookie': 'XSRF-TOKEN=eyJpdiI6IkEzbi9JQkVwbWloZTM1UVdSdVJtK0E9PSIsInZhbHVlIjoiM1pxQVYxajhPcWdlZ1NlYlVMSUlyQzFISVpPNjNrMVB0UmNYMXZGa0crSmYycURoem1vR0FzRUMrNjB2bXFPbCs4U3ZyeGM4ZVNLZ1NjRGVmditUMldNUUNmYmVzeTY2WS85VC93a1c0M0JUMk1Jek00TTNLVnlPb2VVRXpiN0ciLCJtYWMiOiJkNjQyMTMwMGRmZmQ4YTg0ZTNhZDgzODQ5M2NiMmE2ODdlYjRlOTIyMWE5Yjg4YzEyMTBjNTI2ODQxY2YxMzNkIiwidGFnIjoiIn0%3D; draymaster_session=eyJpdiI6Im9vUDZabmlYSTY0a1lSNGdYZzZHT0E9PSIsInZhbHVlIjoiMGVVcSs2T3RheGhMeDNVVFJUQjRmb212TkoySVY5eWFjeVNHT1lGWE9sRHdtR3JTa0REZFhMTzNJeisyTjNOZ1hrQnNscWY0dXBheFFaRFhIdDAvUlFMOFdvTFdaOXBoejcwb2ZDNFNMdDZ6MUFxT2dHU3hlNVkxZmpiTnd2Z0QiLCJtYWMiOiIwN2RmZTc1ZDUzYzViYTgzYWU1MjFjNjIxZjYzMzY3MDE0YjI4MDhkMWMwMTVkYmYxYWM2MzQ0ODM1YzRkNDY1IiwidGFnIjoiIn0%3D'
}
data = {
'_token': 'o8jJ4tR3PHkuz5TR2kuoHwBAdHd5RczFx2rlul1C',
'email': '****',
'password': '****',
'button': ''
}
with requests.Session() as s:
cookies = s.cookies
p = s.post(url='https://rates.itgtrans.com/login', data=data, headers=headers, cookies=cookies)
print(p)
As for me all problem is that you always use the same _token.
Server for every user should generate new uniq token which is valid only few minutes - all for security reason (so hacker can't get it and use it after longer time)
BTW: went I run your code and get page with status 419 and display p.text then I see HTML with text Page Expired which can confirm that you use expired token.
You should always GET this page and search new token in HTML
<input name="_token" type="hidden" value="Xz0pJ0djGVnfaRMuXNDGMdBmZRbc55Ql2Q2CTPit"/>
and use this value in POST
I don't have account on this page but using fresh token from <input name="_token"> I get status 200 instead of 419.
import requests
from bs4 import BeautifulSoup
url = 'https://rates.itgtrans.com/login'
headers = {
'authority': 'rates.itgtrans.com',
'cache-control': 'max-age=0',
'origin': 'https://rates.itgtrans.com',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'referer': 'https://rates.itgtrans.com/login',
'accept-language': 'en-US,en;q=0.9',
}
data = {
'_token': '-empty-',
'email': '****',
'password': '****',
'button': ''
}
with requests.Session() as s:
# --- first GET page ---
response = s.get(url='https://rates.itgtrans.com/login', headers=headers)
#print(response.text)
# --- search fresh token in HTML ---
soup = BeautifulSoup(response.text)
token = soup.find('input', {'name': "_token"})['value']
print('token:', token)
# --- run POST with new token ---
data['_token'] = token
response = s.post(url='https://rates.itgtrans.com/login', data=data, headers=headers)
#print(response.text)
print('status_code:', response.status_code)
BTW:
I get 200 even if I don't use headers.
Because code uses Session so I don't have to copy cookies from GET to POST because Session copies them automatically.
I have an issue with the particular website https://damas.terna.it/DMSPCAS08.
I am trying to either scrape the data or to fetch the excel file that it is included.
I tried to fetch the excel file with a post request.
import requests
from bs4 import BeautifulSoup
import json
import datetime
url = 'https://damas.terna.it/api/Ntc/GetNtc'
headers = {
'Host': 'damas.terna.it',
'Connection': 'keep-alive',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"',
'sec-ch-ua-mobile': '?0',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://damas.terna.it/DMSPCAS08',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Cookie': '__RequestVerificationToken=5mfiSM2dKte532I8fd3MRdn6nnHbSezkQX29r3fyF2tMegXsvnOInpxy8JvFuDyRVS6pZs03y-NL3CsNsItN1yboc128Kk51yEiuUU0mel41; pers_damas_2019=378972352.20480.0000; rxVisitor=1619766836714T8SRPFLUAH62F1R9K6KG3EKK104BSDFE; dtCookie=7$EC34ED7BFB4A503D379D8D8C69242174|846cd19ce7947380|1; rxvt=1619774462088|1619771198468; dtPC=7$369396404_351h-vIDMPHTCURIGKVKBWVVEWOAMRMKDNWCUH-0e1; DamasNetCookie=F-evmYb7kIS_YTMr2mwuClMB1zazemmhl9vzSXynWeuCII_keDb_jQr4VLSYif9t3juDS6LkOuIXKFfe8pydxSzHPfZzGveNB6xryj2Czp9J1qeWFFT9dYFlRXFWAHuaEIyUQQDJmzWfDBrFCWr309mZoE6hkCKzDtoJgIoor9bed1kQgcdeymAH9lrtrKxwsheaQm2qA-vWWqKjCiruO1VkJ6II_bcrAXU2A_ZPQPznE1_8AEC_AwXmBXETubMQwFAnDXsOLDfEYeQ61TGAupF3d-wz3aZfRs5eCA3kw-en-kpEbS0trwFBQzB-098610GIbIPki9ThVitZ2LN2zza6nn1A8qchqfQC_CZEgu6Jt1glfhHceWS6tvWCuyOEqo2jJpxAajMYXPB6mzlHzX13TiV-jgeFSPehugMAgms_exqocw9w27e4lI5laYZu0rkKkznpZ1mJLOhORwny8-bKa3nRUt7erFv7ul3nLLrgd3FP907tHpTh-qXt1Bmr6OqknDZr_EBN8GY_B2YHV-8hC0AjdqQqpS0xOpp7z_CzzgByTOHSNdeKjVgQfZLQ7amnp71lhxgPeJZvOIl_mIWOr_gWRy_iK6UuzrA3udCTV7bAnUXKB8gX89d9ShQf5tZDxPFchrAQBtdmDChQOA; dtLatC=2; dtSa=true%7CC%7C-1%7CExport%20to%20.xls%7C-%7C1619772685174%7C369396404_351%7Chttps%3A%2F%2Fdamas.terna.it%2FDMSPCAS08%7CTerna%20%5Ep%20NTC%7C1619772662568%7C%7C'
}
parameters = {
'busDay': "2021-05-01",
'busDayTill': "2021-05-01",
'borderDirId': '1',
'borderDirName': "TERNA-APG"
}
response = requests.post(url, data=parameters, headers=headers)
soup = BeautifulSoup(response.text, "html.parser")
print(soup.prettify())
I am receiving this error:
The parameters dictionary contains an invalid entry for parameter 'parameters' for method 'System.Web.Mvc.ActionResult GetNtc(Damas.Core.Data.DataSource.Data.ParametersModel)' in 'Terna.Web.Controllers.CapacityManagement.NtcController'. The dictionary contains a value of type 'System.Collections.Generic.Dictionary`2[System.String,System.Object]', but the parameter requires a value of type 'Damas.Core.Data.DataSource.Data.ParametersModel'.
Parameter name: parameters
Please don't post the answer to your question in the question's body; instead, post it in the answer box:
response = requests.post(url, data=json.dumps(parameters), headers=headers) seems to solve the issue.
In the following code, I am trying to do POST method to microsoft online account, and I am starting with a page that requires to post an email. This is my try till now
import requests
from bs4 import BeautifulSoup
url = 'https://moe-register.emis.gov.eg/account/login?ReturnUrl=%2Fhome%2FRegistrationForm'
headers ={
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,ar;q=0.8',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Content-Type': 'application/x-www-form-urlencoded',
'Cookie':'__RequestVerificationToken=vdS3aPPg5qQ2bH9ADTppeKIVJfclPsMI6dqB6_Ru11-2XJPpLfs7jBlejK3n0PZuYl-CwuM2hmeCsXzjZ4bVfj2HGLs2KOfBUphZHwO9cOQ1; .AspNet.MOEEXAMREGFORM=ekeG7UWLA6OSbT8ZoOBYpC_qYMrBQMi3YOwrPGsZZ_3XCuCsU1BP4uc5QGGE2gMnFgmiDIbkIk_8h9WtTi-P89V7ME6t_mBls6T3uR2jlllCh0Ob-a-a56NaVNIArqBLovUnLGMWioPYazJ9DVHKZY7nR_SvKVKg2kPkn6KffkpzzHOUQAatzQ2FcStZBYNEGcfHF6F9ZkP3VdKKJJM-3hWC8y62kJ-YWD0sKAgAulbKlqcgL1ml6kFoctt2u66eIWNm3ENnMbryh8565aIk3N3UrSd5lBoO-3Qh8jdqPCCq38w3cURRzCd1Z1rhqYb3V2qYs1ULRT1_SyRXFQLrJs5Y9fsMNkuZVeDp_CKfyzM',
'Host': 'moe-register.emis.gov.eg',
'Origin': 'https://moe-register.emis.gov.eg',
'Referer': 'https://moe-register.emis.gov.eg/account/login?ReturnUrl=%2Fhome%2FRegistrationForm',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
with requests.session() as s:
# r = s.post(url)
#soup = BeautifulSoup(r.content, 'lxml')
data = {'EmailAddress': '476731809#matrouh1.moe.edu.eg'}
r_post = s.post(url, data=data, headers=headers, verify=False)
soup = BeautifulSoup(r_post.content, 'lxml')
print(soup)
What I got is the same page that requires the post of the email again. I expected to get the page that requires sign-in password..
This is the starting page
and this is an example of the email that needed to be posted 476731809#matrouh1.moe.edu.eg
** I have tried to use such a code but I got the page sign in again (although the credentials are correct)
Can you please try this code
import requests
from bs4 import BeautifulSoup
url = 'https://login.microsoftonline.com/common/login'
s = requests.Session()
res = s.get('https://login.microsoftonline.com')
cookies = dict(res.cookies)
res = s.post(url,
auth=('476731809#matrouh1.moe.edu.eg', 'Std#050202'),
verify=False,
cookies=cookies)
soup = BeautifulSoup(res.text, 'html.parser')
print(soup)
I checked out the page and following seems to be working:
import requests
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'Origin': 'https://moe-register.emis.gov.eg',
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://moe-register.emis.gov.eg/account/login',
'Accept-Language': 'en-US,en;q=0.9,gl;q=0.8,fil;q=0.7,hi;q=0.6',
}
data = {
'EmailAddress': '476731809#matrouh1.moe.edu.eg'
}
response = requests.post('https://moe-register.emis.gov.eg/account/authenticate', headers=headers, data=data, verify=False)
Your POST endpoint seems to be wrong, since you need to re-direct from /login to /authenticate to proceed with the request (I am on a mac so my user-agent may be different than yours/required, you can change that from the headers variable).
I have been suffering from a problem for a day. I want to crawl the website which has pages.
I found that I can crawl it when each pages has different urls
justlike (page=1 . page=2 .. .etc)..
But the website I'm trying to scrape, It never changes its url even though I go to next page .
Is there any ways to scrape this kind of page? Thank you!
the code is the result curl into python
import requests
cookies = {
'WMONID': 'smDC5Ku5TeX',
'userId': 'robin9634',
'UID': 'robin9634',
'JSESSIONID': 'lLqLdHFEk4iEJdQ2HCR5m05tg6ZIxBdegEamDzxeEoTClkvqVDN4xzXeMPtTIN3e.cG9ydGFsX2RvbWFpbi9wZDU=',
}
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'Origin': 'https://dhlottery.co.kr',
'Content-Type': 'application/x-www-form-urlencoded',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.83 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Sec-Fetch-Site': 'same-origin',
'Sec-Fetch-Mode': 'navigate',
'Sec-Fetch-User': '?1',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://dhlottery.co.kr/gameInfo.do?method=powerWinNoList',
'Accept-Language': 'ko-KR,ko;q=0.9,en-US;q=0.8,en;q=0.7',
}
params = (
('method', 'powerWinNoList'),
)
data = {
'nowPage': '7',
'searchDate': '20200909',
'calendar': '2020-09-09',
'sortType': 'num'
}
response = requests.post('https://dhlottery.co.kr/gameInfo.do', headers=headers, params=params, cookies=cookies, data=data)
#NB. Original query string below. It seems impossible to parse and
#reproduce query strings 100% accurately so the one below is given
#in case the reproduced version is not "correct".
# response = requests.post('https://dhlottery.co.kr/gameInfo.do?method=powerWinNoList', headers=headers, cookies=cookies, data=data)