Add item to cart using POST requests - python

How would I add the item I got an id of to cart using post requests? This is my code:
post_headers = {'User-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36', 'x-requested-with': 'XMLHttpRequest', 'content-type': 'application/x-www-form-urlencoded'}
post_data = {"utf-8": "%E2%9C%93", "commit": "add to cart"}
url = "https://www.supremenewyork.com/shop/{productid}/add".format(productid=id)
add_to_cart = requests.post(url, headers=post_headers, data=post_data)
print(add_to_cart.content)
the specific product that I am trying to add to cart using post requests is : https://www.supremenewyork.com/shop/shirts/zkmt62fz1/lg1ehyx3s
It accurately prints the item id in the console, but when I go to my cart, there is nothing there.

I am guessing you are looking in your cart in your browser. Usually a website will keep track of you as a user with cookies (e.g. a session id) and will with this info display your orders. If your are sending the order as a request in Python, then you will receive the cookies from the server in return in the response. Therefore if you are looking for the order in you browser, then you do not have the cookies from the Python response and the site will not recognise you as the same user.

Related

How to use python to simulate a http post request called by a button click event

I am trying to use python to simulate a request in (http://bmfw.haedu.gov.cn/jycx/pthcj_check/78). In the link, only 姓名(name)and 准考证号 (exam number) are required. However, we can input anything in 姓名(name)field, like aaa, but 准考证号 (exam number) should be a correct number, for example, 04301022291, after input correct image check number, then click Query (查询)button,there will be query result come out.
My question is that, how to use python to simulate this request? I tried use developer tools to find out the backend request, which is http://bmfw.haedu.gov.cn/jycx/pthcj_check, but when I use python to call this url, it still comes to query page. My codes as following:
import requests
import json
url = f'http://bmfw.haedu.gov.cn/jycx/pthcj_check'
header = {
"cookie":"HAEDU_SESSION_ID=02c06a8f-fb1e-4c42-80d7-5c17e4d9cdb2; Hm_lvt_6b92f031645868e1e23be9be5938d979=1668318839,1668343665; Hm_lpvt_6b92f031645868e1e23be9be5938d979=1668343665",
"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/107.0.0.0 Safari/537.36",
"Content-Type":"application/x-www-form-urlencoded"
}
data = {
'xxid':'78',
'trueName':'hahaha',
'sfzh':'',
'zkzh':'04301022266',
'imageId':'19GJI'
}
response = requests.post(url, data=data, headers=header)
print(response.text)
I am confused, but maybe this is a fundermental front end question.
BTW, there is no login requird for this request.

Python / Requests, download file from .aspx website after logging in

Python/requests.
I need to:
log in a website
change a parameter
download a file according to change in 2)
Attached the images with form/payload after download completion (Please feel free to ask me more, if you don't find me enough descriptive).
My idea was:
url = 'https://www.sunnyportal.com/Templates/Start.aspx?ReturnUrl=%2f'
protectedurl = 'https://www.sunnyportal.com/FixedPages/Dashboard.aspx'
downloadurl = 'https://www.sunnyportal.com/Redirect/DownloadDiagram'
# your details here to be posted to the login form.
payload = {
'ctl00$ContentPlaceHolder1$Logincontrol1$txtUserName': user,
'ctl00$ContentPlaceHolder1$Logincontrol1$txtPassword': pw
}
# ensure the session context is closed after use.
with requests.Session() as s:
p = s.post(url, data=payload)
print(p.status_code, p.headers)
# authorised request
r = s.get(protectedurl)
print(r.status_code, r.headers)
# download request
d = s.get(downloadurl)
print(d.status_code, d.headers)
I get for all 200 status code, but download doesn't start.
Here you can find the POST payload after logging in:
Thanks, Please please please help me!
I would like to have more clear:
should I add headers to post/get requests? Which headers?
Should I add more to the payload? What exactly?
Should I use straight just 1-2 url(s)? Which one/which ones?
Thanks!
There is a lot to do here but it should be possible. This is an ASP site so you need to get the __VIEWSTATE and __VIEWSTATEGENERATOR for every page you navigate from and include it in the payload. I would include everything in the payload, even the blanks stuff as well as replicate the headers. See the code below for how to login.
Then once you login you can replicate the network call to change the date, again you need to process the __VIEWSTATE and __VIEWSTATEGENERATOR from the page you are moving from and include it in the payload. (use a function like below and just call it with each move).
When you expand the image you will see another network call which you need to replicate, the response will have HTML you can parse and you can find the image in this tag:
<img id="UserControlShowEnergyAndPower1$_diagram" src="/chartfx70/temp/CFV0113_101418049AC.png"
If it's not that exact chart you want then right click the chart and copy-image-address then look for that image url in the HTML to see where it is.
then you can do something like this to save the file:
img_suffix = soup.find('img',{'id':'UserControlShowEnergyAndPower1$_diagram'})['src']
image_name = img_suffix.split('/')[-1]
image_url = 'https://www.sunnyportal.com/'+img_suffix
image_data = s.get(pdf_url) # where s is the requests.Session() variable
print(f'Saving image')
with open(image_name,'wb') as file:
file.write(image_data.content)
Below is how I logged in but you can take it from here to navigate to your image:
import requests
from bs4 import BeautifulSoup
def get_views(resp):
soup = BeautifulSoup(resp,'html.parser')
viewstate = soup.find('input',{'name':'__VIEWSTATE'})['value']
viewstate_gen = soup.find('input',{'name':'__VIEWSTATEGENERATOR'})['value']
return (viewstate,viewstate_gen)
s = requests.Session()
user = 'your_email'
pw = 'your_password'
headers = {
'accept':'*/*',
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
}
url = 'https://www.sunnyportal.com/Templates/Start.aspx?ReturnUrl=%2f'
protectedurl = 'https://www.sunnyportal.com/FixedPages/Dashboard.aspx'
downloadurl = 'https://www.sunnyportal.com/Redirect/DownloadDiagram'
landing_page = s.get(url,headers=headers)
print(landing_page)
viewstate,viewstate_gen = get_views(landing_page.text)
# your details here to be posted to the login form.
payload = {
'__EVENTTARGET':'',
'__EVENTARGUMENT':'',
'__VIEWSTATE':viewstate,
'__VIEWSTATEGENERATOR':viewstate_gen,
'ctl00$ContentPlaceHolder1$Logincontrol1$txtUserName':user,
'ctl00$ContentPlaceHolder1$Logincontrol1$txtPassword':pw,
'ctl00$ContentPlaceHolder1$Logincontrol1$LoginBtn':'Login',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectURL':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectPlant':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectPage':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectDevice':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectOther':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$PlantIdentifier':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$ServiceAccess':'true',
'ClientScreenWidth':'1920',
'ClientScreenHeight':'1080',
'ClientScreenAvailWidth':'1920',
'ClientScreenAvailHeight':'1050',
'ClientWindowInnerWidth':'1920',
'ClientWindowInnerHeight':'979',
'ClientBrowserVersion':'56',
'ClientAppVersion':'5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',
'ClientAppName':'Netscape',
'ClientLanguage':'en-ZA',
'ClientPlatform':'Win32',
'ClientUserAgent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',
'ctl00$ContentPlaceHolder1$hiddenLanguage':'en-gb'
}
new_headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Content-Length':'3917',
'Content-Type':'application/x-www-form-urlencoded',
'DNT':'1',
'Host':'www.sunnyportal.com',
'Origin':'https://www.sunnyportal.com',
'Pragma':'no-cache',
'Referer':'https://www.sunnyportal.com/Templates/Start.aspx?ReturnUrl=%2f',
'sec-ch-ua':'" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
'sec-ch-ua-mobile':'?0',
'sec-ch-ua-platform':'"Windows"',
'Sec-Fetch-Dest':'document',
'Sec-Fetch-Mode':'navigate',
'Sec-Fetch-Site':'same-origin',
'Sec-Fetch-User':'?1',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
}
login = s.post(url,headers=new_headers,data=payload)
print(login)
print(login.text)

BeautifulSoup isn't working while web scraping Amazon

I'm new to web scraping and i am trying to use basic skills on Amazon. I want to make a code for finding top 10 'Today's Greatest Deals' with prices and rating and other information.
Every time I try to find a specific tag using find() and specifying class it keeps saying 'None'. However the actual HTML has that tag.
On manual scanning i found out half the code of isn't being displayed in the output terminal. The code displayed is half but then the body and html tag do close. Just a huge chunk of code in body tag is missing.
The last line of code displayed is:
<!--[endif]---->
then body tag closes.
Here is the code that i'm trying:
from bs4 import BeautifulSoup as bs
import requests
source = requests.get('https://www.amazon.in/gp/goldbox?ref_=nav_topnav_deals')
soup = bs(source.text, 'html.parser')
print(soup.prettify())
#On printing this it misses some portion of html
article = soup.find('div', class_ = 'a-row dealContainer dealTile')
print(article)
#On printing this it shows 'None'
Ideally, this should give me the code within the div tag, so that i can continue further to get the name of the product. However the output just shows 'None'. And on printing the whole code without tags it is missing a huge chunk of html inside.
And of course the information needed is in the missing html code.
Is Amazon blocking my request? Please help.
The User-Agent request header contains a characteristic string that allows the network protocol peers to identify the application type, operating system, software vendor or software version of the requesting software user agent. Validating User-Agent header on server side is a common operation so be sure to use valid browser’s User-Agent string to avoid getting blocked.
(Source: http://go-colly.org/articles/scraping_related_http_headers/)
The only thing you need to do is to set a legitimate user-agent. Therefore add headers to emulate a browser. :
# This is a standard user-agent of Chrome browser running on Windows 10
headers = { 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36' }
Example:
from bs4 import BeautifulSoup
import requests
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36'}
resp = requests.get('https://www.amazon.com', headers=headers).text
soup = BeautifulSoup(resp, 'html.parser')
...
<your code here>
Additionally, you can add another set of headers to pretend like a legitimate browser. Add some more headers like this:
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.98 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language' : 'en-US,en;q=0.5',
'Accept-Encoding' : 'gzip',
'DNT' : '1', # Do Not Track Request Header
'Connection' : 'close'
}

Python Requests - debugging POST requests

I am trying to scrape a website, in which i have to get to the right page using a POST request.
Here below are the different screen showing how i got to find which are the headers and payload that i needed to use in my request:
1) Here the page: it is a list of economic indicators:
2) It is possible to select which country's indicator are displayed using the "filter that is on the right hand side of the screen:
3) Clicking the "apply" button will send a POST requests to the site that will refresh the page to show only the information of the ticked boxes. Here a screencapture showing the elements of the form sent in the POST request:
But if i try to do this POST request using python requests using the following code (see below) it seems that the form is not processed, and the page returned is simply the default one.
payload= {
'country[]': 5,
'limit_from': '0',
'submitFilters': '1',
'timeFilter': 'timeRemain',
'currentTab': 'today',
'timeZone': '55'}
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36',
'X-Requested-With': 'XMLHttpRequest',
'Accept':'*/*',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-US,en;q=0.8',
'Connection':'keep-alive',
'Host':'www.investing.com',
'Origin':'https://www.investing.com',
'Referer':'https://www.investing.com/economic-calendar/',
'Content-Length':'94',
'Content-Type':'application/x-www-form-urlencoded',
'Cookie':'adBlockerNewUserDomains=1505902229; __qca=P0-734073995-1505902265195; __gads=ID=d69b337b0f60d8f0:T=1505902254:S=ALNI_MYlYKXUUbs8WtYTEO2fN9O_q9oykA; cookieConsent=was-set; travelDistance=4; editionPostpone=1507424197769; PHPSESSID=v9q2deffu2n0b9q07t3jkgk4a4; StickySession=id.71595783179.419www.investing.com; geoC=GB; gtmFired=OK; optimizelySegments=%7B%224225444387%22%3A%22gc%22%2C%224226973206%22%3A%22direct%22%2C%224232593061%22%3A%22false%22%2C%225010352657%22%3A%22none%22%7D; optimizelyEndUserId=oeu1505902244597r0.8410692836488942; optimizelyBuckets=%7B%228744291438%22%3A%228731763165%22%2C%228785438042%22%3A%228807365450%22%7D; nyxDorf=OT5hY2M1P2E%2FY24xZTE3YTNoMG9hYmZjPDdlYWFnNz0wNjNvYW5kYWU6PmFvbDM6Y2Y0MDAwYTk1MzdpYGRhPDk2YTNjYT82P2E%3D; billboardCounter_1=1; _ga=GA1.2.1460679521.1505902261; _gid=GA1.2.655434067.1508542678'
}
import lxml.html
import requests
g=requests.post("https://www.investing.com/economic-calendar/",data=payload,headers=headers)
html = lxml.html.fromstring(g.text)
tr=html.xpath("//table[#id='economicCalendarData']//tr")
for t in tr[4:]:
print(t.find(".//td[#class='left flagCur noWrap']/span").attrib["title"])
This is visible as if, for instance, i select only country "5" (the USA), post the request, and look for the countries present in the result page, I will see other countries as well.
Anyone knows what i am doing wrong with that POST request?
As it shows in your own screenshot, it appears that the site posts to the URL
https://www.investing.com/economic-calendar/Service/getCalendarFilteredData
whereas you're only posting directly to
https://www.investing.com/economic-calendar/

Login to website via Python Requests

for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.
I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.
This is the login page: http://seekingalpha.com/account/login
My first approach looks like this:
import requests
with requests.Session() as c:
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})
page = c.get("http://seekingalpha.com/account/email_preferences")
print(page.content)
This results in "403 Forbidden"
My second approach looks like this:
from requests import Request, Session
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
# c.get(requestUrl)
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
headers = {
"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
"origin":"http://seekingalpha.com",
"referer":"http://seekingalpha.com/account/login",
"Cache-Control":"max-age=0",
"Upgrade-Insecure-Requests":1,
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
}
s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)
prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"
resp = s.send(prepped)
print(resp.status_code)
In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.
Does someone have an idea, what went wrong? Probably a lot.
Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.
When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:
Your code will be like this:
import requests
response = requests.get("www.example.com", cookies={
"c_user":"my_cookie_part",
"xs":"my_other_cookie_part"
})
print response.content

Categories

Resources