I`m try to post a data to specific url and get response and save it in the file with curl and no problem with it.
This webpage post a data and if data is correct show response webpage, and if not redirect to url like that:
http://example.com/url/foo/bar/error
My code is:
curl --fail --user-agent "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36" --data "mydataexample" --referer "http://example.com/url/foo/bar" http://example.com/url/foo/bar --output test.html
But when code in python with requests always status_code is OK [200] even with wrong data! and no exist correct response to save!
Here is my python code:
import requests
data = 'myexampledata'
headers = { 'referer':'http://example.com/url/foo/bar' ,'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1700.107 Safari/537.36' }
url = 'http://example.com/url/foo/bar'
r = requests.post(url,params=data,headers=headers)
# check headers and status_code for test if data is wrong...
print r.headers
print r.status_code
And now, How write a python code to solve this problem and exactly work same as curl command? Any advise with requests or pycurl to fix it?
Related
Python/requests.
I need to:
log in a website
change a parameter
download a file according to change in 2)
Attached the images with form/payload after download completion (Please feel free to ask me more, if you don't find me enough descriptive).
My idea was:
url = 'https://www.sunnyportal.com/Templates/Start.aspx?ReturnUrl=%2f'
protectedurl = 'https://www.sunnyportal.com/FixedPages/Dashboard.aspx'
downloadurl = 'https://www.sunnyportal.com/Redirect/DownloadDiagram'
# your details here to be posted to the login form.
payload = {
'ctl00$ContentPlaceHolder1$Logincontrol1$txtUserName': user,
'ctl00$ContentPlaceHolder1$Logincontrol1$txtPassword': pw
}
# ensure the session context is closed after use.
with requests.Session() as s:
p = s.post(url, data=payload)
print(p.status_code, p.headers)
# authorised request
r = s.get(protectedurl)
print(r.status_code, r.headers)
# download request
d = s.get(downloadurl)
print(d.status_code, d.headers)
I get for all 200 status code, but download doesn't start.
Here you can find the POST payload after logging in:
Thanks, Please please please help me!
I would like to have more clear:
should I add headers to post/get requests? Which headers?
Should I add more to the payload? What exactly?
Should I use straight just 1-2 url(s)? Which one/which ones?
Thanks!
There is a lot to do here but it should be possible. This is an ASP site so you need to get the __VIEWSTATE and __VIEWSTATEGENERATOR for every page you navigate from and include it in the payload. I would include everything in the payload, even the blanks stuff as well as replicate the headers. See the code below for how to login.
Then once you login you can replicate the network call to change the date, again you need to process the __VIEWSTATE and __VIEWSTATEGENERATOR from the page you are moving from and include it in the payload. (use a function like below and just call it with each move).
When you expand the image you will see another network call which you need to replicate, the response will have HTML you can parse and you can find the image in this tag:
<img id="UserControlShowEnergyAndPower1$_diagram" src="/chartfx70/temp/CFV0113_101418049AC.png"
If it's not that exact chart you want then right click the chart and copy-image-address then look for that image url in the HTML to see where it is.
then you can do something like this to save the file:
img_suffix = soup.find('img',{'id':'UserControlShowEnergyAndPower1$_diagram'})['src']
image_name = img_suffix.split('/')[-1]
image_url = 'https://www.sunnyportal.com/'+img_suffix
image_data = s.get(pdf_url) # where s is the requests.Session() variable
print(f'Saving image')
with open(image_name,'wb') as file:
file.write(image_data.content)
Below is how I logged in but you can take it from here to navigate to your image:
import requests
from bs4 import BeautifulSoup
def get_views(resp):
soup = BeautifulSoup(resp,'html.parser')
viewstate = soup.find('input',{'name':'__VIEWSTATE'})['value']
viewstate_gen = soup.find('input',{'name':'__VIEWSTATEGENERATOR'})['value']
return (viewstate,viewstate_gen)
s = requests.Session()
user = 'your_email'
pw = 'your_password'
headers = {
'accept':'*/*',
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
}
url = 'https://www.sunnyportal.com/Templates/Start.aspx?ReturnUrl=%2f'
protectedurl = 'https://www.sunnyportal.com/FixedPages/Dashboard.aspx'
downloadurl = 'https://www.sunnyportal.com/Redirect/DownloadDiagram'
landing_page = s.get(url,headers=headers)
print(landing_page)
viewstate,viewstate_gen = get_views(landing_page.text)
# your details here to be posted to the login form.
payload = {
'__EVENTTARGET':'',
'__EVENTARGUMENT':'',
'__VIEWSTATE':viewstate,
'__VIEWSTATEGENERATOR':viewstate_gen,
'ctl00$ContentPlaceHolder1$Logincontrol1$txtUserName':user,
'ctl00$ContentPlaceHolder1$Logincontrol1$txtPassword':pw,
'ctl00$ContentPlaceHolder1$Logincontrol1$LoginBtn':'Login',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectURL':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectPlant':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectPage':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectDevice':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$RedirectOther':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$PlantIdentifier':'',
'ctl00$ContentPlaceHolder1$Logincontrol1$ServiceAccess':'true',
'ClientScreenWidth':'1920',
'ClientScreenHeight':'1080',
'ClientScreenAvailWidth':'1920',
'ClientScreenAvailHeight':'1050',
'ClientWindowInnerWidth':'1920',
'ClientWindowInnerHeight':'979',
'ClientBrowserVersion':'56',
'ClientAppVersion':'5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',
'ClientAppName':'Netscape',
'ClientLanguage':'en-ZA',
'ClientPlatform':'Win32',
'ClientUserAgent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36',
'ctl00$ContentPlaceHolder1$hiddenLanguage':'en-gb'
}
new_headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'en-ZA,en;q=0.9,en-GB;q=0.8,en-US;q=0.7,de;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Content-Length':'3917',
'Content-Type':'application/x-www-form-urlencoded',
'DNT':'1',
'Host':'www.sunnyportal.com',
'Origin':'https://www.sunnyportal.com',
'Pragma':'no-cache',
'Referer':'https://www.sunnyportal.com/Templates/Start.aspx?ReturnUrl=%2f',
'sec-ch-ua':'" Not;A Brand";v="99", "Google Chrome";v="97", "Chromium";v="97"',
'sec-ch-ua-mobile':'?0',
'sec-ch-ua-platform':'"Windows"',
'Sec-Fetch-Dest':'document',
'Sec-Fetch-Mode':'navigate',
'Sec-Fetch-Site':'same-origin',
'Sec-Fetch-User':'?1',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
}
login = s.post(url,headers=new_headers,data=payload)
print(login)
print(login.text)
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.135 Safari/537.36'}
url = 'https://www.nseindia.com/api/chart-databyindex?index=ACCEQN'
r = requests.get(url, headers=headers)
data = r.json()
print(data)
prices = data['grapthData']
print(prices)
It was working fine but now it showing error "Response [401]"
Well, it's all about the site's authentication requirements. It requires a certain level of authorization to access like this.
Im using the requests module in python to try and make a search on the following webiste http://musicpleer.audio/, however this website appears to be blocking me as it issues nothing but a 403 when i attempt to access it, im wondering how i can get around this, ive tried sending it the user agent of my web browser(chrome) and it still returns error 403. any suggestions on how i could get around this an example of downloading a song from the site would be very helpful. Thanks in advance
My code:
import requests, os
def funGetList:
start_path = 'C:/Users/Jordan/Music/' # current directory
list = []
for path,dirs,files in os.walk(start_path):
for filename in files:
temp = (os.path.join(path,filename))
tempLen = len(temp)
"print(tempLen)"
iterate = 0
list.append(temp[22:(len(temp))-4])
def funDownloadMP3:
for i in list:
print(i)
payload = {'searchQuery': 'meme', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
print(requests.post(url, data=payload))
Putting the User-Agent in the headers seems to work:
In []:
import requests
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
url = 'http://musicpleer.audio/'
r = requests.get('{}#!{}'.format(url, 'meme'), headers=headers)
r.status_code
Out[]:
200
Note: It looks like the search url is simple '#!<search-term>'
HTML 403 Forbidden error code.
The server might be expecting some more request headers like Host or Cookies etc.
You might want to use Postman to debug it with ease
for a university project I am currently trying to login to a website, and scrap a little detail (a list of news articles) from my user profile.
I am new to Python, but I did this before to some other website. My first two approaches deliver different HTTP errors. I have considered problems with the header my request is sending, however my understanding of this sites login process appears to be insufficient.
This is the login page: http://seekingalpha.com/account/login
My first approach looks like this:
import requests
with requests.Session() as c:
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
c.post(requestUrl, data=login_data, headers = {"referer": "http://seekingalpha.com/account/login", 'user-agent': userAgent})
page = c.get("http://seekingalpha.com/account/email_preferences")
print(page.content)
This results in "403 Forbidden"
My second approach looks like this:
from requests import Request, Session
requestUrl ='http://seekingalpha.com/account/orthodox_login'
USERNAME = 'XXX'
PASSWORD = 'XXX'
userAgent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36'
# c.get(requestUrl)
login_data = {
"slugs[]":None,
"rt":None,
"user[url_source]":None,
"user[location_source]":"orthodox_login",
"user[email]":USERNAME,
"user[password]":PASSWORD
}
headers = {
"accept":"text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"Accept-Language":"de-DE,de;q=0.8,en-US;q=0.6,en;q=0.4",
"origin":"http://seekingalpha.com",
"referer":"http://seekingalpha.com/account/login",
"Cache-Control":"max-age=0",
"Upgrade-Insecure-Requests":1,
"user-agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36"
}
s = Session()
req = Request('POST', requestUrl, data=login_data, headers=headers)
prepped = s.prepare_request(req)
prepped.body ="slugs%5B%5D=&rt=&user%5Burl_source%5D=&user%5Blocation_source%5D=orthodox_login&user%5Bemail%5D=XXX%40XXX.com&user%5Bpassword%5D=XXX"
resp = s.send(prepped)
print(resp.status_code)
In this approach I was trying to prepare the header exactly as my browser would do it. Sorry for redundancy. This results in HTTP error 400.
Does someone have an idea, what went wrong? Probably a lot.
Instead of spending a lot of energy on manually logging in and playing with Session, I suggest you just scrape the pages right away using your cookie.
When you log in, usually there is a cookie added to your request to identify your identity. Please see this for example:
Your code will be like this:
import requests
response = requests.get("www.example.com", cookies={
"c_user":"my_cookie_part",
"xs":"my_other_cookie_part"
})
print response.content
I use Python requests to get images, but in some case sit doesn't work. It seems to happen more often. An example is
http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg
It loads fine in my browser, but using requests, it returns html that says "403 forbidden" and "nginx/1.7.11"
import requests
image_url = "<the_url>"
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}
r = requests.get(image_url, headers=headers)
# r.content is html '403 forbidden', not an image
I have also tried with this header, which has been necessary in some cases. Same result.
headers = {'User-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36', 'Accept':'image/webp,*/*;q=0.8','Accept-Encoding':'gzip,deflate,sdch'}
(I had a similar question a few weeks ago, but this was answered by the particular image file types not being supported by PIL. This is different.)
EDIT: Based on comments:
It seems the link only works if you have already visited the original site http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/ with the image. I suppose the browser then uses the cached version. Any workaround?
The site is validating the Referer header. This prevents other sites from including the image in their web pages and using the image host's bandwidth. Set it to the site you mentioned in your post, and it will work.
More info:
https://en.wikipedia.org/wiki/HTTP_referer
import requests
image_url = "http://recipes.thetasteofaussie.netdna-cdn.com/wp-content/uploads/2015/07/Leek-and-Sweet-Potato-Gratin.jpg"
headers = {
'User-agent' : 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.76 Safari/537.36',
'Accept' : 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Encoding' : 'gzip,deflate,sdch',
'Referer' : 'http://aussietaste.recipes/vegetables/leek-vegetables/leek-and-sweet-potato-gratin/'
}
r = requests.get(image_url, headers=headers)
print r
For me, this prints
<Response [200]>