python can't send post request image - python

I'm trying to decode a qr image from a website with python: https://zxing.org/w/decode.jspx
And i don't know why my post requests fail and i don't get any response
import requests
url ="https://zxing.org/w/decode.jspx"
session = requests.Session()
f = {'f':open("new.png","rb")}
response = session.post(url,files = f)
f = open("page.html","w")
f.write(response.text)
f.close()
session.close()
Even when i do it with a get requests it still fail ... :/
url ="https://zxing.org/w/decode.jspx"
session = requests.Session()
data = {'u':'https://www.qrstuff.com/images/default_qrcode.png'}
response = session.post(url,data = data)
f = open("page.html","w")
f.write(response.text)
f.close()
session.close()
maby because the website contain two forms ? ...
Thanks for helping

You can do this:
import urllib
url ="https://zxing.org/w/decode?u=https://www.qrstuff.com/images/default_qrcode.png"
response = urllib.urlopen(url)
f = open("page.html","w")
f.write(response.read())
f.close()
If you want to send url action == get and if you want to post data as a file, action == post.
You can check it with Hackbar addons on Firefox

Well i just saw my mistake ...
the web site is : https://zxing.org/w/decode.jspx
but once you have a post or a get it'll be
https://zxing.org/w/decode without ".jspx" so i just removed it and every thing worked well !!

Related

Seleniumwire get Response text

I'm using Selenium-wire to try and read the request response text of some network traffic. The code I have isn't fully reproducable as the account is behind a paywall.
The bit of selenium-wire I'm currently using using is:
for request in driver.requests:
if request.method == 'POST' and request.headers['Content-Type'] == 'application/json':
# The body is in bytes so convert to a string
body = driver.last_request.body.decode('utf-8')
# Load the JSON
data = json.loads(body)
Unfortunately though, that is reading the payload of the request
and I'm trying to parse the Response:
You need to get last_request's response:
body = driver.last_request.response.body.decode('utf-8')
data = json.loads(body)
I usually use these 3 steps
# I define the scopes to avoid other post requests that are not related
# we can also use it to only select the required endpoints
driver.scopes = [
# .* is a regex stands for any char 0 or more times
'.*stackoverflow.*',
'.*github.*'
]
# visit the page
driver.get('LINK')
# get the response
response = driver.last_request # or driver.requests[-1]
# get the json
js = json.loads(
decode(
response.response.body,
# get the encoding from the request
response.headers.get('Content-Encoding', 'identity'),
)
)
# this clears all the requests it's a good idea to do after each visit to the page
del driver.requests
for more info here is the doc

Need to download the PDF, NOT the content of the webpage

So as it stands I am able to get the content of the webpage of the PDF link EXAMPLE OF THE LINK HERE BUT, I don't want the content of the webpage I want the content of the PDF so I can put the content into a PDF on my computer in a folder.
I have been successful in doing this on sites that I don't need to log into and without a proxy server.
Relevant CODE:
import os
import urllib2
import time
import requests
import urllib3
from random import *
s = requests.Session()
data = {"Username":"username", "Password":"password"}
url = "https://login.url.com"
print "doing things"
r2 = s.post(url, data=data, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#I get a response 200 from printing r2
print r2
downlaod_url = "http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM"
file = open("F:\my_filepath\document" + str(maxCounter) + ".pdf", 'wb')
temp = s.get(download_url, proxies = {'https' : 'https://PROXYip:PORT'}, verify=False)
#This prints out the response from the proxy server (i.e. 200)
print temp
something = uniform(5,6)
print something
time.sleep(something)
#This gets me the content of the web page, not the content of the PDF
print temp.content
file.write(temp.content)
file.close()
I need help figuring out how to "download" the content of the PDF
try this:
import requests
url = 'http://msds.walmartstores.com/client/document?productid=1000527&productguid=54e8aa24-0db4-4973-a81f-87368312069a&DocumentKey=undefined&HazdocumentKey=undefined&MSDS=0&subformat=NAM'
pdf = requests.get(url)
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
Edit
Try again with a requests session to manage cookies (assuming they send you those after login) and also maybe a different proxy
proxy_dict = {'https': 'ip:port'}
with requests.Session() as session:
# Authentication request, use GET/POST whatever is needed
# data variable should hold user/password information
auth = session.get(login_url, data=data, proxies=proxy_dict, verify=False)
if auth.status_code == 200:
print(auth.cookies) # Tell me if you got anything
pdf = auth.get('download_url') # Were continuing the same session
with open('walmart.pdf', 'wb') as file:
file.write(pdf.content)
else:
print('No go, got {0} response'.format(auth.status_code))

python - urllib.error.HTTPError: HTTP Error 401: Unauthorized

I'm new to pyhton and just tried to write data from external file. I have no idea where i'm going wrong. can anyone please help me with this.
Thanks in advance.
from urllib import request
url = r'https://query1.finance.yahoo.com/v7/finance/download/AMD?period1=1497317134&period2=1499909134&interval=1d&events=history&crumb=HwDtuBHqtg0'
def download_csv(csv_url):
csv = request.urlopen(csv_url)
csv_data = csv.read
csv_str = str(csv_data)
file = csv_str.split('\\n')
dest_url = r'appl.csv'
wr = open(dest_url, 'w')
for data in file:
wr.write(data + '\n')
wr.close()
download_csv(url)
So I ran the URL in the browser, and it states clearly that your API requires a cookie.
So you must provide a proper header, usually with urllib you can manage sessions but honestly I would opt for a more user-friendly library, such as the requests python library (HTTP for Humans)
Example:
s = requests.Session()
s.get('http://httpbin.org/cookies/set/sessioncookie/123456789')
r = s.get('http://httpbin.org/cookies')
print(r.text)
# '{"cookies": {"sessioncookie": "123456789"}}'
More: http://docs.python-requests.org/en/master/user/advanced/#session-objects

Mintos.com login with python requests

I'm trying to write a tiny piece of software that logs into mintos.com, and saves the account overview page (which is displayed after a successful login) in a html file. I tried some different approaches, and this is my current version.
import requests
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
username = 'abc'
password = '123'
loginUrl = 'https://www.mintos.com/en/login'
resp = requests.get(loginUrl, auth=(username, password))
file = codecs.open("mint.html", "w", "UTF-8")
file.write(resp.text)
file.close()
When I run the code, I only save the original page, not the one I should get when logged in. I guess I'm messing up the login (I mean...there's not much else to mess up). I spent an embarrassing amount of time on this problem already.
Edit:
I also tried something along the lines of:
import requests
import sys
import codecs
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
loginUrl = "https://www.mintos.com/en/login";
username = "abc"
password = "123"
payload = {"username": username, "password": password}
with requests.session() as s:
resp = s.post(loginUrl, data = payload)
file = codecs.open("mint.html", "w", "UTF-8")
file.write(resp.text)
file.close()
Edit 2: Another non working version, this time with _csrf_token
with requests.session() as s:
resp = s.get(loginUrl)
toFind = '_csrf_token" value="'
splited = resp.text.split(toFind)[1]
_csrf_token = splited.split('"',1)[0]
payload = {"_username": _username, "_password": _password, "_csrf_token": _csrf_token}
final = s.post(loginUrl, data = payload)
file = codecs.open("mint.html", "w", "UTF-8")
file.write(final.text)
file.close()
But I still get the same result. The downloaded page has the same token as the one I extract, though.
Final Edit: I made it work, and I feel stupid now. I needed to use "'https://www.mintos.com/en/login/check' as my loginUrl.
The auth parameter is just a shorthand for HTTPBasicAuth, which is not what most websites use. Most of them use cookies or session data in order to store your login / info on your computer so they can check who you are while you're browsing the pages.
If you want to be able to log in on the website, you'll have to make a POST request on the login form and then store (and give back every time) the cookies they'll send to you. Also, this implies they don't have any kind of "anti-bot filter" (which makes you unable to login without having a real browser or, at least, not that easily).

POST form data containing spaces with Python requests

I'm probably overlooking something spectacularly obvious, but I can't find why the following is happening.
I'm trying to POST a search query to http://www.arcade-museum.com using the requests lib and whenever the query contains spaces, the resulting page contains no results. Compare the result of these snippets:
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': '1942'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
and
import requests
url = "http://www.arcade-museum.com/results.php"
payload = {'type': 'Videogame', 'q': 'Wonder Boy'}
r = requests.post(url, payload)
with open("search_results.html", mode="wb") as f:
f.write(r.content)
If you try the same query on the website, the latter will result in a list of about 10 games. The same happens when posting the form data using the Postman REST client Chrome extension:
Again, it's probably something very obvious I'm overlooking, but I can't find what's causing this issue.

Categories

Resources