I'm trying to get json response from this webpage using the following approach but this is what I get {"message": "Must provide valid one of: query_id, query_hash", "status": "fail"}. I tried to print the response url, as in r.url in the second script to see if it matches the one I tried to send but I found it different in structure.
If I use the url directly (taken from dev tools) within requests, I get required content:
import json
import requests
check_url = 'https://www.instagram.com/graphql/query/?query_hash=7dabc71d3e758b1ec19ffb85639e427b&variables=%7B%22tag_name%22%3A%22instagood%22%2C%22first%22%3A2%2C%22after%22%3A%22QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA%3D%3D%22%7D'
r = requests.get(check_url)
print(r.json())
But, I can't make it work:
import json
import requests
url = 'https://www.instagram.com/explore/tags/instagood/'
query_url = 'https://www.instagram.com/graphql/query/?'
payload = {
"query_hash": "7dabc71d3e758b1ec19ffb85639e427b",
"variables": {"tag_name":"instagood","first":"2","after":"QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA=="}
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
r = s.get(query_url,params=json.dumps(payload))
print(r.content)
How can I make the above script work?
Your problem is connected to how you encode the params.
From the check_url in your first example we can see:
?query_hash=7dabc71d3e758b1ec19ffb85639e427b&variables=%7B%22tag_name%22%3A%22...
This URL has 2 params:
query_hash - string
variables - looks like a URL encoded string, judging by the escape values (%7B%22).
As you have correctly identified, %7B%22 corresponds to {". In other words, the second parameter is a url-escaped JSON string.
From this we can get a clue about the new solution:
query_url = 'https://www.instagram.com/graphql/query/?'
variables = {"tag_name": "instagood", "first": "2",
"after": "QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA=="}
payload = {
"query_hash": "7dabc71d3e758b1ec19ffb85639e427b",
"variables": json.dumps(variables)
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) ' + \
'Chrome/81.0.4044.138 Safari/537.36'
r = s.get(query_url, params=payload)
print(r.content)
As you can see, the params passed to the requests.get method is a dict with two keys. This will get translated into ?query_hash=value1&variables=value2.
To get the correct value for variables, we just dump the JSON to string. The requests library will take care of URL-escaping all the characters like { and " in the string.
While running your code, the URL that forms after api call contains unnecessary escape characters. This is what screwing up the API call.
It is not suggested to send data payload while using get. A quick solution to this could be using post request instead. It worked fine!
import json
import requests
url = 'https://www.instagram.com/explore/tags/instagood/'
query_url = 'https://www.instagram.com/graphql/query/?'
payload = {
"query_hash": "7dabc71d3e758b1ec19ffb85639e427b",
"variables": {"tag_name":"instagood","first":"2","after":"QVFDa3djMUFwM1BkRWJNTlEzRmxBYkRGdFBDVzViU2JoNVZPbWNQSmNCTE1HNDlhYWdsdi1EcE5ickhvYjhRWUhqUDhIcXE3YTE4M1JMbmdVN0lMSXM3ZA=="}
}
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1; ) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'
r = s.post(query_url,params=json.dumps(payload))
print(r.content)
Related
I'm using python and trying to read the metadata from a token on solscan.
I am looking for the name, image, etc from metadata.
I am currently using JSON request which seems to work (ie not fail), but it only returns me:
{"holder":0}
Process finished with exit code 0
I am doing several other requests to website, so I think my request is correct.
I tried looking at the documentation on https://public-api.solscan.io/docs and I believe I am requesting the correct info, but I dont get it.
Here is my current code:
import requests
headers = {
'accept': 'application/jsonParsed',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'
}
params = (
('tokenAddress', 'EArf8AxBi44QxFVnSab9gZpXTxVGiAX2YCLokccr1UsW'),
)
response = requests.get('https://public-api.solscan.io/token/meta', headers=headers, params=params)
#response = requests.get('https://arweave.net/viPcoBnO9OjXvnzGMXGvqJ2BEgl25BMtqGaj-I1tkCM', headers=headers)
print(response.content.decode())
Any help appreciated!
This code sample works:
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'
}
params = {
'address': 'EArf8AxBi44QxFVnSab9gZpXTxVGiAX2YCLokccr1UsW',
}
response = requests.get('https://api.solscan.io/account', headers=headers, params=params)
print(response.content.decode())
I use another URL and parameters in my sample: https://api.solscan.io/account used instead of https://public-api.solscan.io/token/meta and address param instead of tokenAddress.
I'm trying to login to Instagram using requests library. I succeeded using following script, however it doesn't work anymore. The password field becomes encrypted (checked the dev tools while logging in manually).
I've tried :
import re
import requests
from bs4 import BeautifulSoup
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
payload = {
'username': 'someusername',
'password': 'somepassword',
'enc_password': '',
'queryParams': {},
'optIntoOneTap': 'false'
}
with requests.Session() as s:
r = s.get(link)
csrf = re.findall(r"csrf_token\":\"(.*?)\"",r.text)[0]
r = s.post(login_url,data=payload,headers={
"user-agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"x-requested-with": "XMLHttpRequest",
"referer": "https://www.instagram.com/accounts/login/",
"x-csrftoken":csrf
})
print(r.status_code)
print(r.url)
I found using dev tools:
username: someusername
enc_password: #PWD_INSTAGRAM_BROWSER:10:1592421027:ARpQAAm7pp/etjy2dMjVtPRdJFRPu8FAGILBRyupINxLckJ3QO0u0RLmU5NaONYK2G0jQt+78BBDBxR9nrUsufbZgR02YvR8BLcHS4uN8Gu88O2Z2mQU9AH3C0Z2NpDPpS22uqUYhxDKcYS5cA==
queryParams: {"oneTapUsers":"[\"36990119985\"]"}
optIntoOneTap: false
How can I login to Instagram using requests?
You can use authentication version 0 - plain password, no encryption:
import re
import requests
from bs4 import BeautifulSoup
from datetime import datetime
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
time = int(datetime.now().timestamp())
payload = {
'username': '<USERNAME HERE>',
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{time}:<PLAIN PASSWORD HERE>', # <-- note the '0' - that means we want to use plain passwords
'queryParams': {},
'optIntoOneTap': 'false'
}
with requests.Session() as s:
r = s.get(link)
csrf = re.findall(r"csrf_token\":\"(.*?)\"",r.text)[0]
r = s.post(login_url,data=payload,headers={
"user-agent": "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.120 Safari/537.36",
"x-requested-with": "XMLHttpRequest",
"referer": "https://www.instagram.com/accounts/login/",
"x-csrftoken":csrf
})
print(r.status_code)
print(r.url)
print(r.text)
Prints:
200
https://www.instagram.com/accounts/login/ajax/
{"authenticated": true, "user": true, "userId": "XXXXXXXX", "oneTapPrompt": true, "reactivated": true, "status": "ok"}
In order to do it, you need to do some investigation job on their javascript.
After a little research, I got that they use AES-GCM with 256 bits key length, they have some prefix of 100 bytes that I still do not know what is it, then they concatenate the password to it and encrypt the whole message 100 + len(password).
You can read about AES-GCM, get the key, iv, and additional data from their code, and complete the job yourself.
I hope that I have helped, Good Luck :)
The above code provided by #Andrej Kesely fails for me. but I have did some changes in the code by setting the header(user-agent and Referer) in the first get "s.get(link)" request.
If you also get the error like same as me.
csrf = re.findall(r"csrf_token\":\"(.*?)\"",r.text)[0]
IndexError: list index out of range
Then here is the solution as I have told you above.
import re
import requests
from bs4 import BeautifulSoup
from datetime import datetime
link = 'https://www.instagram.com/accounts/login/'
login_url = 'https://www.instagram.com/accounts/login/ajax/'
userAgent= "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36"
time = int(datetime.now().timestamp())
payload = {
'username': '<Your_Username>',
'enc_password': f'#PWD_INSTAGRAM_BROWSER:0:{time}:<Your_Password>',
'queryParams': {},
'optIntoOneTap': 'false'
}
with requests.Session() as s:
s.headers= {"user-agent":userAgent}
s.headers.update({"Referer":link})
r = s.get(link)
print(r)
csrf = re.findall(r"csrf_token\":\"(.*?)\"",r.text)[0]
r = s.post(login_url,data=payload,headers={
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36",
"x-requested-with": "XMLHttpRequest",
"referer": "https://www.instagram.com/accounts/login/",
"x-csrftoken":csrf
})
print(r.status_code)
print(r.url)
print(r.text)
Please use the python3 version to run this script. otherwise you will get some more error for “formatted string literals,” and "AttributeError: 'datetime.datetime' object has no attribute 'timestamp'".
Run the script like this.
python3 <your_script_name.py>
I hope this will solve your problem.
You can't. Instagram encrypts the password when the request is sent. Unless you can figure out the way they encrypt it and be able to do the same with the password you're sending, you can't login to instagram with requests.
I want to download bing search images using python code.
Example URL: https://www.bing.com/images/search?q=sketch%2520using%20iphone%2520students
My python code generates an url of bing search as shown in example. Next step, is to download all images shown in that link on my local desktop.
In my project i am generating some words in python and my code generates bing image search URL. All i need is to download images shown on that search page using python.
To download an image, you need to make a request to the image URL that ends with .png, .jpg etc.
But Bing provides a "m" attribute inside the <a> element that stores needed data in the JSON format from which you can parse the image URL that is stored in the "murl" key and download it afterward.
To download all images locally to your computer, you can use 2 methods:
# bs4
for index, url in enumerate(soup.select(".iusc"), start=1):
img_url = json.loads(url["m"])["murl"]
image = requests.get(img_url, headers=headers, timeout=30)
query = query.lower().replace(" ", "_")
if image.status_code == 200:
with open(f"images/{query}_image_{index}.jpg", 'wb') as file:
file.write(image.content)
# urllib
for index, url in enumerate(soup.select(".iusc"), start=1):
img_url = json.loads(url["m"])["murl"]
query = query.lower().replace(" ", "_")
opener = req.build_opener()
opener.addheaders=[("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36")]
req.install_opener(opener)
req.urlretrieve(img_url, f"images/{query}_image_{index}.jpg")
In the first case, you can use context manager with open() to load the image locally. In the second case, you can use urllib.request.urlretrieve method of the urllib.request library.
Also, make sure you're using request headers user-agent to act as a "real" user visit. Because default requests user-agent is python-requests and websites understand that it's most likely a script that sends a request. Check what's your user-agent.
Note: An error might occur with the urllib.request.urlretrieve method where some of the request has got a captcha or something else that returns an unsuccessful status code. The biggest problem is it's hard to test for response code while requests provide a status_code method to test it.
Code and full example in online IDE:
from bs4 import BeautifulSoup
import requests, lxml, json
query = "sketch using iphone students"
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": query,
"first": 1
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36"
}
response = requests.get("https://www.bing.com/images/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, "lxml")
for index, url in enumerate(soup.select(".iusc"), start=1):
img_url = json.loads(url["m"])["murl"]
image = requests.get(img_url, headers=headers, timeout=30)
query = query.lower().replace(" ", "_")
if image.status_code == 200:
with open(f"images/{query}_image_{index}.jpg", 'wb') as file:
file.write(image.content)
Using urllib.request.urlretrieve.
from bs4 import BeautifulSoup
import requests, lxml, json
import urllib.request as req
query = "sketch using iphone students"
# https://docs.python-requests.org/en/master/user/quickstart/#passing-parameters-in-urls
params = {
"q": query,
"first": 1
}
# https://docs.python-requests.org/en/master/user/quickstart/#custom-headers
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36"
}
response = requests.get("https://www.bing.com/images/search", params=params, headers=headers, timeout=30)
soup = BeautifulSoup(response.text, "lxml")
for index, url in enumerate(soup.select(".iusc"), start=1):
img_url = json.loads(url["m"])["murl"]
query = query.lower().replace(" ", "_")
opener = req.build_opener()
opener.addheaders=[("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36")]
req.install_opener(opener)
req.urlretrieve(img_url, f"images/{query}_image_{index}.jpg")
Output:
edit your code to find the designated image url and then use this code
use urllib.request
import urllib.request as req
imgurl ="https://i.ytimg.com/vi/Ks-_Mh1QhMc/hqdefault.jpg"
req.urlretrieve(imgurl, "image_name.jpg")
i would like to take the response data about a specific website.
I have this site:
https://enjoy.eni.com/it/milano/map/
and if i open the browser debuger console i can see a posr request that give a json response:
how in python i can take this response by scraping the website?
Thanks
Apparently the webservice has a PHPSESSID validation so we need to get it first using proper user agent:
import requests
import json
headers = {
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'
}
r = requests.get('https://enjoy.eni.com/it/milano/map/', headers=headers)
session_id = r.cookies['PHPSESSID']
headers['Cookie'] = 'PHPSESSID={};'.format(session_id)
res = requests.post('https://enjoy.eni.com/ajax/retrieve_vehicles', headers=headers, allow_redirects=False)
json_obj = json.loads(res.content)
import requests
headers ={
"Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Encoding":"gzip, deflate",
"Accept-Language":"en-US,en;q=0.5",
"Connection":"keep-alive",
"Host":"mcfbd.com",
"Referer":"https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",
"User-Agent":"Mozilla/5.0(Windows NT 10.0; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0"}
a = requests.session()
soup = BeautifulSoup(a.get("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx").content)
payload = {"ctl00$ContentPlaceHolder1$txtSearchHouse":"",
"ctl00$ContentPlaceHolder1$txtSearchSector":"",
"ctl00$ContentPlaceHolder1$txtPropertyID":"",
"ctl00$ContentPlaceHolder1$txtownername":"",
"ctl00$ContentPlaceHolder1$ddlZone":"1",
"ctl00$ContentPlaceHolder1$ddlSector":"2",
"ctl00$ContentPlaceHolder1$ddlBlock":"2",
"ctl00$ContentPlaceHolder1$btnFind":"Search",
"__VIEWSTATE":soup.find('input',{'id':'__VIEWSTATE'})["value"],
"__VIEWSTATEGENERATOR":"14039419",
"__EVENTVALIDATION":soup.find("input",{"name":"__EVENTVALIDATION"})["value"],
"__SCROLLPOSITIONX":"0",
"__SCROLLPOSITIONY":"0"}
b = a.post("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",headers = headers,data = payload).text
print(b)
above is my code for this website.
https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx
I checked firebug out and these are the values of the form data.
however doing this:
b = requests.post("https://mcfbd.com/mcf/FrmView_PropertyTaxStatus.aspx",headers = headers,data = payload).text
print(b)
throws this error:
[ArgumentException]: Invalid postback or callback argument
is my understanding of submitting forms via request correct?
1.open firebug
2.submit form
3.go to the NET tab
4.on the NET tab choose the post tab
5.copy form data like the code above
I've always wanted to know how to do this. I could use selenium but I thought I'd try something new and use requests
The error you are receiving is correct because the fields like _VIEWSTATE (and others as well) are not static or hardcoded. The proper way to do this is as follows:
Create a Requests Session object. Also, it is advisable to update it with headers containing USER-AGENT string -
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36",}`
s = requests.session()
Navigate to the specified url -
r = s.get(url)
Use BeautifulSoup4 to parse the html returned -
from bs4 import BeautifulSoup
soup = BeautifulSoup(r.content, 'html5lib')
Populate formdata with the hardcoded values and dynamic values -
formdata = {
'__VIEWSTATE': soup.find('input', attrs={'name': '__VIEWSTATE'})['value'],
'field1': 'value1'
}
Then send the POST request using the session object itself -
s.post(url, data=formdata)