How do I change the referer if I'm using the requests library to make a GET request to a web page. I went through the entire manual but couldn't find it.
According to http://docs.python-requests.org/en/latest/user/advanced/#session-objects , you should be able to do:
s = requests.Session()
s.headers.update({'referer': my_referer})
s.get(url)
Or just:
requests.get(url, headers={'referer': my_referer})
Your headers dict will be merged with the default/session headers. From the docs:
Any dictionaries that you pass to a request method will be merged with
the session-level values that are set. The method-level parameters
override session parameters.
here we are rotating the user_agent with referer
user_agent_list = [
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/77.0.3865.90 Safari/537.36",
"Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
"Mozilla/5.0 (iPad; CPU OS 15_6 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/104.0.5112.99 Mobile/15E148 Safari/604.1"
]
reffer_list=[
'https://stackoverflow.com/',
'https://twitter.com/',
'https://www.google.co.in/',
'https://gem.gov.in/'
]
headers = {
'Connection': 'keep-alive',
'Cache-Control': 'max-age=0',
'Upgrade-Insecure-Requests': '1',
'User-Agent': random.choice(user_agent_list),
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate',
'Accept-Language': 'en-US,en;q=0.9,fr;q=0.8',
'referer': random.choice(reffer_list)
}
Related
This is in continuation to my post here.
I have figured out that I require a JSESSIONID in my headers to get the data with Python. But the problem is every time the JSESSIONID changes. How to deal with this issue? Below is my code:
import requests
sym_1 = 'NIFTY'
exp_date = '26MAY2022'
headers_1 = {
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
'cookie': 'JSESSIONID=8EB69BB64441BB6906DD7A241B1AAC82;'
}
url = "https://opstra.definedge.com/api/openinterest/optionchain/free/" + sym_1 + "&" + exp_date
text_data = requests.get(url_1, headers=headers_1)
This code is to post a form data
headers = {
'authority': 'ec.ef.com.cn',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.106 Safari/537.36',
'accept': '*/*',
'accept-language': 'en-US,en;q=0.9,ru-RU;q=0.8,ru;q=0.7,uk;q=0.6,en-GB;q=0.5',
}
s = requests.Session()
response = s.post(url, headers = headers)
which seems different to what Chrome does
I understand :authority is a kind of HTTP/2 Headers. How do I send it with Python requests?
You could use hyper.contrib.HTTP20Adapter, and set the mount(),like:
from hyper.contrib import HTTP20Adapter
import requests
def getHeaders():
headers = {
":authority": "xxx",
":method": "POST",
":path": "/login/secure.ashx",
":scheme": "https",
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
return headers
sessions=requests.session()
sessions.mount('https://xxxx.com', HTTP20Adapter())
r=sessions.post(url_search,data=playload,headers=getHeaders())
Refer to a Chinese blog
This shouldn't be too hard, although I can't figure it, i'm betting i'm making a dumb mistake.
Here's the code that works on an individual link and returns the zestimate (the req_headers variable prevents throwing a captcha):
req_headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.8',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
link = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
test_soup = BeautifulSoup(requests.get(link, headers=req_headers).content, 'html.parser')
results = test_soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
print(results)
Here's the code i'm trying to get to work and return the zestimate for each link and add to a new dataframe column, but I get AttributeError: 'NoneType' object has no attribute 'find_next' (Also, imagine i have a dataframe column of different zillow house links):
req_headers = {
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.8',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36'
}
for link in df['links']:
test_soup = BeautifulSoup(requests.get(link, headers=req_headers).content, 'html.parser')
results = test_soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
df['zestimate'] = results
Any help is appreciated.
I had a space before and after the links in my dataframe column :/. That was it. The code works fine. just an oversight on my part. Thanks all
I got fiddler to capture a GET request, I want to re send the exact request with python.
This is the request I captured:
GET https://example.com/api/content/v1/products/search?page=20&page_size=25&q=&type=image HTTP/1.1
Host: example.com
Connection: keep-alive
Search-Version: v3
Accept: application/json
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36
Referer: https://example.com/search/?q=&type=image&page=20
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
You can use the requests module.
The requests module automatically supplies most of the headers for you so you most likely do not need to manually include all of them.
Since you are sending a GET request, you can use the params parameter to neatly form the query string.
Example:
import requests
BASE_URL = "https://example.com/api/content/v1/products/search"
headers = {
"Connection": "keep-alive",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36"
}
params = {
"page": 20,
"page_size": 25,
"type": "image"
}
response = requests.get(BASE_URL, headers=headers, params=params)
import requests
headers = {
'authority': 'stackoverflow.com',
'cache-control': 'max-age=0',
'upgrade-insecure-requests': '1',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.121 Safari/537.36',
'accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'referer': 'https://stackoverflow.com/questions/tagged/python?sort=newest&page=2&pagesize=15',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,tr-TR;q=0.8,tr;q=0.7',
'cookie': 'prov=6bb44cc9-dfe4-1b95-a65d-5250b3b4c9fb; _ga=GA1.2.1363624981.1550767314; __qca=P0-1074700243-1550767314392; notice-ctt=4%3B1550784035760; _gid=GA1.2.1415061800.1552935051; acct=t=4CnQ70qSwPMzOe6jigQlAR28TSW%2fMxzx&s=32zlYt1%2b3TBwWVaCHxH%2bl5aDhLjmq4Xr',
}
response = requests.get('https://stackoverflow.com/questions/55239787/how-to-send-a-get-request-with-headers-via-python', headers=headers)
This is an example of how to send a get request to this page with headers.
You may open SSL socket (https://docs.python.org/3/library/ssl.html) to example.com:443, write your captured request into this socket as raw bytes, and then read HTTP response from the socket.
You may also try to use http.client.HTTPResponse class to read and parse HTTP response from your socket, but this class is not supposed to be instantiated directly, so some unexpected obstacles could emerge.
even though i'm passing headers like below, i'm getting 416 error: http is not handled or not allowed.
headers = {'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding':'gzip, deflate, br,sdch',
'Accept-Language':'en-US,en;q=0.8',
'AlexaToolbar-ALX_NS_PH':'AlexaToolbar/alx-4.0.1',
'Cache-Control':'max-age=0',
'Connection':'keep-alive',
'Host':'www.links.com',
'Referer':'https://www.links.com/',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
Try limiting the headers to below and see if it helps
headers = {
'Accept-Language':'en-US,en;q=0.8',
'Cache-Control':'max-age=0',
'Host':'www.mdlinx.com',
'Referer':'https://www.mdlinx.com/',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36'}
Basically your site is throwing 416 and of course scrapy won't handle that. So you need to workout which headers are causing the issue. Best thing is to use chrome dev tools, copy the request as curl and see if that works. Also this may be related to no cookies being there.
You need to figure what works and what doesn't and then work this out
Edit-1