I'm trying to scrape price information from Carrefour website with the following code
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/68.0.3440.106 Safari/537.36',
}
link = 'https://www.carrefour.es/extensor-tp-link-rango-75/366506176/p'
response = requests.get(link, headers=headers, timeout=60)
# Following lines parse response with Beautifulsoup
This code ends up with "Connection aborted" error. The error message is sligted different on Linux and Windows
When run on AWS (Ubuntu), it always result in error with below message
requests.exceptions.ConnectionError: ('Connection aborted.', BadStatusLine("''",))
While run on my PC (Win10), sometimes it works correctly, sometimes error occurs with message below
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
I guess this is not caused by IP restriction or User-Agent: Although the script result in no connect, I can still visit link using my PC (Win10), with same IP and User-Agent.
Am I blocked? What's the possible reason of being blocked?
Related
My code works when I make request from location machine.
When I try to make the request from AWS EC2 I get the following error:
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='www1.xyz.com', port=443): Read timed out. (read timeout=20)
I tried checking the url and that was not the issue. I then went ahead and tried to visit the page using the url and hidemyass webproxy with location set to the AWS EC2 machine, it got a 404.
The code:
# Dummy URL's
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
url = 'https://www1.xyz.com/iKeys.jsp?symbol={}&date=31DEC2020'.format(
symbol)
raw_page = requests.get(url, timeout=10, headers=header).text
I have tried setting the proxies to another ip address in the request, which I searched online:
proxies = {
"http": "http://125.99.100.193",
"https": "https://125.99.100.193",}
raw_page = requests.get(url, timeout=10, headers=header, proxies=proxies).text
Still got the same error.
1- Do I need to specify the port in proxies? Could this be causing the error when proxy is set?
2- What could be a solution for this?
Thanks
Hello I have a script that I run on my organizations internal network, but it was supposed to run on the 1st but it didn't so I did a backup on my local database of the data so that I can run the script to have the correct data. I changed the url so it lines up with my local site but it is not working as I get an error of
HTTPSConnectionPool(host='localhost', port=44345): Max retries exceeded with url: /logon.aspx (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7fc26acb86a0>: Failed to establish a new connection: [Errno 111] Connection refused',)
Here is how I set it up my script to access the url
URL = "https://localhost:44345/logon.aspx"
headers = {"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.109 Safari/537.36"}
username="script"
password="password"
s = Session()
s.verify = False
s.headers.update(headers)
r = s.get(URL)
Why is my connection being refused? I can browse to the site through my internet browser so why am I getting a connection refused?
Since your are running on localhost try http protocol instead of https.
I am trying to fix the following error. But i am not finding any solution. can anyone help me with this?
When i run this code sometimes it runs the code, but sometimes it displays the below error. Below is the code with the error
import requests
from bs4 import BeautifulSoup
import mysql.connector
mydb = mysql.connector.connect(host="localhost", user="root",passwd="", database="python_db")
mycursor = mydb.cursor()
#url="https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN=U01224KA1980PLC003802"
#query1 = "INSERT INTO csr_details(average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES()"
mycursor.execute("SELECT cin_no FROM tn_cin WHERE csr_status=0")
urls=mycursor.fetchall()
#print(urls)
def convertTuple(tup):
str = ''.join(tup)
return str
for url in urls:
str = convertTuple(url[0])
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}
csr_link = 'https://csr.gov.in/companyprofile.php?year=FY%202014-15&CIN='
link = csr_link+str
#print(link)
response=requests.get(link, headers=headers)
#print(response.status_code)
bs=BeautifulSoup(response.text,"html.parser")
div_table=bs.find('div', id = 'colfy4')
if div_table is not None:
fy_table = div_table.find_all('table', id = 'employee_data')
if fy_table is not None:
for tr in fy_table:
td=tr.find_all('td')
if len(td)>0:
rows=[i.text for i in td]
row1=rows[0]
row2=rows[1]
row3=rows[2]
row4=rows[3]
#cin_no=url[1]
#cin=convertTuple(url[1])
#result=cin_no+rows
mycursor.execute("INSERT INTO csr_details(cin_no,average_net_profit,csr_prescribed_expenditure,csr_spent,local_area_spent) VALUES(%s,%s,%s,%s,%s)",(str,row1,row2,row3,row4))
#print(cin)
#print(str)
#var=1
status_update="UPDATE tn_cin SET csr_status=%s WHERE cin_no=%s"
data = ('1',str)
mycursor.execute(status_update,data)
#result=mycursor.fetchall()
#print(result)
mydb.commit()
I am getting following error after running the above code
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
The error
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
is often an error caused on the server-side with the error normally classified under the status code of 5xx. The error simply suggests an instance in which the server is closed before a full response is delivered.
I believe it's likely caused by this line
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36', "Accept-Language": "en-US,en;q=0.9", "Accept-Encoding": "gzip, deflate"}
which in some cases has issues with the header values. You may simply try to set the header as
response=requests.get(link, headers={"User-Agent":"Mozilla/5.0"})
and see if that solves your problem.
See this answer for user-agents for a variety of browsers.
Sending a post request with proxies but keep running into proxy error.
Already tried multiple solutions on stackoverflow for [WinError 10061] No connection could be made because the target machine actively refused it.
Tried changing, system settings, verified if the remote server is existing and running, also no HTTP_PROXY environment variable is set in the system.
import requests
proxy = {IP_ADDRESS:PORT} #proxy
proxy = {'https': 'https://' + proxy}
#standard header
header={
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8",
"Referer": "https://tres-bien.com/adidas-yeezy-boost-350-v2-black-fu9006-fw19",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8"
}
#payload to be posted
payload = {
"form_key":"1UGlG3F69LytBaMF",
"sku":"adi-fw19-003",
# above two values are dynamically populating the field; hardcoded the value here to help you replicate.
"fullname": "myname",
"email": "myemail#gmail.com",
"address": "myaddress",
"zipcode": "areacode",
"city": "mycity" ,
"country": "mycountry",
"phone": "myphonenumber",
"Size_raffle":"US_11"
}
r = requests.post(url, proxies=proxy, headers=header, verify=False, json=payload)
print(r.status_code)
Expected output: 200, alongside an email verification sent to my email address.
Actual output: requests.exceptions.ProxyError: HTTPSConnectionPool(host='tres-bien.com', port=443): Max retries exceeded with url: /adidas-yeezy-boost-350-v2-black-fu9006-fw19 (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError(': Failed to establish a new connection: [WinError 10061] No connection could be made because the target machine actively refused it',)))
Quite a few things are wrong here... (after looking at the raffle page you're trying to post to, I suspect it is https://tres-bien.com/adidas-yeezy-boost-350-v2-black-fu9006-fw19 based on the exception you posted).
1) I'm not sure whats going on with your first definition of proxy as a dict instead of a string. That said, it's probably a good practice to use both http and https proxies. If your proxy can support https then it should be able to support http.
proxy = {
'http': 'http://{}:{}'.format(IP_ADDRESS, PORT),
'https': 'https://{}:{}'.format(IP_ADDRESS, PORT)
}
2) Second issue is that the raffle you're trying to submit to takes url encoded form data, not json. Thus your request should be structured like:
r = requests.post(
url=url,
headers=headers,
data=payload
)
3) That page has a ReCaptcha present, which is missing from your form payload. This isn't why your request is getting a connection error, but you're not going to successfully submit a form that has a ReCaptcha field without a proper token.
4) Finally, I suspect the root of your ProxyError is you are trying to POST to the wrong url. Looking at Chrome Inspector, you should be submitting this data to
https://tres-bien.com/tbscatalog/manage/rafflepost/ whereas your exception output indicates you are POSTing to https://tres-bien.com/adidas-yeezy-boost-350-v2-black-fu9006-fw19
Good luck with the shoes.
I was trying to download pictures from websites like 'http://xxx.jpg'.
The code:
headers={'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3359.139 Safari/537.36'}
url='http://xxx.jpg'
response = requests.get(url,headers=headers)
downloadFunction()
The error writes:
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
Error occurred at the first request, so it wasn't the request frequency which had caused the error. And I could still open the websites using a browser, so I just needed the code to act more like a browser. How can I achieve that besides setting the user-agent?
I know it isn't your case, and this is really old, but when searching google I stumbled on this, so I'll leave what solved my problem here:
test_link = "https://www.bbb.org/washington-dc-eastern-pa/business-reviews/online-education/k12-inc-in-herndon-va-190911943/#sealclick"
page = requests.get(test_link)
I got the error:
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, 'An existing connection was forcibly closed by the remote host', None, 10054, None))
So it isn't multiple connections, I think the problem was the headers, once I put headers the error disappeared, this is the code afterwards:
test_link = "https://www.bbb.org/washington-dc-eastern-pa/business-reviews/online-education/k12-inc-in-herndon-va-190911943/#sealclick"
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
"Accept-Encoding": "*",
"Connection": "keep-alive"
}
page = requests.get(test_link, headers=headers)
I had this error when the server was hosted on my machine over https and the SSL certificate was not correctly installed.
Following instructions to properly install the server's certificates solved the problem:
https://coderead.wordpress.com/2014/08/07/enabling-ssl-for-self-hosted-nancy/
https://www.cloudinsidr.com/content/how-to-install-the-most-recent-version-of-openssl-on-windows-10-in-64-bit/
For me, I had to add the headers with the Content-Type and accept (since those two fields were required from the API) and everything worked fine :).
headers = {
'Content-Type': 'application/json',
'accept': 'application/json',
}
result = requests.post(environ.get('customer_api_url'),
headers = headers,
json=lead)