I configured Tor browser and privoxy using https://jarroba.com/anonymous-scraping-by-tor-network/. When I checked my IP with http://icanhazip.com/, my IP addresses are changed; it works. But, when I tried to scrape desired website, I got:
You are attempting to access "website" using an anonymous private/proxy network. Please disable that and try accessing the site again.
Tor hides your IP address, but it does not hide the fact that you are using Tor, since Tor exit relays are public knowledge. For example, xmyip.com will tell you whether or not your IP is a Tor IP.
Given the error you received, it looks like that website blocks Tor users, which is a fairly common practice. See Tor users being actively blocked on some websites for more details.
Related
I'm using a vpn and it only changes ip address of chrome. Sometimes my vpn connection goes down and I want to know when it happens and I want to be able to know that by comparing my local ip address and chromes ip address so that if they are equal I will know that my vpn connection is down.
Is there any possibility to see my chrome's ip address that selenium makes (using seleniums any function or without using doesn't matter)? I need to get that ip not just for only this case, that's why I don't use try catch.
Is there any way to find my chrome's ip address that selenium makes with VB 6.0 and with Python?
So I created a simple Flask app to automate certain calculations we often have to do in math class. I'm now trying to let my friends use it too but I can't get the local port forwarding right. When I run the app I can access it from my local network but not from outside of it. (I tested that by trying to reach the web app through my phone on mobile data, and it doesn't respond.) I'm aware that ssh tunnelling is probably a better way to do this, but I still want to figure out what I'm doing wrong here.
I am very new to this and used this video as a reference: https://www.youtube.com/watch?v=jfSLxs40sIw. Here's a brief summary of the things I already tried:
I changed app.run() to app.run(host='0.0.0.0',port=5000) to make Flask respond to all public ip's.
When I now run my app I can access it from my computer via:
http://0.0.0.0:5000/
http://127.0.0.1:5000/
http://192.168.1.101:5000/
I then used freedns.afraid.org to create a subdomain flaskdries.mooo.com. When redirecting the subdomain to the latest ip-adress in the list (192.168.1.101:5000) it would always refuse to connect, even on the pc that's running the app. Using 127.0.0.1:5000 eventually did the trick for all the devices on my network (image), but still not for devices outside of my network.
I guess that's an obvious thing since my WAN ip is nowhere to be specified in this method. So if I'm correct, when someone goes to the subdomain, there is no link to my router so also not to the device running the app. The problem is that I have no clue where I should specify my WAN ip or something similar.
I noticed that when I created the subdomain the destination was automatically set to my WAN ip
(image). At first I thought simply adding :5000 would work, but unfortunately it doesn't.
As you might have noticed I am extremely new to this and don't really have any other information i can rely on apart from the internet, so any help is welcome!
Thanks in advance,
Dries
After more research I figured out that the problem was that I have a seperate modem and router. For most people one port forward inside the router is enough, but I also had to forward a port from my modem to my router. Kind of annoying that I didn't think of that earlier. Thanks to everybody for responding tho.
Hi and welcome to stack overflow.
In order to access you app from the internet you will need an external static IP that you should be able to obtain from your internet provider. You then set your domain to point to that IP. If you don't want to specify the port each time, you can run you flask app on port 80 or 443 if you want https.
Also it is probably advisable to run it behind a web server of some sorts, like nginx since app.run is only intended to be used for local development.
You are using the ip adresses of you local network.
If you have port forwarding enabled to you machine then you have to use you public ip adress.
Your router should have the public ip adress in the admin interface.
Simplifiyed explanation:
You domain shoud lead to the external IP of your router.
Your router then forwards the request to your machine via portforwading (network IP address).
I am doing web scraping with python in some pages and I have been blocked from some of them. When I have tried to check it also through the TOR Browser I have seen that I cannot access to the pages neither, so I think that these pages have been able to track all my IP or I dont have well configurated TOR (and I think not cause I have checked my IP address with Chrome and TOR and are different), so, any one knows why?
Also, I am trying to do a function or method in my python code to change mi IP automatically. What I have seen is that the best is to do it through the TOR browser (using it as the search engine to get data from pages) but I am not able to make it work. Do you have any recommendation to create this function?
Thank you!
I would expect anti scrape protection to also block visits from known Tor exit nodes. I dont think they know it is you. Some websites hire/implement state of the art scrape protection services.
You could setup your own proxies at friends and family and use a very conservative crawl rate or maybe search for commercial residential proxy offerings.
I'm trying to test response time of webpages hosted on many backends. These hosts are behind load balancer and above that I have my domain.com.
I want to use python+selenium on these backends but with spoofed hostname, without messing with /etc/hosts or running fake DNS servers. Is that possible with pure selenium drivers?
To illustrate problem better, here is what's possible in curl and I'd like to do the same with python+selenium:
If you're on a UNIX system, you can try something as explained here:
https://unix.stackexchange.com/questions/10438/can-i-create-a-user-specific-hosts-file-to-complement-etc-hosts
Basically you still use a hosts file, but it's only for you, located in ~/.hosts, setting the HOSTALIASESenvironment variable.
In short, no.
Selenium drives browsers using the WebDriver standard, which is by definition limited to interactions with page content. Even though you can provide Selenium with configuration options for the browser, no browser provides control over Host headers or DNS resolution outside of a proxy.
But even if you could initiate a request for a particular IP address with a custom Host header, subsequent requests triggered by the content (redirection; downloading of page assets; AJAX calls; etc) would still be outside of your control and are prohibited from customizing the Host header, leading the browser to fall back to standard DNS resolution.
Your only options are modifying the local DNS resolution (via /etc/hosts) or providing an alternative (via a proxy).
I'm writing this application where the user can perform a web search to obtain some information from a particular website.
Everything works well except when I'm connected to the Internet via Proxy (it's a corporate proxy).
The thing is, it works sometimes.
By sometimes I mean that if it stops working, all I have to do is to use any web browser (Chrome, IE, etc.) to surf the internet and then python's requests start working as before.
The error I get is:
OSError('Tunnel connection failed: 407 Proxy Authentication Required',)
My guess is that some sort of credentials are validated and the proxy tunnel is up again.
I tried with the proxies handlers but it remains the same.
My doubts are:
How do I know if the proxy need authentication, and if so, how to do it without hardcoding the username and password since this application will be used by others?
Is there a way to use the Windows default proxy configuration so it will work like the browsers do?
What do you think that happens when I surf the internet and then the python requests start working again?
I tried with requests and urllib.request
Any help is appreciated.
Thank you!
Check if there is any proxy setting in chrome