Django is sending me an email about "Invalid HTTP_HOST header"

Django is sending me an email about "Invalid HTTP_HOST header" - python

I have website that using django on DigitialOcean. Django is sending me an email with title Invalid HTTP_HOST header: 'url.ml'. You may need to add 'url' to ALLOWED_HOSTS. But I don't want to add this, I added my own domains and ip address.
I have list of questions
Why is django sending me this email?
Bots are attacking to my site?
If so how they do that?
How can I prevent this?

As specified in the docs,
When DEBUG is False, Django will email the users listed in the ADMINS
setting whenever your code raises an unhandled exception and results
in an internal server error (HTTP status code 500). This gives the
administrators immediate notification of any errors. The ADMINS will
get a description of the error, a complete Python traceback, and
details about the HTTP request that caused the error.
So, I guess email comes from this. If your are interested in getting more detailed trace on 500 errors, you can have a look on Sentry, which his quite useful.
Now, regarding your error: url.ml I can't find any IP on this domain, but it's quite easy to had this header manually.
If you want to go deeper, you can go check your NGINX Log, aiming for IP sending those kind of requests: It might be some kind of DigitalOcean monitoring utilities, some kind of attack, you will need to get more details about that.

Related

I'm trying to send email with Django

I'm trying to send email with Django but i keep getting SMTPConnectError
ErrorMessage
[Views.pysettings.py

Although you should get better information about your problem, but I though you could see SMTP settings of your email/gmail or etc.

Access Denied 403 when webscraping; What to do?

I was testing a scraping algorithm that I had built. I made a request to https://www2.hm.com/fi_fi/miesten.html but misspecified the user-agent information. It seems that this triggered an immediate ban (not sure) Scraping their site should be fine - their robots.txt says: User-agent: *
Disallow: )
Example of making a request to HM and the subsequent server response
I erased the user agent and proxy information due to privacy concerns. However, they are nothing out of the ordinary.
I receive the following as response:
"b'\nAccess Denied\n\n\n \nYou don't have permission to access "http://www2.hm.com/fi_fi/miesten.html" on this server.\nReference #18.2796ef50.1625728417.f9aab80\n\n\n'"
So my question is: is there anything that I can do to lift this ban? Can i connect someone from their end and ask to lift it? If so, where can this information usually be found.
Although this question concern this site in particular, this is a much broader question. In the case of a ban, can the user try to connect someone from the server? I thought about contacting customer support, but I heavily suspect they they cannot help with this issue, and won't even understand what it is about.
I have googled this issue, but not found anything of help. They usually advise to clear cache, memory etc. This is not the problem here. I can access the site via Chrome or other browsers, but when using requests via python, this problem appears.

Pretty sure you need to use a Javascript scraping bot, you can try with this tool: https://docs.python-requests.org/projects/requests-html/en/latest/
And to get contact informations about the owner of a website you can use the unix whois command:
whois hm.com

Selenium - ERR_TOO_MANY_REDIRECTS [duplicate]

I am trying to automate my login to a webpage to download a daily xml. I understand that I need to have the actual frame url I think is
http://shop.braintrust.gr/shop/store/customerauthenticateform.asp
I examine the form and the fields and I do the following
browser = webdriver.Chrome('C:\\chromedriver.exe')
browser.get('http://shop.braintrust.gr/shop/store/customerauthenticateform.asp')
print('Browser Opened')
username = browser.find_element_by_name('UserID')
username.send_keys(email)
password = browser.find_element_by_name('password')
# time.sleep(2)
password.send_keys(pwd)
but I get a blank page saying that browser did a lot of redirections this means that it is impossible to login?
How can I login?
thank you

ERR_TOO_MANY_REDIRECTS
ERR_TOO_MANY_REDIRECTS (also known as a redirect loop) is one of the regular website errors. Typically this error occurs after a recent change to your website, a mis-configuration of redirects on your server or wrong settings with third-party services.
This error have no relation with Selenium as such and can be reproduced through Manual Steps.
The reason for ERR_TOO_MANY_REDIRECTS is that, something is causing your website to go into an infinite redirection loop. Essentially the site is stuck (such as URL 1 points to URL 2 and URL 2 points back to URL 1, or the domain has redirected you too many times) and unlike some other errors, these rarely resolve themselves and will probably need you to take action to fix it. There are a couple different variations of this error depending upon the browser you’re running.
Solution
Some common approach to check and fix the error as as follows:
Delete Cookies on That Specific Site: Google and Mozilla both in fact recommends right below the error to try clearing your cookies. Cookies can sometimes contain faulty data in which could cause the ERR_TOO_MANY_REDIRECTS error. This is one recommendation you can try even if you’re encountering the error on a site you don’t own. Due to the fact that cookies retain your logged in status on sites and other settings, in these cases simply deleting the cookie(s) on the site that is having the problem. This way you won’t impact any of your other sessions or websites that you frequently visit.
Clear Browser Cache: If you want to check and see if it might be your browser cache, without clearing your cache, you can always open up your browser in incognito mode. Or test another browser and see if you still see the ERR_TOO_MANY_REDIRECTS error.
Determine Nature of Redirect Loop: If clearing the cache didn’t work, then you’ll want to see if you can determine the nature of the redirect loop. For example, if a site has a 301 redirect loop back to itself, which is causing a large chain of faulty redirects. You can follow all the redirects and determine whether or not its looping back to itself, or perhaps is an HTTP to HTTPS loop.
Check Your HTTPS Settings: Another thing to check is your HTTPS settings. A lot of times it is observed ERR_TOO_MANY_REDIRECTS occur when someone has just migrated their WordPress site to HTTPS and either didn’t finish or setup something incorrectly.
Check Third-Party Services: ERR_TOO_MANY_REDIRECTS is also often commonly caused by reverse-proxy services such as Cloudflare. This usually happens when their Flexible SSL option is enabled and you already have an SSL certificate installed with your WordPress host. Why? Because, when flexible is selected, all requests to your hosting server are sent over HTTP. Your host server most likely already has a redirect in place from HTTP to HTTPS, and therefore a redirect loop occurs.
Check Redirects on Your Server: Besides HTTP to HTTPS redirects on your server, it can be good to check and make sure there aren’t any additional redirects setup wrong. For example, one bad 301 redirect back to itself could take down your site. Usually, these are found in your server’s config files.

Disable SSL for Heroku App (django)

We've decided not to use SSL anymore and unfortunately our server guy has quit and now I need to fix this. I've revoked the certs from Comodo, removed the SSL app from Heroku but that was apparently not enough and now we have serious problems with our site.
When visiting inteokej.nu one gets redirected to the app, but automatically http turns to https and instead of showing the domain (inteokej.nu) the app link is shown https://inteokej.herokuapp.com (I want inteokej.nu to be shown, not the actual app link).
That is a problem but not the biggest problem, which is that it's not possible to use the site anymore (e.g login, the static pages works though). When I try to login I first get a https security error and when I proceed I get to the following page: https://www.inteokej.nu/cgi-sys/defaultwebpage.cgi ("Sorry! If you are the owner of this website, please contact your hosting provider: webmaster#inteokej.nu").
I've now learned the hard way that SSL is a complex thing but I really need to get this site up again as soon as possible. So, where should I start and how could I proceed from this point? I guess there's some back end coding that should be done in the django code as well?
Thanks a lot in advance!

Your issue doesn't seem to be with SSL but DNS or at least however your server guy set things up.
The error page you're seeing isn't a Heroku error, inteokej.nu isn't being hosted on Heroku but on a server run by your DNS provider svenskadomaner.se .
If you use the Firefox Live HTTP Headers plugin you can follow the request/response cycle and you'll see that there is a 301 redirect from www.inteokej.nu to inteokej.herokuapp.com (probably an .htaccess redirect).
Check the DNS records for your domain (like here http://viewdns.info/dnsrecord/?domain=inteokej.nu ) you'll see that there is no CNAME record to Heroku, only an A Record to 46.22.116.5 which is an IP Address owned by svenskadomaner.se.
So the thing to do is to set up the custom domain as recommended on Heroku's site:
https://devcenter.heroku.com/articles/custom-domains
and set the CNAME to Heroku's recommendation.
One reason your server guy might have set things up like they did is that Heroku doesn't easily allow "naked domains", so people often do .htaccess redirects from example.com to www.example (which does work easily with CNAMEs).
Good luck!

On what side is 'HTTP Error 403: request disallowed by robots.txt' generated?

I am trying out Mechanize to make some routine simpler. I have managed to bypass that error by using br.set_handle_robots(False). There are talks about how ethical it's to use it. What I wonder about is where this error is generated, on my side, or on server side? I mean does Mechanize throw the exception when it sees some robots.txt rule or does server decline the request when it detects that I use an automation tool?

The server detects the user-agent. If the user agent match one in robots.txt, the rules are applied by the client.
By default, mechanize returns "Python-urllib/2.7".
See http://en.wikipedia.org/wiki/Robots_exclusion_standard

The server blocks your activity with such response.
Is it your site? If not, follow the rules:
Obey robots.txt file
Put a delay between request, even if robots.txt doesn't require it.
Provide some contact information (e-mail or page URL) in the User-Agent header.
Otherwise be ready site owner blocking you based on User-Agent, IP or other information he thinks distinguish you from legitimate users.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.