How to get the server info of a website using python requests? - python

I want to make a web crawler to make a statistic about most popular server software among Bulgarian sites, such as Apache, nginx, etc. Here is what I came up with:
import requests
r = requests.get('http://start.bg')
print(r.headers)
Which return the following:
{'Debug': 'unk',
'Content-Type': 'text/html; charset=utf-8',
'X-Powered-By': 'PHP/5.3.3',
'Content-Length': '29761',
'Connection': 'close',
'Set-Cookie': 'fbnr=1; expires=Sat, 13-Feb-2016 22:00:01 GMT; path=/; domain=.start.bg',
'Date': 'Sat, 13 Feb 2016 13:43:50 GMT',
'Vary': 'Accept-Encoding',
'Server': 'Apache/2.2.15 (CentOS)',
'Content-Encoding': 'gzip'}
Here you can easily see that it runs on Apache/2.2.15 and you can get this result by simply saying r.headers['Server']. I tried that with several Bulgarian websites and they all had the Server key.
However, when I request the header of a more sophisticated website, such as www.teslamotors.com, I get the following info:
{'Content-Type': 'text/html; charset=utf-8',
'X-Cache-Hits': '9',
'Cache-Control': 'max-age=0, no-cache, no-store',
'X-Content-Type-Options': 'nosniff',
'Connection': 'keep-alive',
'X-Varnish-Server': 'sjc04p1wwwvr11.sjc05.teslamotors.com',
'Content-Language': 'en',
'Pragma': 'no-cache',
'Last-Modified': 'Sat, 13 Feb 2016 13:07:50 GMT',
'X-Server': 'web03a',
'Expires': 'Sat, 13 Feb 2016 13:37:55 GMT',
'Content-Length': '10290',
'Date': 'Sat, 13 Feb 2016 13:37:55 GMT',
'Vary': 'Accept-Encoding',
'ETag': '"1455368870-1"',
'X-Frame-Options': 'SAMEORIGIN',
'Accept-Ranges': 'bytes',
'Content-Encoding': 'gzip'}
As you can see there isn't any ['Server'] key in this dictionary (although there is X-Server and X-Varnish-Server which I'm not sure what they mean, but its value is not a server name like Apache.
So i'm thinking there must be another request I could send that would yield the desired server information, or probably they have their own specific server software (which sounds plausible for facebook).
I also tried other .com websites, such as https://spotify.com and it does have a ['Server'] key.
So is there a way to find the info about the servers Facebook and Tesla Motors use?

That has nothing to do with python, most well configured web servers will not return information inside the "server" http header due to security implications.
No sane developer would want to let you know that they are running an unpatched version of xxx product.

Related

Find out time stamp of Slack message from Python API

I created Slack app, added Bot and Incoming Webhook to it and posted some messages with Bot. Now I would like to find out time stamp of Slack message in order to delete it later with chat.delete method.
I found it that I can use channels.history method.
Here is how I tried to use it. I used it with token found under OAuth Access Token, since per docs I cannot use Bot token with channels.history method.
from slackclient import SlackClient
slack_token_user_token = 'xoxp-long_string_of_integers'
sc_user_token = SlackClient(slack_token_user_token)
sc_user_token.api_call(
"channels.history",
channel="CHXXXXXXX")
I got back the following error:
{'error': 'missing_scope',
'headers': {'Access-Control-Allow-Headers': 'slack-route, x-slack-version-ts',
'Access-Control-Allow-Origin': '*',
'Access-Control-Expose-Headers': 'x-slack-req-id',
'Cache-Control': 'private, no-cache, no-store, must-revalidate',
'Connection': 'keep-alive',
'Content-Encoding': 'gzip',
'Content-Length': '108',
'Content-Type': 'application/json; charset=utf-8',
'Date': 'Fri, 05 Apr 2019 18:18:11 GMT',
'Expires': 'Mon, 26 Jul 1997 05:00:00 GMT',
'Pragma': 'no-cache',
'Referrer-Policy': 'no-referrer',
'Server': 'Apache',
'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload',
'Vary': 'Accept-Encoding',
'Via': '1.1 f0f1092b2ad1f0e573a4fcbefe4fb621.cloudfront.net (CloudFront)',
'X-Accepted-OAuth-Scopes': 'channels:history',
'X-Amz-Cf-Id': 'fSm6uo2H88E43JCvqd2h5mohnzA6z0B3kmdsG3u9nW0PJNrsrpK7mg==',
'X-Cache': 'Miss from cloudfront',
'X-Content-Type-Options': 'nosniff',
'X-OAuth-Scopes': 'identify,bot,incoming-webhook',
'X-Slack-Req-Id': 'c158668d-ddc9-4bbc-9a7d-6b9a9011d2dc',
'X-Via': 'haproxy-www-yfr6',
'X-XSS-Protection': '0'},
'needed': 'channels:history',
'ok': False,
'provided': 'identify,bot,incoming-webhook'}
If this is permission issue, how do I find out proper token to use?
According to the error message you posted the token used is lacking the required scope.
'needed': 'channels:history'
It looks like you provided the bot token, which can not work.
'provided': 'identify,bot,incoming-webhook'
Provide the access token and make sure you first add the channel.history scope and reinstall the app to activate.

Scopus API - Number of requests remaining for the week

I am using the Elsevier API to access citation count data from Scopus, through the scopus-api module (but would be happy to use Elsevier's elsapy module). I can access the data I need, but there is a limit for the number of requests that can be made per week.
How would one obtain the number of remaining requests for the week?
All help is appreciated.
Although an old question, the answer might help someone else who stumbles across it. The quota related information is contained in the headers of the response to your request. Each API endpoint seems to have its own limit.
Here's an example of a response that still has something left of the quota:
{'allow': 'GET', 'Content-Encoding': 'gzip', 'Content-Type': 'application/xml;charset=UTF-8', 'Date': 'Fri, 26 Aug 2019 17:46:46 GMT', 'Server': 'Apache-Coyote/1.1', 'vary': 'Origin', 'X-ELS-APIKey': 'your-api-key-would-be-here', 'X-ELS-ReqId': '16385g19-b193-1308-5817-c5694db5619g', 'X-ELS-ResourceVersion': 'default', 'X-ELS-Status': 'OK', 'X-ELS-TransId': '16385g19-b193-1308-5817-c5694db5619g', 'X-RateLimit-Limit': '20000', 'X-RateLimit-Remaining': '19636', 'X-RateLimit-Reset': '2019-10-03 07:18:17', 'transfer-encoding': 'chunked', 'Connection': 'keep-alive'}
Here's an example for which the quota has been exceeded:
{'Content-Encoding': 'gzip', 'Content-Type': 'text/xml;charset=UTF-8', 'Date': 'Fri, 19 Aug 2019 17:46:46 GMT', 'Server': 'Apache-Coyote/1.1', 'X-ELS-Status': 'QUOTA_EXCEEDED - Quota Exceeded', 'X-RateLimit-Reset': '2019-08-26 05:51:01', 'Content-Length': '191', 'Connection': 'keep-alive'}
An example to get the headers in python using requests:
url = https://api.elsevier.com/content/abstract/scopus_id/85040730407?apiKey=yourapikey
response = requests.get(url)
print(response.headers)

Python request.get() returns 404 page not found

I have been having a little big of a funny behavior and would love an explanation as to why it is happening.
I am using the following to grab a page and then parse through it:
r = requests.get(html)
Now when I run this on a windows computer with python on a Webpage A it gets back the page as you would expect.
However, when I run this same command on my Synology Diskstation(I believe Linux Based) it returns back a 404 page not found page instead of the entered url page.
When I try different URL's it gives me back the right page on both systems.
Any explanation as to how or why this is happening?
EDIT: Just tried it on my MacBook at home as well and it works just fine. But for some reason it still does not work on the Diskstation :S
EDIT:
Headers from two machines
Mac (Where it is working):
{'Content-Length': '17924', 'X-Content-Type-Options': 'nosniff', 'Content-Encoding': 'gzip', 'Set-Cookie': 'PHPSESSID=q86c56e1e4t1d8jsu0penc488oraladt; path=/', 'Vary': 'Host,Accept-Encoding', 'Keep-Alive': 'timeout=10, max=100', 'Server': 'Apache', 'Connection': 'Keep-Alive', 'Date': 'Tue, 24 Jan 2017 04:31:08 GMT', 'Content-Type': 'text/html'}
Diskstation (Where it is not):
{'X-Content-Type-Options': 'nosniff', 'Transfer-Encoding': 'chunked', 'Vary': 'Host', 'Keep-Alive': 'timeout=10, max=100', 'Server': 'Apache', 'Connection': 'Keep-Alive', 'Date': 'Tue, 24 Jan 2017 04:30:25 GMT', 'Content-Type': 'text/html'}
More than likely, either you are hitting a robots.txt issue or the header info sent is different between the 2 systems. A basic trace should point you in the right direction.

Not able to upload a file through python

After several attempts and repeated failures, I am posting my code excerpt here. I keep getting Authentication failure. Can somebody point out what is it that I am doing wrong here?
import requests
fileToUpload = {'file': open('/home/pinku/Desktop/Test_Upload.odt', 'rb')}
res = requests.post('https://upload.backupgrid.net/add', fileToUpload)
print res.headers
cookie = {'PHPSESSID': 'tobfr5f31voqmtdul11nu6n9q1'}
requests.post('https://upload.backupgrid.net/add', cookie, fileToUpload)
By print res.headers, I get the following:
CaseInsensitiveDict({'content-length': '67',
'access-control-allow-methods': 'OPTIONS, HEAD, GET, POST, PUT,
DELETE', 'x-content-type-options': 'nosniff', 'content-encoding':
'gzip', 'set-cookie': 'PHPSESSID=ou8eijalgpss204thu7ht532g1; path=/,
B100Serverpoolcookie=4281246842.1.973348976.502419456; path=/',
'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'vary': 'Accept-Encoding',
'server': 'Apache/2.2.15 (CentOS)', 'pragma': 'no-cache',
'cache-control': 'no-store, no-cache, must-revalidate', 'date': 'Mon,
09 Sep 2013 09:13:08 GMT', 'access-control-allow-origin': '*',
'access-control-allow-headers': 'X-File-Name, X-File-Type,
X-File-Size', 'content-type': 'text/html; charset=UTF-8'})
It contains the cookies also. Am I passing the cookies correctly? Please help!
You are not passing cookies correctly, should be:
requests.post('https://upload.backupgrid.net/add',
files=fileToUpload,
cookies=cookie)
See also documentation:
Cookies
POST a Multipart-Encoded File

Why is Facebook authentication via Python producing an error?

I am trying to authenticate users of my Django application into Facebook via the oauth2 Python package.
def myView(request):
consumer = oauth2.Consumer(
key = settings.FACEBOOK_APP_ID,
secret = settings.FACEBOOK_APP_SECRET)
# Request token URL for Facebook.
request_token_url = "https://www.facebook.com/dialog/oauth/"
# Create client.
client = oauth2.Client(consumer)
# The OAuth Client request works just like httplib2 for the most part.
resp, content = client.request(request_token_url, "GET")
# Return a response that prints out the Facebook response and content.
return HttpResponse(str(resp) + '\n\n ------ \n\n' + content)
However, I am directed to a page that contains an error when I go to this view. The error has this response from Facebook.
{'status': '200', 'content-length': '16418', 'x-xss-protection': '0',
'content-location': u'https://www.facebook.com/dialog/oauth/?oauth_body_hash=2jmj7l5rSw0yVb%2FvlWAYkK%2FYBwk%3D&oauth_nonce=53865791&oauth_timestamp=1342666292&oauth_consumer_key=117889941688718&oauth_signature_method=HMAC-SHA1&oauth_version=1.0&oauth_signature=XD%2BZKqhJzbOD8YBJoU1WgQ4iqtU%3D',
'x-content-type-options': 'nosniff',
'transfer-encoding': 'chunked',
'expires': 'Sat, 01 Jan 2000 00:00:00 GMT',
'connection': 'keep-alive',
'-content-encoding': 'gzip',
'pragma': 'no-cache',
'cache-control': 'private, no-cache, no-store, must-revalidate',
'date': 'Thu, 19 Jul 2012 02:51:33 GMT',
'x-frame-options': 'DENY',
'content-type': 'text/html; charset=utf-8',
'x-fb-debug': 'yn3XYqMylh3KFcxU9+FA6cQx8+rFtP/9sJICRgj3GOQ='}
Does anyone see anything awry in my code? I have tried concatenating arguments as strings to request_token_url to no avail. I am sure that my Facebook app ID and secret string are correct.

Categories

Resources