Find out web traffic source by the referrer header - python

I've honestly tried to find libraries (in Python) which would allow me to easily do this. Or to find some precise instructions how to parse the referrer header properly. But nothing so far.
Any ideas? Need to do this myself on the backed, not by importing data from Google Analytics or something. Thank you.

Use Flask API. The "request" object has a bounch of important information, such as the IP.
https://flask.palletsprojects.com/en/1.1.x/reqcontext/
You can you external services to check the location based in the IP, such as https://ipstack.com/.
Another, more complete option is using Google Analytics:
https://analytics.google.com/analytics/web/
You just add a tag to you HTML and Google does the rest!

Related

Python: interrogate database over http

I want to do automatic searches on a database (in this example www.scopus.com) with a simple python script. I need some place from where to start. For example I would like to do a search and get a list of links and open the links and extract information from the opened pages. Where do I start?
Technically speaking, scopus.com is not "a database", it's a web site that let's you search / consult a database. If you want to programmatically access their service, the obvious way is to use their API, which will mostly requires sending HTTP requests and parsing the HTTP response. You can do this with the standard lib's modules, but you'll certainly save a lot of time using python-requests instead. And you'll certainly want to get some understanding of the HTTP protocol before...

Is it possible to use a proxy rotator such as crawlera with google trends?

Since google trends require you to login, can I still use an IP rotator such as crawlera to download the csv files? If so, is there any example code with python (i.e python + crawlera to download files on google).
Thanks in advance.
No one is going to write code for you.
But I can leave some comments because I have been using Crawlera proxies for the past few months.
With crawlera you can scrape Google Trends with new IP each time, or even you can use a same IP each time(its called session management in crawlera).
You can send a header 'X-Crawlera-Session':'create' along with your request and Crawlera on their end will create a session, and in response, they will return 'X-Crawlera-Session': ['123123123'] ... And if you think that you are not blocked from Google,
You can send 'X-Crawlera-Session': '123123123' with each of your request so Crawlera will use same IP each time.
There is example of code with many languages in the documentation.
See https://doc.scrapinghub.com/crawlera.html#python for Python example.
Yes, it's possible to use Crawlera as well as other proxy apis like https://gimmeproxy.com . It provides Google proxies which might work for you.

get icloud web service endpoints to fetch data

My question may look silly but I am asking this after too much search on Google, yet not have any clue.
I am using iCloud web services. For that I have converted this Python code to PHP. https://github.com/picklepete/pyicloud
Up to this, everything is working good. When authenticate using icloud username,password I am getting a list of web service URLs as part of response. Now for example to use Contacts web service, I need to use Contact web service URL and add a part to that URL to fetch contacts.
https://p45-contactsws.icloud.com:443/co/startup with some parameters.
The webservice URL https://p45-contactsws.icloud.com:443 is coming in response while authenticating. But the later part, 'co/startup' is there in the python code. I don't know how they found that part. So for some services which is there in Python code, they are working good. But I want to use few other service like https://p45-settingsws.icloud.com:443, https://p45-keyvalueservice.icloud.com:443 etc. and when I try to send request with correct parameters to this other services, I am getting errors like 404 not found or unauthorized access. So I believe that some URL part must be added to this just like contacts. If someone knows how or where can I get correct URL part, I will be really thankful.
Thanks to all in advance for their time reading/answering my question.
I am afraid there doesn't seem to be an official source for these API endpoints, since they seem to be discovered through sniffing the network calls rather than a proper guide from Apple. For example, this presentation, which comes from a forensic tools company, is from 2013 and covers some of the relevant endpoints. Note that iOS was still at versions 5 & 6 then (vs. the current v9.3).
All other code samples on the net basically are using the same set of API endpoints that were originally observed in 2012-2013. (Here's a snippet from another python module with additional URLs you may use.) However, all of them pretty much point to each other as the source.
If you'd like to pursue a different path, Apple now promotes the CloudKit and CloudKit JS solutions for registered apps working with iCloud data.

How can I access SoundCloud-Stream URLs in Python?

Some time ago I wrote a little tool for a friend of mine. I retrieved all stream-links (like this) from a soundlist and downloaded all those with a small python script.
Since begin of March, soundcloud must have changed something, and now my cronjob recieves 401 Unauthorized errors. I've read through the soundcloud API, but that whole Access Token does not really fit my needs.
Has anyone of you an idea of easily dealing with this problem? Thanks.
As Makoto said, 401 seems like you have lost priviledges to access through your OAuth token so I would double check to make sure your app is still available and that your tokens are correct. You can check on the Your Apps Page.
Also, I noticed that your url seemed a bit different than what the SC api shows. Once you resolve to get a proper track id, the convention for a stream url is:
http://api.soundcloud.com/tracks/{id}/stream
This can be found in their track documentation.
Read the documentation here. You have to add your client_id parameter to the stream url and then you will be redirected to the stream link (mp3).

Getting Started with Facebook API

I have a friend that owns a small business and has a Page on Facebook. I want to help her manage it from a marketing perspective, and figure that it may be best to do so through their API.
I have skimmed their API documentation, and have a basic working knowledge of Python. What I can't figure out is if I can access their page's data with Python and grab the data on wall posts, who liked posts, etc. Is this possible? I can't find a decent tutorial for someone who is new to programming.
To provide context, I have been scraping the Twitter Search API for some time now and I am hoping there is something similar (request certain data elements, and have it returned as structured data I can analyze). I find their API extremely straight forward, and for Facebook, I don't know where to begin.
I don't want to create an application, I simply want to access the data that is related to my friend's page.
I am hoping to find some decent tutorials and help on what I will need to get started. Any help you can provide will be greatly appreciated.
You could try Pyjamas Desktop.
http://pyjs.org/
It runs python in an embedded web browser and gives you access to the html DOM.
This potentially means that you can use the JS api directly from python.
You will need to be running a server locally though.
Basically to automate posting stuff to the persons profile you need to get their oath token and then make API calls w/ that token.
Here are steps to get API token:
Register APP w/ facebook and get app id
Have your friend click this link https://www.facebook.com/dialog/oauth?
client_id=[your app id here]&
type=user_agent&
scope=email,read_stream,,,user_about_me,offline_access,publish_stream&
redirect_uri=http://www.facebook.com/connect/login_success.html
Then record that token for future
You can now use any available python FB lib to post and manage that FB page.
This should get you started:
http://eggie5.com/20-getting-started-w-facebook-api

Categories

Resources