URL fetch: prevent abuse, mailcious urls etc. in python/django

URL fetch: prevent abuse, mailcious urls etc. in python/django - python

I'm building a webpage featuring a very much a-like the facebook wall/newsfeed. Registered users (or through Facebook-connect, google auth) can submit urls. At the moment, I'm taking these URLs and use urllib2 to fetch the content of the URL and search for relevant information like og:properties, HTML title-tag and perheps some -tags for images.
Now, I understand that I'm putting my server at risk when I'm letting users feed my server with URLs to open.
My question is how high the risk is? What standard security checks can I make?
As for now, I am simply opening the url without any "active" protection because I don't know what to check for.
And what about storing fetched content into the database. Does django have built-in protection against SQL-injections?
Thank you!

One of the obvious risks here is that one could use your website as a vector for spreading malicious URLs.
E.g. Say I figure out a malformed html that allows for arbitrary code execution in webkit based browsers, say by exploiting a certain 0-day buffer overflow. Say your website goes popular, that'd be one of the spots I'd definitely try.
Now, you can't possibly match the contents of the URLs submitted to look for security flaws. You'd become an anti-virus/security company then. Both Chrome & Safari do take care of these to some extent.
For user's/content's sake and for the risk I explained, you could build in a flagging system that learns by user's actions. You could train a classifier whenever someone flags a URL, see examples here.
I'm sure there is a variety of such solutions, also in python.
For a quick overview of security, sql injections in Django's context, checkout this link.

Related

get icloud web service endpoints to fetch data

My question may look silly but I am asking this after too much search on Google, yet not have any clue.
I am using iCloud web services. For that I have converted this Python code to PHP. https://github.com/picklepete/pyicloud
Up to this, everything is working good. When authenticate using icloud username,password I am getting a list of web service URLs as part of response. Now for example to use Contacts web service, I need to use Contact web service URL and add a part to that URL to fetch contacts.
https://p45-contactsws.icloud.com:443/co/startup with some parameters.
The webservice URL https://p45-contactsws.icloud.com:443 is coming in response while authenticating. But the later part, 'co/startup' is there in the python code. I don't know how they found that part. So for some services which is there in Python code, they are working good. But I want to use few other service like https://p45-settingsws.icloud.com:443, https://p45-keyvalueservice.icloud.com:443 etc. and when I try to send request with correct parameters to this other services, I am getting errors like 404 not found or unauthorized access. So I believe that some URL part must be added to this just like contacts. If someone knows how or where can I get correct URL part, I will be really thankful.
Thanks to all in advance for their time reading/answering my question.

I am afraid there doesn't seem to be an official source for these API endpoints, since they seem to be discovered through sniffing the network calls rather than a proper guide from Apple. For example, this presentation, which comes from a forensic tools company, is from 2013 and covers some of the relevant endpoints. Note that iOS was still at versions 5 & 6 then (vs. the current v9.3).
All other code samples on the net basically are using the same set of API endpoints that were originally observed in 2012-2013. (Here's a snippet from another python module with additional URLs you may use.) However, all of them pretty much point to each other as the source.
If you'd like to pursue a different path, Apple now promotes the CloudKit and CloudKit JS solutions for registered apps working with iCloud data.

How to hide url after domain in web2py?

I am building a website using web2py. For security reasons I would like to hide the url after the domain to the visitors. For example, when a person clicks a link to "domain.com/abc", it will go to that page and the address bar shows "domain.com".
I have played with the routes_in and routes_out, but it only seems to map your typed url to a destination but not hiding the url.
How can I do that? Thanks!

Well I guess you're going to have a build the worlds most remarkable single page application :) Security through obscurity is never a good design pattern.
There is absolutely no security "reason" for hiding a URL if your system is designed in a such a way that the use of the URLs is meaningless unless the access control layer defines permissions for such use (usually through an authentication and role/object based permission architecture).
Keep in mind - anyone these days can use Chrome inspector to see whatever you are trying to hide in the address bar.
For example. Say you want to load domain.com/adduser
Sure you make an AJAX call to that URL, and the browser address bar would never change from domain.com/ - but a quick look in the source will uncover /adduser pretty quickly.
Sounds like you need to have a think about what these addresses really expose and start locking them down.

Security risks of a link scraping system

I'm implementing a link scraping system like Facebook's link share feature, whereby a user enters a url which is passed to our server via ajax, and our server then does a get request (using the requests library) and parses the response html with Beautiful Soup to capture relevant information about the page.
In this type of system, obviously a person can enter any url that they want. I'm trying to imagine what type of security risks our server could be exposed to in this type of scenario? Could such a set up be exploited maliciously?

You probably want to make sure that your server doesn't execute any plugins or copy any videos/images.
Javascript is trickier, if you ignore it you will miss some links, if you execute it then you had better be sure you aren't being used to do something like send spam.
If you are asking on SO you probably aren't sure enough!

You should do a google on RFI/LFI (Remote / Local) File Inclusion Vulnerability and Iframe attacks. If you are safe from these two attacks , then you're good.

I have built quite a few small & large crawling systems. Actually not sure what kind of security risks you are talking about. I am not clear on your requirements.
But if all you are doing is fetch the html using BeautifulSoup & then extracting certain stuff about the page like title tag & meta tag info etc. & then store this data. I dont see any problems.
Unless you are not blindly doing some kind of eval either on the response of the url or on the stuff the user entered you are safe I feel.

Python get data from secured website

Id like to know if there is a way to get information from my banking website with Python, Id like to retrieve my card history and display it, and possibly save it into a text document each month.
I have found the urls ext to login and get the information from the website, which works from a browser, but I have been using liburl2 to "open" the webpages from Python and I have a feeling its not working because of some cookie or session things.
I can get any information I want from a website that does not require a login with urllib2, and then save the actual HTML and go through it later, but I cant on my banks website,
Any help would be appreciated

This is a part of Web-Scraping :
Web-scraping is a standard task that can serve various needs.
Scraping data out of secure-website means https
Handling https is not a problem with mechanize and BeautifulSoup
Although urllib2 with HTTPCookieJar also works fine
If managing the cookies is the problem, then I would recommend mechanize
Considering the case of your BANK-Site :
I would recommend not to play with your account.
If you must then, its not as easy as any normal secure/non-secure site.
These sites are designed to with-stand such scripts.
Problems that you would face with this:
BANK sites will surely have Captcha that is almost impossible to by-pass with a script unless you employee a lot of rocket-science and effort.
Other problem that you will definitely face is javascript, standard scripting solutions are focused to manage cookies, HTML parsing, etc. For processing javascript on links you will have to process js in your python script. That again needs a lot of effort.
Then, AJAX that again comes from javascript fetches data from server after page-load.
So, it will require you to take a lot of effort to do this task.
Also, if you try doing this you risk of blocking access to your account since banking sites are quick to block account access on 3-4 unsuccessful attempt on login or captcha, etc.
So, think before you do.

Retrieving my own data via FaceBook API

I am building a website for a comedy group which uses Facebook as one of their marketing platforms; one of the requirements for the new site is to display all of their Facebook events on a calendar.
Currently, I am just trying to put together a Python script which can pull some data from my own Facebook account, like a list of all my friends. I presume once I can accomplish this I can move to pulling more complicated data out of my clients account (since they have given me access to their account).
I have looked at many of the posts here, and also went through the Facebook API documentation, including Facebook Connect, but am really beating my head against the wall. Everything I have read seems like overkill, as it involves setting up a good deal of infrastructure to allow my app to set up connections to any arbitrary user's account (who authorizes me). Shouldn't it be much simpler, given I only ever need to access 1 account?
I cannot find a way to retrieve data without having to display the Facebook login window. I have a script which will retrieve all my friends, but it includes a redirect where I have to physically log myself in to Facebook.
Would appreciate any advice or links, I just feel like I must be missing something simple.
Thank you!

Just posting up my notes on the successful advice, should others find this post;
Per Daniel and William's advice, I obtained the right permissions using the Connect options. From William, this link explains how the Facebook connection works
https://developers.facebook.com/docs/authentication/
This section on setting up the actual authentication was most helpful to me.
http://developers.facebook.com/docs/api
Basically, it goes as follows:
Post a link to the following URL. A user will need to physically click on it (even if that user is just you, the site admin).
https://graph.facebook.com/oauth/authorize?client_id=YOUR_CLIENT_ID&redirect_uri=http://www.example.com/HANDLER
This will redirect to a Facebook login, which will return to http://www.example.com/HANDLER after the user authenticates. If you wish to do more than basic reads and news feed updates you will need to include this variable in the above link: scope=offline_access,user_photos. The scope variable just includes a comma separated list of values, which Facebook will explicitly tell the authenticating user about during the login process, and they will have to OK. Most helpful for me was the offline_access flag (user_photos lets you get at their photos too), so I can pull content without someone logging in regularly (so long as I store the access token obtained later)
Have a script located at http://www.example.com/HANDLER that will take a variable from the request (so facebook will redirect to http://www.example.com/HANDLER&code=YOUR_CODE after authentication). Your handler needs to pull out the code variable, and then send the following request:
https://graph.facebook.com/oauth/access_token?
client_id=YOUR_CLIENT_ID&
redirect_uri=http://www.example.com/oauth_redirect&
client_secret=YOUR_SECRET_KEY&
code=YOUR_CODE
This request will return a string of the form access_token=YOUR_ACCESS_TOKEN.
Just parse off the 'access_token=', and you will have a token that you can use to access the facebook graph API, in requests like
http://graph.facebook.com/me/friends?access_token=YOUR_ACCESS_TOKEN
This will return a JSON object containing all of your friends
Hope this saves someone else some not fun time straining through documentation. Thanks for the help!

It is true, that Facebook's API is targeted at developers who are creating apps that will be used by many users.
Thankfully, the new Graph API is much simpler to use than its predecessor, and shouldn't be terribly difficult for you to work with without using or creating a lot of underlying infrastructure.
You will need to implement authorization, but this is not difficult, and as long as you prompt the user for the offline_access permission, it'll only need to be done once.
The documentation on Desktop Authentication would probably be most relevant to you at this point, though you might want to move to the javascript-based authentication once you've got a web app up and running.
Once the authentication is done, all you're doing is making GET requests to various urls and working with the resulting JSON.
Here's the documentation about Events, and you can get a list of friends from the friends connection of a User.

I'm not expert on Facebook/Facebook Connect, however I've seen it used/used applications with it and it seems there's really only the 'official' way to do it. I'm afraid it looks like your best bet would probably be something along the lines of this.
http://wiki.developers.facebook.com/index.php/Connect/Authentication_and_Authorization
Regardless of how you actually 'use' it, you'll still need to authorize the application to connect to the account and this means having a Facebook App as well.

The answer to Facebook application authentication is hard to find but is actually found within the "Analytics" page of the Graph API.
Specify the following: https://graph.facebook.com/oauth/access_token?client_cred&client_id=yourappid&client_secret=yourappsecret , you will then be given an access_token that you may use on all other calls.
The Facebook provided APIs do NOT currently provide this level of functionality.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

URL fetch: prevent abuse, mailcious urls etc. in python/django - python

Related

get icloud web service endpoints to fetch data

How to hide url after domain in web2py?

Security risks of a link scraping system

Python get data from secured website

Retrieving my own data via FaceBook API

Categories

Resources