What ready available algorithms could I use to data mine twitter to find out the degrees of separation between 2 people on twitter.
How does it change when the social graph keeps changing and updating constantly.
And then, is there any dump of twitter social graph data which I could use rather than making so many API calls to start over.
From the Twitter API
What's the Data Mining Feed and can I have access to it?
The Data Mining Feed is an expanded version of our /statuses/public_timeline REST API method. It returns 600 recent public statuses, cached for a minute at a time. You can request it up to once per minute to get a representative sample of the public statuses on Twitter. We offer this for free (and with no quality of service guarantees) to researchers and hobbyists. All we ask is that you provide a brief description of your research or project and the IP address(es) you'll be requesting the feed from; just fill out this form. Note that the Data Mining Feed is not intended to provide a contiguous stream of all public updates on Twitter; please see above for more information on the forthcoming "firehose" solution.
and also see: Streaming API Documentation
There was a company offering a dump of the social graph, but it was taken down and no longer available. As you already realized - it is kind of hard, as it is changing all the time.
I would recommend checking out their social_graph api methods as they give the most info with the least API calls.
There might be other ways of doing it but I've just spent the past 10 minutes looking at doing something similar and stumbled upon this Q.
I'd use an undirected (& weighted - as I want to look at location too) graph - use JgraphT or similar in py; JGraphT is java based but includes different prewritten algos.
You can then use an algorithm called BellmanFord; takes an integer input and searches the graph for the shortest path with the integer input, and only integer input, unlike Dijkstras.
http://en.wikipedia.org/wiki/Bellman%E2%80%93Ford_algorithm
I used it recently in a project for flight routing, iterating up to find shortest path with shortest 'hops' (edges).
Related
I want to get the route between 2 points using google maps api but I want also avoid some coordinates between them.
I have been investigating this feature but I do not know if it is possible to get this done. See these threads:
Is there a way to avoid a specific road or coordinate in Google Directions?
Avoid some coordinates in routes using Google Directions API Android
Anyone know if it is possible?
Thanks
Avoid feature has been introduced in Google Maps Distance Matrix API, however it can only be used to avoid Tolls, Highways, Ferries, and Indoor.
you can check this on its documentation page.
https://developers.google.com/maps/documentation/distance-matrix/intro
This feature is pretty popular and has been formerly requested in Google Issue Tracker. If you really need it you can go ahead and support it by starring it.
https://issuetracker.google.com/issues/35816642
I am working on a project for which I want to extract the timelines of around 500 different twitter users (I am using this for historical analysis, so I'll only need to retrieve them all once- no need to update with incoming tweets).
While I know the Twitter API only allows the last 3,200 tweets to be retrieved, when I use the basic UserTimeline method of the R twitteR package, I only seem to fetch about 20 every time I try (for users with significantly more, recent, tweets). Is this because of rate limiting, or because I am doing something wrong?
Does anyone have tips for doing this most efficiently? I realize it might take a lot of time because of rate limiting, is there a way of automating/iterating this process in R?
I am quite stuck, so thank you very much for any help/tips you may have!
(I have some experience using the Twitter API/twitteR package to extract tweets using a certain hashtag over a couple of days. I have basic Python skills, if it turns out to be easier/quicker to do in Python).
It looks like the twitteR documentation suggests using the maxID argument for pagination. So when you get the first batch of results, you could use the minimum ID in that set minus one as the maxID for the next request, until you get no more results back (meaning you've gotten to the beginning of a user's timeline).
I have a simple piece of Python code to run a Radar Search with Places API and return the results coordinates in a list. I run into three problems, first, the results pulled this way do not match with doing a search on Google Maps itself using the same coordinates and parameters. Specifically, I get MANY more results on Radar Search. In a radius of 1km, I get more than 200 results for a restaurant chain name.
Second, the results go beyond 1km, my specified radius. The furthest is 1.3km away using Haversine.
Third, the results are wrong. The keyword field has no effect on the results. For example, searching for "McDonalds" or "Car" with the same parameters yield the exact same results. One of the results points to an Adidas store when I use the Place ID to find the Google description.
This is code independent, these problems are there if I just C&P this into the url bar:
https://maps.googleapis.com/maps/api/place/radarsearch/json?location=39.876186,116.439424&radius=1000&keyword=McDonalds&key=KEY
I have seen another similar post on Places API malfunctioning recently. Any help is appreciated. Thanks
I have a support ticket open with Google about this, as we're Enterprise customers, and they have confirmed there is an issue and they're working on it. From my conversations with them over the last few days:
There have been a few other reports of this issue and we've reported
the problem to the Places API team. I'll get back to you as soon as we
have more information from them.
We've received some other reports of this and the API engineers are
looking at the issue with the highest priority. There's no obvious
cause yet, but we'll let you know when they're done investigating and
have determined a fix.
I'm sorry to hear about the complaints that you're receiving, but
unfortunately the engineers haven't been able to give me an ETA yet. I
expect to hear back from them soon but can't give an estimate yet.
I'll post updates here as I get them.
UPDATE 9/8: Google's support is saying this issue will be fixed by end of the week.
UPDATE 9/12: Google fixed it. It was being tracked here: https://code.google.com/p/gmaps-api-issues/issues/detail?id=7082
I did try this API: (insert)
https://developers.google.com/youtube/v3/docs/playlistItems/insert#try-it
and got a 200OK and a json revising what I've insert correctly.
what I specified is the video resource (kind, id) and a playlistId which is my 'watchhistory'.
The strange thing is I cannot see the new result when I do a list (GET) with the corresponding api calls. However, if I actually go to youtube.com, I see the new item which I never watch, in my watch history appearing the right order (newest). But my objective is to still be able to get this information with the api calls. Anybody experience similar things?
I think this is related to an internal engineering issue concerning the API not reporting the full watch history. It's something that they've been working on for a few months now, with limited success (they've pushed a couple of changes, but it hasn't remedied the problem). Basically, until it is fixed, the v3 watch history only returns an older, incomplete subset of the actual history.
Here's the issue if you want to track it or contribute to the data to help resolve it:
https://code.google.com/p/gdata-issues/issues/detail?id=4642
I need to get the number of people who have followed a certain account by month, also the number of people who have unfollowed the same account by month, the total number of tweets by month, and the total number of times something the account tweeted has been retweeted by month.
I am using python to do this, and have installed python-twitter, but as the documentation is rather sparse, I'm having to do a lot of guesswork. I was wondering if anyone could point me in the right direction? I was able to get authenticated using OAuth, so thats not an issue, I just need some help with getting those numbers.
Thank you all.
These types of statistical breakdowns are not generally available via the Twitter API. Depending on your sample date range, you may have luck using Twittercounter.com's API (you can sign up for an API key here).
The API is rate limited to 100 calls per hour, unless you get whitelisted. You can get results for the previous 14 days. An example request is below:
http://api.twittercounter.com?twitter_id=813286&apikey=[api_key]
The results, in JSON, look like this:
{"version":"1.1","username":"BarackObama","url":"http:\/\/www.barackobama.com","avatar":"http:\/\/a1.twimg.com\/profile_images\/784227851\/BarackObama_twitter_photo_normal.jpg","followers_current":7420937,"date_updated":"2011-04-16","follow_days":"563","started_followers":"2264457","growth_since":5156480,"average_growth":"9166","tomorrow":"7430103","next_month":"7695917","followers_yesterday":7414507,"rank":"3","followers_2w_ago":7243541,"growth_since_2w":177396,"average_growth_2w":"12671","tomorrow_2w":"7433608","next_month_2w":"7801067","followersperdate":{"date2011-04-16":7420937,"date2011-04-15":7414507,"date2011-04-14":7400522,"date2011-04-13":7385729,"date2011-04-12":7370229,"date2011-04-11":7366548,"date2011-04-10":7349078,"date2011-04-09":7341737,"date2011-04-08":7325918,"date2011-04-07":7309609,"date2011-04-06":7306325,"date2011-04-05":7283591,"date2011-04-04":7269377,"date2011-04-03":7257596},"last_update":1302981230}
The retweet stats aren't available from Twittercounter, but you might be able to obtain those from Favstar (although they don't have a public API currently.)
My problem is I also need to get unfollow statistics, which twittercounter does not supply.
My solution was to access the twitter REST API directly, using the oauth2 library in python. I found this very simple compared to some of the other twitter libraries for python out there. This example was particularly helpful: http://parand.com/say/index.php/2010/06/13/using-python-oauth2-to-access-oauth-protected-resources/