Using Python to extract LinkedIn information [duplicate] - python

This question already exists:
Python: visiting random LinkedIn profiles [closed]
Closed 8 years ago.
I'm trying to visit, say a set of 8,000 LinkedIn profiles that belong to people who have a certain first name (just for example, let's say "Larry"), and then would like to extract the kinds of jobs each user has held in the past. Is there an efficient way to do this? I would need each Larry to be picked independently from one another; basically, traversing someone's network isn't a good way to do this. Is there a way to completely randomize how the Larry's are picked?
Don't even know where to start. Thanks.

To Start:
Trying to crawl the response linkedin gives you on your browser would be almost suicidal.
Check their APIs (particularly the People's API) and their code samples.
Important disclaimer found in the People's API:
People Search API is a part of our Vetted API Access Program. You must
apply here and get LinkedIn's approval before using this API.
MAYBE with that in mind you'll be able to write an script that queries and parses those APIs. For instance, retrieving users with Larry as a first name http://api.linkedin.com/v1/people-search?first-name=Larry
Once you get approved by Linkedin and you have retrieved some data from their APIs and tried some json or XML parsing (whatever the APIs return), you will have something more specific to ask.
If you still want to crawl the html returned by linkedin when you hit https://www.linkedin.com/pub/dir/?first=Larry&last=&search=Search take a look to BeautifulSoup

Related

How to get list of categories in Google Books API?

I was searching for an already answered question about this but couldn't find one so please forgive me if I somehow missed it.
I'm using Google Books API and I know I can search a book by specific category.
My question is, how can I get all the available categories from the API?
I looked in the API documentation but couldn't find any mention of this.
The Google books api does not have an end point for returning Categories that are not associated with a book itself.
The Google Books api is only there to list books. You can
search and browse through the list of books that match a given query.
view information about a book, including metadata, availability and price, links to the preview page.
manage your own bookshelves.
You can see the category of a book you can not get a list of available categories in the whole system
You may be interested to know this has been on their todo list since 2012 category list
We have numerous requests for this and we're investigating how we can properly provide the data. One issue is Google does not own all the category information. "New York Times Bestsellers" is one obvious example. We need to first identify what we can publish through the API.
work around
i worked around it by implementing my own category list mechanism so i can pull all categories that exists in my app's database.
(unfortunately, the newly announced ScriptDb deprecation means my whole system will go to waste in a couple of monthes anyway... but that's another story)
https://support.google.com/books/partner/answer/3237055?hl=en
Scroll down to subject/genres and you will see this link.
https://bisg.org/page/bisacedition
This list is apparently a list of subjects AKA categories for North American Books. I am making various GET requests with an API testing tool and getting for the most part, perfect matches (you may have to drop a word from the query string. ex: "criticism" instead of "literary criticism") for whatever subject I choose from the BISG subjects list, and what comes back in the json response under the "categories" key.
Ex: GET https://www.googleapis.com/books/v1/volumes?q=business+subject:juvenile+fiction
Long story short, the BISG link is where I'm pretty sure Google got all the options for their "categories" key from.

Find a Facebook group ID via graph

Im writing a script to convert facebook user id's in Python. To get a user ID I am simple going to:
https://graph.facebook.com/USERNAME
Is there any way to access this for groups as well?
I have found other questions like this but none give a definitive answer.
Do I need to use oauth?
I have seen several websites that do what I need such as
http://lookup-id.com/
https://www.wallflux.com/facebook_id/

How can I scrape twitter's tweets that go 2 years back by scraping?

I've been looking into scraping, and I cant manage to scrape twitter searches that date long way back by using python code, i can do it however with an addon for chrome but it falls short since it will only let me obtain a very limited amount of tweets. can anybody point me in the right direction¿
I found a related answer here:
https://stackoverflow.com/a/6082633/1449045
you can get up to 1500 tweets for a search; 3200 for a particular user's timeline
source: https://dev.twitter.com/docs/things-every-developer-should-know
see "There are pagination limits"
Here you can find a list of libraries to simplify the use of the APIs,
in different languages, including python
https://dev.twitter.com/docs/twitter-libraries
You can use snscrape. It doesn't require a Twitter Developer Account it can go back many years.

Python scripting for XBMC

I am new to programming and to Python itself. I have no programming experience. I have managed to read up on Python and done some fairly basic Python tutorial, now I am ready for my first project in Python.
I am basing my project around XBMC, I want to develop some addons for this awesome media center.
I have a few websites that I want to scrape and display in XBMC. One is a music website and one is a payed TV website which is only available to people with accounts with them. I have managed to scrape a website with feedparse but I have no idea how to output these titles and links to play in XBMC.
My question here is: where do I start, how do I construct the script for these websites, what tools/libraries/modules do I need. And what do I need to do to include it into XBMC.
On the general topic that has been asked a ton of times regarding webpage scraping, the common answer is always Mechanize/Beautiful Soup for python. That would allow you to actually get your data.
Once you have your data, its then just a matter of formatting it the way you want, for your xbmc app: http://wiki.xbmc.org/index.php?title=HOW-TO:Write_Python_Scripts_for_XBMC
Its a two step process.
Get your data from a source and format it into some common structure
Use the common structure to populate your elements in the xbmc script
What you actually want to do with your script will determine how you would use your data. If its just simply providing information, then that link above would pretty much explain it.

Crawler for Twitter social graph via python

I am sorry for asking but I am new in writing crawler.
I would like to crawl Twitter space for Twitter users and follow relationship among them using python.
Any recommendation for starting points such as tutorials?
Thank you very much in advance.
I'm a big fan of Tweepy myself - https://github.com/tweepy/tweepy
You'll have to refer to the Twitter docs for the API methods that you're going to need. As far as I know, Tweepy wraps all of them, but I recommend looking at Twitter's own docs to find out which ones you need.
To construct a following/follower graph, you're going to need some of these:
GET followers/ids - grab followers (in IDs) for a user
GET friends/ids - grab followings (in IDs) for a user
GET users/lookup - grab up to 100 users, specified by IDs
besides reading the twitter api?
a good starting point would be the great python twitter library by mike verdona which personally I think is the the best one. (also an intorduction here)
also see this question in stackoverflow

Categories

Resources