I'm using praw to write a bot for reddit. I already know how to use the get_comments() function to get all the comments in a subreddit. However, I would like to get the titles of all the posts in a subreddit, however, after going through the docs for praw, I could not find a function which does so.
I just want to go into a subreddit, and fetch all the titles of the posts, and then store them in an object.
Could someone tell me how I go around achieving this?
import praw
r=praw.Reddit('demo')
subreddit=r.get_subreddit('stackoverflow')
for submission in subreddit.get_hot(limit=10):
print submission.title
This information is available in the PRAW documentation.
Seems pretty late but anyways you can go through the official reddit api JSON responses format. From there you can see all the attributes that are available to you for a particular object.
Here's the github link for the reddit API
Edit: You can also use pprint(vars(object_name))
Related
I'm using PRAW to scrape for some content from reddit. I can get info on a submission (praw.objects.Submission), but I don't see from the documentation how to tell if the post is flagged as NSFW or not. Is it possible to figure this out through PRAW or should I use another api wrapper?
You may figure it out through PRAW by retrieving a submission object, and then applying over_18 to the object itself (as #Kevin suggested).
Here's an example:
if submission.over_18:
...
else:
...
And for future references, by using "dir(object)," you'll be able to see both attributes and methods that pertain to the Reddit API (which you may use to test and see all properties that effect the given object being tested). You can ignore everything that starts with an underscore (most likely).
Or you can go straight to source where PRAW is getting its data. The variable names are not set by PRAW, they come from this JSON (linked above).
Issues using SoundCloud API with python to get user info
I've downloaded the soundcloud library and followed the tutorials, and saw on the soundcloud dev page that user syntax is, for example /users/{id}/favorites.
I just don't know how to use python to query user information. Specifically, i would like to print a list of tracks that a given user liked, (or favorited, but liked would be better).
any help would be greatly appreciated. thanks!
Generally, It's better to mention what you have tried and show some code. It makes it easier for people to help you on Stack Overflow. Regardless maybe looking at SoundCloud's Python wrapper will help you.
You can also do the following :
import soundcloud
token= 'user_access_token'
client = soundcloud.Client(access_token=token)
user_info = client.get('/me')
user_favorites = client.get('/me/favorites')
user_tracks = client.get('/me/tracks')
and so on...
I figured it out, pretty simple just didn't know the exact syntax.
users = client.get('/users', g ='keyword')
According to their documentation:
This should be enough to get the hottest new reddit submissions:
r = client.get(r'http://www.reddit.com/api/hot/', data=user_pass_dict)
But it doesn't and I get a 404 error. Am I getting the url for data request wrong?
http://www.reddit.com/api/login works though.
Your question specifically asks what you need to do to get the "hottest new" submissions. "Hottest new" doesn't really make sense as there is the "hot" view and a "new" view. The URLs for those two views are http://www.reddit.com/hot and http://www.reddit.com/new respectively.
To make those URLs more code-friendly, you can append .json to the end of the URL (any reddit URL for that matter) to get a json-representation of the data. For instance, to get the list of "hot" submissions make a GET request to http://www.reddit.com/hot.json.
For completeness, in your example, you attempt to pass in data=user_pass_dict. That's definitely not going to work the way you expect it to. While logging in is not necessary for what you want to do, if you happen to have need for more complicated use of reddit's API using python, I strongly suggest using PRAW. With PRAW you can iterate over the "hot" submissions via:
import praw
r = praw.Reddit('<REPLACE WITH A UNIQUE USER AGENT>')
for submission in r.get_frontpage():
# do something with the submission
print(vars(submission))
According to the docs, use /hot rather than /api/hot:
r = client.get(r'http://www.reddit.com/hot/', data=user_pass_dict)
I am sorry for asking but I am new in writing crawler.
I would like to crawl Twitter space for Twitter users and follow relationship among them using python.
Any recommendation for starting points such as tutorials?
Thank you very much in advance.
I'm a big fan of Tweepy myself - https://github.com/tweepy/tweepy
You'll have to refer to the Twitter docs for the API methods that you're going to need. As far as I know, Tweepy wraps all of them, but I recommend looking at Twitter's own docs to find out which ones you need.
To construct a following/follower graph, you're going to need some of these:
GET followers/ids - grab followers (in IDs) for a user
GET friends/ids - grab followings (in IDs) for a user
GET users/lookup - grab up to 100 users, specified by IDs
besides reading the twitter api?
a good starting point would be the great python twitter library by mike verdona which personally I think is the the best one. (also an intorduction here)
also see this question in stackoverflow
I want to collect old tweets for a specific period. I have found out that topsy provides the Otter API to do that. I am developing in Python so I found the python-otter API http://otterapi.googlecode.com/svn/trunk/. However, there is no documentation and I have no idea how to use it! Does anybody know if there is any documention at all. And btw is there another way I can find old tweets programatically?
Thanks
The documentation can be found in http://code.google.com/p/otterapi/wiki/Resources
Why not directly make GET requests using urllib2 and the like??