I'm using PRAW to scrape for some content from reddit. I can get info on a submission (praw.objects.Submission), but I don't see from the documentation how to tell if the post is flagged as NSFW or not. Is it possible to figure this out through PRAW or should I use another api wrapper?
You may figure it out through PRAW by retrieving a submission object, and then applying over_18 to the object itself (as #Kevin suggested).
Here's an example:
if submission.over_18:
...
else:
...
And for future references, by using "dir(object)," you'll be able to see both attributes and methods that pertain to the Reddit API (which you may use to test and see all properties that effect the given object being tested). You can ignore everything that starts with an underscore (most likely).
Or you can go straight to source where PRAW is getting its data. The variable names are not set by PRAW, they come from this JSON (linked above).
Related
This is my first time coding, and as such I think my problem is most likely general confusion and difficulty with terms. I have the login function and the reply functions working on my bot, but I'm stuck at what command to use to narrow my bot's search-for-keyword range to a specific thread and or user, instead of an entire subreddit.
I've tried looking at the PRAW documentation and Build-A-Bot tutorials online, but I can't find any compatible commands in Python/PRAW to search a specific Redditor, comment, or subreddit thread.
This is the original command for PRAW that makes my bot search the subreddit for its key phrase:
for comment in r.subreddit('').comments(limit=25):
But I'm trying to hone it in on searching more specifically, so I tried this:
for comment in r.submission('#portion of the URL that has the submission ID in it').comments(limit=25):
But that just returns "TypeError: 'CommentForest' object is not callable."
I've also tried:
for comment in r.user('#Redditor name').comments(limit=25):
But that just returns "TypeError: 'User' object is not callable."
I have zero coding background and I'm actually having a lot of fun with Python thus far! I'm just stuck at this point. Any help and or suggestions would be appreciated!
I think what you want is redditor rather than user. From the praw docs:
# assume you have a Submission instance bound to variable `submission`
redditor1 = submission.author
print(redditor1.name) # Output: name of the redditor
# assume you have a Reddit instance bound to variable `reddit`
redditor2 = reddit.redditor('bboe')
print(redditor2.link_karma) # Output: bboe's karma
You may have already seen them, but the docs can be found here.
I'm trying to iterate over submissions of a certain subreddit from the newest to the oldest using PRAW. I used to do it like this:
subreddit = reddit.subreddit('LandscapePhotography')
for submission in subreddit.submissions(None, time.time()):
print("Submission Title: {}".format(submission.title))
However, when I try to do it now I get the following error:
AttributeError: 'Subreddit' object has no attribute 'submissions'
From looking at the docs I can't seem to figure out how to do this. The best I can do is:
for submission in subreddit.new(limit=None):
print("Submission Title: {}".format(submission.title))
However, this is limited to the first 1000 submissions only.
Is there a way to do this with all submissions and not just the first 1000 ?
Unfortunately, Reddit removed this function from their API.
Check out the PRAW changelog. One of the changes in version 6.0.0 is:
Removed
Subreddit.submissions as the API endpoint backing the method is no more. See
https://www.reddit.com/r/changelog/comments/7tus5f/update_to_search_api/.
The linked post says that Reddit is disabling Cloudsearch for all users:
Starting March 15, 2018 we’ll begin to gradually move API users over to the new search system. By end of March we expect to have moved everyone off and finally turn down the old system.
PRAW's Subreddit.sumbissions() used Cloudsearch to search for posts between the given timestamps. Since Cloudsearch has been removed and the search that replaced it doesn't support timestamp search, it is no longer possible to perform a search based on timestamp with PRAW or any other Reddit API client. This includes trying to get all posts from a subreddit.
For more information, see this thread from /r/redditdev posted by the maintainer of PRAW.
Alternatives
Since Reddit limits all listings to ~1000 entries, it is currently impossible to get all posts in a subreddit using their API. However, third-party datasets with APIs exist, such as pushshift.io. As /u/kungming2 said on Reddit:
You can use Pushshift.io to still return data from defined time
periods by using their API:
https://api.pushshift.io/reddit/submission/search/?after=1334426439&before=1339696839&sort_type=score&sort=desc&subreddit=translator
This, for example, allows you to parse submissions to r/translator
between 2012-04-14 and 2012-06-2014.
You can retrieve all the data from pushshift.io using an iterative loop. Just set the start date as the current epoch date, and get 1000 items, then put the created_utc of the last items in the list as the before parameter to get the next 1000 items and keeps going until it stops returning.
Below is a useful link for further information:
https://www.reddit.com/r/pushshift/comments/b7onr6/max_number_of_results_returned_per_query/enter link description here
Pushshift doesn't work for private subreddits. In that case you can create a database 1000 submissions at a time from now on (not retroactive).
If you just need as many submissions as possible you could try using the different sort methods top, hot, new and combine them.
I have a django project and I am trying to integrate SynapsePay Api. I sent a request that returned a class as a response. I am trying to figure out how to cycle or parse the response to get to the json so that I can grab certain values from the response that was returned. I have looked everywhere and cant seem to find a solution or a way to get into a class objects and reach the json part of the return.
Here is the response that I am getting...
I want to grab the _id from both of the objects returned in the object response below. Does anyone know how I can do this??
[
<class 'synapse_pay_rest.models.nodes.ach_us_node.AchUsNode'>( {
'user':"<class 'synapse_pay_rest.models.users.user.User'>(id=...49c04e1)",
'json':{}
} )
]
It would help if you linked the library you are using, which I'm guessing is
https://github.com/SynapseFI/SynapseFI-Python
In this case, there really is no need to grab the JSON - the client library has already wrapped the returned JSON in easy to use objects. The class instances (not classes) it returns already have an id attribute which is what you want. You can also get the raw JSON from the json attribute (based on my reading of the source code.
I'd recommend installing IPython (pip install ipython) and using the repl to execute commands and play around with the response objects. You will quickly be able to use tab completion to find what attributes are available.
I'm using praw to write a bot for reddit. I already know how to use the get_comments() function to get all the comments in a subreddit. However, I would like to get the titles of all the posts in a subreddit, however, after going through the docs for praw, I could not find a function which does so.
I just want to go into a subreddit, and fetch all the titles of the posts, and then store them in an object.
Could someone tell me how I go around achieving this?
import praw
r=praw.Reddit('demo')
subreddit=r.get_subreddit('stackoverflow')
for submission in subreddit.get_hot(limit=10):
print submission.title
This information is available in the PRAW documentation.
Seems pretty late but anyways you can go through the official reddit api JSON responses format. From there you can see all the attributes that are available to you for a particular object.
Here's the github link for the reddit API
Edit: You can also use pprint(vars(object_name))
According to their documentation:
This should be enough to get the hottest new reddit submissions:
r = client.get(r'http://www.reddit.com/api/hot/', data=user_pass_dict)
But it doesn't and I get a 404 error. Am I getting the url for data request wrong?
http://www.reddit.com/api/login works though.
Your question specifically asks what you need to do to get the "hottest new" submissions. "Hottest new" doesn't really make sense as there is the "hot" view and a "new" view. The URLs for those two views are http://www.reddit.com/hot and http://www.reddit.com/new respectively.
To make those URLs more code-friendly, you can append .json to the end of the URL (any reddit URL for that matter) to get a json-representation of the data. For instance, to get the list of "hot" submissions make a GET request to http://www.reddit.com/hot.json.
For completeness, in your example, you attempt to pass in data=user_pass_dict. That's definitely not going to work the way you expect it to. While logging in is not necessary for what you want to do, if you happen to have need for more complicated use of reddit's API using python, I strongly suggest using PRAW. With PRAW you can iterate over the "hot" submissions via:
import praw
r = praw.Reddit('<REPLACE WITH A UNIQUE USER AGENT>')
for submission in r.get_frontpage():
# do something with the submission
print(vars(submission))
According to the docs, use /hot rather than /api/hot:
r = client.get(r'http://www.reddit.com/hot/', data=user_pass_dict)