Facebook sdk for python cannot limit page feed query with 'since'

Facebook sdk for python cannot limit page feed query with 'since' - python

I'm trying to fetch all posts on specific page after certain time. I'm using this facebook-sdk for python. Last post on the page has been posted 30th may 2017. The problem with my query is that even if I try to limit the query with since='2017-06-06T09:00:00+00:00', it always returns my posts from the page.
My timestamp is in exact same format as I get it if I query it from the facebook.
Here's the code
facebook_api = facebook.GraphAPI(access_token='FACEBOOK_PAGE_ACCESS_TOKEN')
facebook_feed = facebook_api.get_object(
id=FACEBOOK_PAGE_ID,
fields='feed',
since='2017-06-06T09:00:00+0000'
)
I don't have a clue what kind of api query url facebook-sdk for python creates with this code and I don't know how to check that.
I also tried converting timestamp to unix timestamp with online tool, but it didn't work either.

get_object() will query the root node directly, not the non-root nodes. Use get_connections() instead.
Read more here about objects: https://developers.facebook.com/docs/graph-api/reference/
Try below code, it works for me.
import facebook
facebook_api = facebook.GraphAPI(access_token='YOUR_ACCESS_TOKEN')
facebook_feed = facebook_api.get_connections('YOUR_PAGE_ID', 'feed', since=1496707200)
print facebook_feed['data']

Related

PRAW 6: Get all submission of a subreddit

I'm trying to iterate over submissions of a certain subreddit from the newest to the oldest using PRAW. I used to do it like this:
subreddit = reddit.subreddit('LandscapePhotography')
for submission in subreddit.submissions(None, time.time()):
print("Submission Title: {}".format(submission.title))
However, when I try to do it now I get the following error:
AttributeError: 'Subreddit' object has no attribute 'submissions'
From looking at the docs I can't seem to figure out how to do this. The best I can do is:
for submission in subreddit.new(limit=None):
print("Submission Title: {}".format(submission.title))
However, this is limited to the first 1000 submissions only.
Is there a way to do this with all submissions and not just the first 1000 ?

Unfortunately, Reddit removed this function from their API.
Check out the PRAW changelog. One of the changes in version 6.0.0 is:
Removed
Subreddit.submissions as the API endpoint backing the method is no more. See
https://www.reddit.com/r/changelog/comments/7tus5f/update_to_search_api/.
The linked post says that Reddit is disabling Cloudsearch for all users:
Starting March 15, 2018 we’ll begin to gradually move API users over to the new search system. By end of March we expect to have moved everyone off and finally turn down the old system.
PRAW's Subreddit.sumbissions() used Cloudsearch to search for posts between the given timestamps. Since Cloudsearch has been removed and the search that replaced it doesn't support timestamp search, it is no longer possible to perform a search based on timestamp with PRAW or any other Reddit API client. This includes trying to get all posts from a subreddit.
For more information, see this thread from /r/redditdev posted by the maintainer of PRAW.
Alternatives
Since Reddit limits all listings to ~1000 entries, it is currently impossible to get all posts in a subreddit using their API. However, third-party datasets with APIs exist, such as pushshift.io. As /u/kungming2 said on Reddit:
You can use Pushshift.io to still return data from defined time
periods by using their API:
https://api.pushshift.io/reddit/submission/search/?after=1334426439&before=1339696839&sort_type=score&sort=desc&subreddit=translator
This, for example, allows you to parse submissions to r/translator
between 2012-04-14 and 2012-06-2014.

You can retrieve all the data from pushshift.io using an iterative loop. Just set the start date as the current epoch date, and get 1000 items, then put the created_utc of the last items in the list as the before parameter to get the next 1000 items and keeps going until it stops returning.
Below is a useful link for further information:
https://www.reddit.com/r/pushshift/comments/b7onr6/max_number_of_results_returned_per_query/enter link description here

Pushshift doesn't work for private subreddits. In that case you can create a database 1000 submissions at a time from now on (not retroactive).
If you just need as many submissions as possible you could try using the different sort methods top, hot, new and combine them.

Determine display columns from Saved Search via SuiteTalk / NetSuite?

I'm using Python 2.7 and Zeep to call SuiteTalk v2017_2_0, the SOAP-based NetSuite web service API. The command I'm running is search like so:
from zeep import Client
netsuite = Client(WSDL)
TransactionSearchAdvanced = netsuite.get_type(
'ns19:TransactionSearchAdvanced')
TransactionSearchRow = netsuite.get_type('ns19:TransactionSearchRow')
# login removed for brevity
r = netsuite.service.search(TransactionSearchAdvanced(
savedSearchId=search, columns=TransactionSearchRow()))
Now the results of this include all the data I want but I can't figure out how (if at all) I can determine the display columns that the website would show for this saved search and the order they go in.
I figure I could probably netsuite.service.get() and pass the internalId of the saved search but what type do I specify? Along those lines, has anyone found a decent reference for all the objects, type enumerations, etc.?

https://stackoverflow.com/a/50257412/1807800
Check out the above link regarding Search Preferences. It explains how to limit columns returned to those only in the search.

How to get all content posted by a Facebook Group using Graph API

I am very new to the Graph API and trying to write a simple python script that first identifies all pages that a user has liked and all groups that he/she is a part of. To do this, I used the following:
To get the groups he has joined:
API: /{user-id}/groups
Permissions req: user_groups
To get the pages he has liked:
API: /{user-id}/likes
Permissions req: user_likes
and
url='https://graph.facebook.com/'+userId+'/likes?access_token='+accessToken +'&limit='+str(limit)
Now that I can see the id's of the groups in the JSON output, I want to hit them one by one and fetch all content (posts, comments, photos etc.) posted within that group. Is this possible and if yes, how can I do it? What API calls do I have to make?

That's quite a broad question, before asking here you should have give a try searching on SO.
Anyways, I'll tell you broadly how can you do it.
First of all go through the official documentation of Graph API: Graph API Reference.
You'll find each and every API which can be used to fetch the data. For example: /group, /page. You'll get to know what kind of access token with what permissions are required for an API call.
Here are some API calls useful to you-
to fetch the group/page's posts- /{group-id/page-id}/posts
to fetch the comments of a post- {post-id}/comments
to fetch the group/page's photos- /{group-id/page-id}/photos
and so on. Once you'll go through the documentation and test some API calls, the things would be much clear. It's quite easy!
Hope it helps. Good luck!

Here's an example using facepy:
from facepy import GraphAPI
import json
graph = GraphAPI(APP_TOKEN)
groupIDs = ("[id here]","[etc]")
outfile_name ="teacher-groups-summary-export-data.csv"
f = csv.writer(open(outfile_name, "wb+"))
for gID in groupIDs:
groupData = graph.get(gID + "/feed", page=True, retry=3, limit=500)
for data in groupData:
json_data=json.dumps(data, indent = 4,cls=DecimalEncoder)
decoded_response = json_data.decode("UTF-8")
data = json.loads(decoded_response)
print "Paging group data..."
for item in data["data"]:
...etc, dealing with items...

Check the API reference. You should use feed.
You can use /{group-id}/feed to get an array of Post objects of the group. Remember to include a user access token for a member of the group.

404 error while doing an api call to Reddit

According to their documentation:
This should be enough to get the hottest new reddit submissions:
r = client.get(r'http://www.reddit.com/api/hot/', data=user_pass_dict)
But it doesn't and I get a 404 error. Am I getting the url for data request wrong?
http://www.reddit.com/api/login works though.

Your question specifically asks what you need to do to get the "hottest new" submissions. "Hottest new" doesn't really make sense as there is the "hot" view and a "new" view. The URLs for those two views are http://www.reddit.com/hot and http://www.reddit.com/new respectively.
To make those URLs more code-friendly, you can append .json to the end of the URL (any reddit URL for that matter) to get a json-representation of the data. For instance, to get the list of "hot" submissions make a GET request to http://www.reddit.com/hot.json.
For completeness, in your example, you attempt to pass in data=user_pass_dict. That's definitely not going to work the way you expect it to. While logging in is not necessary for what you want to do, if you happen to have need for more complicated use of reddit's API using python, I strongly suggest using PRAW. With PRAW you can iterate over the "hot" submissions via:
import praw
r = praw.Reddit('<REPLACE WITH A UNIQUE USER AGENT>')
for submission in r.get_frontpage():
# do something with the submission
print(vars(submission))

According to the docs, use /hot rather than /api/hot:
r = client.get(r'http://www.reddit.com/hot/', data=user_pass_dict)

How can I edit a secondary calendar using google python API

I wonder if it's possible to create and delete events with Google API in a secondary calendar. I know well how to do it in main calendar so I only ask, how to change calendar_service to read and write to other calendar.
I've tried loging with secondary calendar email, but that's not possible with BadAuthentication Error. The URL was surely correct, becouse it was read by API.
Waiting for your help.

A'm answering my own question so I can finally accept this one. The problem has been solved some time ago.
The most important answer is in this documentation.
Each query can be run with uri as argument. For example "InsertEvent(event, uri)". Uri can be set manually (from google calendar settings) or automatically, as written in post below. Note, that CalendarEventQuery takes only username, not the whole url.
The construction of both goes this way:
user = "abcd1234#group.calendar.google.com"
uri = "http://www.google.com/calendar/feeds/{{ user }}/private/full-noattendees"
What's useful, is that you can run queries with different uri and add/delete events to many different calendars in one script.
Hope someone finds it helpful.

I got the same issue but I found this solution (I do not remember where)
This solution is to extract the secondary calendar user from its src url provided by google
This is probably not the better one but it's a working one
Note:the code is extracted from a real project [some part has been removed] and must be adapted to your particular case and is provide as sample just to have a support for explaination (It will not work as is)
# 1 - Connect using the main user email address in a classical way
cal_client = gdata.calendar.service.CalendarService()
# insert here after connection stuff
# 2 - For each existing calendars
feed = cal_client.GetAllCalendarsFeed():
# a loop the find the calendar by it's title (cal_title)
for a_calendar in feed.entry:
if cal_title in a_calendar.title.text:
cal_user = a_calendar.content.src.split('/')[5].replace('%40','#')
# If you print a_calendar.content.src.split you will see that the url
# contains an email like definition. This is the one to used to work
# with the calendar
Then you just have to replace the default user by the cal_user in the api to work on the secondary calendar.
Replace call is required because google api function are doing internal conversion on special characters like '%'
I hope this will help you.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Facebook sdk for python cannot limit page feed query with 'since' - python

Related

PRAW 6: Get all submission of a subreddit

Determine display columns from Saved Search via SuiteTalk / NetSuite?

How to get all content posted by a Facebook Group using Graph API

404 error while doing an api call to Reddit

How can I edit a secondary calendar using google python API

Categories

Resources