Facebook API using Python. How to print out comments from ALL post? - python

I am new to Facebook API. Currently, I am trying to print out ALL the comments that have been posted for this facebook page called 'leehsienloong'. However, I could only print out a total of 700+ comments. I'm sure there are more than 700+ comments in total.
I find out that the problem is, I did not request to go to another page to print out the comments. I read about paging Facebook API, but I still do not understand how to do the code for paging.
Is there anyone out there who will be able to help/assist me? I really need help. Thank you.
Here is my code, without paging:
import facebook #sudo pip install facebook-sdk
import itertools
import json
import re
import requests
access_token = "XXX"
user = 'leehsienloong'
graph = facebook.GraphAPI(access_token)
profile = graph.get_object(user)
posts = graph.get_connections(profile['id'], 'posts')
Jstr = json.dumps(posts)
JDict = json.loads(Jstr)
count = 0
for i in JDict['data']:
allID = i['id']
try:
allComments = i['comments']
for a in allComments['data']:
count += 1
print a['message']
except (UnicodeEncodeError):
pass
print count

You can use the limit parameter to increase the number of comments to be fetched. The default is 25. You can increase it like this:
posts = graph.get_connections(profile['id'], 'posts', limit=100)
But more convenient way would be get the previous and next pages from paging and do multiple requests.

to get all the comments of a post the logic should be something like
comments = []
for post in posts["data"]:
first_comments = graph.get_connections(id=post["id"], connection_name="comments")
comments.extend(first_comments["data"])
while True:
try:
next_comments = requests.get(post_comments["paging"]["next"]).json()
comments.extend(next_comments["data"])
except KeyError:
break

Related

How can I check a webscraping page with requests realtime (always), (autoupdate)? Python

I'm a fellow young programmer and I have a question about,
I have a code checking percentages on https://shadowpay.com/en?price_from=0.00&price_to=34.00&game=csgo&hot_deal=true
And I want to make it happen in real-time.
Questions:
Is there a way to make it check in real-time or is it just by refreshing the page?
if refreshing page:
How can I make it refresh the page, I saw older answers but they did not work for me because the answers only worked in their code.
(I tried to request get it every time the while loop happens, but it doesn't work, or should it?)
This is the code:
import json
import requests
import time
import plyer
import random
import copy
min_notidication_perc = 26; un = 0; us = ""; biggest_number = 0;
r = requests.get('https://api.shadowpay.com/api/market/get_items?types=[]&exteriors=[]&rarities=[]&collections=[]&item_subcategories=[]&float={"from":0,"to":1}&price_from=0.00&price_to=34.00&game=csgo&hot_deal=true&stickers=[]&count_stickers=[]&short_name=&search=&stack=false&sort=desc&sort_column=price_rate&limit=50&offset=0', timeout=3)
while True:
#Here is the place where I'm thinking of putting it
time.sleep(5); skin_list = [];perc_list = []
for i in range(len(r.json()["items"])):
perc_list.append(r.json()["items"][i]["discount"])
skin_list.append(r.json()["items"][i]["collection"]["name"])
skin = skin_list[perc_list.index(max(perc_list))]; print(skin)
biggest_number = int(max(perc_list))
if un != biggest_number or us != skin:
if int(max(perc_list)) >= min_notidication_perc:
plyer.notification.notify(
title=f'-{int(max(perc_list))}% ShadowPay',
message=f'{skin}',
app_icon="C:\\Users\\<user__name>\\Downloads\\Inipagi-Job-Seeker-Target.ico",
timeout=120,
)
else:
pass
else:
pass
us = skin;un = biggest_number
print(f'id: {random.randint(1, 99999999)}')
print(f'-{int(max(perc_list))}% discount\n')
When using requests.get() you are retrieving the page source of that link then closing it. As you are waiting on the response you don't need the time.sleep(5) line as that is handled by requests.
In order to get the real-time value you'll have to call the page again, this is where you can use time.sleep() so as not to abuse the api.

How can I quickly get the follower count for a large list of Instagram users?

I have the following program in python that reads in a list of 1390680 URLS of Instagram accounts and gets the follower count for each user. It utilizes the instaloader. Here's the code:
import pandas as pd
from instaloader import Instaloader, Profile
# 1. Loading in the data
# Reading the data from the csv
data = pd.read_csv('IG_Audience.csv')
# Getting the profile urls
urls = data['Profile URL']
def getFollowerCount(PROFILE):
# using the instaloader module to get follower counts from this programmer
# https://stackoverflow.com/questions/52225334/webscraping-instagram-follower-count-beautifulsoup
try:
L = Instaloader()
profile = Profile.from_username(L.context, PROFILE)
print(PROFILE, 'has', profile.followers, 'followers')
return(profile.followers)
except Exception as exception :
print(exception, False)
return(0)
# Follower count List
followerCounts = []
# This loop will fetch the follower count for each user
for url in urls:
# Getting the profile username from the URL by removing the instagram.com
# portion and the backslash at the end of the url
url_dirty = url.replace('https://www.instagram.com/', '')
url_clean = url_dirty[:-1]
followerCounts.append(getFollowerCount(url_clean))
# Converting the list to a series, adding it to the dataframe, and writing it to
# a csv
data['Follower Count'] = pd.Series(followerCounts)
data.to_csv('IG_Audience.csv')
The main issue I have with this is that it is taking a very long time to read through the entire list. It took 14 hours just to get the follower counts for 3035 users. Is there any way to speed up this process?
First I wanna say I'm sorry for being VERY late but hopefully this can help someone in the future. I'm having a similar issue and I believe I found out why, when you get the followers you instaloader doesn't just go to the profiles page and read the number but it gets the URL and profile ID for each account and can only get so many at a time, the best way I can think to get around this would be to make a request to the page and just read the follower count on their main page issue with this however is after I believe 9999 followers it will start saying "10k" or "10.1k" so you'll be off by 100 and just gets worse if the person has over a million because then its off by even more.

Facebook Graph API: Download all posts with likes and comments of those posts at the same time (paging)

I am trying to download all the posts and the comments(and replies to comments) for that post in a public facebook page. Here is the code that I am using:
from facepy import GraphAPI
import json
page_id = "Google"
access_token = "access_token"
graph = GraphAPI(access_token)
data = graph.get(page_id + "/feed", page=True, retry=3, limit=100,fields='message,likes')
i = 0
for p in data:
print 'Downloading posts', i
with open('facepydata/content%i.json' % i, 'w') as outfile:
json.dump(p, outfile, indent = 4)
i += 1
First of all (1) this code is giving me this exception:
facepy.exceptions.FacebookError: [1] Please reduce the amount of data you're asking for, then retry your request
How should I solve this problem?
Second: (2)How can I get all the likes,comments and replies at the same time of getting the posts (paging is also required in likes,comments and replies to get all of them). page=True not working for these fields.
Thank you!
Facebooks Graph API has rate limiting. I believe #Klaus-D is correct that the error it is clear the request should have a lower limit parameter set, in which you can then page through the results of.
I would try limit=10, and then page through with your loop as you have it.

Use the Google Custom Search API to search the web from Python

I'm a newbee in Python, HTML and CSS and am trying to reverse engineer "https://github.com/scraperwiki/google-search-python" to learn the three and use the Google Custom Search API to search the web from Python. Specifically, I want to search the search engine I made through Google Custom Search "https://cse.google.com/cse/publicurl?cx=000839040200690289140:u2lurwk5tko"I looked through the code made some minor adjustments and came up with the following. "Search.py"
import os
from google_search import GoogleCustomSearch
#This is for the traceback
import traceback
import sys
#set variables
os.environ["SEARCH_ENGINE_ID"] = "000839... "
os.environ["GOOGLE_CLOUD_API_KEY"] = "AIza... "
SEARCH_ENGINE_ID = os.environ['SEARCH_ENGINE_ID']
API_KEY = os.environ['GOOGLE_CLOUD_API_KEY']
api = GoogleCustomSearch(SEARCH_ENGINE_ID, API_KEY)
print("we got here\n")
#for result in api.search('prayer', 'https://cse.google.com/cse/publicurl?cx=000839040200690289140:u2lurwk5tko'):
for result in api.search('pdf', 'http://scraperwiki.com'):
print(result['title'])
print(result['link'])
print(result['snippet'])
print traceback.format_exc()
And the import ("At least the relevant parts") I believe comes from the following code in google_search.py
class GoogleCustomSearch(object):
def __init__(self, search_engine_id, api_key):
self.search_engine_id = search_engine_id
self.api_key = api_key
def search(self, keyword, site=None, max_results=100):
assert isinstance(keyword, basestring)
for start_index in range(1, max_results, 10): # 10 is max page size
url = self._make_url(start_index, keyword, site)
logging.info(url)
response = requests.get(url)
if response.status_code == 403:
LOG.info(response.content)
response.raise_for_status()
for search_result in _decode_response(response.content):
yield search_result
if 'nextPage' not in search_result['meta']['queries']:
print("No more pages...")
return
However, when I try to compile it, I get the following.
So, here's my problem. I cant quite figure out why the following lines of code don't print to the terminal. What am I overlooking?
print(result['title'])
print(result['link'])
print(result['snippet'])
The only thing I can think of is that I didn't take a correct ID or something. I created a Google custom search and a project on Google developers console as the quick start suggested. Here is where I got my SEARCH_ENGINE_ID and GOOGLE_CLOUD_API_KEY from.
After I added the stacktrace suggested in the comments I got this
Am I just misunderstanding the code, or is there something else I'm missing? I really appreciate any clues that will help me solve this problem, I'm kind of stumped right now.
Thanks in advance guys!

How to retrieve google URL from search query

So I'm trying to create a Python script that will take a search term or query, then search google for that term. It should then return 5 URL's from the result of the search term.
I spent many hours trying to get PyGoogle to work. But later found out Google no longer supports the SOAP API for search, nor do they provide new license keys. In a nutshell, PyGoogle is pretty much dead at this point.
So my question here is... What would be the most compact/simple way of doing this?
I would like to do this entirely in Python.
Thanks for any help
Use BeautifulSoup and requests to get the links from the google search results
import requests
from bs4 import BeautifulSoup
keyword = "Facebook" #enter your keyword here
search = "https://www.google.co.uk/search?sclient=psy-ab&client=ubuntu&hs=k5b&channel=fs&biw=1366&bih=648&noj=1&q=" + keyword
r = requests.get(search)
soup = BeautifulSoup(r.text, "html.parser")
container = soup.find('div',{'id':'search'})
url = container.find("cite").text
print(url)
What issues are you having with pygoogle? I know it is no longer supported, but I've utilized that project on many occasions and it would work fine for the menial task you have described.
Your question did make me curious though--so I went to Google and typed "python google search". Bam, found this repository. Installed with pip and within 5 minutes of browsing their documentation got what you asked:
import google
for url in google.search("red sox", num=5, stop=1):
print(url)
Maybe try a little harder next time, ok?
Here, link is the xgoogle library to do the same.
I tried similar to get top 10 links which also counts words in links we are targeting. I have added the code snippet for your reference :
import operator
import urllib
#This line will import GoogleSearch, SearchError class from xgoogle/search.py file
from xgoogle.search import GoogleSearch, SearchError
my_dict = {}
print "Enter the word to be searched : "
#read user input
yourword = raw_input()
try:
#This will perform google search on our keyword
gs = GoogleSearch(yourword)
gs.results_per_page = 80
#get google search result
results = gs.get_results()
source = ''
#loop through all result to get each link and it's contain
for res in results:
#print res.url.encode('utf8')
#this will give url
parsedurl = res.url.encode("utf8")
myurl = urllib.urlopen(parsedurl)
#above line will read url content, in below line we parse the content of that web page
source = myurl.read()
#This line will count occurrence of enterd keyword in our webpage
count = source.count(yourword)
#We store our result in dictionary data structure. For each url, we store it word occurent. Similar to array, this is dictionary
my_dict[parsedurl] = count
except SearchError, e:
print "Search failed: %s" % e
print my_dict
#sorted_x = sorted(my_dict, key=lambda x: x[1])
for key in sorted(my_dict, key=my_dict.get, reverse=True):
print(key,my_dict[key])

Categories

Resources