I am really new to twitter api, and I've been trying to get a list of IDs of everyone that retweeted a specific tweet.
After several attempts i can't get the 'api.get_retweeter_ids' to get every id. It always seems to get a few. I know there is a limit of 100 per request, but the function just ends there after getting around 50-90 IDs on a tweet with 30k retweets or so.
Here is my code
def get_user_ids_by_retweets(tweetid):
retweeters_ids = []
for i, _id in enumerate(tweepy.Cursor(api.get_retweeter_ids, id=tweetid).items()):
retweeters_ids.append(_id)
print(i, _id)
df = pd.DataFrame(retweeters_ids)
# print(df)
return retweeters_ids
Demo for get all of re tweeter user list (name, id and username)
https://twitter.com/Nike/status/1582388225049780230/retweets
code
import tweepy
import json
def get_user_ids_by_retweets(tweet_id):
# get client with token
bearer_token ="*************************"
client = tweepy.Client(bearer_token=bearer_token)
listUser = []
# get first paging retweet users
retweeters = client.get_retweeters(id=tweet_id)
for retweeter in retweeters.data:
listUser.append({
"name": retweeter.name,
"id": retweeter.id,
"username": retweeter.username
})
next_token = retweeters.meta['next_token']
# get til end of paging retweet users
while next_token != None:
retweeters = client.get_retweeters(id=tweet_id, pagination_token=next_token)
if retweeters.data is not None:
for retweeter in retweeters.data:
listUser.append({
"name": retweeter.name,
"id": retweeter.id,
"username": retweeter.username
})
next_token = retweeters.meta['next_token']
else:
next_token = None
return listUser
def obj_dict(obj):
return obj.__dict__
tweet_id="1582388225049780230"
listUser = get_user_ids_by_retweets(tweet_id)
print(json.dumps(listUser, indent=4, default=obj_dict))
Result
[
{
"name": "valmig",
"id": 1594136795905593344,
"username": "AngelVa00615402"
},
{
"name": "Wyatt Jones",
"id": 764734669434871808,
"username": "TheGhostZeus"
},
{
"name": "Prime Projects",
"id": 1603887705242435584,
"username": "PrimeProjects4"
},
... removed
{
"name": "Ryan Maldonado",
"id": 1419009007688224768,
"username": "RyanMal87509518"
},
{
"name": "Jimmy Daugherty",
"id": 20888017,
"username": "JimmyDaugherty"
},
{
"name": "Nike Basketball",
"id": 5885732,
"username": "nikebasketball"
}
]
Main Idea
Get tweeter API return limited number of tweeters with next_token.
It can be next paging's tweeter by assign to pagination_token.
It can be all of tweeter until 'next_token` is null.
So #1 and #2 get two tweeters with next_token , those sum tweeters are same as #3 tweeters.
import tweepy
bearer_token ="*************************"
client = tweepy.Client(bearer_token=bearer_token)
tweet_id="1582388225049780230"
print("#1 -------- Get first two tweeter -------------------------")
retweeters = client.get_retweeters(id=tweet_id, max_results=2)
print("#2 -------- Show Meta --------------------")
print(retweeters.meta)
print(" ")
print("#3 -------- print two -------------------------")
for retweeter in retweeters.data:
print(retweeter.name, " -> ",retweeter.id,",",retweeter.username)
print(" ")
print("#4 ---------Get Next two tweeter ---------------------------")
retweeters = client.get_retweeters(id=tweet_id, pagination_token=retweeters.meta['next_token'] ,max_results=2)
print(retweeters.meta)
print(" ")
print("#5 -------- print two -------------------------")
for retweeter in retweeters.data:
print(retweeter.name, " -> ",retweeter.id,",",retweeter.username)
print(" ")
print("#6 --- Get First four tweeter == are same #1 + #2 ---------")
retweeters = client.get_retweeters(id=tweet_id, max_results=4)
print(" ")
print("#7 -------- print four -------------------------")
for retweeter in retweeters.data:
print(retweeter.name, " -> ",retweeter.id,",",retweeter.username)
$ python retweet.py
#1 -------- Get first two tweeter -------------------------
#2 -------- Show Meta --------------------
{'result_count': 2, 'next_token': '7140dibdnow9c7btw4827c3yb0pfg7mg4qq12dn59ot9s'}
#3 -------- print two -------------------------
valmig -> 1594136795905593344 , AngelVa00615402
Wyatt Jones -> 764734669434871808 , TheGhostZeus
#4 ---------Get Next two tweeter ---------------------------
{'result_count': 2, 'next_token': '7140dibdnow9c7btw4827c3nilr9nqckqkuxdzj3u7pkn', 'previous_token': '77qpymm88g5h9vqkluxdnrmaxhecakrtbzn80cd5hizht'}
#5 -------- print two -------------------------
Prime Projects -> 1603887705242435584 , PrimeProjects4
Joshua Paul Hudson -> 847275330 , JoshswiftJoshua
#6 --- Get First four tweeter == are same #1 + #2 ---------
#7 -------- print four -------------------------
valmig -> 1594136795905593344 , AngelVa00615402
Wyatt Jones -> 764734669434871808 , TheGhostZeus
Prime Projects -> 1603887705242435584 , PrimeProjects4
Joshua Paul Hudson -> 847275330 , JoshswiftJoshua
References
List of objects to JSON with Python
Python – Append to JSON File
Twitter API v2 Retweet
I would avoid managing the tokens manually, if not needed. The Paginator is the tool for it (it's the API V2 version of the API V1.1 Cursor that you've tried to use). If you are sure that the amount of retweets is covered by the currently available number of requests (default is 100 retweeters per request) then you could try the following (it's the equivalent to the other answer):
def get_user_ids_by_retweets(tweet_id):
client = tweepy.Client(BEARER_TOKEN, return_type=dict)
return list(tweepy.Paginator(client.get_retweeters, tweet_id).flatten())
If you're not sure about it but just want to give it a try without loosing any retrieved retweeters, then you could use this variation which catches the resp. tweepy.errors.TooManyRequests exception:
def get_user_ids_by_retweets(tweet_id):
client = tweepy.Client(BEARER_TOKEN, return_type=dict)
users = []
try:
for page in tweepy.Paginator(client.get_retweeters, tweet_id):
users.extend(page.get("data", []))
except tweepy.errors.TooManyRequests:
print("Too many requests, couldn't retrieve all retweeters.")
return users
If you want to make sure that you get all retweeters, then you could add a waiting period that is tailored to your access level (if you're using the free version then you should have 75 requests per 15 minutes, i.e. after reaching the limit you need to wait 60 * 15 seconds). Here you need to use the token to re-enter at the point where you left in case the rate limit was reached:
from time import sleep
DURATION = 60 * 15 + 5
def get_user_ids_by_retweets(tweet_id):
client = tweepy.Client(BEARER_TOKEN, return_type=dict)
users, token = [], None
while True:
pages = tweepy.Paginator(
client.get_retweeters, tweet_id, pagination_token=token
)
try:
for page in pages:
users.extend(page.get("data", []))
token = page["meta"].get("next_token", None)
if token is None:
break
except tweepy.errors.TooManyRequests:
print("Request rate limit reached, taking a nap.")
sleep(DURATION)
return users
Related
(Pymongo) several functions cover with Commit and rollback, and one of them stll proceed while whole function should stop
I read the manual, that all the example commit and rollback only cover two operation, is that the limit?, usually should contain 3 or more operations and either operate in the same time or not operate if error https://pymongo.readthedocs.io/en/stable/api/pymongo/client_session.html
I tried to contain 3 operation inside commit and rollback but
mycol_two.insert_one() didn't stop proceed like other function when error occur
brief description:
I have three collections in same DB
collection "10_20_cash_all"
collection "10_20_cash_log"
collection "10_20_cash_info"
commit and rollback on line 39 to 44
line 42 print( 3/0 ) , I intent to make an error, expect all function would stop proceed
import pymongo
import datetime
import json
from bson.objectid import ObjectId
from bson import json_util
import re
myclient = pymongo.MongoClient("mongodb://localhost:27017/")
mydb = myclient["(practice_10_14)-0004444"]
mycol_one = mydb["10_20_cash_all"]
mycol_two = mydb["10_20_cash_log"]
mycol_3rd = mydb["10_20_cash_info"]
# already store 100$ in bank
# doc_two = {"ID" : 100998 , "Cash_log$" : 5 } # withdraw 70$ from bank
doc_two = input("Enter ID and log amount$: ")
doc_3rd = input("Enter extra info: ")
doc_two_dic = json.loads(doc_two)
doc_3rd_dic = json.loads(doc_3rd)
# doc_3rd = {"note" : "today is good" }
ID_input = doc_two_dic['ID']
print("ur id is :" + str(ID_input))
doc_one = {"ID" : ID_input}
with myclient.start_session() as s:
cash_all_result = mycol_one.find_one(doc_one, session=s)
def cb(s):
try:
while True:
cash_all_result = mycol_one.find_one(doc_one, session=s)
mycol_two.insert_one(doc_two_dic, session=s)
print( 3/0 )
mycol_3rd.insert_one(doc_3rd_dic, session=s)
print( "now total is :" + str(cash_all_result['Cash_$']) )
Cash_total_int = int(cash_all_result['Cash_$'])
log_int = int(doc_two_dic['Cash_log$'])
if Cash_total_int < log_int:
print("error: withdraw is over ur balance")
break
new_Cash_total = Cash_total_int - log_int
print("now total is :" + str(new_Cash_total))
newvalues_json = { "$set" : {"Cash_$" : new_Cash_total } }
mycol_one.update_one(doc_one , newvalues_json, session=s)
fail_condition_json = {"ok" : 1 , "fail reason" : "no error "}
print(fail_condition_json)
return fail_condition_json
except Exception as e:
fail_condition_json = {"ok" : 0 , "fail reason" : "error raise on start_session()"}
print(fail_condition_json)
return fail_condition_json
s.with_transaction(cb)
command prompt:
Enter ID and log amount$: {"ID" : 100998 , "Cash_log$" : 5 }
Enter extra info: {"note" : "today is good" }
ur id is :100998
{'ok': 0, 'fail reason': 'error raise on start_session()'}
the "10_20_cash_log" still store new value which shoud empty/not run like '"10_20_cash_info"' is empty
{
"_id" : ObjectId("635262e502725626c39cbe9e"),
"ID" : 100998,
"Cash_log$" : 5
}
I have a JSON log file and want to print and count the number of times a URL(requestURL) has been hit by an IP in the same log file. The output should be like the below:
IP(remoteIp): URL1-(Count), URL2-(Count), URL3...
127.0.0.1: http://www.google.com - 12, www.bing.com/servlet-server.jsp - 2, etc..
The Sample of the Logfile is like below
"insertId": "kdkddkdmdkd",
"jsonPayload": {
"#type": "type.googleapis.com/google.cloud.loadbalancing.type.LoadBalancerLogEntry",
"enforcedSecurityPolicy": {
"configuredAction": "DENY",
"outcome": "DENY",
"preconfiguredExprIds": [
"owasp-crs-v030001-id942220-sqli"
],
"name": "shbdbbddjdjdjd",
"priority": 2000
},
"statusDetails": "body_denied_by_security_policy"
},
"httpRequest": {
"requestMethod": "POST",
"requestUrl": "https://dknnkkdkddkd/token",
"requestSize": "3004",
"status": 403,
"responseSize": "274",
"userAgent": "okhttp/3.12.2",
"remoteIp": "127.0.0.1",
"serverIp": "123.123.33.31",
"latency": "0.018728s"
}
The solution that I am using is below. I am able to get the total hits per IP or how many total times a URL has been hit etc.
import json
from collections import Counter
unique_ip = {}
request_url = {}
def getAndSaveValueSafely(freqTable, searchDict, key):
try:
tmp = searchDict['httpRequest'][key]
if tmp in freqTable:
freqTable[tmp] += 1
else:
freqTable[tmp] = 1
except KeyError:
if 'not_present' in freqTable:
freqTable['not_present'] += 1
else:
freqTable['not_present'] = 1
with open("threat_intel_1.json") as file:
data = json.load(file)
for d2 in data:
getAndSaveValueSafely(unique_ip, d2, 'remoteIp')
getAndSaveValueSafely(request_url, d2, 'requestUrl')
mc_unique_ip = (dict(Counter(unique_ip).most_common()))
mc_request_url = (dict(Counter(request_url).most_common()))
def printing():
a = str(len(unique_ip))
b = str(len(request_url))
with open("output.txt", "w") as f1:
print(
f' Start Time of log = {minTs}'
f' \n\n End Time of log = {maxTs} \n\n\n {a} Unique IP List = {mc_unique_ip} \n\n\n {b} Unique URL = {mc_request_url},file=f1)
I dont think you need to use counter and are unlikely to see any benifit
from collections import defaultdict
result = {} # start empty
with open("threat_intel_1.json") as file:
data = json.load(file)
for d2 in data:
req = d2.get('httpRequest',None)
if not req:
continue
url = req['requestUrl']
ip = req['remoteIp']
result.setdefault(url,defaultdict(int))[ip] += 1
print(result)
# {"/endpoint.html": {"127.2.3.4":15,"222.11.31.22":2}}
if instead you want it the other way thats easy also
for d2 in data:
req = d2.get('httpRequest',None)
if not req:
continue
url = req['requestUrl']
ip = req['remoteIp']
result.setdefault(ip,defaultdict(int))[url] += 1
#{"127.1.2.3",{"/endpoint1.html":15,"/endpoint2.php":1},"33.44.55.66":{"/endpoint1.html":5}, ...}
instead of using defaultdict you could add a line
# result.setdefault(ip,defaultdict(int))[url] += 1
result.setdefault(ip,{})
result[ip][url] = result[ip].get(url,0) + 1
which arguably is more readable anyway...
This is the code i have so far:
import json
import requests
import time
endpoint = "https://www.deadstock.ca/collections/new-arrivals/products/nike-
air-max-1-cool-grey.json"
req = requests.get(endpoint)
reqJson = json.loads(req.text)
for id in reqJson['product']:
name = (id['title'])
print (name)
Feel free to visit the link, I'm trying to grab all the "id" value and print them out. They will be used later to send to my discord.
I tried with my above code but i have no idea how to actually get those values. I don't know which variable to use in the for in reqjson statement
If anyone could help me out and guide me to get all of the ids to print that would be awesome.
for product in reqJson['product']['title']:
ProductTitle = product['title']
print (title)
I see from the link you provided that the only ids that are in a list are actually part of the variants list under product. All the other ids are not part of a list and have therefore no need to iterate over. Here's an excerpt of the data for clarity:
{
"product":{
"id":232418213909,
"title":"Nike Air Max 1 \/ Cool Grey",
...
"variants":[
{
"id":3136193822741,
"product_id":232418213909,
"title":"8",
...
},
{
"id":3136193855509,
"product_id":232418213909,
"title":"8.5",
...
},
{
"id":3136193789973,
"product_id":232418213909,
"title":"9",
...
},
...
],
"image":{
"id":3773678190677,
"product_id":232418213909,
"position":1,
...
}
}
}
So what you need to do should be to iterate over the list of variants under product instead:
import json
import requests
endpoint = "https://www.deadstock.ca/collections/new-arrivals/products/nike-air-max-1-cool-grey.json"
req = requests.get(endpoint)
reqJson = json.loads(req.text)
for product in reqJson['product']['variants']:
print(product['id'], product['title'])
This outputs:
3136193822741 8
3136193855509 8.5
3136193789973 9
3136193757205 9.5
3136193724437 10
3136193691669 10.5
3136193658901 11
3136193626133 12
3136193593365 13
And if you simply want the product id and product name, they would be reqJson['product']['id'] and reqJson['product']['title'], respectively.
I have this JSON file.
{
"reviewers":[
{
"user":{
"name":"keyname",
"emailAddress":"John#email",
"id":3821,
"displayName":"John Doe",
"active":true,
"slug":"jslug",
"type":"NORMAL",
"link":{
"url":"/users/John",
"rel":"self"
},
},
"role":"REVIEWER",
"approved":true
},
{
"user":{
"name":"keyname2",
"emailAddress":"Harry#email",
"id":6306,
"displayName":"Harry Smith",
"active":true,
"slug":"slug2",
"link":{
"type":"NORMAL",
"url":"/users/Harry",
"rel":"self"
},
},
"role":"REVIEWER",
"approved":false
}
],
}
Initially, I was using a snippet of code that would go through and grab the full names of the reviewers.
def get_reviewers(json):
reviewers = ""
for key in json["reviewers"]:
reviewers += key["user"]["displayName"] + ", "
reviewers = reviewers[:-2]
return reviewers
which would return "John Doe, Harry Smith". However, now I'm trying to get it so that the script will return a (A) next to the name of the user if their tag equals true "approved"=true.
So for example the code above would get the names, then see that John's approved tag is true and Harry's is false, then return "John Doe(A), Harry Smith". I'm just not sure where to even begin to do this. Can anyone point me in the right direction?
This is what I've been trying so far but obviously it isn't working as I'd like it to.
def get_reviewers(stash_json):
reviewers = ""
for key in stash_json["reviewers"]:
if stash_json["reviewers"][0]["approved"] == true:
reviewers += key["user"]["displayName"] + "(A)" + ", "
else:
reviewers += key["user"]["displayName"] + ", "
reviewers = reviewers[:-2]
return reviewers
which outputs Jason Healy(A), Joan Reyes(A)
This is what my stash_json outputs when put through pprint.
You probably want something along the lines of this:
def get_reviewers(stash_json):
reviewers = ""
for item in stash_json["reviewers"]:
if item["approved"]:
reviewers += item["user"]["displayName"] + "(A)" + ", "
else:
reviewers += item["user"]["displayName"] + ", "
reviewers = reviewers[:-2]
return reviewers
I think part of your confusion comes from the fact that "reviewers" is a list of dict elements, and each dict element has a key-value approved, but also a key "user" which value itself is another dict.
Read the JSON file carefully, and for debugging purposes, use plenty of
print(...)
print(type(...)) # whether something is a dict, list, str, bool etc
or
from pprint import pprint # pretty printing
pprint(...)
This looks like a good place to use join and list comprehension:
def get_reviewers(stash_json):
return ", ".join([item['user']['displayName'] + ('(A)' if item['approved'] else '') for item in stash_json['reviewers']])
I'm working with PayPals API which is really badly documented and need to ask for some help.
I am extending my site on the PayPal Adaptive API which allows me to setup Preapproved payments before.
Along with the details sent I'd like to add some user information.
It seems like it can be done according to their documentation, but nowhere in the IPN does it get captured.
simple payment
def test_pay():
response = paypal.pay(
actionType = 'PAY',
cancelUrl = cancelUrl,
currencyCode = currencyCode,
senderEmail = EMAIL_ACCOUNT,
feesPayer = 'EACHRECEIVER',
memo = 'Simple payment example',
preapprovalKey = 'PA-0HA01893HK6322232',
receiverList = { 'receiver': [
{ 'amount':"10.0", 'email':API_EMAIL, 'primary':True },
{ 'amount':"5.0", 'email':SECONDARY_EMAIL, 'primary':False }
]},
clientDetailsType = { 'customerId': 1, 'customerType': 'Normal' },
returnUrl = returnUrl,
ipnNotificationUrl = notificationUrl
)
# if response['responseEnvelope']['ack'] == "Success":
print response['responseEnvelope']['ack']
# if response['paymentExecStatus'] == "COMPLETED":
print response['paymentExecStatus']
# if response.has_key('payKey'):
print response['payKey']
print response
test_pay()
The IPN response
pay_key=AP-8J7165865F7541310&transaction%5B0%5D.id_for_sender_txn=4GL2853573576212V&transaction%5B0%5D.pending_reason=NONE&charset=windows-1252&log_default_shipping_address_in_transaction=false&transaction%5B0%5D.id=6XD76450JV9737605¬ify_version=UNVERSIONED&preapproval_key=PA-93P236141R834703C&transaction%5B1%5D.id=9R07347926768733A&test_ipn=1&transaction%5B0%5D.status=Completed&status=COMPLETED&action_type=PAY&memo=Simple+payment+example&transaction%5B0%5D.receiver=a.smit_1329744569_biz%40mac.com&transaction%5B1%5D.status=Completed&payment_request_date=Wed+Feb+22+05%3A30%3A49+PST+2012&transaction%5B1%5D.id_for_sender_txn=2D9633797C888500H&verify_sign=AIDiik4kxSLiNqbMmTDHplFnCnz3A3ORrDVlBVOzrtltyUx-NoxxgSc6&transaction%5B1%5D.pending_reason=NONE&transaction%5B0%5D.status_for_sender_txn=Completed&transaction%5B1%5D.status_for_sender_txn=Completed&transaction%5B0%5D.is_primary_receiver=true&transaction%5B1%5D.receiver=a.smit_1298362298_per%40mac.com&transaction%5B1%5D.amount=USD+5.00&ipn_notification_url=http%3A%2F%2F108.166.107.74%2Fyour-ipn-location%2F&transaction%5B0%5D.amount=USD+10.00&transaction_type=Adaptive+Payment+PAY&cancel_url=http%3A%2F%2F108.166.107.74%2F&reverse_all_parallel_payments_on_error=false&sender_email=a.smit_1329128659_per%40mac.com&transaction%5B1%5D.is_primary_receiver=false&fees_payer=EACHRECEIVER&return_url=http%3A%2F%2F108.166.107.74%2F
Nowhere in the response can I see the customerType or customerId
Any ideas?
customerId and customerType aren't returned according to the PayPal API. I imagine it's a field that will show up in your transaction history on the PayPal site. Why there is no API function to return customer data is beyond me, as well.