Get sum of followers from Twitter for 10,000 users - python

I am trying to get the number of followers of each twitter user for a list of 10,000 users randomly selected via a random number generator. I am using the GET users/lookup API by Twitter to do this. I can't use the GET followers directly due to the limit of 15 requests/ 15 mins. I managed to do this but my data is not in json format and I've spent the entire day trying to get jq to process it (which fails due to special character within the envelope). Can anyone suggest how do I do this?
I had no issues doing the same with Twitter's streaming API as the data was JSON format which was parse-able by jq. I need to do this with jq, python & shell scripting.
MY CODE TO GET THIS RESPONSE: (Python)
#!/usr/bin/python
import json
import oauth2 as oauth
from random import randint
import sys
import time
ckey= '<insert-key>'
csecret= '<insert-key>'
atoken= '<insert-key>'
asecret= '<insert-key>'
consumer = oauth.Consumer(key= ckey, secret = csecret)
access_token = oauth.Token(key=atoken, secret=asecret)
client= oauth.Client(consumer, access_token)
# Print to a file
sys.stdout = open('TwitterREST.txt', 'w')
for j in range(0,20):
for i in range(0, 900):
user_id =(randint(2361391150, 2361416150))
f="https://api.twitter.com/1.1/users/lookup.json?user_id="
s=str(user_id)
timeline_endpoint = f+s
response,data=client.request(timeline_endpoint)
tweets= json.loads(data)
for tweet in tweets:
print (data)
# Twitter only allows 900 requests/15 mins
time.sleep(900)
SAMPLE RESPONSE RETURNED BY TWITTER: (Cannot pretty print it)
b'[ { "id":2361393867, "id_str":"2361393867", "name":"graam a7bab", "screen_name":"bedoo691", "location":"", "description":"\u0627\u0633\u062a\u063a\u0641\u0631\u0627\u0644\u0644\u0647 \u0648\u0627\u062a\u0648\u0628 \u0627\u0644\u064a\u0647\u0647 ..!*", "url":null, "entities": { "description": { "urls":[]} }, "protected":false, "followers_count":1, "friends_count":6, "listed_count":0, "created_at":"Sun Feb 23 19:03:21 +0000 2014", "favourites_count":1, "utc_offset":null, "time_zone":null, "geo_enabled":false, "verified":false, "statuses_count":7, "lang":"ar", "status": { "created_at":"Tue Mar 04 16:07:44 +0000 2014", "id":440881284383256576, "id_str":"440881284383256576", "text":"#Naif8989", "truncated":false, "entities":{ "hashtags":[], "symbols":[], "user_mentions":[ { "screen_name":"Naif8989", "name":"\u200f naif alharbi", "id":540343286, "id_str":"540343286", "indices":[0,9]}], "urls":[]}, "source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e", "in_reply_to_status_id":437675858485321728, "in_reply_to_status_id_str":"437675858485321728", "in_reply_to_user_id":2361393867, "in_reply_to_user_id_str":"2361393867", "in_reply_to_screen_name":"bedoo691", "geo":null, "coordinates":null, "place":null, "contributors":null, "is_quote_status":false, "retweet_count":0, "favorite_count":0, "favorited":false, "retweeted":false, lang":"und"}, contributors_enabled":false, is_translator":false, is_translation_enabled":false, profile_background_color":"C0DEED", profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png", profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png", profile_background_tile":false, profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/437664693373911040\/ydODsIeh_normal.jpeg", profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/437664693373911040\/ydODsIeh_normal.jpeg", profile_link_color":"1DA1F2", profile_sidebar_border_color":"C0DEED", profile_sidebar_fill_color":"DDEEF6", profile_text_color":"333333", profile_use_background_image":true, has_extended_profile":false, default_profile":true, default_profile_image":false, following":false, follow_request_sent":false, "notifications":false, "translator_type":"none" } ]'

Related

Mongo misses to catch insert triggers

I am trying to build an architecture without any HTTPS requests that take a minimum (100 ms) to deliver a request to the backend server, so I decided to take help from mongo triggers now it's taking (20 ms) but miss some requests. I have 2 scripts
To Test Mongo Artitecture
To listen to mongo trigger and process that request
Architecture Stats
No of requests
Time Taken
Processed Requests
missed requests
docker swarm replicas
1000
949.51
996
4
1
3000
4387.24
2948
52
1
5000
10051.78
4878
122
1
Worker Node Code:
from pymongo import MongoClient
import pymongo
connectionString = "mongodb://ml1:27017,ml1:27018,ml1:27019/magic_shop?replicaSet=mongodb-replicaset";
client = MongoClient(connectionString)
db = client.magic_shop # test is my database
col = db.req # Here spam is my collection
import socket
machine_name = socket.gethostname()
from datetime import datetime
import time
def processing_request(request):
'''
Description:
Get the string, takes its length and run a loop 10 times of string length, and set the
response to processed.
Input:
request (str) : Random String
Output:
Set value back to the mongo DB
response: Processed
'''
for i in range(0,len(request)*10):
pass
try:
with db.req.watch([{'$match': {'operationType': 'insert'}}]) as stream:
for values in stream:
request_id = values['fullDocument']['request_id']
request = values['fullDocument']['request']
myquery = { "request_id": request_id }
# Checking if not processed by other replica (Blocking System)
if len(list(col.find({ 'request_id': request_id, 'CSR_NAME' : { "$exists": False}}))):
newvalues = { "$set": { "CSR_NAME": machine_name, "response": "Processing", "processing_time":datetime.today()} }
# CSR Responding to the client
col.update_one(myquery, newvalues)
print(request_id)
print(f"Processing {request_id}")
# Now Processing Request
processing_request(request)
# Updating that work is done
myquery = { "request_id": request_id }
newvalues = { "$set": {"response": "Request Processed Have a nice Day sir!", 'response_time':datetime.today()} }
# CSR Responding to the client
col.update_one(myquery, newvalues)
except pymongo.errors.PyMongoError:
# The ChangeStream encountered an unrecoverable error or the
# resume attempt failed to recreate the cursor.
logging.error('...')
Testing Code
from pymongo import MongoClient
from PIL import Image
import io, random
import matplotlib.pyplot as plt
from datetime import datetime
db = client.magic_shop # test is my database
col = db.req # Here spam is my collection
brust = 5000
count = 0
for i in range(0, brust):
random_id = int(random.random()*100000)
image= {
'request_id': random_id,
'request': random.choice(requests),
'request_time':datetime.today()
}
image_id = col.insert_one(image).inserted_id
count+=1
print(f"Request Done {count}")
You can also get the complete code on Github
https://github.com/SohaibAnwaar/Docker-swarm

python function to transform data to JSON

Can I check how do we convert the below to a dictionary?
code.py
message = event['Records'][0]['Sns']['Message']
print(message)
# this gives the below and the type is <class 'str'>
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
}
}
I would need to add in additional field called "status" : 1 such that it looks like this:
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Wanted to know what is the best way of doing this?
Update: I managed to do it for some reason.
I used ast.literal_eval(data) like below.
D2= ast.literal_eval(message)
D2["status"] =1
print(D2)
#This gives the below
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Is there any better way to do this? Im not sure so wanted to check...
Can I check how do we convert the below to a dictionary?
As far as I can tell, the data = { } asigns a dictionary with content to the variable data.
I would need to add an additional field called "status" : 1 such that it looks like this
A simple update should do the trick.
data.update({"status": 1})
I found two issues when trying to deserialise the string as JSON
invalid escape I\\'m
unescaped newlines
These can worked around with
data = data.replace("\\'", "'")
data = re.sub('\n\n"', '\\\\n\\\\n"', data, re.MULTILINE)
d = json.loads(data)
There are also surrogate pairs in the data which may cause problems down the line. These can be fixed by doing
data = data.encode('utf-16', 'surrogatepass').decode('utf-16')
before calling json.loads.
Once the data has been deserialised to a dict you can insert the new key/value pair.
d['status'] = 1

Sending bulk data to Azure ML Endpoint

I have an Azure ML endpoint which is used to get scoring when I supply data in json.
import requests
import json
# URL for the web service
scoring_uri = 'http://107a119d-9c23-4792-b5db-065e9d3af1e6.eastus.azurecontainer.io/score'
# If the service is authenticated, set the key or token
key = '##########################'
data = {"data":
[{'Associate_Gender': 'Male', 'Associate_Age': 20, 'Education': 'Under Graduate', 'Source_Hiring': 'Internal Movement', 'Count_of_Incoming_Calls_6_month': None, 'Count_of_Incoming_Calls_6_month_bucket': 'Greater than equal to 0 and less than 4', 'Internal_Quality_IQ_Score_Last_6_Months': '93%', 'Internal_Quality_IQ_Score_Last_6_Months_Bucket': 'Greater than 75%', 'Associate_Tenure_Floor_Bucket': 'Greater than 0 and less than 90', 'Current_Call_Repeat_Yes_No': False, 'Historical_CSAT_score': 'Greater than equal to 7 and less than 9', 'Customer_Age': 54, 'Customer_Age_Bucket': 'Greater than equal to 46', 'Network_Region_Originating_Call': 'East London', 'Length_of_Relationship_with_Customer': 266, 'Length_of_Relationship_with_Customer_bucket': 'Greater than 90', 'Call_Reason_Type_L1': 'Voice', 'Call_Reason_Type_L2': 'Prepaid', 'Call_Reason_Type_L3': 'Request for Reversal Provisioning', 'Number_of_VAS_services_active': 6, 'Customer_Category': 'Mercury', 'Customer_monthly_ARPU_GBP_Bucket': 'More than 30', 'Customer_Location': 'Houslow'}]
}
# Convert to JSON string
input_data = json.dumps(data)
# Set the content type
headers = {'Content-Type': 'application/json'}
# If authentication is enabled, set the authorization header
headers['Authorization'] = f'Bearer {key}'
# Make the request and display the response
resp = requests.post(scoring_uri, input_data, headers=headers)
print(resp.text)
How to send input data from files in bulk and get output. Or is it not feasible to send huge amount of data for scoring on endpoints?
Any alternative suggestion for scoring on azure is also welcome.
Lets assume you have a folder called json_data, where all your json files are stored, then you would open these files and post them to your endpoint as follows:
import requests
import json
import os
import glob
your_uri = 'https://jsonplaceholder.typicode.com/'
folder_path = './json_data'
for filename in glob.glob(os.path.join(folder_path, '*.json')):
with open(filename, 'r') as f:
json_input_data = json.load(f)
resp = requests.post(your_uri, json_input_data)
print(resp)
To showcase the successful http response 201 with jsonplaceholder.typicode.com you have to create a folder in the same directory of your python file and name it json_data, then create a few json files inside the folder and paste some data into the files, e.g.:
file1.json:
{
"title": "some title name 1",
"body": "some body content 1"
}
file2.json:
{
"title": "some title name 2",
"body": "some body content 2"
}
etc.
You could easily rewrite it and use your own uri, key, headers, etc.
To send bulk data for inferencing, I recommend to create a Batch Endpoint,
in Azure ML and the best way to do it is using the Azure CLI:
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-endpoint#create-a-batch-endpoint
You can then start a batch scoring using:
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-endpoint#start-a-batch-scoring-job-using-the-azure-cli
Or using REST:
https://learn.microsoft.com/en-us/azure/machine-learning/how-to-use-batch-endpoint#start-a-batch-scoring-job-using-rest

Importing json file into firebase with firebase ID´s

I generated a dump from one of my tables and every entity looks like this:
{
"name": "¡Felicidades! Menos cel=más Yumi.",
"promotion": "Tu paciencia será recompensada con un 2x1 en un cono de yogurt de tu elección.",
"establishment_id": 1,
"quantity": 10,
"validity_date": "2017-09-22",
"terms": "aplica restricciones",
"code": "2x1yum",
"points": 500,
"active": 0,
"image": "../uploads/20170511/img-59149a2de8d3b",
"image_file_name": "2X1.jpg",
"game_level_id": 0
},
When i use the import option on firebase my database gives the child nodes a numerical ID (1, 2 , 3 etc..) and i want the child nodes to have the custom firebase ID's it gives when you push a new node to the db, i.e.( HW0vcmYU4MW0zpZX9KlEoVCq1Up2)
Im trying to write a python script to my table and give every object in the json its unique frebase ID using the admin SDK but i'm stuck trying to iterate through the json.
Should i use an csv instead?
Here is my python function, if i could get some assistance iterating through it i'd appreciate it:
import firebase_admin
import csv
from firebase_admin import credentials
from firebase_admin import db
cred = credentials.Certificate('./dev.json')
firebase_admin.initialize_app(cred, {
'databaseURL': 'https://url.com/'
})
ref = db.reference('mobile_user/')
// or csv
f = open('coupons.json')
data = []
for line in f:
//iteration here
ref.push(data)
print(data)

How to access certain values of a json site via python

This is the code i have so far:
import json
import requests
import time
endpoint = "https://www.deadstock.ca/collections/new-arrivals/products/nike-
air-max-1-cool-grey.json"
req = requests.get(endpoint)
reqJson = json.loads(req.text)
for id in reqJson['product']:
name = (id['title'])
print (name)
Feel free to visit the link, I'm trying to grab all the "id" value and print them out. They will be used later to send to my discord.
I tried with my above code but i have no idea how to actually get those values. I don't know which variable to use in the for in reqjson statement
If anyone could help me out and guide me to get all of the ids to print that would be awesome.
for product in reqJson['product']['title']:
ProductTitle = product['title']
print (title)
I see from the link you provided that the only ids that are in a list are actually part of the variants list under product. All the other ids are not part of a list and have therefore no need to iterate over. Here's an excerpt of the data for clarity:
{
"product":{
"id":232418213909,
"title":"Nike Air Max 1 \/ Cool Grey",
...
"variants":[
{
"id":3136193822741,
"product_id":232418213909,
"title":"8",
...
},
{
"id":3136193855509,
"product_id":232418213909,
"title":"8.5",
...
},
{
"id":3136193789973,
"product_id":232418213909,
"title":"9",
...
},
...
],
"image":{
"id":3773678190677,
"product_id":232418213909,
"position":1,
...
}
}
}
So what you need to do should be to iterate over the list of variants under product instead:
import json
import requests
endpoint = "https://www.deadstock.ca/collections/new-arrivals/products/nike-air-max-1-cool-grey.json"
req = requests.get(endpoint)
reqJson = json.loads(req.text)
for product in reqJson['product']['variants']:
print(product['id'], product['title'])
This outputs:
3136193822741 8
3136193855509 8.5
3136193789973 9
3136193757205 9.5
3136193724437 10
3136193691669 10.5
3136193658901 11
3136193626133 12
3136193593365 13
And if you simply want the product id and product name, they would be reqJson['product']['id'] and reqJson['product']['title'], respectively.

Categories

Resources