I have a function in AWS Lambda that connects to the Twitter API and returns the tweets which match a specific search query I provided via the event. A simplified version of the function is below. There's a few helper functions I use like get_secret to manage API keys and process_tweet which limits what data gets sent back and does things like convert the created at date to a string. The net result is that I should get back a list of dictionaries.
def lambda_handler(event, context):
twitter_secret = get_secret("twitter")
auth = tweepy.OAuthHandler(twitter_secret['api-key'],
twitter_secret['api-secret'])
auth.set_access_token(twitter_secret['access-key'],
twitter_secret['access-secret'])
api = tweepy.API(auth)
cursor = tweepy.Cursor(api.search,
q=event['search'],
include_entities=True,
tweet_mode='extended',
lang='en')
tweets = list(cursor.items())
tweets = [process_tweet(t) for t in tweets if not t.retweeted]
return json.dumps({"tweets": tweets})
From my desktop then, I have code which invokes the lambda function.
aws_lambda = boto3.client('lambda', region_name="us-east-1")
payload = {"search": "paint%20protection%20film filter:safe"}
lambda_response = aws_lambda.invoke(FunctionName="twitter-searcher",
InvocationType="RequestResponse",
Payload=json.dumps(payload))
results = lambda_response['Payload'].read()
tweets = results.decode('utf-8')
The problem is that somewhere between json.dumpsing the output in lambda and reading the payload in Python, the data has gotten screwy. For example, a line break which should be \n becomes \\\\n, all of the double quotes are stored as \\" and Unicode characters are all prefixed by \\. So, everything that was escaped, when it was received by Python on my desktop with the escaping character being escaped. Consider this element of the list that was returned (with manual formatting).
'{\\"userid\\": 190764134,
\\"username\\": \\"CapitalGMC\\",
\\"created\\": \\"2018-09-02 15:00:00\\",
\\"tweetid\\": 1036267504673337344,
\\"text\\": \\"Protect your vehicle\'s paint! Find out how on this week\'s blog.
\\\\ud83d\\\\udc47\\\\n\\\\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW\\"}'
I can use regex to fix some problems (\\" and \\\\n) but the Unicode is tricky because even if I match it, how do I replace it with a properly escaped character? When I do this in R, using the aws.lambda package, everything is fine, no weird escaped escapes.
What am I doing wrong on my desktop with the response from AWS Lambda that's garbling the data?
Update
The process tweet function is below. It literally just pulls out the bits I care to keep, formats the datetime object to be a string and returns a dictionary.
def process_tweet(tweet):
bundle = {
"userid": tweet.user.id,
"username": tweet.user.screen_name,
"created": str(tweet.created_at),
"tweetid": tweet.id,
"text": tweet.full_text
}
return bundle
Just for reference, in R the code looks like this.
payload = list(search="paint%20protection%20film filter:safe")
results = aws.lambda::invoke_function("twitter-searcher"
,payload = jsonlite::toJSON(payload
,auto_unbox=TRUE)
,type = "RequestResponse"
,key = creds$key
,secret = creds$secret
,session_token = creds$session_token
,region = creds$region)
tweets = jsonlite::fromJSON(results)
str(tweets)
#> 'data.frame': 133 obs. of 5 variables:
#> $ userid : num 2231994854 407106716 33553091 7778772 782310 ...
#> $ username: chr "adaniel_080213" "Prestige_AdamL" "exclusivedetail" "tedhu" ...
#> $ created : chr "2018-09-12 14:07:09" "2018-09-12 11:31:56" "2018-09-12 10:46:55" "2018-09-12 07:27:49" ...
#> $ tweetid : num 1039878080968323072 1039839019989983232 1039827690151444480 1039777586975526912 1039699310382931968 ...
#> $ text : chr "I liked a #YouTube video https://url/97sRShN4pM Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film" "Another #Corvette #ZO6 full body clearbra wrap completed using #xpeltech ultimate plus PPF ... Paint protection"| __truncated__ "We recently protected this Tesla Model 3 with Paint Protection Film and Ceramic Coating.#teslamodel3 #charlotte"| __truncated__ "Tesla Model 3 - Front End Package - Suntek Ultra Paint Protection Film https://url/AD1cl5dNX3" ...
tweets[131,]
#> userid username created tweetid
#> 131 190764134 CapitalGMC 2018-09-02 15:00:00 1036267504673337344
#> text
#> 131 Protect your vehicle's paint! Find out how on this week's blog.👇\n\nhttps://url/XYMxPhVhdH https://url/mFL2Zv8nWW
In your lambda function you should return a response object with a JSON object in the response body.
# Lambda Function
def get_json(event, context):
"""Retrieve JSON from server."""
# Business Logic Goes Here.
response = {
"statusCode": 200,
"headers": {},
"body": json.dumps({
"message": "This is the message in a JSON object."
})
}
return response
Don't use json.dumps()
I had a similar issue, and when I just returned "body": content instead of "body": json.dumps(content) I could easily access and manipulate my data. Before that, I got that weird form that looks like JSON, but it's not.
Related
I'm trying to use Google Cloud Translation API for translating an excel (or csv) document that includes text in multiple languages and my target language is english.
I would like to use "Translate text in batches (Advanced edition only)" code sample (link here: https://cloud.google.com/translate/docs/samples/translate-v3-batch-translate-text) but in the code sample is a line that defines the source language so there can only be one source language.
But I need to detect the langugage first in the document and then translate the text to english. There is code sample for detecting language in a simple string of a text "Detecting languages (Advanced)" (link: https://cloud.google.com/translate/docs/advanced/detecting-language-v3) but I need to combine the first code sample that translates documents (but only has one source language defined) with the ability to detect language instead of having one source language defined.
Is there this type of code sample in the resources? How could this be solved?
Here is the sample code in question:
from google.cloud import translate
def batch_translate_text(
input_uri="gs://YOUR_BUCKET_ID/path/to/your/file.txt",
output_uri="gs://YOUR_BUCKET_ID/path/to/save/results/",
project_id="YOUR_PROJECT_ID",
timeout=180,
):
"""Translates a batch of texts on GCS and stores the result in a GCS location."""
client = translate.TranslationServiceClient()
location = "us-central1"
# Supported file types: https://cloud.google.com/translate/docs/supported-formats
gcs_source = {"input_uri": input_uri}
input_configs_element = {
"gcs_source": gcs_source,
"mime_type": "text/plain", # Can be "text/plain" or "text/html".
}
gcs_destination = {"output_uri_prefix": output_uri}
output_config = {"gcs_destination": gcs_destination}
parent = f"projects/{project_id}/locations/{location}"
# Supported language codes: https://cloud.google.com/translate/docs/language
operation = client.batch_translate_text(
request={
"parent": parent,
"source_language_code": "en",
"target_language_codes": ["ja"], # Up to 10 language codes here.
"input_configs": [input_configs_element],
"output_config": output_config,
}
)
print("Waiting for operation to complete...")
response = operation.result(timeout)
print("Total Characters: {}".format(response.total_characters))
print("Translated Characters: {}".format(response.translated_characters))
Unfortunately it is not possible to pass array of values to field source_language_code using batchTranslateText. What I could suggest is to perform detectLanguage and translateText per file.
What the code below does is:
It extracts the content to be translated. For testing purposes the the csv files used only have 1 column and content for sample1.csv is in tl(Tagalog) and sample2.csv is in es(Spanish).
Pass the extracted content to detect_language() to get detected language code.
Pass all the required parameters to translate_text() to translate
NOTE: The code below is only tested using csv files with one column. Edit the code at main() to pattern on what column you would like to extract data.
from google.cloud import translate
import csv
def listToString(s):
""" Transform list to string"""
str1 = " "
return (str1.join(s))
def detect_language(project_id,content):
"""Detecting the language of a text string."""
client = translate.TranslationServiceClient()
location = "global"
parent = f"projects/{project_id}/locations/{location}"
response = client.detect_language(
content=content,
parent=parent,
mime_type="text/plain", # mime types: text/plain, text/html
)
for language in response.languages:
return language.language_code
def translate_text(text, project_id,source_lang):
"""Translating Text."""
client = translate.TranslationServiceClient()
location = "global"
parent = f"projects/{project_id}/locations/{location}"
# Detail on supported types can be found here:
# https://cloud.google.com/translate/docs/supported-formats
response = client.translate_text(
request={
"parent": parent,
"contents": [text],
"mime_type": "text/plain", # mime types: text/plain, text/html
"source_language_code": source_lang,
"target_language_code": "en-US",
}
)
# Display the translation for each input text provided
for translation in response.translations:
print("Translated text: {}".format(translation.translated_text))
def main():
project_id="your-project-id"
csv_files = ["sample1.csv","sample2.csv"]
# Perform your content extraction here if you have a different file format #
for csv_file in csv_files:
csv_file = open(csv_file)
read_csv = csv.reader(csv_file)
content_csv = []
for row in read_csv:
content_csv.extend(row)
content = listToString(content_csv) # convert list to string
detect = detect_language(project_id=project_id,content=content)
translate_text(text=content,project_id=project_id,source_lang=detect)
if __name__ == "__main__":
main()
sample1.csv:
kamusta
ayos
sample2.csv:
cómo estás
okey
Output using the code above:
Translated text: how are you okay
Translated text: how are you ok
I am using microsoft graph api to pull my emails in python and return them as a json object. There is a limitation that it only returns 12 emails. The code is:
def get_calendar_events(token):
graph_client = OAuth2Session(token=token)
# Configure query parameters to
# modify the results
query_params = {
#'$select': 'subject,organizer,start,end,location',
#'$orderby': 'createdDateTime DESC'
'$select': 'sender, subject',
'$skip': 0,
'$count': 'true'
}
# Send GET to /me/events
events = graph_client.get('{0}/me/messages'.format(graph_url), params=query_params)
events = events.json()
# Return the JSON result
return events
The response I get are twelve emails with subject and sender, and total count of my email.
Now I want iterate over emails changing the skip in query_params to get the next 12. Any method of how to iterate it using loops or recursion.
I'm thinking something along the lines of this:
def get_calendar_events(token):
graph_client = OAuth2Session(token=token)
# Configure query parameters to
# modify the results
json_list = []
ct = 0
while True:
query_params = {
#'$select': 'subject,organizer,start,end,location',
#'$orderby': 'createdDateTime DESC'
'$select': 'sender, subject',
'$skip': ct,
'$count': 'true'
}
# Send GET to /me/events
events = graph_client.get('{0}/me/messages'.format(graph_url), params=query_params)
events = events.json()
json_list.append(events)
ct += 12
# Return the JSON result
return json_list
May require some tweaking but essentially you're adding 12 to the offset each time as long as it doesn't return an error. Then it appends the json to a list and returns that.
If you know how many emails you have, you could also batch it that way.
I'm trying to convert a working python program that retrieves information from a website with an api key to its R equivalent. Since I don't know much about httr or python it's a challenge. The python code is (somewhat abbreviated and with a dummy X-ApiKey)
url = 'https://api.clarivate.com/api/woslite'
query = 'ts=((land AND ocean AND climate AND change)) AND PY=2013-2019'
count = 100
firstRecord = 1
parameters = {'databaseId': 'WOK', 'usrQuery': query, 'count': count, 'firstRecord': firstRecord}
headers={'accept':'application/json','X-ApiKey':'********'}
response = requests.get(url,params=parameters, headers=headers)
My attempt at an R version is
library(httr)
wosliteKey <- Sys.getenv("wosliteKey")
firstRecord <- 1
count <- 100
url <- 'https://api.clarivate.com/api/woslite'
query <- 'ts=(land AND ocean AND climate AND change) AND PY=2013-2019'
r <- GET(url, query = list(api_key = wosliteKey, usrQuery = query, databaseId = 'WOK', count = count, firstRecord = firstRecord))
Running the above returns
Response [https://api.clarivate.com/api/woslite]
Date: 2019-05-09 22:50
Status: 401
Content-Type: application/json; charset=utf-8
Size: 41 B
status 401 means unauthorized access. The python code uses X-ApiKey rather than api_key. But I can't figure out a. what the difference is and b. how to put it into the query list.
With the help of the comments above I figured out how to make this work. The python version of httr GET looks like the following
response = requests.get(url,params=parameters, headers=headers)
For my problem, I have the following from the python program
parameters = {'databaseId': 'WOK', 'usrQuery': query, 'count': count, 'firstRecord': firstRecord}
headers={'accept':'application/json','X-ApiKey':'********'}
The equivalent with httr GET is
response <- httr::GET(url, httr::add_headers(accept = 'application/json', `X-APIKey` = wosliteKey), query = list(databaseId = 'WOK', usrQuery = query, count = count, firstRecord = firstRecord))
The python headers info is replaced by the add_headers function from httr. The python parameters info is added as a list in the query option.
I searched for a while, i found something looks working with your problem.
android package - R.attr nested class -- employing an api key
I'm trying to duplicate some subscription Plans in my Stripe account using the Python SDK, so that I can update the plan prices (plans are immutable). I can successfully list all plans, so authentication is not an issue.:
import stripe
stripe.api_key = 'sk_test_...'
start_id = None
while True:
if start_id:
resp = stripe.Plan.list(starting_after=start_id)
else:
resp = stripe.Plan.list()
plans = resp['data']
if len(plans) == 0: break
start_id = plans[-1]['id']
for plan in plans:
new_amount = get_new_plan_amount(plan['id'], plan['name'])
new_plan = {
"id": "%s-v2" % plan["id"],
"name": "%s V2" % plan["name"],
"amount": new_amount,
"interval": plan["interval"],
"currency": plan["currency"],
}
if plan['interval_count']:
new_plan["interval_count"] = plan['interval_count']
if plan['metadata']:
new_plan["metadata"] = plan['metadata']
if plan['statement_descriptor']:
new_plan["statement_descriptor"] = plan['statement_descriptor']
stripe.Plan.create(new_plan) ### error
When I try to update a plan I get the following error:
Stripe.error.AuthenticationError: Invalid API Key provided: "{'****': *'*********** ', '********': *'*****', '********_*****': *, '********': '', '******': *****, '**': *'*******-v2'}". This key contains at least one space. Please delete the spaces and try again.
I don't get it. Which field is it that contains a space? I've checked the id field (which it seems to be suggesting) but there are no spaces in that field.
I'm using Python 2.7 and version 2013-08-13 of the Stripe API.
You need to unpack the dictionary containing your parameters when creating the plan:
stripe.Plan.create(**new_plan)
Additionally, you don't need to manage pagination parameters yourself. The Python library can do it for you using auto-pagination:
plans = stripe.Plan.list()
for plan in plans.auto_paging_iter():
# do something with plan
**When i send a request like --
f = urllib.urlopen(https://www.googleapis.com/plus/v1/people/103777531434977807649/activities/public?key=*************** )
json=f.read()
print json
it returns some thing like this not the required json
{
"kind": "plus#activityFeed",
"etag": "\"seVFOlIgH91k2i-GrbizYfaw_AM/chWYjTdvKRLG9yxkeAfrCrofGHk\"",
"nextPageToken": "CAIQ__________9_IAAoAA",
"title": "Google+ List of Activities for Collection PUBLIC",
"items": []
}
what i have to do to get the right response????
this is the code:
import json
f = urllib.urlopen('https://www.googleapis.com/plus/v1/people/'+id+'/activities /public?key=*****************&maxResults=100')
s = f.read()
f.close()
ss=json.loads(s)
print ss
try:
nextpagetoken=str(ss['nextPageToken'])
i=0
str_current_datetime=str(datetime.now())
gp_crawldate=str_current_datetime.split(" ")[0]
gp_groupid=id
db = MySQLdb.connect("localhost","root","****","googleplus" )
cursor=db.cursor()
while i<len(ss['items']):
gp_kind=str(ss['items'][i]['kind'])
gp_title=str(ss['items'][i]['title'].encode('utf8'))
gp_published=str(ss['items'][i]['published'][0:10])
check=int(cool(str(ss['items'][i]['published'][0:19])))#this method is defined in the code
gp_activityid=str(ss['items'][i]['id'])
gp_actorid=str(ss['items'][i]['actor']['id'])
gp_verb=str(ss['items'][i]['verb'])
gp_objecttype=str(ss['items'][i]['object']['objectType'])
gp_originalcontent=str(ss['items'][i]['object']['content'].encode('utf8'))
gp_totalreplies=str(ss['items'][i]['object']['replies']['totalItems'])
gp_totalplusone=str(ss['items'][i]['object']['plusoners']['totalItems'])
gp_totalreshare=str(ss['items'][i]['object']['resharers']['totalItems'])
#gp_geocode=str(ss['items'][i]['geocode'])
#gp_placename=str(ss['items'][i]['placeName'])
i=i+1
is the any change in g+api???
The response you posted is a correct response. If the items field is an empty list, then the user that you are fetching the posts for has probably never posted anything publicly. In this case, I confirmed that the user has no public posts simply by visiting their profile.