python function to transform data to JSON - python

Can I check how do we convert the below to a dictionary?
code.py
message = event['Records'][0]['Sns']['Message']
print(message)
# this gives the below and the type is <class 'str'>
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
}
}
I would need to add in additional field called "status" : 1 such that it looks like this:
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Wanted to know what is the best way of doing this?
Update: I managed to do it for some reason.
I used ast.literal_eval(data) like below.
D2= ast.literal_eval(message)
D2["status"] =1
print(D2)
#This gives the below
{
"created_at":"Sat Jun 26 12:25:21 +0000 2021",
"id":1408763311479345152,
"text":"#test I\'m planning to buy the car today \ud83d\udd25\n\n",
"language":"en",
"author_details":{
"author_id":1384883875822907397,
"author_name":"\u1d04\u0280\u028f\u1d18\u1d1b\u1d0f\u1d04\u1d1c\u0299 x NFTs \ud83d\udc8e",
"author_username":"cryptocurrency_x009",
"author_profile_url":"https://xxxx.com",
"author_created_at":"Wed Apr 21 14:57:11 +0000 2021"
},
"id_displayed":"1",
"counter_emoji":{
},
"status": 1
}
Is there any better way to do this? Im not sure so wanted to check...

Can I check how do we convert the below to a dictionary?
As far as I can tell, the data = { } asigns a dictionary with content to the variable data.
I would need to add an additional field called "status" : 1 such that it looks like this
A simple update should do the trick.
data.update({"status": 1})

I found two issues when trying to deserialise the string as JSON
invalid escape I\\'m
unescaped newlines
These can worked around with
data = data.replace("\\'", "'")
data = re.sub('\n\n"', '\\\\n\\\\n"', data, re.MULTILINE)
d = json.loads(data)
There are also surrogate pairs in the data which may cause problems down the line. These can be fixed by doing
data = data.encode('utf-16', 'surrogatepass').decode('utf-16')
before calling json.loads.
Once the data has been deserialised to a dict you can insert the new key/value pair.
d['status'] = 1

Related

Understanding what happens when you use references of arrays and append them to dictionaries

If someone has a better question title, i'm all ears eyes...
I just spent a while on this problem and found the issue was my understanding. I had the following code below (note the comment marked by "## <<--").
It basically takes a dictionary (summaryTotal, e.g. data below) which contains a summary of alarms: both total counts and a list of the summarised alarms and some info about them. The actual alarms are within summaryTotal['Alarms'] which is an array of dictionaries. My function filters this full list of alarms by the alarm source and produces the same format as summaryTotal, but filtered.
The line of code I was having issue with was this one:
summaryFiltered['Alarms'].append(alarm)
In this line, alarm is really a reference to an element from the alarmsTotal list. The alarmsTotal list itself is a reference to summaryTotal['Alarms'].
When I used this line of code and appended alarm, the function at the bottom of the code system.util.jsonEncode (external function to change a python object into a json encoded python object - not too sure on the details), was always coming up with a 'too many recursive calls' error. When I changed what I was appending by creating essentially a new alarm dictionary within a new object and set it to the values within the actual alarm object, then the jsonEncode function started working and not raising recursion exceptions.
I'd like to be able to explain why that is?
When I'm appending alarm to my new array, I think I'm appending the reference of the summaryTotal['Alarms'] object to it, for example summaryTotal['Alarms'][20]... ?
summaryTotal = ['ActiveUnacked':0, 'ActiveAcked': 2, 'ClearUnacked': 23,
'Alarms':
[
{
"name": "Comms Fault",
"eventTime": "Fri Mar 05 12:25:27 ACDT 2021",
"label": "Comms Fault",
"displayPath": "Refrigeration MSB4 MCC PWM Comms Fault",
"source": "FolderA/FolderB/FolderC/MSB4 MCC PWM Comms OK",
"state": "Cleared, Unacknowledged"
},
{
"name": "Comms Fault",
"eventTime": "Fri Mar 05 12:28:46 ACDT 2021",
"label": "Comms Fault",
"displayPath": "Refrigeration MSB4 MCC PWM Comms Fault",
"source": "Folder1/Folder2/Folder3/MSB4 MCC PWM Comms OK",
"state": "Cleared, Unacknowledged"
}
]
]
alarmsTotal = summaryTotal['Alarms']
summaryFiltered = {'ActiveAcked':0, 'ActiveUnacked':0, 'ClearUnacked':0, 'Alarms':[]}
for alarm in alarmsTotal:
if pathFilter in alarm['source']:
alarmInfo = {'name':alarm['name'],
'label':alarm['label'],
'displayPath':alarm['displayPath'],
'source': alarm['source'],
'state': alarm['state'],
'eventTime': alarm['eventTime']
}
summaryFiltered['Alarms'].append(alarm) ## <<-- to fix the code, I added `alarmInfo` above and appended `alarmInfo` instead of `alarm`
if alarm['state'] == 'Cleared, Unacknowledged':
summaryFiltered['ClearUnacked'] += 1
if alarm['state'] == 'Active, Unacknowledged':
summaryFiltered['ActiveUnacked'] += 1
if alarm['state'] == 'Active, Acknowledged':
summaryFiltered['ActiveAcked'] += 1
ret = system.util.jsonEncode(summaryFiltered)

Python - Mongoengine: date range query

I am relatively new to mongoDb in python, so kindly help
I have created a collection called waste:
class Waste(Document):
meta = {'collection': 'Waste'}
item_id = IntField(required=True)
date_time_record = DateTimeField(default=datetime.utcnow)
waste_id = IntField(unique=True, required=True)
weight = FloatField(required= True)
I want to do a range query for a given start and end date:
I have tried the following query:
start = datetime(start_year, start_month, start_day)
end = datetime(end_year, end_month, end_day)
kwargs['date_time_record'] = {'$lte': end, '$gte': start}
reports = Waste.objects(**kwargs).get()
But I keep getting the error: DoesNotExist: Waste matching query does not exist.
the date value being sent as:
{
"start_year": 2020,
"start_month" : 5,
"start_day" : 10,
"end_year": 2020,
"end_month" : 5,
"end_day" : 20
}
when I try to get the first object from the collection, the output in json is:
{"_id": {"$oid": "5ebbcf126fdbb9db9f74d24a"}, "item_id": 96387295, "date_time_record": {"$date": 1589366546870}, "waste_id": 24764942, "weight": 32546.0}
a $date is added and I am unable to decipher the numbers in the date field. But when I look at the data using the mongo compass it looks just fine:
There exist a record in the given date range so I am unable to understand where am I going wrong.
I got this working by using Q:
the query I used is
reports = Waste.objects((Q(date_time_record__gte=start) & Q(date_time_record__lte=end)))
The response is:
[{"_id": {"$oid": "5ebbcf126fdbb9db9f74d24a"}, "item_id": 96387295, "date_time_record": {"$date": 1589366546870}, "waste_id": 24764942, "weight": 32546.0}]

Get sum of followers from Twitter for 10,000 users

I am trying to get the number of followers of each twitter user for a list of 10,000 users randomly selected via a random number generator. I am using the GET users/lookup API by Twitter to do this. I can't use the GET followers directly due to the limit of 15 requests/ 15 mins. I managed to do this but my data is not in json format and I've spent the entire day trying to get jq to process it (which fails due to special character within the envelope). Can anyone suggest how do I do this?
I had no issues doing the same with Twitter's streaming API as the data was JSON format which was parse-able by jq. I need to do this with jq, python & shell scripting.
MY CODE TO GET THIS RESPONSE: (Python)
#!/usr/bin/python
import json
import oauth2 as oauth
from random import randint
import sys
import time
ckey= '<insert-key>'
csecret= '<insert-key>'
atoken= '<insert-key>'
asecret= '<insert-key>'
consumer = oauth.Consumer(key= ckey, secret = csecret)
access_token = oauth.Token(key=atoken, secret=asecret)
client= oauth.Client(consumer, access_token)
# Print to a file
sys.stdout = open('TwitterREST.txt', 'w')
for j in range(0,20):
for i in range(0, 900):
user_id =(randint(2361391150, 2361416150))
f="https://api.twitter.com/1.1/users/lookup.json?user_id="
s=str(user_id)
timeline_endpoint = f+s
response,data=client.request(timeline_endpoint)
tweets= json.loads(data)
for tweet in tweets:
print (data)
# Twitter only allows 900 requests/15 mins
time.sleep(900)
SAMPLE RESPONSE RETURNED BY TWITTER: (Cannot pretty print it)
b'[ { "id":2361393867, "id_str":"2361393867", "name":"graam a7bab", "screen_name":"bedoo691", "location":"", "description":"\u0627\u0633\u062a\u063a\u0641\u0631\u0627\u0644\u0644\u0647 \u0648\u0627\u062a\u0648\u0628 \u0627\u0644\u064a\u0647\u0647 ..!*", "url":null, "entities": { "description": { "urls":[]} }, "protected":false, "followers_count":1, "friends_count":6, "listed_count":0, "created_at":"Sun Feb 23 19:03:21 +0000 2014", "favourites_count":1, "utc_offset":null, "time_zone":null, "geo_enabled":false, "verified":false, "statuses_count":7, "lang":"ar", "status": { "created_at":"Tue Mar 04 16:07:44 +0000 2014", "id":440881284383256576, "id_str":"440881284383256576", "text":"#Naif8989", "truncated":false, "entities":{ "hashtags":[], "symbols":[], "user_mentions":[ { "screen_name":"Naif8989", "name":"\u200f naif alharbi", "id":540343286, "id_str":"540343286", "indices":[0,9]}], "urls":[]}, "source":"\u003ca href=\"http:\/\/twitter.com\/download\/android\" rel=\"nofollow\"\u003eTwitter for Android\u003c\/a\u003e", "in_reply_to_status_id":437675858485321728, "in_reply_to_status_id_str":"437675858485321728", "in_reply_to_user_id":2361393867, "in_reply_to_user_id_str":"2361393867", "in_reply_to_screen_name":"bedoo691", "geo":null, "coordinates":null, "place":null, "contributors":null, "is_quote_status":false, "retweet_count":0, "favorite_count":0, "favorited":false, "retweeted":false, lang":"und"}, contributors_enabled":false, is_translator":false, is_translation_enabled":false, profile_background_color":"C0DEED", profile_background_image_url":"http:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png", profile_background_image_url_https":"https:\/\/abs.twimg.com\/images\/themes\/theme1\/bg.png", profile_background_tile":false, profile_image_url":"http:\/\/pbs.twimg.com\/profile_images\/437664693373911040\/ydODsIeh_normal.jpeg", profile_image_url_https":"https:\/\/pbs.twimg.com\/profile_images\/437664693373911040\/ydODsIeh_normal.jpeg", profile_link_color":"1DA1F2", profile_sidebar_border_color":"C0DEED", profile_sidebar_fill_color":"DDEEF6", profile_text_color":"333333", profile_use_background_image":true, has_extended_profile":false, default_profile":true, default_profile_image":false, following":false, follow_request_sent":false, "notifications":false, "translator_type":"none" } ]'

Use python to create a nested json

{"0":{"posted_date":"25 Jun 2015"},"1":{"posted_date":"26 Jun 2015"}}
Note:
that '0' and '1' are variable - 'count', the variable is generate through repeat/loop
"posted_date" is a string
"25 jun 2015" and "26 jun 2015" are also variable - 'date'
How to create a JSON output like above with python?
[edit-not working code]
import json
final = []
count = 0
postID = 224
while postID < 1200:
final.append({count: {"posted_ID":postID}})
count = count + 1
postID = postID * 2
print str(json.dumps(final))
import json
dates = ["25 Jun 2015", "26 Jun 2015", "27 Jun 2015"]
result = {}
for each, date in enumerate(dates):
result.update({each: {"posted_data": date}})
jsoned = json.dumps(result)
You don't need to use the "count" variable
First create the map the way you want it:
outMap = {}
outMap["0"]={}
outMap["0"]["posted_date"]="25 Jun 2015"
outMap["1"]={}
outMap["1"]["posted_date"]="26 Jun 2015"
Then use json.dumps() to get the json
import json
outjson = json.dumps(outMap)
print(outjson)

parse JSON values by multilevel keys

Yesterday, I have started with learning python. I want to parse some JSON values now. I have read many of tutorials and spent a lot of time on getting values by multilevel key (if I can call it like that) in my script but nothing works to me. Can you help me please?
This is my JSON output:
{
"future.arte.tv": [
{
"mediaUrl": "http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR",
"micropost": {
"html": "Berlin ",
"plainText": "Berlin"
},
"micropostUrl": "http://future.arte.tv/de/der-erste-weltkrieg-die-rolle-von-wissenschaft-und-technik",
"publicationDate": "Tue Jun 17 20:31:33 CEST 2014",
"relevance": 5.9615083,
"timestamp": 1403029893606,
"type": "image"
}
],
"www.zdf.de": [
{
"mediaUrl": "http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025",
"micropost": {
"plainText": "Berlin direkt"
},
"micropostUrl": "http://www.zdf.de/ZDFmediathek/hauptnavigation/sendung-a-bis-z",
"publicationDate": "Tue Jun 10 16:25:42 CEST 2014",
"relevance": 3.7259426,
"timestamp": 1402410342400,
"type": "image"
}
]
}
I need to get values stored in "mediaUrl" key so I tried to do
j = json.loads(jsonOutput)
keys = j.keys();
for key in keys:
print key # keys are future.arte.tv and www.zdf.de
print j[key]["mediaUrl"]
but print j[key]["mediaUrl"] causes this error:
TypeError: list indices must be integers, not str
so I tried to do print j[key][0] but the result is not as I wanted to have (I want to have just mediaUrl value... btw j[key][1] causes list index out of range error):
{u'micropostUrl': u'http://www.berlin.de/special/gesundheit-und-beauty/ernaehrung/1692726-215-spargelhoefe-in-brandenburg.html', u'mediaUrl': u'http://berlin.de/binaries/asset/image_assets/42859/ratio_4_3/1371638570/170x130/', u'timestamp': 1403862143675, u'micropost': {u'plainText': u'Spargel', u'html': u'Spargel '}, u'publicationDate': u'Fri Jun 27 11:42:23 CEST 2014', u'relevance': 1.6377668, u'type': u'image'}
Can you give me some advice please?
Here is a list comprehension that should do it
>>> [d[i][0].get('mediaUrl') for i in d.keys()]
['http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025',
'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR']
How it works
First you can get a list of the top-level keys
>>> d.keys()
['www.zdf.de', 'future.arte.tv']
Get the corresponding values
>>> [d[i] for i in d.keys()]
[[{'micropostUrl': 'http://www.zdf.de/ZDFmediathek/hauptnavigation/sendung-a-bis-z', 'mediaUrl': 'http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025', 'timestamp': 1402410342400L, 'micropost': {'plainText': 'Berlin direkt'}, 'publicationDate': 'Tue Jun 10 16:25:42 CEST 2014', 'relevance': 3.7259426, 'type': 'image'}], [{'micropostUrl': 'http://future.arte.tv/de/der-erste-weltkrieg-die-rolle-von-wissenschaft-und-technik', 'mediaUrl': 'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR', 'timestamp': 1403029893606L, 'micropost': {'plainText': 'Berlin', 'html': 'Berlin '}, 'publicationDate': 'Tue Jun 17 20:31:33 CEST 2014', 'relevance': 5.9615083, 'type': 'image'}]]
For each dictionary, grab the value for the 'mediaUrl' key
>>> [d[i][0].get('mediaUrl') for i in d.keys()]
['http://www.zdf.de/ZDFmediathek/contentblob/368/timg94x65blob/9800025',
'http://future.arte.tv/sites/default/files/styles/desktop-span12-940x529/public/berlin.jpg?itok=CvYlNekR']

Categories

Resources