Splitting multiple Dictionaries within a Pandas Column - python

I'm trying to split a dictionary with a list within a pandas column but it isn't working for me...
The column looks like so when called;
df.topics[3]
Output
"[{'urlkey': 'webdesign', 'name': 'Web Design', 'id': 659}, {'urlkey': 'productdesign', 'name': 'Product Design', 'id': 2993}, {'urlkey': 'internetpro', 'name': 'Internet Professionals', 'id': 10102}, {'urlkey': 'web', 'name': 'Web Technology', 'id': 10209}, {'urlkey': 'software-product-management', 'name': 'Software Product Management', 'id': 42278}, {'urlkey': 'new-product-development-software-tech', 'name': 'New Product Development: Software & Tech', 'id': 62946}, {'urlkey': 'product-management', 'name': 'Product Management', 'id': 93740}, {'urlkey': 'internet-startups', 'name': 'Internet Startups', 'id': 128595}]"
I want to only be left with the 'name' and 'id' to put into separate columns of topic_1, topic_2, and so forth.
Appreciate any help.

You can give this a try.
import json
df.topics.apply(lambda x : {x['id']:x['name'] for x in json.loads(x.replace("'",'"'))} )
Your output for the row you gave is :
{659: 'Web Design',
2993: 'Product Design',
10102: 'Internet Professionals',
10209: 'Web Technology',
42278: 'Software Product Management',
62946: 'New Product Development: Software & Tech',
93740: 'Product Management',
128595: 'Internet Startups'}

You should try a simple method
dt = df.topic[3]
li = []
for x in range(len(dt)):
t = {dt[x]['id']:dt[x]['name']}
li.append(t)
print(li)
Output is-
[{659: 'Web Design'},
{2993: 'Product Design'},
{10102: 'Internet Professionals'},
{10209: 'Web Technology'},
{42278: 'Software Product Management'},
{62946: 'New Product Development: Software & Tech'},
{93740: 'Product Management'},
{128595: 'Internet Startups'}]
First we takes the value of df.topic[3] in dt which is in form of list and dictionary inside the list, then we take an temp list li[] in which we add(append) our values, Now we run the loop for the length of values of de.topic(which we takes as dt), Now in t we are adding id or name by dt[0]['id'] or dt[0]['name'] which is '659:'Web Design' as x increase all values are comes in t, then by { : }
we are converting the values in Dictionary and append it to the temporary list li

Related

Create and save a GSheet file with list of dictionaries

Have dict1 {subdict1,subdict2}, dict2 {subdict1,subdict2} and dict3 (doesnt have subdicts) into a list 'insights', need to create a gsheet file for each dict of 'insights' list but a sheet for each subdict, this is what its inside 'insights':
[{'city': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'city',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'gender',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'country',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'follower_demographics',
'period': 'lifetime',
'title': 'Follower demographics',
'type': 'age',
'description': 'The demographic characteristics of followers, including countries, cities and gender distribution.',
'df': ''}},
{'city': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'city',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'gender': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'gender',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'country': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'country',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''},
'age': {'name': 'reached_audience_demographics',
'period': 'lifetime',
'title': 'Reached audience demographics',
'type': 'age',
'description': 'The demographic characteristics of the reached audience, including countries, cities and gender distribution.',
'df': ''}},
{'name': 'follower_count',
'period': 'day',
'title': 'Follower Count',
'description': 'Total number of unique accounts following this profile',
'df': ''}]
As you can see in summary the list is the following:
insights = [
follower_demographics,
reached_demographics,
followers_count
]
And this is what each dictionary of the list have, in the case of 'follower_demographics' it breaks in a dictionary of ['city', 'gender', 'country', 'age'] where inside each one is this:
demographics = {
'name': '',
'period': '',
'title': '',
'type': '',
'description': '',
'df': ''
}
So I did the function below to create a file for the 3 dictionaries of 'insights', the problem is that it creates 4 files of 'follower_demographics' and each one with one respective dataframe.
def create_gsheet(insights, folder_id):
try:
# create a list to store the created files
files = []
# iterate over the items in the insights dictionary
for idx, (key, value) in enumerate(insights.items()):
# check if the value is a dictionary
if isinstance(value, dict):
# Create a new file with the name taken from the 'title' key
file = gc.create(value['title'], folder=folder_id)
print(f"Creating {value['title']} - {idx}/{len(insights)}")
# add the file to the list
files.append(file)
# Create a new sheet within the file with the name taken from the 'name' key
sheet = file.add_worksheet(value['type'] + '_' + value['name'])
# Set the sheet data to the df provided in the dictionary
sheet.set_dataframe(value['df'], (1,1), encoding='utf-8', fit=True)
sheet.frozen_rows = 1
# delete the default sheet1 from all the created files
for file in files:
file.del_worksheet(file.sheet1)
except Exception as error:
print(F'An error occurred: {error}')
sheet = None
And the result I want is that for example create 'follower_demographics' file and as sub_sheets 'city_follower_demographics', 'gender_follower_demographics' with their respective dataframes.

How to Match two APIs to update one API dataset using Python

I want to be able to GET information from API 1 and match it with API 2 and be able to update API 2's information with API 1. I am trying to figure out the most efficient/automated way to accomplish this as it also needs to be updated at a interval of every 10 minutes
I can query and get the results from API 1 this is my code and what my code looks like.
import json
import requests
myToken = '52c32f6588004cb3ab33b0ff320b8e4f'
myUrl = 'https://api1.com/api/v1/devices.json'
head = {'Authorization': 'Token {}'.format(myToken)}
response = requests.get(myUrl, headers=head)
r = json.loads(response.content)
r
The payload looks like this from API 1
{ "device" : {
"id": 153,
"battery_status" : 61,
"serial_no": "5QBYGKUI05",
"location_lat": "-45.948917",
"location_lng": "29.832179",
"location_address": "800 Laurel Rd, Lansdale, PA 192522,USA"}
}
I want to be able to take this information and match by "serial_no" and update all the other pieces of information for the corresponding device in API 2
I query the data for API 2 and this is what my code looks like
params = {
"location":'cf6707e3-f0ae-4040-a184-737b21a4bbd1',
"dateAdded":'ge:11/23/2020'}
url = requests.get('https://api2.com/api/assets',auth=('api2', '123456'), params=params)
r = json.loads(url.content)
r['items']
The JSON payload looks like this
[{'id': '064ca857-3783-460e-a7a2-245e054dcbe3',
'name': 'Apple Laptop 1',
'model': {'id': '50f5993e-2abf-49c8-86e0-8743dd58db6f',
'name': 'MacBook Pro'},
'manufacturer': {'id': 'f56244e2-76e3-46da-97dd-f72f92ca0779',
'name': 'APPLE'},
'room': {'id': '700ff2dc-0118-46c6-936a-01f0fa88c620',
'name': 'Storage Room 1',
'thirdPartyId': ''},
'location': {'id': 'cf6707e3-f0ae-4040-a184-737b21a4bbd1',
'name': 'Iron Mountain',
'thirdPartyId': ''},
'position': 'NonMounted',
'containerAsset': {'id': '00000000-0000-0000-0000-000000000000',
'name': None},
'baseAsset': {'id': '064ca857-3783-460e-a7a2-245e054dcbe3',
'name': 'Apple Laptop 1'},
'description': None,
'status': {'id': 'df9906d8-2856-45e3-9cba-bd7a1ac4971f',
'name': 'Production'},
'serialNumber': '5QBYGKUI06',
'tagNumber': None,
'alternateTagNumber': None,
'verificationStatus': {'id': 'cb3560a9-eef5-47b9-b033-394d3a09db18',
'name': 'Verified'},
'requiresRFID': False,
'requiresHangTag': False,
'bottomPosition': 0.0,
'leftPosition': 0.0,
'rackPosition': 'Front',
'labelX': None,
'labelY': None,
'verifyNameInRear': False,
'verifySerialNumberInRear': False,
'verifyBarcodeInRear': False,
'isNonDataCenter': False,
'rotate': False,
'customer': {'id': '00000000-0000-0000-0000-000000000000', 'name': None},
'thirdPartyId': '',
'temperature': None,
'dateLastScanned': None,
'placement': 'Floor',
'lastScannedLabelX': None,
'lastScannedLabelY': None,
'userDefinedValues': [{'userDefinedKeyId': '79e77a1e-4030-4308-a8ff-9caf40c04fbd',
'userDefinedKeyName': 'Longitude ',
'value': '-75.208917'},
{'userDefinedKeyId': '72c8056e-9b7d-40ac-9270-9f5929097e82',
'userDefinedKeyName': 'Address',
'value': '800 Laurel Rd, New York ,NY 19050, USA'},
{'userDefinedKeyId': '31aeeb91-daef-4364-8dd6-b0e3436d6a51',
'userDefinedKeyName': 'Battery Level',
'value': '67'},
{'userDefinedKeyId': '22b7ce4f-7d3d-4282-9ecb-e8ec2238acf2',
'userDefinedKeyName': 'Latitude',
'value': '35.932179'}]}
The documentation provided by API 2 tells me they only support PUT for updates as of right now but I would also want to know how I would do this using PATCH as it will be available in the future. So the data payload that I need to successful PUT is this
payload = {'id': '064ca857-3783-460e-a7a2-245e054dcbe3',
'name': 'Apple Laptop 1',
'model': {'id': '50f5993e-2abf-49c8-86e0-8743dd58db6f',
'name': 'MacBook Pro'},
'manufacturer': {'id': 'f56244e2-76e3-46da-97dd-f72f92ca0779',
'name': 'APPLE'},
'room': {'id': '700ff2dc-0118-46c6-936a-01f0fa88c620',
'name': 'Storage Room 1',
'thirdPartyId': ''},
'status': {'id': 'df9906d8-2856-45e3-9cba-bd7a1ac4971f',
'name': 'Production'},
'serialNumber': '5QBYGKUI06',
'verificationStatus': {'id': 'cb3560a9-eef5-47b9-b033-394d3a09db18',
'name': 'Verified'},
'requiresRFID': 'False',
'requiresHangTag': 'False',
'userDefinedValues': [{'userDefinedKeyId': '79e77a1e-4030-4308-a8ff-9caf40c04fbd',
'userDefinedKeyName': 'Longitude ',
'value': '-75.248920'},
{'userDefinedKeyId': '72c8056e-9b7d-40ac-9270-9f5929097e82',
'userDefinedKeyName': 'Address',
'value': '801 Laurel Rd, New York, Ny 192250, USA'},
{'userDefinedKeyId': '31aeeb91-daef-4364-8dd6-b0e3436d6a51',
'userDefinedKeyName': 'Battery Level',
'value': '67'},
{'userDefinedKeyId': '22b7ce4f-7d3d-4282-9ecb-e8ec2238acf2',
'userDefinedKeyName': 'Latitude',
'value': '29.782177'}]}
So apart of this is figuring out how I can query the json data portions that I need for the update
I am able to update the information using this line
requests.put('https://api2.com/api/assets/064ca857-3783-460e-a7a2-245e054dcbe3',auth=('API2', '123456'), data=json.dumps(payload))
but I need for it to dynamically update so I don't think the hard coded id parameter in the line will be efficient in a automation/efficiency standpoint. If anybody has any ideas, resources to point me in the right direction to know more about this process (I don't really know what it is even called) would be greatly appreciated.
Not entirely sure what you are trying to do here, but if you want to pull information nested in the responses you can do this.
Serial number from API 1
r['device']['serial_no']
Serial number for API 2
either r[0]['serialNumber'] or r['items'][0]['serialNumber'] depending on what you are showing
To modify the payload serial number, for example
payload['serialNumber'] = '123456abcdef'

Convert nested dictionary within JSON from a string

I have JSON data that I loaded that appears to have a bit of a messy data structure where nested dictionaries are wrapped in single quotes and recognized as a string, rather than a single dictionary which I can loop through. What is the best way to drop the single quotes from the key-value property ('value').
Provided below is an example of the structure:
for val in json_data:
print(val)
{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'},
If I add a nested look targeting ['value'], it loops by character and not key-value pair in the dictionary.
Using json.loads to convert string to dict
import json
json_data = [{'id': 'status6',
'title': 'Estimation',
'text': '> 2 days',
'type': 'color',
'value': '{"index":14,"post_id":null,"changed_at":"2020-06-12T09:04:58.659Z"}',
'name': 'Internal: online course'},
{'id': 'date',
'title': 'Deadline',
'text': '2020-06-26',
'type': 'date',
'value': '{"date":"2020-06-26","changed_at":"2020-06-12T11:33:37.195Z"}',
'name': 'Internal: online course'},
{'id': 'tags',
'title': 'Tags',
'text': 'Internal',
'type': 'tag',
'value': '{"tag_ids":[3223513]}',
'name': 'Internal: online course'}]
# the result is a Python dictionary:
for val in json_data:
print(json.loads(val['value']))
this should be work!!

How to access specific text from json data? [python]

I need to access the text attribute from this json data so i could end up having:
{'description': {'tags': ['outdoor', 'building', 'street', 'city', 'busy', 'people', 'filled', 'traffic', 'many', 'table', 'car', 'group', 'walking', 'bunch', 'crowded', 'large', 'night', 'light', 'standing', 'man', 'tall', 'umbrella', 'riding', 'sign', 'crowd'], 'captions': [{'text': 'a group of people on a city street filled with traffic at night', 'confidence': 0.8241405091548035}]}, 'requestId': '12fd327f-9b9c-4820-9feb-357a776211d3', 'metadata': {'width': 1826, 'height': 2436, 'format': 'Jpeg'}}
text = "The Text"
I Have tried doing parsed['captions']['text'] but this didnt work. Please let me know if you can help Thanks!
Two problems here. First, captions is under description, and second, text is a key of a dictionary inside a list (first and only item):
>>> import pprint
>>> pprint.pprint(parsed)
{'description': {'captions': [{'confidence': 0.8241405091548035,
'text': 'a group of people on a city street filled with traffic at night'}],
...
So, you could extract the text like this:
>>> parsed['description']['captions'][0]['text']
'a group of people on a city street filled with traffic at night'
Another option could be to use a 3rd-party library that simplifies traversing such JSON structures, for example plucky (full disclosure: I'm the author). With plucky, you can say:
>>> from plucky import pluckable
>>> pluckable(parsed).description.captions.text
['a group of people on a city street filled with traffic at night']
and not worry about dictionaries inside lists.
You can use python json library here, like below -
import json
your_json_string = "{'description': {'tags': ['outdoor', 'building', 'street', 'city', 'busy', 'people', 'filled', 'traffic', 'many', 'table', 'car', 'group', 'walking', 'bunch', 'crowded', 'large', 'night', 'light', 'standing', 'man', 'tall', 'umbrella', 'riding', 'sign', 'crowd'], 'captions': [{'text': 'a group of people on a city street filled with traffic at night', 'confidence': 0.8241405091548035}]}, 'requestId': '12fd327f-9b9c-4820-9feb-357a776211d3', 'metadata': {'width': 1826, 'height': 2436, 'format': 'Jpeg'}}"
data_dict = json.loads(your_json_string)
print(data_dict['description']['captions'][0]['text'])

Dictionaries within a list

suppose I have the following list of dictionaries:
database = [{'Job title': 'painter', 'Email address': 'xxx#yyy.com', 'Last name': 'Wright', 'First name': 'James', 'Company': 'Swift'},
{'Job title': 'plumber', 'Email address': 'xxx#yyy.com', 'Last name': 'Bright', 'First name': 'James', 'Company': 'ABD Plumbing'},
{'Job title': 'brick layer', 'Email address': 'xxx#yyy.com', 'Last name': 'Smith', 'First name': 'John', 'Company': 'Bricky brick'}]
I'm entering the following code so I can print information about a person given their first name (I will be changing this, to search for last name, company, job title etc, using a variable):
print(next(item for item in database if item['First name'] == 'James'))
The issue arises as I have two First name's which are equal, namely James. How do I adjust the code so that it prints out information about all the James's in the database?
Remove the next().
print([item for item in database if item['First name'] == 'James'])

Categories

Resources