How to pick up data from json objects in python? - python

I am trying to pick Instances in the json objects data which looks like this
[{'Groups': [], 'Instances': [{'AmiLaunchIndex': 0, 'ImageId': 'ami-0ceecbb0f30a902a6', 'InstanceId': 'i-xxxxx', 'InstanceType': 't2.micro', 'KeyName': 'xxxx', 'LaunchTime': {'$date': '2022-12-17T13:07:54Z'}, 'Monitoring': {'State': 'disabled'}, 'Placement': {'AvailabilityZone': 'us-west-2b', 'GroupName': '', 'Tenancy': 'default'}, 'PrivateDnsName': 'ip-zxxxxx.us-west-2.compute.internal', 'PrivateIpAddress': 'xxxxx', 'ProductCodes': [], 'PublicDnsName': 'ec2-xx-xxx-xxx.us-west-2.compute.amazonaws.com', 'PublicIpAddress': 'xxxxxx', 'State': {'Code': 16, 'Name': 'running'}, 'StateTransitionReason': '', 'SubnetId': 'subnet-xxxxx', 'VpcId': 'vpc-xxxxx', 'Architecture': 'x86_64', 'BlockDeviceMappings': [{'DeviceName': '/dev/xvda', 'Ebs': {'AttachTime': {'$date': '2022-12-17T13:07:55Z'}, 'DeleteOnTermination': True, 'Status': 'attached', 'VolumeId': 'vol-xxxx'}}], 'ClientToken': '529fc1ac-bf64-4804-b0b8-7c7778ace68c', 'EbsOptimized': False, 'EnaSupport': True, 'Hypervisor': 'xen', 'NetworkInterfaces': [{'Association': {'IpOwnerId': 'amazon', 'PublicDnsName': 'ec2-35-86-111-31.us-west-2.compute.amazonaws.com', 'PublicIp': 'xxxxx'}, 'Attachment': {'AttachTime': {'$date': '2022-12-17T13:07:54Z'}, 'AttachmentId': 'eni-attach-0cac7d4af20664b23', 'DeleteOnTermination': True, 'DeviceIndex': 0, 'Status': 'attached', 'NetworkCardIndex': 0}, 'Description': '', 'Groups': [{'GroupName': 'launch-wizard-5', 'GroupId': 'sg-xxxxx'}], 'Ipv6Addresses': [], 'MacAddress': 'xxxxx', 'NetworkInterfaceId': 'eni-xxxxx', 'OwnerId': 'xxxx', 'PrivateDnsName': 'ip-xxxxx.us-west-2.compute.internal', 'PrivateIpAddress': 'xxx.xxx.xxx', 'PrivateIpAddresses': [{'Association': {'IpOwnerId': 'amazon', 'PublicDnsName': 'ec2-xx-xx-xx-xxx.us-west-2.compute.amazonaws.com', 'PublicIp': 'xxx.xxx.xxx'}, 'Primary': True, 'PrivateDnsName': 'ip-172-31-20-187.us-west-2.compute.internal', 'PrivateIpAddress': 'xxx.xxx.xxx'}], 'SourceDestCheck': True, 'Status': 'in-use', 'SubnetId': 'subnet-xxxxxxx', 'VpcId': 'vpc-0b09cd4sedxxx', 'InterfaceType': 'interface'}], 'RootDeviceName': '/dev/xvda', 'RootDeviceType': 'ebs', 'SecurityGroups': [{'GroupName': 'launch-wizard-5', 'GroupId': 'sg-0a0d1c79d8076660e'}], 'SourceDestCheck': True, 'Tags': [{'Key': 'Name', 'Value': 'MainServers'}], 'VirtualizationType': 'hvm', 'CpuOptions': {'CoreCount': 1, 'ThreadsPerCore': 1}, 'CapacityReservationSpecification': {'CapacityReservationPreference': 'open'}, 'HibernationOptions': {'Configured': False}, 'MetadataOptions': {'State': 'applied', 'HttpTokens': 'optional', 'HttpPutResponseHopLimit': 1, 'HttpEndpoint': 'enabled', 'HttpProtocolIpv6': 'disabled', 'InstanceMetadataTags': 'disabled'}, 'EnclaveOptions': {'Enabled': False}, 'PlatformDetails': 'Linux/UNIX', 'UsageOperation': 'RunInstances', 'UsageOperationUpdateTime': {'$date': '2022-12-17T13:07:54Z'}, 'PrivateDnsNameOptions': {'HostnameType': 'ip-name', 'EnableResourceNameDnsARecord': True, 'EnableResourceNameDnsAAAARecord': False}, 'MaintenanceOptions': {'AutoRecovery': 'default'}}], 'OwnerId': '76979cfxdsss11', 'ReservationId': 'r-xxxxx'}]
I tired loading data and doing
resp = json.loads(jsonfile)
reqData= resp['Instances']
But getting error
TypeError: list indices must be integers or slices, not str
Is there any way I can fix this and get the data? Help will be extremely appriciated.

It's wrapped inside a list. So simply do:
print(lst[0]["Instances"])

To select only the instances from the data, you can use the json.loads function to parse the JSON data and extract the Instances field as a list.
import json
# Parse the JSON data
data = json.loads(json_data)
# Extract the instances
instances = data['Instances']
You can then iterate over the data with something like this
for instance in instances:
instance_id = instance['InstanceId']
instance_type = instance['InstanceType']
launch_time = instance['LaunchTime']

Related

How to return all dictionary values and keys from function

need some help..
I have data like:
data dict:
{'folder_id': 13, 'type': 'local', 'read': False, 'last_modification_date': 1653588426,
'creation_date': 1653408009, 'status': 'completed', 'uuid': 'ebaacb26-654b-04de-b5b3-
337b5bb66eea9e74cdc5db8168c5', 'shared': False, 'user_permissions': 128, 'owner':
'email#com.com', 'timezone': 'Europe/Brusel', 'rrules': 'FREQ=WEEKLY;INTERVAL=1;BYDAY=TU',
'starttime': '20220211T190000', 'enabled': True, 'control': True, 'live_results': 0, 'name':
'6666-Test-VM01', 'id': 286}
{'folder_id': 2, 'type': 'local', 'read': True, 'last_modification_date': 1653587386,
'creation_date': 1653399006, 'status': 'completed', 'uuid': 'fbbfbca1-ba55-31cf-7de9-
984ab3aa2ce13223a2ed8c1b45a7', 'shared': False, 'user_permissions': 128, 'owner':
'info#test.com', 'timezone': 'Europe/Brusel', 'rrules': 'FREQ=WEEKLY;INTERVAL=1;BYDAY=TU',
'starttime': '20220215T163000', 'enabled': True, 'control': True, 'live_results': 0, 'name':
'testvm-3', 'id': 83}
{'folder_id': 2, 'type': 'local', 'read': False, 'last_modification_date': 1653587306,
'creation_date': 1653348635, 'status': 'completed', 'uuid': '762ba2a8-9e73-957c-4934-
ae919c7148453dbdd9ea1e07b423', 'shared': False, 'user_permissions': 128, 'owner':
'kitty#xelo.lt', 'timezone': 'Europe/Brusel', 'rrules': 'FREQ=WEEKLY;INTERVAL=1;BYDAY=TU',
'starttime': '20220209T023000', 'enabled': True, 'control': True, 'live_results': 0, 'name':
'111-Vm3000test', 'id': 264}
I'm trying to return dictionary with all data inside, but it's returning only first or last value, how to get all data from this dictionary, my code:
class Testclass():
def __init__(self):
self.resources = {}
def get_data(self):
for data in data_dict:
self.resources['folder_id'] = data['folder_id']
self.resources['name'] = data['name']
return self.resources
output of this code:
{'folders_id': 13, 'name': '6666-Test-VM01'}
why i didint get all anothers?
Expected output:
{'folders_id': 13, 'name': '6666-Test-VM01'}
{'folders_id': 2, 'name': 'testvm-3'}
{'folders_id': 2, 'name': '111-Vm3000test'}
Assuming data_dict is a list of dictionaries, let's do the same for resources, you can do something like:
def __init__(self):
self.resources = []
def get_data(self):
for data in data_dict:
new_dict = {}
new_dict['folder_id'] = data['folder_id']
new_dict['name'] = data['name']
self.resources.append(new_dict)
return self.resources

how to retrieve a link from a discord message?

i'm trying to create a program, which needs to read messages from a discord bot and retrieve links from these messages.
here's the code:
import requests
import json
from bs4 import builder
import bs4
def retrieve_messages(channelid):
headers = {
'authorization': 'NTQ5OTM4ODEzOTUxMTQ4MDQ3.YMi7CQ.fOm6F-dmPJPEW0dehLwCkB_ilBU'
}
r = requests.get(f'https://discord.com/api/v9/channels/{channelid}/messages', headers=headers)
jsonn = json.loads(r.text)
for value in jsonn:
print(value, '\n')
retrieve_messages('563699841377763348')
here's the output:
{'id': '908857015412084796', 'type': 0, 'content': '<#&624528614330859520>', 'channel_id': '5636998413777633, 2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on48', 'author': {'id': '749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [], 'mentions': []'pinned': False, 'mention_everyone': False, 'tts': Fa, 'mention_roles': ['624528614330859520'], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timest}amp': '2021-11-12T23:13:18.221000+00:00', 'edited_timestamp': None, 'flags': 0, 'components': []}
{'id': '908857014430629898', 'type': 0, 'content': '', 'channel_id': '563699841377763348', 'author': {'id':
'749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [{'type': 'rich', 'title': '<:GoldenKey:273763771929853962> Borderlands 1: 5 gold keys', 'description': 'Platform: Universal\nExpires: 30 November,
2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on the [website](https://shift.gearboxsoftware.com/rewards) or in game.\n\n[Source](https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation)', 'color': 16040976}], 'mentions': [], 'mention_roles': [], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timestamp': '2021-11-12T23:13:17.987000+00:00', 'edited_timestamp': None, 'flags': 1, 'components': []}
in the output there are 2 links, but I need to save the second link to a variable, and I'm wondering how I can do that
This is easiest done with the response body as a text object that can be scanned with regex to find the URLs
Solution
The variable test_case_data is the response body in TEXT form as a string.
import re
regex = r"(http|ftp|https):\/\/([\w_-]+(?:(?:\.[\w_-]+)+))([\w.,#?^=%&:\/~+#-]*[\w#?^=%&\/~+#-])"
def find_embedded_urls(data):
return re.finditer(regex,data)
test_case_data = """'id': '908857014430629898', 'type': 0, 'content': '', 'channel_id': '563699841377763348', 'author': {'id':
'749499357761503284', 'username': 'shift', 'avatar': 'de9cd6f3224e660a4b6906a89fc2bc15', 'discriminator': '6125', 'public_flags': 0, 'bot': True}, 'attachments': [], 'embeds': [{'type': 'rich', 'title': '<:GoldenKey:273763771929853962> Borderlands 1: 5 gold keys', 'description': 'Platform: Universal\nExpires: 30 November,
2021.```\n5J53T-BKJK5-CTXBZ-JJJTJ-WW6F3```Redeem on the [website](https://shift.gearboxsoftware.com/rewards) or in game.\n\n[Source](https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation)', 'color': 16040976}], 'mentions': [], 'mention_roles': [], 'pinned': False, 'mention_everyone': False, 'tts': False, 'timestamp': '2021-11-12T23:13:17.987000+00:00', 'edited_timestamp': None, 'flags': 1, 'components': []}"""
# test_case_data = response.text
matches = find_embedded_urls(test_case_data)
matches = [match[0] for match in matches] #convert all urls to strings
print(matches) # List of all the urls! Index for whatever one you need
Output
['https://shift.gearboxsoftware.com/rewards', 'https://shift.orcicorn.com/shift-code/5j53t-bkjk5-ctxbz-jjjtj-ww6f3/?utm_source=json&utm_medium=shift&utm_campaign=automation']
With the URLs as a list index, you can set variables by indexing the list at whatever point you need.

How to get the authors name from Google books api?

When I search for the book using this link https://www.googleapis.com/books/v1/volumes?q=9780310709626 I get the author name in the details.
However when I run my code and print items I don't see the author name. I've been trying to figure out why it doesn't show from the data but I don't see any problem with my code.
print(searchBooks("9780310709626"))
def getBooks(id):
url = "https://www.googleapis.com/books/v1/volumes?q=isbn:"
resp = url(api + id)
data = json.load(resp)
print(data["items"])
My code output:
[{'kind': 'books#volume', 'id': 'JEP3sgEACAAJ', 'etag': '92vdEneJ83g', 'selfLink': 'https://www.googleapis.com/books/v1/volumes/JEP3sgEACAAJ', 'volumeInfo': {'title': "The Beginner's Bible", 'subtitle': 'Timeless Bible Stories', 'publisher': 'Zondervan', 'publishedDate': '2005', 'description': 'Retells familiar Bible stories from the Old and New Testaments for children to enjoy.', 'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0310709628'}, {'type': 'ISBN_13', 'identifier': '9780310709626'}], 'readingModes': {'text': False, 'image': False}, 'pageCount': 511, 'printType': 'BOOK', 'categories': ['Juvenile Nonfiction'], 'averageRating': 4.5, 'ratingsCount': 2, 'maturityRating': 'NOT_MATURE', 'allowAnonLogging': False, 'contentVersion': 'preview-1.0.0', 'panelizationSummary': {'containsEpubBubbles': False, 'containsImageBubbles': False}, 'imageLinks': {'smallThumbnail': 'http://books.google.com/books/content?id=JEP3sgEACAAJ&printsec=frontcover&img=1&zoom=5&source=gbs_api', 'thumbnail': 'http://books.google.com/books/content?id=JEP3sgEACAAJ&printsec=frontcover&img=1&zoom=1&source=gbs_api'}, 'language': 'en', 'previewLink': 'http://books.google.com.tw/books?id=JEP3sgEACAAJ&dq=isbn:9780310709626&hl=&cd=1&source=gbs_api', 'infoLink': 'http://books.google.com.tw/books?id=JEP3sgEACAAJ&dq=isbn:9780310709626&hl=&source=gbs_api', 'canonicalVolumeLink': 'https://books.google.com/books/about/The_Beginner_s_Bible.html?hl=&id=JEP3sgEACAAJ'}, 'saleInfo': {'country': 'TW', 'saleability': 'NOT_FOR_SALE', 'isEbook': False}, 'accessInfo': {'country': 'TW', 'viewability': 'NO_PAGES', 'embeddable': False, 'publicDomain': False, 'textToSpeechPermission': 'ALLOWED', 'epub': {'isAvailable': False}, 'pdf': {'isAvailable': False}, 'webReaderLink': 'http://play.google.com/books/reader?id=JEP3sgEACAAJ&hl=&printsec=frontcover&source=gbs_api', 'accessViewStatus': 'NONE', 'quoteSharingAllowed': False}, 'searchInfo': {'textSnippet': 'Retells familiar Bible stories from the Old and New Testaments for children to enjoy.'}}, {'kind': 'books#volume', 'id': 'ZRgnzQEACAAJ', 'etag': 'RXYM4Rbwx+g', 'selfLink': 'https://www.googleapis.com/books/v1/volumes/ZRgnzQEACAAJ', 'volumeInfo': {'title': "The Beginner's Bible", 'authors': ['Catherine DeVries'], 'publishedDate': '2005', 'industryIdentifiers': [{'type': 'ISBN_10', 'identifier': '0310709628'}, {'type': 'ISBN_13', 'identifier': '9780310709626'}], 'readingModes': {'text': False, 'image': False}, 'pageCount': 511, 'printType': 'BOOK', 'averageRating': 4, 'ratingsCount': 1, 'maturityRating': 'NOT_MATURE', 'allowAnonLogging': False, 'contentVersion': 'preview-1.0.0', 'panelizationSummary': {'containsEpubBubbles': False, 'containsImageBubbles': False}, 'language': 'en', 'previewLink': 'http://books.google.com.tw/books?id=ZRgnzQEACAAJ&dq=isbn:9780310709626&hl=&cd=2&source=gbs_api', 'infoLink': 'http://books.google.com.tw/books?id=ZRgnzQEACAAJ&dq=isbn:9780310709626&hl=&source=gbs_api', 'canonicalVolumeLink': 'https://books.google.com/books/about/The_Beginner_s_Bible.html?hl=&id=ZRgnzQEACAAJ'}, 'saleInfo': {'country': 'TW', 'saleability': 'NOT_FOR_SALE', 'isEbook': False}, 'accessInfo': {'country': 'TW', 'viewability': 'NO_PAGES', 'embeddable': False, 'publicDomain': False, 'textToSpeechPermission': 'ALLOWED', 'epub': {'isAvailable': False}, 'pdf': {'isAvailable': False}, 'webReaderLink': 'http://play.google.com/books/reader?id=ZRgnzQEACAAJ&hl=&printsec=frontcover&source=gbs_api', 'accessViewStatus': 'NONE', 'quoteSharingAllowed': False}}]
With the requests library:
import requests
url = 'https://www.googleapis.com/books/v1/volumes?q=9780310709626'
resp = requests.get(url)
json = resp.json()
print(json['items'][0]['volumeInfo']['authors'])
From the response you can see that authors is an array. To reach that array you will need to do json['items'][0]['volumeInfo']['authors'].
As items is also an array, meaning that there could be multiple items in this response. You might want to write extra code to deal with that other than hard-code index=0.
Note that in this case, you probably won't know the schema of the response. You should handle unexpected behaviors. For some certain books maybe some keys are missing, json['items'] could be an empty array, or even items is not in the response at all.

Python: trouble updating metadata in Salesforce report Rest API request to get report over 2000 rows

I want to pull a report which is over 2000 rows from Salesforce via API using python. How do I update the post request to send the updated metadata with the new filters in order to get the next 2000 rows of data? Here is the code I have, but the response of the post-request has the same exact filters as before. What am I doing wrong here?
Excerpt of Code:
headers = {
'Content-type': 'application/json',
'Accept-Encoding': 'gzip',
'Authorization': 'Bearer %s' % access_token
}
parameters={}
descripion = requests.request('get', instance_url+'/services/data/v51.0/analytics/reports/00O4Q000009VEPCUA4/describe',
headers=headers, params=parameters, timeout=30).json()
orig_metadata = descripion['reportMetadata']
id_column='CUST_NAME'
last_load_num='162451'
sf_id_column = descripion['reportExtendedMetadata']['detailColumnInfo'][id_column]['label']
print(sf_id_column)
metadata = {
'reportBooleanFilter': '({}) AND {}'.format(orig_metadata['reportBooleanFilter'],
len(orig_metadata['reportFilters']) + 1),
'reportFilters': orig_metadata['reportFilters']+[{'column':id_column,
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'greaterThan',
'value': last_load_num}],
'standardDateFilter':[{'column': 'CUST_CREATED_DATE','durationValue': 'CUSTOM',
'endDate': '2021-07-14','startDate': '2021-07-01'}],
'detailColumns': orig_metadata['detailColumns'][:],
'sortBy': [{'sortColumn': id_column, 'sortOrder': 'Asc'}],
}
r=requests.request('post', instance_url+'/services/data/v51.0/analytics/reports/00O4Q000009VEPCUA4',
headers=headers, params={'metadata':metadata}, timeout=30).json()
Here is what's in the original metadata:
{'aggregates': ['s!rtms__Load__c.rtms__Carrier_Quote_Total__c', 'RowCount'],
'chart': None,
'crossFilters': [],
'currency': None,
'dashboardSetting': None,
'description': None,
'detailColumns': ['CUST_NAME',
'CUST_CREATED_DATE',
'rtms__Load__c.rtms__Expected_Ship_Date2__c',
'rtms__Load__c.rtms__Load_Status__c',
'rtms__Load__c.rtms__Total_Weight__c',
'rtms__Load__c.rtms__Equipment_Type__c',
'rtms__Load__c.rtms__Origin__c',
'rtms__Load__c.rtms__Destination__c',
'rtms__Load__c.rtms__Zip3_Lane__c',
'rtms__Load__c.rtms__Zip5_Lane__c',
'rtms__Load__c.rtms__Carrier_Quote_Total__c',
'rtms__Load__c.rtms__Customer_Quote_Total__c'],
'developerName': 'Adel_Past_Shipment_Test_Pricing_Tool',
'division': None,
'folderId': '00l1U000000eXWwQAM',
'groupingsAcross': [],
'groupingsDown': [],
'hasDetailRows': True,
'hasRecordCount': True,
'historicalSnapshotDates': [],
'id': '00O4Q000009VEPCUA4',
'name': 'Adel Past Shipment Test Pricing Tool',
'presentationOptions': {'hasStackedSummaries': True},
'reportBooleanFilter': None,
'reportFilters': [{'column': 'rtms__Load__c.rtms__Customer__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'contains',
'value': 'adel'},
{'column': 'rtms__Load__c.rtms__Load_Status__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'notContain',
'value': 'cancelled'}],
'reportFormat': 'TABULAR',
'reportType': {'label': 'Loads', 'type': 'CustomEntity$rtms__Load__c'},
'scope': 'organization',
'showGrandTotal': True,
'showSubtotals': True,
'sortBy': [{'sortColumn': 'CUST_CREATED_DATE', 'sortOrder': 'Desc'}],
'standardDateFilter': {'column': 'CUST_CREATED_DATE',
'durationValue': 'CUSTOM',
'endDate': None,
'startDate': None},
'standardFilters': None,
'supportsRoleHierarchy': False,
'userOrHierarchyFilterId': None}
And here is what's in r['reportMetadata']:
{'aggregates': ['s!rtms__Load__c.rtms__Carrier_Quote_Total__c', 'RowCount'],
'chart': None,
'crossFilters': [],
'currency': None,
'dashboardSetting': None,
'description': None,
'detailColumns': ['CUST_NAME',
'CUST_CREATED_DATE',
'rtms__Load__c.rtms__Expected_Ship_Date2__c',
'rtms__Load__c.rtms__Load_Status__c',
'rtms__Load__c.rtms__Total_Weight__c',
'rtms__Load__c.rtms__Equipment_Type__c',
'rtms__Load__c.rtms__Origin__c',
'rtms__Load__c.rtms__Destination__c',
'rtms__Load__c.rtms__Zip3_Lane__c',
'rtms__Load__c.rtms__Zip5_Lane__c',
'rtms__Load__c.rtms__Carrier_Quote_Total__c',
'rtms__Load__c.rtms__Customer_Quote_Total__c'],
'developerName': 'Adel_Past_Shipment_Test_Pricing_Tool',
'division': None,
'folderId': '00l1U000000eXWwQAM',
'groupingsAcross': [],
'groupingsDown': [],
'hasDetailRows': True,
'hasRecordCount': True,
'historicalSnapshotDates': [],
'id': '00O4Q000009VEPCUA4',
'name': 'Adel Past Shipment Test Pricing Tool',
'presentationOptions': {'hasStackedSummaries': True},
'reportBooleanFilter': None,
'reportFilters': [{'column': 'rtms__Load__c.rtms__Customer__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'contains',
'value': 'adel'},
{'column': 'rtms__Load__c.rtms__Load_Status__c',
'filterType': 'fieldValue',
'isRunPageEditable': True,
'operator': 'notContain',
'value': 'cancelled'}],
'reportFormat': 'TABULAR',
'reportType': {'label': 'Loads', 'type': 'CustomEntity$rtms__Load__c'},
'scope': 'organization',
'showGrandTotal': True,
'showSubtotals': True,
'sortBy': [{'sortColumn': 'CUST_CREATED_DATE', 'sortOrder': 'Desc'}],
'standardDateFilter': {'column': 'CUST_CREATED_DATE',
'durationValue': 'CUSTOM',
'endDate': None,
'startDate': None},
'standardFilters': None,
'supportsRoleHierarchy': False,
'userOrHierarchyFilterId': None}
code image

TypeError: an integer is required when select subset of rows dataframe pandas

{'contributors': None,
'coordinates': None,
'created_at': 'Tue Aug 02 19:51:58 +0000 2016',
'entities': {'hashtags': [],
'symbols': [],
'urls': [],
'user_mentions': [{'id': 873491544,
'id_str': '873491544',
'indices': [0, 13],
'name': 'Kenel M',
'screen_name': 'KxSweaters13'}]},
'favorite_count': 1,
'favorited': False,
'geo': None,
'id': 760563814450491392,
'id_str': '760563814450491392',
'in_reply_to_screen_name': 'KxSweaters13',
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': 873491544,
'in_reply_to_user_id_str': '873491544',
'is_quote_status': False,
'lang': 'en',
'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
'place': {'attributes': {},
'bounding_box': {'coordinates': [[[-71.813501, 42.4762],
[-71.702186, 42.4762],
[-71.702186, 42.573956],
[-71.813501, 42.573956]]],
'type': 'Polygon'},
'contained_within': [],
'country': 'Australia',
'country_code': 'AUS',
'full_name': 'Melbourne, V',
'id': 'c4f1830ea4b8caaf',
'name': 'Melbourne',
'place_type': 'city',
'url': 'https://api.twitter.com/1.1/geo/id/c4f1830ea4b8caaf.json'},
'retweet_count': 0,
'retweeted': False,
'source': 'Twitter for Android',
'text': '#KxSweaters13 are you the kenelx13 I see owning leominster for team valor?',
'truncated': False,
'user': {'contributors_enabled': False,
'created_at': 'Thu Apr 21 17:09:52 +0000 2011',
'default_profile': False,
'default_profile_image': False,
'description': "Arbys when it's cold. Kimballs when it's warm. #Ally__09 all year. Comp sci classes sometimes.",
'entities': {'description': {'urls': []}},
'favourites_count': 1106,
'follow_request_sent': None,
'followers_count': 167,
'following': None,
'friends_count': 171,
'geo_enabled': True,
'has_extended_profile': False,
'id': 285715182,
'id_str': '285715182',
'is_translation_enabled': False,
'is_translator': False,
'lang': 'en',
'listed_count': 2,
'location': 'MA',
'name': 'Steve',
'notifications': None,
'profile_background_color': '131516',
'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme14/bg.gif',
'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme14/bg.gif',
'profile_background_tile': True,
'profile_banner_url': 'https://pbs.twimg.com/profile_banners/285715182/1462218226',
'profile_image_url': 'http://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
'profile_image_url_https': 'https://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
'profile_link_color': '4A913C',
'profile_sidebar_border_color': 'FFFFFF',
'profile_sidebar_fill_color': 'EFEFEF',
'profile_text_color': '333333',
'profile_use_background_image': True,
'protected': False,
'screen_name': 'StephenBurke_',
'statuses_count': 5913,
'time_zone': 'Eastern Time (US & Canada)',
'url': None,
'utc_offset': -14400,
'verified': False}}
I have a json file which contains a list of json objects (each has the structure like above)
So I read it into a dataframe:
df = pd.read_json('data.json')
and then I try to get all the rows which are the 'city' type by:
df = df[df['place']['place_type'] == 'city']
but then I got the 'TypeError: an integer is required' During handling of the above exception, another exception occurred: KeyError: 'place_type'
Then I tried:
df['place'].head(3)
=>
0 {'id': '01864a8a64df9dc4', 'url': 'https://api...
1 {'id': '01864a8a64df9dc4', 'url': 'https://api...
2 {'id': '0118c71c0ed41109', 'url': 'https://api...
Name: place, dtype: object
So df['place'] return a series where keys are the indexes and that's why I got the TypeError
I've also tried to select the place_type of the first row and it works just fine:
df.iloc[0]['place']['place_type']
=>
city
The question is how can I filter out the rows in this case?
Solution:
Okay, so the problem lies in the fact that the pd.read_json cannot deal with nested JSON structure, so what I have done is to normalize the json object:
with open('data.json') as jsonfile:
data = json.load(jsonfile)
df = pd.io.json.json_normalize(data)
df = df[df['place.place_type'] == 'city']
You can use the a list comprehension to do the filtering you need.
df = [loc for loc in df if d['place']['place_type'] == 'city']
This will give you an array where the elements place_type is 'city'.
I don't know if you have to use the place_type that is the index, to show all the rows that contains city.
"and then I try to get all the rows which are the city type by:"
This way you can get all the rows that contains city in the column place:
df = df[(df['place'] == 'city')]

Categories

Resources