Accessing value boto3 dictionary response - python

I'm having an error trying to get a value out of data returned from boto3. I'm able to print the entire response (see below) but can't figure out what I need to do to get the NetworkInterfaceId out of response.
I'm running this in Python 2.7.5 because that's what the instances that need to run it have by default. I'm new to python so hope I'm missing some simple, thanks for your help!
Error
TypeError: list indices must be integers, not str
Code
#!/usr/bin/python
import boto3
ec2 = boto3.client('ec2')
response = ec2.describe_route_tables(
RouteTableIds=[
"rtb-4a1efc23",
],
Filters=[
{
'Name': 'route.destination-cidr-block',
'Values': [
"172.29.0.0/16",
]
},
]
)
#print(response)
print(response["RouteTables"][0]["Routes"]["NetworkInterfaceId"])
Response
{'ResponseMetadata': {'RetryAttempts': 0, 'HTTPStatusCode': 200, 'RequestId': 'a8e7ba60-7599-450a-a708-8d90e429d59e', 'HTTPHeaders': {'transfer-encoding': 'chunked', 'vary': 'Accept-Encoding', 'server': 'AmazonEC2', 'content-type': 'text/xml;charset=UTF-8', 'date': 'Wed, 08 Feb 2017 11:51:47 GMT'}}, u'RouteTables': [{u'Associations': [{u'SubnetId': 'subnet-d7040aaf', u'RouteTableAssociationId': 'rtbassoc-867a94ef', u'Main': False, u'RouteTableId': 'rtb-4a1efc23'}, {u'SubnetId': 'subnet-e0fcd3aa', u'RouteTableAssociationId': 'rtbassoc-9f7a94f6', u'Main': False, u'RouteTableId': 'rtb-4a1efc23'}], u'RouteTableId': 'rtb-4a1efc23', u'VpcId': 'vpc-0d00e264', u'PropagatingVgws': [{u'GatewayId': 'vgw-fcf479cc'}], u'Tags': [{u'Value': 'pub', u'Key': 'Name'}], u'Routes': [{u'GatewayId': 'local', u'DestinationCidrBlock': '172.28.0.0/16', u'State': 'active', u'Origin': 'CreateRouteTable'}, {u'Origin': 'CreateRoute', u'DestinationCidrBlock': '172.29.0.0/16', u'InstanceId': 'i-0b84e502d9dc49443', u'NetworkInterfaceId': 'eni-08f55373', u'State': 'active', u'InstanceOwnerId': '444456106883'}, {u'Origin': 'CreateRoute', u'DestinationCidrBlock': '172.31.0.0/16', u'InstanceId': 'i-0b84e502d9dc49443', u'NetworkInterfaceId': 'eni-08f55373', u'State': 'active', u'InstanceOwnerId': '444456106883'}, {u'GatewayId': 'igw-7b03e012', u'DestinationCidrBlock': '0.0.0.0/0', u'State': 'active', u'Origin': 'CreateRoute'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.114.112.192/27', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.114.210.160/27', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.138.172.32/27', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.138.172.96/27', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.114.105.128/26', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.115.80.0/26', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.115.131.0/26', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.138.17.128/26', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.138.83.64/26', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.138.180.128/26', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}, {u'GatewayId': 'vgw-fcf479cc', u'DestinationCidrBlock': '10.192.0.0/16', u'State': 'active', u'Origin': 'EnableVgwRoutePropagation'}]}]}

From the documentation, you can find that Routes is a list. If you want to fetch the NetworkInterfaceId, you should loop through Routes.
for route in response["RouteTables"][0]['Routes']:
if 'NetworkInterfaceId' in route:
print route['NetworkInterfaceId']
Note that NetworkInterfaceId may or may not be present in the response. I figured that out from the response you have pasted here.

Related

How to pick up data from json objects in python?

I am trying to pick Instances in the json objects data which looks like this
[{'Groups': [], 'Instances': [{'AmiLaunchIndex': 0, 'ImageId': 'ami-0ceecbb0f30a902a6', 'InstanceId': 'i-xxxxx', 'InstanceType': 't2.micro', 'KeyName': 'xxxx', 'LaunchTime': {'$date': '2022-12-17T13:07:54Z'}, 'Monitoring': {'State': 'disabled'}, 'Placement': {'AvailabilityZone': 'us-west-2b', 'GroupName': '', 'Tenancy': 'default'}, 'PrivateDnsName': 'ip-zxxxxx.us-west-2.compute.internal', 'PrivateIpAddress': 'xxxxx', 'ProductCodes': [], 'PublicDnsName': 'ec2-xx-xxx-xxx.us-west-2.compute.amazonaws.com', 'PublicIpAddress': 'xxxxxx', 'State': {'Code': 16, 'Name': 'running'}, 'StateTransitionReason': '', 'SubnetId': 'subnet-xxxxx', 'VpcId': 'vpc-xxxxx', 'Architecture': 'x86_64', 'BlockDeviceMappings': [{'DeviceName': '/dev/xvda', 'Ebs': {'AttachTime': {'$date': '2022-12-17T13:07:55Z'}, 'DeleteOnTermination': True, 'Status': 'attached', 'VolumeId': 'vol-xxxx'}}], 'ClientToken': '529fc1ac-bf64-4804-b0b8-7c7778ace68c', 'EbsOptimized': False, 'EnaSupport': True, 'Hypervisor': 'xen', 'NetworkInterfaces': [{'Association': {'IpOwnerId': 'amazon', 'PublicDnsName': 'ec2-35-86-111-31.us-west-2.compute.amazonaws.com', 'PublicIp': 'xxxxx'}, 'Attachment': {'AttachTime': {'$date': '2022-12-17T13:07:54Z'}, 'AttachmentId': 'eni-attach-0cac7d4af20664b23', 'DeleteOnTermination': True, 'DeviceIndex': 0, 'Status': 'attached', 'NetworkCardIndex': 0}, 'Description': '', 'Groups': [{'GroupName': 'launch-wizard-5', 'GroupId': 'sg-xxxxx'}], 'Ipv6Addresses': [], 'MacAddress': 'xxxxx', 'NetworkInterfaceId': 'eni-xxxxx', 'OwnerId': 'xxxx', 'PrivateDnsName': 'ip-xxxxx.us-west-2.compute.internal', 'PrivateIpAddress': 'xxx.xxx.xxx', 'PrivateIpAddresses': [{'Association': {'IpOwnerId': 'amazon', 'PublicDnsName': 'ec2-xx-xx-xx-xxx.us-west-2.compute.amazonaws.com', 'PublicIp': 'xxx.xxx.xxx'}, 'Primary': True, 'PrivateDnsName': 'ip-172-31-20-187.us-west-2.compute.internal', 'PrivateIpAddress': 'xxx.xxx.xxx'}], 'SourceDestCheck': True, 'Status': 'in-use', 'SubnetId': 'subnet-xxxxxxx', 'VpcId': 'vpc-0b09cd4sedxxx', 'InterfaceType': 'interface'}], 'RootDeviceName': '/dev/xvda', 'RootDeviceType': 'ebs', 'SecurityGroups': [{'GroupName': 'launch-wizard-5', 'GroupId': 'sg-0a0d1c79d8076660e'}], 'SourceDestCheck': True, 'Tags': [{'Key': 'Name', 'Value': 'MainServers'}], 'VirtualizationType': 'hvm', 'CpuOptions': {'CoreCount': 1, 'ThreadsPerCore': 1}, 'CapacityReservationSpecification': {'CapacityReservationPreference': 'open'}, 'HibernationOptions': {'Configured': False}, 'MetadataOptions': {'State': 'applied', 'HttpTokens': 'optional', 'HttpPutResponseHopLimit': 1, 'HttpEndpoint': 'enabled', 'HttpProtocolIpv6': 'disabled', 'InstanceMetadataTags': 'disabled'}, 'EnclaveOptions': {'Enabled': False}, 'PlatformDetails': 'Linux/UNIX', 'UsageOperation': 'RunInstances', 'UsageOperationUpdateTime': {'$date': '2022-12-17T13:07:54Z'}, 'PrivateDnsNameOptions': {'HostnameType': 'ip-name', 'EnableResourceNameDnsARecord': True, 'EnableResourceNameDnsAAAARecord': False}, 'MaintenanceOptions': {'AutoRecovery': 'default'}}], 'OwnerId': '76979cfxdsss11', 'ReservationId': 'r-xxxxx'}]
I tired loading data and doing
resp = json.loads(jsonfile)
reqData= resp['Instances']
But getting error
TypeError: list indices must be integers or slices, not str
Is there any way I can fix this and get the data? Help will be extremely appriciated.
It's wrapped inside a list. So simply do:
print(lst[0]["Instances"])
To select only the instances from the data, you can use the json.loads function to parse the JSON data and extract the Instances field as a list.
import json
# Parse the JSON data
data = json.loads(json_data)
# Extract the instances
instances = data['Instances']
You can then iterate over the data with something like this
for instance in instances:
instance_id = instance['InstanceId']
instance_type = instance['InstanceType']
launch_time = instance['LaunchTime']

how to add fields in google cloud fire store using python

i have two Json data data 1 and data 2 and i want to merge it in fire store document when i try to merge two dicts only half part is showing .
data = {
u'name': u'Los Angeles',
u'state': u'CA',
u'country': u'USA'
}
data2= {
u'name': u'MIAMI',
u'state': u'CA',
u'country': u'USA'
}
db.collection(u'cities').document(u'LA').set(data)
city_ref = db.collection(u'cities').document(u'LA')
city_ref.set({
u'name': u'MIAMI',
u'state': u'CA',
u'country': u'USA'
}, merge=True)
#only this part is showing
{
u'name': u'MIAMI',
u'state': u'CA',
u'country': u'USA'
}
#when i, doing this
data = {
u'name': u'Los Angeles',
u'state': u'CA',
u'country': u'USA',
u'name': u'MIAMI',
u'state': u'CA',
u'country': u'USA'}
#only this much part is showing in my field
u'name': u'MIAMI',
u'state': u'CA',
u'country': u'USA'
is there any way to merge this two in python
You can only have one value per field name. Merge is useful when you want to add additional fields to the same document, for example:
data = {
u'name': u'Los Angeles',
u'state': u'CA',
u'country': u'USA'
}
db.collection(u'cities').document(u'LA').set(data)
city_ref = db.collection(u'cities').document(u'LA')
city_ref.set({
u'name2': u'MIAMI',
u'state2': u'CA',
u'country2': u'USA'
}, merge=True)
# Result
data = {
u'country': u'USA',
u'country2': u'USA',
u'name': u'Los Angeles',
u'name2': u'MIAMI',
u'state': u'CA',
u'state2': u'CA',}
You can use this code( it will add the new fields if they don't exist and update them if they exist already).
doc.reference.update({
u'newField1': newValue1
u'newField2': newValue2
})

TypeError: an integer is required when select subset of rows dataframe pandas

{'contributors': None,
'coordinates': None,
'created_at': 'Tue Aug 02 19:51:58 +0000 2016',
'entities': {'hashtags': [],
'symbols': [],
'urls': [],
'user_mentions': [{'id': 873491544,
'id_str': '873491544',
'indices': [0, 13],
'name': 'Kenel M',
'screen_name': 'KxSweaters13'}]},
'favorite_count': 1,
'favorited': False,
'geo': None,
'id': 760563814450491392,
'id_str': '760563814450491392',
'in_reply_to_screen_name': 'KxSweaters13',
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': 873491544,
'in_reply_to_user_id_str': '873491544',
'is_quote_status': False,
'lang': 'en',
'metadata': {'iso_language_code': 'en', 'result_type': 'recent'},
'place': {'attributes': {},
'bounding_box': {'coordinates': [[[-71.813501, 42.4762],
[-71.702186, 42.4762],
[-71.702186, 42.573956],
[-71.813501, 42.573956]]],
'type': 'Polygon'},
'contained_within': [],
'country': 'Australia',
'country_code': 'AUS',
'full_name': 'Melbourne, V',
'id': 'c4f1830ea4b8caaf',
'name': 'Melbourne',
'place_type': 'city',
'url': 'https://api.twitter.com/1.1/geo/id/c4f1830ea4b8caaf.json'},
'retweet_count': 0,
'retweeted': False,
'source': 'Twitter for Android',
'text': '#KxSweaters13 are you the kenelx13 I see owning leominster for team valor?',
'truncated': False,
'user': {'contributors_enabled': False,
'created_at': 'Thu Apr 21 17:09:52 +0000 2011',
'default_profile': False,
'default_profile_image': False,
'description': "Arbys when it's cold. Kimballs when it's warm. #Ally__09 all year. Comp sci classes sometimes.",
'entities': {'description': {'urls': []}},
'favourites_count': 1106,
'follow_request_sent': None,
'followers_count': 167,
'following': None,
'friends_count': 171,
'geo_enabled': True,
'has_extended_profile': False,
'id': 285715182,
'id_str': '285715182',
'is_translation_enabled': False,
'is_translator': False,
'lang': 'en',
'listed_count': 2,
'location': 'MA',
'name': 'Steve',
'notifications': None,
'profile_background_color': '131516',
'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme14/bg.gif',
'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme14/bg.gif',
'profile_background_tile': True,
'profile_banner_url': 'https://pbs.twimg.com/profile_banners/285715182/1462218226',
'profile_image_url': 'http://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
'profile_image_url_https': 'https://pbs.twimg.com/profile_images/727223698332200961/bGPjGjHK_normal.jpg',
'profile_link_color': '4A913C',
'profile_sidebar_border_color': 'FFFFFF',
'profile_sidebar_fill_color': 'EFEFEF',
'profile_text_color': '333333',
'profile_use_background_image': True,
'protected': False,
'screen_name': 'StephenBurke_',
'statuses_count': 5913,
'time_zone': 'Eastern Time (US & Canada)',
'url': None,
'utc_offset': -14400,
'verified': False}}
I have a json file which contains a list of json objects (each has the structure like above)
So I read it into a dataframe:
df = pd.read_json('data.json')
and then I try to get all the rows which are the 'city' type by:
df = df[df['place']['place_type'] == 'city']
but then I got the 'TypeError: an integer is required' During handling of the above exception, another exception occurred: KeyError: 'place_type'
Then I tried:
df['place'].head(3)
=>
0 {'id': '01864a8a64df9dc4', 'url': 'https://api...
1 {'id': '01864a8a64df9dc4', 'url': 'https://api...
2 {'id': '0118c71c0ed41109', 'url': 'https://api...
Name: place, dtype: object
So df['place'] return a series where keys are the indexes and that's why I got the TypeError
I've also tried to select the place_type of the first row and it works just fine:
df.iloc[0]['place']['place_type']
=>
city
The question is how can I filter out the rows in this case?
Solution:
Okay, so the problem lies in the fact that the pd.read_json cannot deal with nested JSON structure, so what I have done is to normalize the json object:
with open('data.json') as jsonfile:
data = json.load(jsonfile)
df = pd.io.json.json_normalize(data)
df = df[df['place.place_type'] == 'city']
You can use the a list comprehension to do the filtering you need.
df = [loc for loc in df if d['place']['place_type'] == 'city']
This will give you an array where the elements place_type is 'city'.
I don't know if you have to use the place_type that is the index, to show all the rows that contains city.
"and then I try to get all the rows which are the city type by:"
This way you can get all the rows that contains city in the column place:
df = df[(df['place'] == 'city')]

Python - Parsing Json file and getting multiple values from dictionaries in list

json file and I successfully parsed it as you can see below. What I want is Get the Id Of ['Users'] and get ['Photos'] ['Url'] part for the related Id.
My .json output
{u'Success': True,
u'Total': 172159,
u'Users': [{u'AboutMe': u'U\xe7mak i\xe7in ku\u015f olmak gerekmiyor, k\xfc\xe7\xfck sevin\xe7ler olsun yeter.',
u'Age': 34,
u'Education': None,
u'EyeColor': u'Mavi',
u'Gender': 2,
u'HairColor': u'A\xe7\u0131k kahve',
u'Height': 183,
u'Id': u'19185978',
u'IsHot': False,
u'IsOnline': True,
u'Job': u'Serbest meslek',
u'JobId': None,
u'LastActivityDate': u'2018-03-07T03:43:50.53855Z',
u'Location': u'\u0130zmir - Merkez',
u'LookingFor': None,
u'MaritalStatus': u'Single',
u'MaritalStatusId': None,
u'Photo': None,
u'Photos': [{u'CreateDate': u'0001-01-01T00:00:00',
u'Id': None,
u'PhotoName': None,
u'State': None,
u'Url': u'https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/a2f/a2fe1237-e0e1-4456-bd7b-b1d55bc8f00e.jpg.jpg'},
{u'CreateDate': u'0001-01-01T00:00:00',
u'Id': None,
u'PhotoName': None,
u'State': None,
u'Url': u'https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/87f/87fba6a5-8555-4b53-968b-678f832fd28f.jpg.jpg'},
{u'CreateDate': u'0001-01-01T00:00:00',
u'Id': None,
u'PhotoName': None,
u'State': None,
u'Url': u'https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/18d/18d3d6bc-97ec-49c3-80d4-57d4e58d020f.jpg.jpg'},
{u'CreateDate': u'0001-01-01T00:00:00',
u'Id': None,
u'PhotoName': None,
u'State': None,
u'Url': u'https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/eba/eba25a06-4168-49cf-b0cb-e501d0efb965.jpg.jpg'}],
u'RelationshipType': u'E-Posta Arkada\u015fl\u0131\u011f\u0131, , , ',
u'StatusMessage': u'Siz istiyorsunuz ki her \u015fey benim istedi\u011fim gibi olsun, herkes pe\u015fimden ko\u015fsun. Ama her zaman \xf6yle olmuyor.',
u'TownName': None,
u'Username': u'45ahmet35',
u'Weight': 85,
u'Zodiac': u'Ko\xe7',
u'ZodiacId': None},
{u'AboutMe': None,
u'Age': 42,
u'Education': None,
u'EyeColor': u'Kahverengi',
u'Gender': 2,
u'HairColor': u'K\u0131rla\u015fm\u0131\u015f',
u'Height': 175,
u'Id': u'19274893',
u'IsHot': False,
u'IsOnline': True,
u'Job': u'',
u'JobId': None,
u'LastActivityDate': u'2018-03-07T03:43:24.555Z',
u'Location': u'\u0130zmir - Alia\u011fa',
u'LookingFor': None,
u'MaritalStatus': u'Single',
u'MaritalStatusId': None,
u'Photo': None,
u'Photos': [{u'CreateDate': u'0001-01-01T00:00:00',
u'Id': None,
u'PhotoName': None,
u'State': None,
u'Url': u'https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/bf0/bf0fbad6-076a-4924-b496-5385044c08bc.jpg.jpg'},
{u'CreateDate': u'0001-01-01T00:00:00',
u'Id': None,
u'PhotoName': None,
u'State': None,
u'Url': u'https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/998/99893a8e-5342-441f-bd83-82fa20bdb27a.jpg.jpg'},
],
u'RelationshipType': u'',
u'StatusMessage': u'Yok ',
u'TownName': None,
u'Username': u'kaya3510',
u'Weight': 80,
u'Zodiac': u'Ko\xe7',
u'ZodiacId': None},
And My Python Code
import json
json_obj = json.load(open("13.json"))
for i in json_obj['Users']:
print i['Id']
print i['Photos']['Url']
And the Error I get.
19185978
Traceback (most recent call last):
File "/root/Desktop/siberAlem/parser.py", line 7, in
print i['Photos']['Url']
TypeError: list indices must be integers, not str
Thanks in advance.
This should help:
for i in json_obj['Users']:
print i["Id"]
for j in i["Photos"]:
print j["Url"]
Output:
19185978
https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/a2f/a2fe1237-e0e1-4456-bd7b-b1d55bc8f00e.jpg.jpg
https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/87f/87fba6a5-8555-4b53-968b-678f832fd28f.jpg.jpg
https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/18d/18d3d6bc-97ec-49c3-80d4-57d4e58d020f.jpg.jpg
https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/eba/eba25a06-4168-49cf-b0cb-e501d0efb965.jpg.jpg
19274893
https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/bf0/bf0fbad6-076a-4924-b496-5385044c08bc.jpg.jpg
https://diymyqt2ncnnc.cloudfront.net/s3/traktorumnetphotos/998/99893a8e-5342-441f-bd83-82fa20bdb27a.jpg.jpg

Reading Json objects from text file into pandas

I have extracted json objects from an api library and wrote them into a text file. I am now stuck on how to take the json structure saved in the .txt file and read that back into python pandas library.
There are many resources that walk through how to import a json file into pandas but since this is a text file and I'm new to programming and working with json structure I'm not sure how to efficiently perform this task.
There are numerous json objects in the text file and I would share an example but it has a bunch of url shorteners which is preventing me from being able to post this question so unless someone really needs to see the structure Ill hold off. I already tried pd.read_csv() and pd.read_json() but since this is a json structure in a .txt file its not working properly for either so far.
Here has been my best guess so far to get the data back into python:
data = []
with open('tweet_json.txt') as f:
for line in f:
data.append(json.loads(line))
But I got the following error message when I tried that: JSONDecodeError: Extra data: line 1 column 4626 (char 4625)
Here are two tweets that you can copy and save to a .txt file to replicate:
{'contributors': None,
'coordinates': None,
'created_at': 'Tue Aug 01 16:23:56 +0000 2017',
'display_text_range': [0, 85],
'entities': {'hashtags': [],
'media': [{'display_url': 'pic.twitter.com/MgUWQ76dJU',
'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1',
'id': 892420639486877696,
'id_str': '892420639486877696',
'indices': [86, 109],
'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
'sizes': {'large': {'h': 528, 'resize': 'fit', 'w': 540},
'medium': {'h': 528, 'resize': 'fit', 'w': 540},
'small': {'h': 528, 'resize': 'fit', 'w': 540},
'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
'type': 'photo',
'url': na}],
'symbols': [],
'urls': [],
'user_mentions': []},
'extended_entities': {'media': [{'display_url': 'pic.twitter.com/MgUWQ76dJU',
'expanded_url': 'https://twitter.com/dog_rates/status/892420643555336193/photo/1',
'id': 892420639486877696,
'id_str': '892420639486877696',
'indices': [86, 109],
'media_url': 'http://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
'media_url_https': 'https://pbs.twimg.com/media/DGKD1-bXoAAIAUK.jpg',
'sizes': {'large': {'h': 528, 'resize': 'fit', 'w': 540},
'medium': {'h': 528, 'resize': 'fit', 'w': 540},
'small': {'h': 528, 'resize': 'fit', 'w': 540},
'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
'type': 'photo',
'url': na}]},
'favorite_count': 39311,
'favorited': False,
'full_text': "This is Phineas. He's a mystical boy. Only ever appears in the hole of a donut. 13/10 na ",
'geo': None,
'id': 892420643555336193,
'id_str': '892420643555336193',
'in_reply_to_screen_name': None,
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': None,
'in_reply_to_user_id_str': None,
'is_quote_status': False,
'lang': 'en',
'place': None,
'possibly_sensitive': False,
'possibly_sensitive_appealable': False,
'retweet_count': 8778,
'retweeted': False,
'source': 'Twitter for iPhone',
'truncated': False,
'user': {'contributors_enabled': False,
'created_at': 'Sun Nov 15 21:41:29 +0000 2015',
'default_profile': False,
'default_profile_image': False,
'description': 'Only Legit Source for Professional Dog Ratings STORE: #ShopWeRateDogs | IG, FB & SC: WeRateDogs | MOBILE APP: #GoodDogsGame Business: dogratingtwitter#gmail.com',
'entities': {'description': {'urls': []},
'url': {'urls': [{'display_url': 'weratedogs.com',
'expanded_url': 'http://weratedogs.com',
'indices': [0, 23],
'url': na }]}},
'favourites_count': 126135,
'follow_request_sent': False,
'followers_count': 4730764,
'following': False,
'friends_count': 109,
'geo_enabled': True,
'has_extended_profile': True,
'id': 4196983835,
'id_str': '4196983835',
'is_translation_enabled': False,
'is_translator': False,
'lang': 'en',
'listed_count': 3700,
'location': 'DM YOUR DOGS. WE WILL RATE',
'name': 'WeRateDogs™',
'notifications': False,
'profile_background_color': '000000',
'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
'profile_background_tile': False,
'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1510812288',
'profile_image_url': 'http://pbs.twimg.com/profile_images/936608706107772929/GwbLQRxf_normal.jpg',
'profile_image_url_https': 'https://pbs.twimg.com/profile_images/936608706107772929/GwbLQRxf_normal.jpg',
'profile_link_color': 'F5ABB5',
'profile_sidebar_border_color': '000000',
'profile_sidebar_fill_color': '000000',
'profile_text_color': '000000',
'profile_use_background_image': False,
'protected': False,
'screen_name': 'dog_rates',
'statuses_count': 6301,
'time_zone': None,
'translator_type': 'none',
'url': n/a,
'utc_offset': None,
'verified': True}}
{'contributors': None,
'coordinates': None,
'created_at': 'Tue Aug 01 00:17:27 +0000 2017',
'display_text_range': [0, 138],
'entities': {'hashtags': [],
'media': [{'display_url': 'pic.twitter.com/0Xxu71qeIV',
'expanded_url': 'https://twitter.com/dog_rates/status/892177421306343426/photo/1',
'id': 892177413194625024,
'id_str': '892177413194625024',
'indices': [139, 162],
'media_url': 'http://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg',
'media_url_https': 'https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg',
'sizes': {'large': {'h': 1600, 'resize': 'fit', 'w': 1407},
'medium': {'h': 1200, 'resize': 'fit', 'w': 1055},
'small': {'h': 680, 'resize': 'fit', 'w': 598},
'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
'type': 'photo',
'url': na}],
'symbols': [],
'urls': [],
'user_mentions': []},
'extended_entities': {'media': [{'display_url': 'pic.twitter.com/0Xxu71qeIV',
'expanded_url': 'https://twitter.com/dog_rates/status/892177421306343426/photo/1',
'id': 892177413194625024,
'id_str': '892177413194625024',
'indices': [139, 162],
'media_url': 'http://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg',
'media_url_https': 'https://pbs.twimg.com/media/DGGmoV4XsAAUL6n.jpg',
'sizes': {'large': {'h': 1600, 'resize': 'fit', 'w': 1407},
'medium': {'h': 1200, 'resize': 'fit', 'w': 1055},
'small': {'h': 680, 'resize': 'fit', 'w': 598},
'thumb': {'h': 150, 'resize': 'crop', 'w': 150}},
'type': 'photo',
'url': na}]},
'favorite_count': 33662,
'favorited': False,
'full_text': "This is Tilly. She's just checking pup on you. Hopes you're doing ok. If not, she's available for pats, snugs, boops, the whole bit. 13/10 na,
'geo': None,
'id': 892177421306343426,
'id_str': '892177421306343426',
'in_reply_to_screen_name': None,
'in_reply_to_status_id': None,
'in_reply_to_status_id_str': None,
'in_reply_to_user_id': None,
'in_reply_to_user_id_str': None,
'is_quote_status': False,
'lang': 'en',
'place': None,
'possibly_sensitive': False,
'possibly_sensitive_appealable': False,
'retweet_count': 6431,
'retweeted': False,
'source': 'Twitter for iPhone',
'truncated': False,
'user': {'contributors_enabled': False,
'created_at': 'Sun Nov 15 21:41:29 +0000 2015',
'default_profile': False,
'default_profile_image': False,
'description': 'Only Legit Source for Professional Dog Ratings STORE: #ShopWeRateDogs | IG, FB & SC: WeRateDogs | MOBILE APP: #GoodDogsGame Business: dogratingtwitter#gmail.com',
'entities': {'description': {'urls': []},
'url': {'urls': [{'display_url': 'weratedogs.com',
'expanded_url': 'http://weratedogs.com',
'indices': [0, 23],
'url': na}]}},
'favourites_count': 126135,
'follow_request_sent': False,
'followers_count': 4730865,
'following': False,
'friends_count': 109,
'geo_enabled': True,
'has_extended_profile': True,
'id': 4196983835,
'id_str': '4196983835',
'is_translation_enabled': False,
'is_translator': False,
'lang': 'en',
'listed_count': 3728,
'location': 'DM YOUR DOGS. WE WILL RATE',
'name': 'WeRateDogs™',
'notifications': False,
'profile_background_color': '000000',
'profile_background_image_url': 'http://abs.twimg.com/images/themes/theme1/bg.png',
'profile_background_image_url_https': 'https://abs.twimg.com/images/themes/theme1/bg.png',
'profile_background_tile': False,
'profile_banner_url': 'https://pbs.twimg.com/profile_banners/4196983835/1510812288',
'profile_image_url': 'http://pbs.twimg.com/profile_images/936608706107772929/GwbLQRxf_normal.jpg',
'profile_image_url_https': 'https://pbs.twimg.com/profile_images/936608706107772929/GwbLQRxf_normal.jpg',
'profile_link_color': 'F5ABB5',
'profile_sidebar_border_color': '000000',
'profile_sidebar_fill_color': '000000',
'profile_text_color': '000000',
'profile_use_background_image': False,
'protected': False,
'screen_name': 'dog_rates',
'statuses_count': 6301,
'time_zone': None,
'translator_type': 'none',
'url': na,
'utc_offset': None,
'verified': True}}
Update
The following code produces this error: JSONDecodeError: Expecting ',' delimiter: line 1 column 4627 (char 4626)
with open('tweet_json.txt', 'r') as f:
datastore = json.load(f)
This post is the closest I've found so far to help me solve my issue:
Python json.loads shows ValueError: Expecting , delimiter: line 1
Thanks everyone for the feedback. I had to adjust the code regarding how I was extracting the data from the API and then it was pretty straight-forward to get the data into a list of dictionaries after that.
with open('tweet_json.txt', 'a+', encoding='utf-8') as file:
for tweet_id in twitter_archive_df['tweet_id']:
try:
tweet = api.get_status(id = tweet_id, tweet_mode='extended')
file.write(json.dumps(tweet))
file.write('\n')
except:
pass
file.close()
then I ran the following code to import the json objects from the .txt file into a list of dictionaries:
with open('tweet_json.txt') as file:
status = []
for line in file:
status.append(json.loads(line))

Categories

Resources