I have a dict as follows:
{ u'has_more': False,
u'is_limited': True,
u'latest': u'1501149118.071555',
u'messages': [ { u'text': u'--Sharp 1.9 DM\n--Modifying and testing DM script for bypassing existing sonumber validation and add line items',
u'ts': u'1501149054.047400',
u'type': u'message',
u'user': u'U0HN06ZB9'},
{ u'text': u'-- support to engineering on Licensing infra upgrade to 3.6\n - created a new key for qa on current 3.5 ubuntu 12 instance\n - added that key to the instance , created the ami and shared it with QA\n - short discussion with Navin on same',
u'ts': u'1501148934.002719',
u'type': u'message',
u'user': u'U02RRQJG1'},
{ u'inviter': u'U03FE3Z7D',
u'subtype': u'channel_join',
u'text': u'<#U0HN06ZB9|shikhar.rastogi> has joined the channel',
u'ts': u'1501148921.998107',
u'type': u'message',
u'user': u'U0HN06ZB9'},
{ u'inviter': u'U03FE3Z7D',
u'subtype': u'channel_join',
u'text': u'<#U02RRQJG1|himani> has joined the channel',
u'ts': u'1501148328.777625',
u'type': u'message',
u'user': u'U02RRQJG1'},
{ u'text': u'something like ^^^^',
u'ts': u'1501148318.773838',
u'type': u'message',
u'user': u'U03FE3Z7D'},
{ u'text': u'-- This is test \n-- Not\n-- test1\n-- Test b',
u'ts': u'1501148309.770614',
u'type': u'message',
u'user': u'U03FE3Z7D'},
{ u'text': u'<!channel> can all of you start putting some random crap in same format as shift handoff',
u'ts': u'1501148287.762336',
u'type': u'message',
u'user': u'U03FE3Z7D'},
{ u'text': u'<!channel> can all of you start putting some random crap in same format as shift handoff',
u'ts': u'1501148287.762161',
u'type': u'message',
u'user': u'U03FE3Z7D'},
{ u'text': u'sjvnsv',
u'ts': u'1501138569.469475',
u'type': u'message',
u'user': u'U03FE3Z7D'},
{ u'text': u'-- Test1 \n-- Leave this ASAP',
u'ts': u'1501136157.933720',
u'type': u'message',
u'user': u'U03FE3Z7D'},
{ u'bot_id': u'B19LZG1A5',
u'subtype': u'bot_message',
u'text': u'This is crazy',
u'ts': u'1501075281.418010',
u'type': u'message',
u'username': u'TEST_BOT'}],
u'ok': True,
u'oldest': u'1500820472.964970'}
Now i am trying to extract 2 things 1st is the user and its corresponding text, but somehow i am not able to get this using following :
json_objects = json.loads(r.text)
for i in json_objects:
print json_objects['messages'][i]['user']
print json_objects['messages'][i]['text']
The above throws a error :
Traceback (most recent call last):
File "clean_test.py", line 45, in <module>
get_channel_messages()
File "clean_test.py", line 38, in get_channel_messages
print json_objects['messages'][i]['user']
TypeError: list indices must be integers, not unicode
The above should actually get the user and calls a user_detail() meathod to get the name and come back, once this is done, i want the content to be dumped into a file in the following manner
username1:
-- text
username2:
-- text2
You want to iterate over the indices of the list, not iterate over the keys of the outer dict
json_objects = json.loads(r.text)
for i in range(len(json_objects['messages'])):
print json_objects['messages'][i]['user']
print json_objects['messages'][i]['text']
Or another way would be (the Pythonic Way):
for i in json_objects['messages']:
print i['user']
print i['text']
You are iterating with dictionary and trying to access list. Try with this
for i in json_objects['messages']:
print i['user']
print i['text']
Related
Having a stupid moment here and could use some help. I'm just trying to create a simple script in python 2.7 for importing json via an API. The accounts_json is something like:
{u'name': u'admin', u'isOrg': False, u'isImported': False, u'isAdmin': True, u'fullName': u'', u'id': u'8efb2bfd-ae3f-4665-9d2e-13287a4ffe0e', u'isActive': True}
For whatever reason, I'm getting a SyntaxError on one of my variables in a block of code like this:
password_dict = {
"password":"blahblah"
}
if not accountsJson["isOrg"]:
accountsJson.update(password_dict)
to_import = json.dumps(accountsJson)
This results in:
to_import = accountsJson
^
SyntaxError: invalid syntax
If I separate things out in the python interpreter everything takes:
>>> if not accountsJson["isOrg"]:
... accountsJson.update(password_dict)
...
>>> accountsJson
{u'name': u'admin', u'isOrg': False, u'isImported': False, u'isAdmin': True, u'fullName': u'', 'password': u'blahblah', u'id': u'8efb2bfd-ae3f-4665-9d2e-13287a4ffe0e', u'isActive': True}
>>> to_import = accountsJson
>>> to_import
{u'name': u'admin', u'isOrg': False, u'isImported': False, u'isAdmin': True, u'fullName': u'', 'password': u'blahblah', u'id': u'8efb2bfd-ae3f-4665-9d2e-13287a4ffe0e', u'isActive': True}
>>> to_import["password"]
u'blahblah'
I've tried both with/without json.loads doesn't seem to impact things.
What am I doing wrong here?
This question already has answers here:
JSON ValueError: Expecting property name: line 1 column 2 (char 1)
(6 answers)
Closed 5 years ago.
I have the follow string, which I'm trying to load using python 2.7 json.loads.
{
u'Status': {
u'display_name': u'Status',
u'is_updatable': True,
u'type': u'TEXT',
u'val': u'Paying',
u'source': u'API'
}, u'Create Date': {
u'display_name': u'Create Date',
u'is_updatable': True,
u'type': u'DATE',
u'val': u'2017-09-20',
u'source': u'API'
}, u'Total # of Projects': {
u'display_name': u'Total # of Projects',
u'is_updatable': True,
u'type': u'TEXT',
u'val': u'53',
u'source': u'Pixel'
}
}
I'm getting the error:
ValueError: Expecting property name: line 1 column 2 (char 1)
Any ideas?
Following file that you have pasted is not in json format.
you can always check the validity of your JSON file using.
https://jsoneditoronline.org/
OR
import json
a= {
u'Status': {
u'display_name': u'Status',
u'is_updatable': True,
u'type': u'TEXT',
u'val': u'Paying',
u'source': u'API'
}, u'Create Date': {
u'display_name': u'Create Date',
u'is_updatable': True,
u'type': u'DATE',
u'val': u'2017-09-20',
u'source': u'API'
}, u'Total # of Projects': {
u'display_name': u'Total # of Projects',
u'is_updatable': True,
u'type': u'TEXT',
u'val': u'53',
u'source': u'Pixel'
}
}
b=json.dumps(a) #String to json
print (b)
c=json.loads(b)
print (c)
Note :
json loads -> returns an object from a string representing a json object.
json dumps -> returns a string representing a json object from an object.
I'm having a really hard time to get a track id in Spotify search endpoint.
It is deeply nested.
So, if I do this:
results = sp.search(q='artist:' + 'Nirvava + ' track:' + 'Milk it', type='track')
pprint.pprint(results)
I am able to get:
{u'tracks': {u'href': u'https://api.spotify.com/v1/search?query=artist%3ANirvana+track%3AMilk+it&type=track&offset=0&limit=10',
u'items': [{u'album': {u'album_type': u'album',
u'artists': [{u'external_urls': {u'spotify': u'https://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh'},
u'href': u'https://api.spotify.com/v1/artists/6olE6TJLqED3rqDCT0FyPh',
u'id': u'6olE6TJLqED3rqDCT0FyPh',
u'name': u'Nirvana',
u'type': u'artist',
u'uri': u'spotify:artist:6olE6TJLqED3rqDCT0FyPh'}],
u'available_markets': [u'CA',
u'MX',
u'US'],
u'external_urls': {u'spotify': u'https://open.spotify.com/album/7wOOA7l306K8HfBKfPoafr'},
u'href': u'https://api.spotify.com/v1/albums/7wOOA7l306K8HfBKfPoafr',
u'id': u'7wOOA7l306K8HfBKfPoafr',
u'images': [{u'height': 640,
u'url': u'https://i.scdn.co/image/3dd2699f0fcf661c35d45745313b64e50f63f91f',
u'width': 640},
{u'height': 300,
u'url': u'https://i.scdn.co/image/a6c604a82d274e4728a8660603ef31ea35e9e1bd',
u'width': 300},
{u'height': 64,
u'url': u'https://i.scdn.co/image/f52728b0ecf5b6bfc998dfd0f6e5b6b5cdfe73f1',
u'width': 64}],
u'name': u'In Utero - 20th Anniversary Remaster',
u'type': u'album',
u'uri': u'spotify:album:7wOOA7l306K8HfBKfPoafr'},
u'artists': [{u'external_urls': {u'spotify': u'https://open.spotify.com/artist/6olE6TJLqED3rqDCT0FyPh'},
u'href': u'https://api.spotify.com/v1/artists/6olE6TJLqED3rqDCT0FyPh',
u'id': u'6olE6TJLqED3rqDCT0FyPh',
u'name': u'Nirvana',
u'type': u'artist',
u'uri': u'spotify:artist:6olE6TJLqED3rqDCT0FyPh'}],
u'available_markets': [u'CA', u'MX', u'US'],
u'disc_number': 1,
u'duration_ms': 234746,
u'explicit': False,
u'external_ids': {u'isrc': u'USGF19960708'},
u'external_urls': {u'spotify': u'https://open.spotify.com/track/4rtZtLpriBscg7zta3TZxp'},
u'href': u'https://api.spotify.com/v1/tracks/4rtZtLpriBscg7zta3TZxp',
u'id': u'4rtZtLpriBscg7zta3TZxp',
u'name': u'Milk It',
u'popularity': 43,
u'preview_url': None,
u'track_number': 8,
u'type': u'track',
-----> u'uri':u'spotify:track:4rtZtLpriBscg7zta3TZxp'},
QUESTION:
now, how do I fetch the last 'uri' (u'uri': u'spotify:track:4rtZtLpriBscg7zta3TZxp'}, under the name 'Milk It'?
>>> print results['tracks']['items'][0]['uri']
spotify:track:4rtZtLpriBscg7zta3TZxp
just wondering how you would be able to get usable data from between tags from a html request. Assume this is in some html, how would I extract the dict to use.
<script type="text/javascript">window._sharedData = {"static_root":"\/\/d36xtkk24g8jdx.cloudfront.net\/bluebar\/a1968ef","platform":{"is_touch":false,"app_platform":"web"},"hostname":"instagram.com","entry_data":{"DesktopPPage":[{"canSeePrerelease":false,"viewer":null,"media":{"caption_is_edited":false,"code":"vF25LwCnL8","date":1415348305.0,"video_url":"http:\/\/videos-h-12.ak.instagram.com\/hphotos-ak-xap1\/10753251_876245142395032_328159772_n.mp4","caption":"2014 season teaser! Just a taste of some of the \ud83d\udd28\ud83d\udd28\ud83d\udd28 that got fumbled on \ud83d\udcf9 this season. Edit dropping fall 2017 #m.wilkie #sturhyssmith #snowboarding #springshred #bdpproteam #turoaparks #turoa #mtruapehu #seasonedit #wouldyouratherfightagoatwithahumanheadorahumanwithagoathead?","secure_video_url":"https:\/\/igcdn-videos-h-12-a.akamaihd.net\/hphotos-ak-xap1\/10753251_876245142395032_328159772_n.mp4","usertags":{"nodes":[]},"comments":{"nodes":[{"text":"Where do I buy tickets to the London premiere? #fanboy","viewer_can_delete":false,"id":"848487057151652334","user":{"username":"jamesbutchernz","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_1052126311_75sq_1391324963.jpg"}},{"text":"It's invites only #jamesbutchernz #m.wilkie is choosing too so chances are slim unless your smoking hot with low self esteem.","viewer_can_delete":false,"id":"849353938720944684","user":{"username":"bobeykrebner","profile_pic_url":"https:\/\/igcdn-photos-g-a.akamaihd.net\/hphotos-ak-xpf1\/10584664_742398385822158_510451676_a.jpg"}},{"text":"It's lucky we all know I'm both of those. #easy","viewer_can_delete":false,"id":"849403857951420829","user":{"username":"jamesbutchernz","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_1052126311_75sq_1391324963.jpg"}},{"text":"Last I heard you were smoking hot and had the self esteem of Kanye West #jamesbutchernz what changed?","viewer_can_delete":false,"id":"849671858500038887","user":{"username":"bobeykrebner","profile_pic_url":"https:\/\/igcdn-photos-g-a.akamaihd.net\/hphotos-ak-xpf1\/10584664_742398385822158_510451676_a.jpg"}},{"text":"You know what they say #bobeykrebner. Treat yourself like Kayne treats Kayne.","viewer_can_delete":false,"id":"849966794608898266","user":{"username":"jamesbutchernz","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_1052126311_75sq_1391324963.jpg"}}]},"shared_by_author":true,"likes":{"count":41,"viewer_has_liked":false,"nodes":[{"user":{"username":"claytonbenson","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_52633025_75sq_1359351765.jpg"}},{"user":{"username":"snowrev","profile_pic_url":"https:\/\/igcdn-photos-d-a.akamaihd.net\/hphotos-ak-xaf1\/10735284_1474262932842435_1018554144_a.jpg"}},{"user":{"username":"shayning_","profile_pic_url":"https:\/\/igcdn-photos-f-a.akamaihd.net\/hphotos-ak-xaf1\/10817775_319647074907693_836092401_a.jpg"}},{"user":{"username":"paused_future","profile_pic_url":"https:\/\/igcdn-photos-f-a.akamaihd.net\/hphotos-ak-xpa1\/10809941_1580815445475533_469492417_a.jpg"}},{"user":{"username":"kris_tayl0r","profile_pic_url":"https:\/\/igcdn-photos-e-a.akamaihd.net\/hphotos-ak-xaf1\/10802916_384369668395220_1244229274_a.jpg"}},{"user":{"username":"crazyshuz","profile_pic_url":"https:\/\/igcdn-photos-h-a.akamaihd.net\/hphotos-ak-xfp1\/10787707_905860216092359_425635869_a.jpg"}},{"user":{"username":"titstatertots","profile_pic_url":"https:\/\/igcdn-photos-b-a.akamaihd.net\/hphotos-ak-xpf1\/10554089_855164584513369_706239607_a.jpg"}}]},"owner":{"username":"bobeykrebner","requested_by_viewer":false,"followed_by_viewer":false,"profile_pic_url":"https:\/\/igcdn-photos-g-a.akamaihd.net\/hphotos-ak-xpf1\/10584664_742398385822158_510451676_a.jpg","has_blocked_viewer":false,"id":"1459690667","is_private":false},"is_video":true,"id":"848325528968131324","display_src":"http:\/\/photos-e.ak.instagram.com\/hphotos-ak-xfp1\/10748245_307748359428196_942078105_n.jpg"},"__get_params":{},"staticRoot":"\/\/d36xtkk24g8jdx.cloudfront.net\/bluebar\/a1968ef","__query_string":"?","prerelease":false,"__path":"\/p\/vF25LwCnL8\/","shortcode":"vF25LwCnL8"}]},"country_code":"AU","config":{"viewer":null,"csrf_token":"0bfa16595bdacb5bcfcb94441d0fb7ab"}};</script>
I basically want to know how to get the usable data from within the script tags but after the "window._sharedData =" line.
You'd use a combination of HTML parsing and text manipulation.
BeautifulSoup would help with the parsing, after which you can extract the <script> tag text content and split out the JavaScript object definition:
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html_page_source)
script_tag = soup.find('script', text=re.compile('window\._sharedData'))
shared_data = script_tag.string.partition('=')[-1].strip(' ;')
The last line takes the string contents of the tag, splits off everything up to the first = then removes all leading and trailing whitespace and semicolons.
Demo, including loading the resulting string as JSON:
>>> from bs4 import BeautifulSoup
>>> import re
>>> soup = BeautifulSoup('''\
... <script type="text/javascript">window._sharedData = {"static_root":"\/\/d36xtkk24g8jdx.cloudfront.net\/bluebar\/a1968ef","platform":{"is_touch":false,"app_platform":"web"},"hostname":"instagram.com","entry_data":{"DesktopPPage":[{"canSeePrerelease":false,"viewer":null,"media":{"caption_is_edited":false,"code":"vF25LwCnL8","date":1415348305.0,"video_url":"http:\/\/videos-h-12.ak.instagram.com\/hphotos-ak-xap1\/10753251_876245142395032_328159772_n.mp4","caption":"2014 season teaser! Just a taste of some of the \ud83d\udd28\ud83d\udd28\ud83d\udd28 that got fumbled on \ud83d\udcf9 this season. Edit dropping fall 2017 #m.wilkie #sturhyssmith #snowboarding #springshred #bdpproteam #turoaparks #turoa #mtruapehu #seasonedit #wouldyouratherfightagoatwithahumanheadorahumanwithagoathead?","secure_video_url":"https:\/\/igcdn-videos-h-12-a.akamaihd.net\/hphotos-ak-xap1\/10753251_876245142395032_328159772_n.mp4","usertags":{"nodes":[]},"comments":{"nodes":[{"text":"Where do I buy tickets to the London premiere? #fanboy","viewer_can_delete":false,"id":"848487057151652334","user":{"username":"jamesbutchernz","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_1052126311_75sq_1391324963.jpg"}},{"text":"It's invites only #jamesbutchernz #m.wilkie is choosing too so chances are slim unless your smoking hot with low self esteem.","viewer_can_delete":false,"id":"849353938720944684","user":{"username":"bobeykrebner","profile_pic_url":"https:\/\/igcdn-photos-g-a.akamaihd.net\/hphotos-ak-xpf1\/10584664_742398385822158_510451676_a.jpg"}},{"text":"It's lucky we all know I'm both of those. #easy","viewer_can_delete":false,"id":"849403857951420829","user":{"username":"jamesbutchernz","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_1052126311_75sq_1391324963.jpg"}},{"text":"Last I heard you were smoking hot and had the self esteem of Kanye West #jamesbutchernz what changed?","viewer_can_delete":false,"id":"849671858500038887","user":{"username":"bobeykrebner","profile_pic_url":"https:\/\/igcdn-photos-g-a.akamaihd.net\/hphotos-ak-xpf1\/10584664_742398385822158_510451676_a.jpg"}},{"text":"You know what they say #bobeykrebner. Treat yourself like Kayne treats Kayne.","viewer_can_delete":false,"id":"849966794608898266","user":{"username":"jamesbutchernz","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_1052126311_75sq_1391324963.jpg"}}]},"shared_by_author":true,"likes":{"count":41,"viewer_has_liked":false,"nodes":[{"user":{"username":"claytonbenson","profile_pic_url":"https:\/\/instagramimages-a.akamaihd.net\/profiles\/profile_52633025_75sq_1359351765.jpg"}},{"user":{"username":"snowrev","profile_pic_url":"https:\/\/igcdn-photos-d-a.akamaihd.net\/hphotos-ak-xaf1\/10735284_1474262932842435_1018554144_a.jpg"}},{"user":{"username":"shayning_","profile_pic_url":"https:\/\/igcdn-photos-f-a.akamaihd.net\/hphotos-ak-xaf1\/10817775_319647074907693_836092401_a.jpg"}},{"user":{"username":"paused_future","profile_pic_url":"https:\/\/igcdn-photos-f-a.akamaihd.net\/hphotos-ak-xpa1\/10809941_1580815445475533_469492417_a.jpg"}},{"user":{"username":"kris_tayl0r","profile_pic_url":"https:\/\/igcdn-photos-e-a.akamaihd.net\/hphotos-ak-xaf1\/10802916_384369668395220_1244229274_a.jpg"}},{"user":{"username":"crazyshuz","profile_pic_url":"https:\/\/igcdn-photos-h-a.akamaihd.net\/hphotos-ak-xfp1\/10787707_905860216092359_425635869_a.jpg"}},{"user":{"username":"titstatertots","profile_pic_url":"https:\/\/igcdn-photos-b-a.akamaihd.net\/hphotos-ak-xpf1\/10554089_855164584513369_706239607_a.jpg"}}]},"owner":{"username":"bobeykrebner","requested_by_viewer":false,"followed_by_viewer":false,"profile_pic_url":"https:\/\/igcdn-photos-g-a.akamaihd.net\/hphotos-ak-xpf1\/10584664_742398385822158_510451676_a.jpg","has_blocked_viewer":false,"id":"1459690667","is_private":false},"is_video":true,"id":"848325528968131324","display_src":"http:\/\/photos-e.ak.instagram.com\/hphotos-ak-xfp1\/10748245_307748359428196_942078105_n.jpg"},"__get_params":{},"staticRoot":"\/\/d36xtkk24g8jdx.cloudfront.net\/bluebar\/a1968ef","__query_string":"?","prerelease":false,"__path":"\/p\/vF25LwCnL8\/","shortcode":"vF25LwCnL8"}]},"country_code":"AU","config":{"viewer":null,"csrf_token":"0bfa16595bdacb5bcfcb94441d0fb7ab"}};</script>
... ''')
>>> script_tag = soup.find('script', text=re.compile('window\._sharedData'))
>>> shared_data = script_tag.string.partition('=')[-1].strip(' ;')
>>> import json
>>> result = json.loads(shared_data)
>>> from pprint import pprint
>>> pprint(result)
{u'config': {u'csrf_token': u'0bfa16595bdacb5bcfcb94441d0fb7ab',
u'viewer': None},
u'country_code': u'AU',
u'entry_data': {u'DesktopPPage': [{u'__get_params': {},
u'__path': u'/p/vF25LwCnL8/',
u'__query_string': u'?',
u'canSeePrerelease': False,
u'media': {u'caption': u'2014 season teaser! Just a taste of some of the \U0001f528\U0001f528\U0001f528 that got fumbled on \U0001f4f9 this season. Edit dropping fall 2017 #m.wilkie #sturhyssmith #snowboarding #springshred #bdpproteam #turoaparks #turoa #mtruapehu #seasonedit #wouldyouratherfightagoatwithahumanheadorahumanwithagoathead?',
u'caption_is_edited': False,
u'code': u'vF25LwCnL8',
u'comments': {u'nodes': [{u'id': u'848487057151652334',
u'text': u'Where do I buy tickets to the London premiere? #fanboy',
u'user': {u'profile_pic_url': u'https://instagramimages-a.akamaihd.net/profiles/profile_1052126311_75sq_1391324963.jpg',
u'username': u'jamesbutchernz'},
u'viewer_can_delete': False},
{u'id': u'849353938720944684',
u'text': u"It's invites only #jamesbutchernz #m.wilkie is choosing too so chances are slim unless your smoking hot with low self esteem.",
u'user': {u'profile_pic_url': u'https://igcdn-photos-g-a.akamaihd.net/hphotos-ak-xpf1/10584664_742398385822158_510451676_a.jpg',
u'username': u'bobeykrebner'},
u'viewer_can_delete': False},
{u'id': u'849403857951420829',
u'text': u"It's lucky we all know I'm both of those. #easy",
u'user': {u'profile_pic_url': u'https://instagramimages-a.akamaihd.net/profiles/profile_1052126311_75sq_1391324963.jpg',
u'username': u'jamesbutchernz'},
u'viewer_can_delete': False},
{u'id': u'849671858500038887',
u'text': u'Last I heard you were smoking hot and had the self esteem of Kanye West #jamesbutchernz what changed?',
u'user': {u'profile_pic_url': u'https://igcdn-photos-g-a.akamaihd.net/hphotos-ak-xpf1/10584664_742398385822158_510451676_a.jpg',
u'username': u'bobeykrebner'},
u'viewer_can_delete': False},
{u'id': u'849966794608898266',
u'text': u'You know what they say #bobeykrebner. Treat yourself like Kayne treats Kayne.',
u'user': {u'profile_pic_url': u'https://instagramimages-a.akamaihd.net/profiles/profile_1052126311_75sq_1391324963.jpg',
u'username': u'jamesbutchernz'},
u'viewer_can_delete': False}]},
u'date': 1415348305.0,
u'display_src': u'http://photos-e.ak.instagram.com/hphotos-ak-xfp1/10748245_307748359428196_942078105_n.jpg',
u'id': u'848325528968131324',
u'is_video': True,
u'likes': {u'count': 41,
u'nodes': [{u'user': {u'profile_pic_url': u'https://instagramimages-a.akamaihd.net/profiles/profile_52633025_75sq_1359351765.jpg',
u'username': u'claytonbenson'}},
{u'user': {u'profile_pic_url': u'https://igcdn-photos-d-a.akamaihd.net/hphotos-ak-xaf1/10735284_1474262932842435_1018554144_a.jpg',
u'username': u'snowrev'}},
{u'user': {u'profile_pic_url': u'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xaf1/10817775_319647074907693_836092401_a.jpg',
u'username': u'shayning_'}},
{u'user': {u'profile_pic_url': u'https://igcdn-photos-f-a.akamaihd.net/hphotos-ak-xpa1/10809941_1580815445475533_469492417_a.jpg',
u'username': u'paused_future'}},
{u'user': {u'profile_pic_url': u'https://igcdn-photos-e-a.akamaihd.net/hphotos-ak-xaf1/10802916_384369668395220_1244229274_a.jpg',
u'username': u'kris_tayl0r'}},
{u'user': {u'profile_pic_url': u'https://igcdn-photos-h-a.akamaihd.net/hphotos-ak-xfp1/10787707_905860216092359_425635869_a.jpg',
u'username': u'crazyshuz'}},
{u'user': {u'profile_pic_url': u'https://igcdn-photos-b-a.akamaihd.net/hphotos-ak-xpf1/10554089_855164584513369_706239607_a.jpg',
u'username': u'titstatertots'}}],
u'viewer_has_liked': False},
u'owner': {u'followed_by_viewer': False,
u'has_blocked_viewer': False,
u'id': u'1459690667',
u'is_private': False,
u'profile_pic_url': u'https://igcdn-photos-g-a.akamaihd.net/hphotos-ak-xpf1/10584664_742398385822158_510451676_a.jpg',
u'requested_by_viewer': False,
u'username': u'bobeykrebner'},
u'secure_video_url': u'https://igcdn-videos-h-12-a.akamaihd.net/hphotos-ak-xap1/10753251_876245142395032_328159772_n.mp4',
u'shared_by_author': True,
u'usertags': {u'nodes': []},
u'video_url': u'http://videos-h-12.ak.instagram.com/hphotos-ak-xap1/10753251_876245142395032_328159772_n.mp4'},
u'prerelease': False,
u'shortcode': u'vF25LwCnL8',
u'staticRoot': u'//d36xtkk24g8jdx.cloudfront.net/bluebar/a1968ef',
u'viewer': None}]},
u'hostname': u'instagram.com',
u'platform': {u'app_platform': u'web', u'is_touch': False},
u'static_root': u'//d36xtkk24g8jdx.cloudfront.net/bluebar/a1968ef'}
I have consumed a bunch of tweets in a mongodb database. I would like to query these tweets using pymongo. For example, I would like to query for screen_name. However, when I try to do this, python does not return a tweet but a message about pymongo.cursor.Cursor. Here is my code:
import sys
import pymongo
from pymongo import Connection
connection = Connection()
db = connection.test
tweets = db.tweets
list(tweets.find())[:1]
I get a JSON, which looks like this:
{u'_id': ObjectId('51c8878fadb68a0b96c6ebf1'),
u'contributors': None,
u'coordinates': {u'coordinates': [-75.24692983, 43.06183036],
u'type': u'Point'},
u'created_at': u'Mon Jun 24 17:53:19 +0000 2013',
u'entities': {u'hashtags': [],
u'symbols': [],
u'urls': [],
u'user_mentions': []},
u'favorite_count': 0,
u'favorited': False,
u'filter_level': u'medium',
u'geo': {u'coordinates': [43.06183036, -75.24692983], u'type': u'Point'},
u'id': 349223725943623680L,
u'id_str': u'349223725943623680',
u'in_reply_to_screen_name': None,
u'in_reply_to_status_id': None,
u'in_reply_to_status_id_str': None,
u'in_reply_to_user_id': None,
u'in_reply_to_user_id_str': None,
u'lang': u'en',
u'place': {u'attributes': {},
u'bounding_box': {u'coordinates': [[[-79.76259, 40.477399],
[-79.76259, 45.015865],
[-71.777491, 45.015865],
[-71.777491, 40.477399]]],
u'type': u'Polygon'},
u'country': u'United States',
u'country_code': u'US',
u'full_name': u'New York, US',
u'id': u'94965b2c45386f87',
u'name': u'New York',
u'place_type': u'admin',
u'url': u'http://api.twitter.com/1/geo/id/94965b2c45386f87.json'},
u'retweet_count': 0,
u'retweeted': False,
u'source': u'Twitter for iPhone',
u'text': u'Currently having a heat stroke',
u'truncated': False,
u'user': {u'contributors_enabled': False,
u'created_at': u'Fri Oct 28 02:04:05 +0000 2011',
u'default_profile': False,
u'default_profile_image': False,
u'description': u'young and so mischievious',
u'favourites_count': 1798,
u'follow_request_sent': None,
u'followers_count': 368,
u'following': None,
u'friends_count': 335,
u'geo_enabled': True,
u'id': 399801173,
u'id_str': u'399801173',
u'is_translator': False,
u'lang': u'en',
u'listed_count': 0,
u'location': u'Upstate New York',
u'name': u'Joe Catanzarita',
u'notifications': None,
u'profile_background_color': u'D6640D',
u'profile_background_image_url': u'http://a0.twimg.com/profile_background_images/702001815/f87508e73bbfab8c8c85ebe10b29fcf6.png',
u'profile_background_image_url_https': u'https://si0.twimg.com/profile_background_images/702001815/f87508e73bbfab8c8c85ebe10b29fcf6.png',
u'profile_background_tile': True,
u'profile_banner_url': u'https://pbs.twimg.com/profile_banners/399801173/1367200323',
u'profile_image_url': u'http://a0.twimg.com/profile_images/378800000012256721/d8b5f801fb331de6ead4aed42dc77a46_normal.jpeg',
u'profile_image_url_https': u'https://si0.twimg.com/profile_images/378800000012256721/d8b5f801fb331de6ead4aed42dc77a46_normal.jpeg' ,
u'profile_link_color': u'140DE0',
u'profile_sidebar_border_color': u'FFFFFF',
u'profile_sidebar_fill_color': u'E0F5A6',
u'profile_text_color': u'120212',
u'profile_use_background_image': True,
u'protected': False,
u'screen_name': u'JoeCatanzarita',
u'statuses_count': 6402,
u'time_zone': u'Quito',
u'url': None,
u'utc_offset': -18000,
u'verified': False}}
However, when I try to query for this screen_name, I get:
tweets.find({"screen_name": "JoeCatanzarita"})
<pymongo.cursor.Cursor at 0x52c02f0>
And when I then try to count the number of tweets which have "screen_name": "name", I get:
tweets.find({"screen_name": "name"}).count()
0
Any idea what I am doing wrong/how I can get pymongo to return the tweets I am looking for?
Thanks!
PyMongo's find() method returns a Cursor. To actually execute the query on the server and retrieve results, iterate the cursor with list or a for loop:
for doc in tweets.find({'screen_name': 'name'}):
print(doc)
# Or:
docs = list(tweets.find({'screen_name': 'name'}))
If tweets.find({"screen_name": "name"}).count() returns 0, it means no documents match your query.
Edit: now that you've posted an example document, I see you want to query like:
list(tweets.find({'user.screen_name': 'name'}))
... since the screen_name field is embedded in the user sub-document.
I think the problem is that "screen_name" is inside a sub-document if you can provide the document structure I may be able to help you.
Ok now I see what's your problem:
If you look carefully into your document you will notice that "screen_name" is inside the subdocument user, so if you want to acess it all you have to do is the following:
tweets.find({"user.screen_name": "JoeCatanzarita"}) #for example.
Whenever you are in a situation where the element you are trying to find is inside a subdocument like in this situation or inside an array always use this syntax.
I had this same problem with a collection.find() call.
I checked the type of the object and it is python dict. so I took the dict and iterated through it even though there was only one item and she's working like a charm.
myResult = db.find({}, {<!-- blah blah blah for the fields you want -->}).sort({"_id":1}).limit(1)
for item in myResult:
print item
I know this was ages ago but I spent some time surfing this and couldn't find an easy explanation.
Hope this helps.