Extract urls from elements in a list - python

I have a list json_response containing Twitter data including image URLs. I am trying to extract the url in the from ['includes']['media'] object. However, the majority of elements in the list does not have ['media'] which I believe causes the loop to fail. Running the code I get the KeyError: 'media' even though I row['image_url'] = None in the loop would account for list elements without a ['media']
I have provided a sample of the json_response. However, the actual URLs have been replaced due to Stackoverflows restricting on posting URLs
print(json.dumps(json_response[10:13], indent=4, sort_keys=True)) # look at json_response object.
[
{
"data": [
{
"author_id": "125700232",
"created_at": "2021-12-31T07:13:04.000Z",
"id": "1476813641265549317",
"text": "You can\u2019t be a democrat or a liberal or progressive & besties with racists who radicalize people like this. \n\nI\u2019ve never publicly named him but since he blocked me years ago for holding him accountable, maybe I will."
},
{
"author_id": "800464894361382912",
"created_at": "2021-12-27T12:17:25.000Z",
"id": "1475440681258737673",
"text": "For $9 an hour, I was told to kill myself over a confusing sale sign, I'd been called worthless and stupid weekly. I've had things thrown at me, been spat on. A customer blocked me from coming on a bus so I couldn't go home. If people were kind to begin with, more would \"show up\""
},
{
"attachments": {
"media_keys": [
"3_1474448924249407490"
]
},
"author_id": "1390363055150845959",
"created_at": "2021-12-24T18:36:32.000Z",
"id": "1474448926891782149",
"text": "Blocked by China boy. Spy banging snowflake #RepSwalwell"
},
{
"author_id": "196428643",
"created_at": "2021-12-21T22:22:15.000Z",
"id": "1473418564430229505",
"text": "I replied to an Eric Swalwell lame tweet with a Fang Fang reference yesterday and he blocked me. Then suddenly my account was hacked and my account linked email was changed from a Manhattan ISP. I don't think it was a coincidence."
},
{
"attachments": {
"media_keys": [
"3_1462187451292819458",
"3_1462187494385065994"
]
},
"author_id": "25871358",
"created_at": "2021-11-20T22:40:05.000Z",
"id": "1462189029919805450",
"text": "Pearl clutch elsewhere about #RepSwalwell unfollowing you, when I told you you were a lying gaslighting jackwagon and you blocked me. Truth hurts."
},
{
"author_id": "1251510910390337536",
"created_at": "2021-10-30T01:40:32.000Z",
"id": "1454261909759406086",
"text": "Eric Swalwell blocked me tonight \ud83d\ude02"
},
{
"author_id": "15790644",
"created_at": "2021-07-23T20:11:58.000Z",
"id": "1418665211221925889",
"text": "Twitter won't allow me to follow anyone.\n\nAlso, tried to retweet Eric Swalwell's tweet and it blocked me. And other tweets...\n\nGuess I'm like a mosquito buzzing around the head of Jack."
},
{
"attachments": {
"media_keys": [
"3_1411309517317586945"
]
},
"author_id": "107575508",
"created_at": "2021-07-03T13:03:05.000Z",
"id": "1411309521251745796",
"text": "This tweet was blocked by Twitter for retweets and quotes. In summary a team member of Eric Swalwell illegally entered Mo Brooks home to serve papers and assaulted Brooks wife. There is security camera footage. Papers being serve claim Brooks caused Jan. 6 \u2018riot\u2019."
},
{
"author_id": "26182604",
"created_at": "2021-06-07T03:58:01.000Z",
"id": "1401750267121524738",
"text": "I can't # him because my words hurt his feeling and he blocked me. LOL! CNN : Democratic Rep. Eric Swalwell's suit seeks to hold Brooks, ex-President Trump and others liable for the January 6 attack."
},
{
"author_id": "258617217",
"created_at": "2021-04-06T20:37:13.000Z",
"id": "1379533675772186630",
"text": "George Webb Blocked me.\nI guess because I pointed out that in one of his books he connected a lady FANG FANG from Wuhan as the same Fang Fang CCP agent that tried to seduce Eric Swalwell, \n\n2 different ladies.\nThat wasn't nice Mr. Webb"
}
],
"includes": {
"media": [
{
"media_key": "3_1474448924249407490",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1462187451292819458",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1462187494385065994",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1411309517317586945",
"type": "photo",
"url": "url here"
}
],
"users": [
{
"created_at": "2010-03-23T15:50:53.000Z",
"description": "founder of Melanated Mingle|licensed psychotherapist|psych prof|Latin\u00e8 Ph.D|chingona",
"id": "125700232",
"name": "Dr. Lisa Xochitl Vallejos, Ph.D., LPC",
"username": "realdocv"
},
{
"created_at": "2016-11-20T22:24:37.000Z",
"description": "31, she/her",
"id": "800464894361382912",
"name": "Fluke \ud83d\udc99",
"username": "flukefancy"
},
{
"created_at": "2021-05-06T17:49:30.000Z",
"description": "",
"id": "1390363055150845959",
"name": "kbark",
"username": "kbark23500486"
},
{
"created_at": "2010-09-29T02:29:17.000Z",
"description": "",
"id": "196428643",
"name": "Bird Dog \u00d3 S\u00failleabh\u00e1in",
"username": "AntiqueSully"
},
{
"created_at": "2009-03-22T20:12:14.000Z",
"description": "I don't know what a Hoosier is, either.",
"id": "25871358",
"name": "Misty",
"username": "mialynneb"
},
{
"created_at": "2020-04-18T14:00:37.000Z",
"description": "#Rachlsbored #KittyKattKeee #Rebeccahansonn #BasedHabits #bxtchbabyy #Sexxcel #lizhomlesvoice #psychoness_xo #VOLTRON4444 #jessiprincey #MartinaMarkota: CEO",
"id": "1251510910390337536",
"name": "\u1587\u15e9\u15ea\u15ea\u01b3\u26a1\ufe0f\u26a1\ufe0f",
"username": "_RadicalReality"
},
{
"created_at": "2008-08-09T17:27:58.000Z",
"description": "Really. There are conservatives in New York! #MAGA",
"id": "15790644",
"name": "SueDinNY",
"username": "SueDinNY"
},
{
"created_at": "2010-01-23T01:24:09.000Z",
"description": "Army Vet, served in M.I. Unit. If you disagree with me, it\u2019s because you haven\u2019t seen what I\u2019ve seen. Proud Supporter of #realDonaldTrump #maga",
"id": "107575508",
"name": "Margaret Briem",
"username": "LivethLifeULove"
},
{
"created_at": "2009-03-24T05:09:49.000Z",
"description": "Dad. IT guy. Linux geek. TTRPG fan. Critter. Drone Pilot. Sometimes I go outside. (he/him)",
"id": "26182604",
"name": "Wayne Edgar",
"username": "zerovertex"
},
{
"created_at": "2011-02-28T03:06:07.000Z",
"description": "Author-Film Maker-Researcher-Artist-Peace Seeker\n",
"id": "258617217",
"name": "F\u04e8\u042fBIDD\u03a3\u041f FI\u1102\u03a3\u01a7 \u01acV",
"username": "TMV_intel"
}
]
},
"meta": {
"newest_id": "1476813641265549317",
"next_token": "b26v89c19zqg8o3fosqt4kos8ff8dfq3on3e08qcqvngd",
"oldest_id": "1379533675772186630",
"result_count": 10
}
},
{
"data": [
{
"attachments": {
"media_keys": [
"3_1311261760222101505"
]
},
"author_id": "395236271",
"created_at": "2020-09-30T11:10:19.000Z",
"id": "1311262093992132610",
"text": "Steve Knight didn't like it when I pointed out that his \"Trump didn't say nazis are fine people\" run against all visual evidence we have on the Unite the Right rally, and so the \"Free Speech Champion\" blocked me.\n\nFucking snowflake."
}
],
"includes": {
"media": [
{
"media_key": "3_1311261760222101505",
"type": "photo",
"url": "url here"
}
],
"users": [
{
"created_at": "2011-10-21T10:49:03.000Z",
"description": "Harmless but a bit insane.",
"id": "395236271",
"name": "Lu\u00eds Dias",
"username": "lmldias"
}
]
},
"meta": {
"newest_id": "1311262093992132610",
"next_token": "b26v89c19zqg8o3fn0mljncu0v5ci7xlbm3agsunyikxp",
"oldest_id": "1311262093992132610",
"result_count": 1
}
},
{
"data": [
{
"attachments": {
"media_keys": [
"3_1471578541368221703"
]
},
"author_id": "1442527297773326344",
"created_at": "2021-12-16T20:30:40.000Z",
"id": "1471578543385677830",
"text": "Hahaha #NancyPelosi #SpeakerPelosi staff has blocked me from tweeting to them! Why are they so afraid of the truth?"
},
{
"attachments": {
"media_keys": [
"3_1469091211038404613"
]
},
"author_id": "864826449601019905",
"created_at": "2021-12-09T23:48:02.000Z",
"id": "1469091500264935424",
"text": "I'm blocked by Elizabeth Warren, Nancy Pelosi and now Karlyn. Interesting pattern. ;)"
},
{
"attachments": {
"media_keys": [
"3_1465403503354990595"
]
},
"author_id": "1083551928821448710",
"created_at": "2021-11-29T19:33:16.000Z",
"id": "1465403505045393418",
"text": "I just realized that I've been blocked by Nancy Pelosi's daughter \ud83d\ude02"
},
{
"attachments": {
"media_keys": [
"3_1462066354568273930"
]
},
"author_id": "844569319409405958",
"created_at": "2021-11-20T14:32:40.000Z",
"id": "1462066368921100293",
"text": "Tried to tag Drunk Nancy Pelosi. She Blocked me. Or shall I say her assistant blocked me. LMAO. They don\u2019t want the truth out. I don\u2019t care who she is. I front her out."
},
{
"author_id": "3921070047",
"created_at": "2021-11-03T04:26:18.000Z",
"id": "1455753176322347009",
"text": "\"Nancy Pelosi is not going to change your lifestyle, I can, but you've blocked me and hald of mules...\""
},
{
"author_id": "345120618",
"created_at": "2021-10-03T02:04:28.000Z",
"id": "1444483458164670467",
"text": "Blocked by Nancy Pelosi? I'm jealous."
},
{
"author_id": "1227381497277095937",
"created_at": "2021-10-03T00:49:28.000Z",
"id": "1444464586506350595",
"text": "Gosh. I can only dream of being blocked by a trash receptacle like Nancy Pelosi. What a badge of honor \ud83c\udf96 it would be. I'll just have to keep trying.\ud83d\ude0e\ud83c\uddfa\ud83c\uddf8"
},
{
"attachments": {
"media_keys": [
"3_1444377454018138115"
]
},
"author_id": "918169011602386944",
"created_at": "2021-10-02T19:03:40.000Z",
"id": "1444377560385658880",
"text": "Anybody else blocked by Nancy Pelosi? \n\nI thought it was illegal for government people to block us?"
},
{
"author_id": "783746267222462464",
"created_at": "2021-08-09T21:23:09.000Z",
"id": "1424843718042001411",
"text": "\" This page has been blocked by Microsoft Edge\"\n--\nSidney Powell Discusses the FBI & Nancy Pelosi\u2019s Role In The January 6th FALSE FLAG"
},
{
"author_id": "158064102",
"created_at": "2021-08-08T12:52:20.000Z",
"id": "1424352780777508864",
"text": "Nancy Pelosi's daughter blocked me?? sweet old little me!!"
},
{
"author_id": "9484732",
"created_at": "2021-08-02T18:30:09.000Z",
"id": "1422263466396733441",
"text": "Democratic leadership didn't have the votes for an extension of the eviction moratorium and were blocked by Republicans from attempting to get around their internal divisions by passing a shorter-term extension through Oct. 18. via #siobhanehughes"
},
{
"author_id": "1381073800624660484",
"created_at": "2021-07-31T22:47:32.000Z",
"id": "1421603462643535873",
"text": "Joe Biden\n> is spending a lot on defense that could be used to create a debt free design\n> is hiding behind Nancy Pelosi and other women in his life \n> can cancel student debt\n> if he's being blocked by the DoD then he actually can't do it"
},
{
"author_id": "1278119139601715201",
"created_at": "2021-07-27T18:41:31.000Z",
"id": "1420091998590099458",
"text": "What?? I got blocked by her because I said victim blaming about Elise stefanik blaming Nancy pelosi for Jan 6th. I went to answer her and I\u2019m blocked. Ppl are seriously reactionary. Geez!"
},
{
"author_id": "1367196589291364357",
"created_at": "2021-07-16T13:58:01.000Z",
"id": "1416034388400353281",
"text": "They were blocked by Nancy Pelosi"
},
{
"author_id": "3001635726",
"created_at": "2021-07-09T04:06:48.000Z",
"id": "1413348887855828997",
"text": "Blocked by Nancy Pelosi who then staged her laptop to be stolen"
},
{
"author_id": "19845473",
"created_at": "2021-07-02T03:54:13.000Z",
"id": "1410809005258256393",
"text": "Fox News #ChadPergram blocked me. Don't worry he didn't fail to ask Nancy Pelosi about 49ers. News."
},
{
"author_id": "979513121541967873",
"created_at": "2021-06-01T01:37:11.000Z",
"id": "1399540499552378881",
"text": "Unarmed Ashli Babbitt... Behind doors that were blocked by furniture.... what threat did she pose\u2049\ufe0f\nZero.... Zero... Zero Threat\u203c\ufe0f A scared, slimy POS backed by Nancy Pelosi took her life & has been protected\u203c\ufe0f"
},
{
"author_id": "1394830598087249924",
"created_at": "2021-05-31T17:56:40.000Z",
"id": "1399424603605323783",
"text": "Nancy Pelosi blocked me. Badge of honor"
},
{
"author_id": "969989169186557953",
"created_at": "2021-05-21T11:15:17.000Z",
"id": "1395699716659286018",
"text": "Nancy Pelosi\u2019s daughter blocked me on Twitter"
},
{
"attachments": {
"media_keys": [
"3_1385809440830484481"
]
},
"author_id": "803311702032850944",
"created_at": "2021-04-24T04:14:52.000Z",
"id": "1385809443015831557",
"text": "Just realized that big #danrodimer blocked me. How can he stand up to Nancy pelosi when he can't even stand up to me posting his old campaign video? #txlege #TXpolitics "
},
{
"author_id": "1213210549732855808",
"created_at": "2021-04-11T08:26:15.000Z",
"id": "1381161660593758212",
"text": "thinking about the fact that on my old account I was blocked by Nancy Pelosi's daughter"
},
{
"author_id": "2836412739",
"created_at": "2021-03-29T21:30:42.000Z",
"id": "1376648033430044679",
"text": "Lol, corrupt scumbag Nancy Pelosi blocked me. #Corruption She doesn\u2019t want her sleepy followers to see the truth."
},
{
"attachments": {
"media_keys": [
"3_1373636857561513987"
]
},
"author_id": "969989169186557953",
"created_at": "2021-03-21T14:05:23.000Z",
"id": "1373636860635983873",
"text": "Nancy Pelosi\u2019s daughter blocked me too. I honestly need to make a Hall of Fame for those who have blocked me"
},
{
"author_id": "1169798579768180736",
"created_at": "2021-03-18T00:50:21.000Z",
"id": "1372349621033443329",
"text": "FellowAMERICANS #BlackLivesMatter #NAACP_LDF #African #Muslim We #UMMAABroadcasting BLOCKED_by #Facebook #Gmail We_DEMAND #HumanRights of Work_Class(80% #USA One_BillionAfrican #Blacks 2.5Billion #Muslims )& #JoeBiden #KamalaHarris #NancyPelosi #POTUS #VP #SpeakerPelosi MUST_ACT"
}
],
"includes": {
"media": [
{
"media_key": "3_1471578541368221703",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1469091211038404613",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1465403503354990595",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1462066354568273930",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1444377454018138115",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1385809440830484481",
"type": "photo",
"url": "url here"
},
{
"media_key": "3_1373636857561513987",
"type": "photo",
"url": "url here"
}
],
"users": [
{
"created_at": "2021-09-27T16:31:37.000Z",
"description": "Don't Tread On Me! Trump 2024. Patriot, Anti-Socialist, Pro-1st & 2nd Amendment. Pro-FREEDOM. I AM MAGA! #IamMAGA\nMelting Snowflake Brains with my Salty Tweets!",
"id": "1442527297773326344",
"name": "Patriot USA \ud83c\uddfa\ud83c\uddf8",
"username": "I_am_MAGA_USA"
},
{
"created_at": "2017-05-17T12:54:28.000Z",
"description": "#LetsGoBrandon #FJB #WitchesForTrump #MagicalPersistence #LibertariansForTrump #PeaceLoveLiberty #PatriotPaganPride #Cult45",
"id": "864826449601019905",
"name": "The\u26e4Tower\u26e4Falls",
"username": "Gwenhwyfar7Aine"
},
{
"created_at": "2019-01-11T02:31:26.000Z",
"description": "GETTR - #TheJohnD \n\nGAB - #John_Deplorable",
"id": "1083551928821448710",
"name": "John D \u2022",
"username": "RedWingGrips"
},
{
"created_at": "2015-10-10T19:42:38.000Z",
"description": "\u3134\u3147\u3139 \u3142\u3137\u3145\u314c\u3134\u314c\u3139",
"id": "3921070047",
"name": "\u2728\ud83e\udd88\u2622\ud83d\udd1e",
"username": "Dystar924"
},
{
"created_at": "2011-07-30T02:54:55.000Z",
"description": "Living the good life in sunny Scottsdale, Arizona.",
"id": "345120618",
"name": "Bill Deegan",
"username": "RealBillDeegan"
},
{
"created_at": "2017-10-11T17:38:45.000Z",
"description": "Don't most of us rely on a single strand for happiness?\n\nAfter being a Single Mom & CFO, I was ready to LIVE!\n\nA drunk driver stole that \ud83d\udcaf",
"id": "918169011602386944",
"name": "Caren R \ud83c\uddfa\ud83c\uddf8\ud83c\uddee\ud83c\uddf1\ud83c\uddec\ud83c\udde7",
"username": "BritishCaren"
},
{
"created_at": "2015-01-15T17:32:12.000Z",
"description": "I did stuff in special education. I\u2019ll always defend the public schools. Progressive feminist and registered Democrat since before you were born.",
"id": "2984412230",
"name": "Kay D\u2019Antonio",
"username": "KayDA26"
},
{
"created_at": "2016-10-05T19:10:46.000Z",
"description": "GETTR handle: Murt32_1943 #murt32\n\nForever America First. Always MAGA\n\nAdjectives: Brilliant/Gorgeous \n\nSupports the LGBFJB community\n\nSorry, I don't do DM's.",
"id": "783746267222462464",
"name": "murt32\ud83c\uddfa\ud83c\uddf8 \ud83c\udf40",
"username": "murt32_1943"
},
{
"created_at": "2014-08-04T22:58:39.000Z",
"description": "Finance |Filmmaker\ud83c\udfac| 2A Advocate\ud83e\uddf9| Content Creator \ud83c\udf9e|Political Commentary\ud83d\udce1| Senior Director \u270f|Engineer | Humility is a journey we must all take.",
"id": "2733732880",
"name": "Somebody's Uncle",
"username": "Dariusr0berts"
},
{
"created_at": "2020-01-03T21:28:46.000Z",
"description": "18,Will buy Origami Angel Merch DM me!!!! (He/Him) Private // #SadHammyFan",
"id": "1213210549732855808",
"name": "Mess",
"username": "punk_matthew"
},
{
"created_at": "2014-10-18T19:41:25.000Z",
"description": "Musician, composer, luthier, digital warrior, Patriot, #MAGA\ud83c\uddfa\ud83c\uddf8\ud83c\uddfa\ud83c\uddf8\ud83c\uddfa\ud83c\uddf8 Q, #Trump 2020!, Save the children from the Peds!",
"id": "2836412739",
"name": "Truth Hurts",
"username": "TruthHurtu2"
},
{
"created_at": "2019-09-06T02:26:30.000Z",
"description": "JOURNALIST in MEMPHIS; Our WatsApp & 2Facebooks BLOCKED, by ENEMIES of our US Constitution.",
"id": "1169798579768180736",
"name": "Arshad Khan, UMMAA Broadcasting, Rolla, MO, USA",
"username": "arshad_usa"
}
]
},
"meta": {
"newest_id": "1471578543385677830",
"next_token": "b26v89c19zqg8o3fosqrfh7sqsqc9rs7aukssfoknvuyl",
"oldest_id": "1372349621033443329",
"result_count": 36
}
}
]
Code that should retrieve the URLs from ['includes']['media']
for each_dict in json_lite:
row = {} # empty dict for data
# 3. loop for user object
row['image_url'] = None # assuming user has no image url
for user in each_dict['includes']['media']:
# 5. user url
# check for url of the current user only
if 'url' in user['url']:
row['image_url'] = user.get('url') # if user has url
break # break the loop, as url is found
url_df = url_df.append(row, ignore_index=True) # append data to empty url_df

Not quite the way you asked for, but you might consider just using regex:
import re
urls = re.findall('"url": "([^"]*)"', json.dumps(data))
Output:
['url here',
'url here',
'url here',
'url here',
'url here',
'url here',
'url here',
'url here',
'url here',
'url here',
'url here',
'url here']

Error message KeyError: 'media' points that you should check, if each_dict['includes'] contains 'media' key. You could also use get method of dict, to skip those, which miss 'media' key. Try to replace
for user in each_dict['includes']['media']:
with
for user in each_dict['includes'].get('media',[]):
which should prevent your error.

Related

Remove list element based on part of a string

I have a long list json_response containing Twitter data. Some of the 293 elements in the list do not contain any tweets indicated by 'result_count': 0 and I want to delete those elements from json_response
The following should remove all elements containing 'result_count': 0. However, nothing happens when the code is executed
json_response = [element for element in json_response if element != "'result_count': 0"]
A sample of json_response where only the second out of four elements contain tweets.
print(json.dumps(json_response[0:4], indent=4, sort_keys=True))
[
{
"meta": {
"next_token": "b26v89c19zqg8o3fo77fw18ex7m9tkxtn5jx8qokz8y2l",
"result_count": 0
}
},
{
"data": [
{
"author_id": "751651375407181824",
"created_at": "2019-12-16T02:10:22.000Z",
"id": "1206396117425852417",
"text": "Tarkanian libel lawsuit against Jacky Rosen, 2016 opponent, blocked by Nevada Supreme Court"
},
{
"author_id": "7568942",
"created_at": "2019-12-15T04:41:00.000Z",
"id": "1206071638166507520",
"text": "Tarkanian libel lawsuit against Jacky Rosen, 2016 opponent, blocked by Nevada Supreme Court Dismissed thanks to NV's anti-SLAPP law"
},
{
"author_id": "2404787642",
"created_at": "2019-12-13T18:40:32.000Z",
"id": "1205558134317568000",
"text": "Tarkanian libel lawsuit against Jacky Rosen, 2016 opponent, blocked by Nevada Supreme Court"
},
{
"author_id": "245630545",
"created_at": "2019-12-13T18:06:29.000Z",
"id": "1205549565513883648",
"text": "Attacks lobbed in the heat of a campaign don't end with the campaign, Part 2: Supreme Court puts an end to Danny Tarkanian's libel lawsuit against Jacky Rosen for ads from a 2016 congressional campaign, also via #RileySnyder:"
},
{
"author_id": "56440142",
"created_at": "2019-12-12T22:26:06.000Z",
"id": "1205252514070839296",
"text": ".#DannyTarkanian libel lawsuit against #SenJackyRosen, 2016 opponent, blocked by Nevada Supreme Court via #RileySnyder\u200b"
},
{
"author_id": "794407888567476224",
"created_at": "2019-12-12T22:08:08.000Z",
"id": "1205247991029755905",
"text": "Tarkanian libel lawsuit against Jacky Rosen, 2016 opponent, blocked by Supreme Court\nVia #RileySnyder\n"
}
],
"includes": {
"users": [
{
"created_at": "2016-07-09T05:37:07.000Z",
"description": "Towanda! from Fried Green Tomatoes",
"id": "751651375407181824",
"name": "Karen Gruber",
"username": "mail4ufromme1"
},
{
"created_at": "2007-07-18T20:09:04.000Z",
"description": "Full-time software engineering manager, part-time educator, constant student, backpacker and disliker of the Oxford comma.",
"id": "7568942",
"name": "Justin Yost",
"username": "justinyost"
},
{
"created_at": "2014-03-22T17:05:36.000Z",
"description": "",
"id": "2404787642",
"name": "James Egan",
"username": "JamesEganLaw"
},
{
"created_at": "2011-02-01T03:39:40.000Z",
"description": "Assistant editor and reporter #TheNVIndy covering statehouse elections and more. Co-host of #nvindyespanol's Cafecito. Email me: michelle#thenvindy.com",
"id": "245630545",
"name": "Michelle Rindels",
"username": "MichelleRindels"
},
{
"created_at": "2009-07-13T17:49:46.000Z",
"description": "Curious about Congress and the beautiful game. Following the Nevada delegation for #TheNVIndy",
"id": "56440142",
"name": "Humberto Sanchez",
"username": "hsanchez128"
},
{
"created_at": "2016-11-04T05:16:14.000Z",
"description": "Nonprofit news outlet reporting on Nevada politics, policy and people since 2017 | Your State. Your News. Your Voice. | ideas#thenvindy.com",
"id": "794407888567476224",
"name": "Nevada Independent",
"username": "TheNVIndy"
}
]
},
"meta": {
"newest_id": "1206396117425852417",
"next_token": "b26v89c19zqg8o3fn0po9zgvw98j7w7sec5wgoh0s0rr1",
"oldest_id": "1205247991029755905",
"result_count": 6
}
},
{
"meta": {
"next_token": "b26v89c19zqg8o3fosns35qj7v5486697crmsdhl6kku5",
"result_count": 0
}
},
{
"meta": {
"next_token": "b26v89c19zqg8o3fo77h5ma6xw9tghoz8z8l6hgq0shod",
"result_count": 0
}
}
]
Since your input is ultimately just a list of dictionaries with <key, dictionary> pairs, this should do it:
json_response = [element for element in json_response
if element['meta']['result_count'] > 0]

How to access Twitter included data object?

I have extracted the following Twitter data using Tweepy. However, I am not able to fetch data from the included data object. I am specifically trying to fetch the URL and description data. I can see from the json_response that both data on URL and description are present.
My data has the following structure:
{
"data": [
{
"attachments": {
"media_keys": [
"3_1376989039262195713"
]
},
"author_id": "964661980551266304",
"created_at": "2021-03-30T20:05:45.000Z",
"id": "1376989044039544836",
"text": "#RichardGrenell I also want to speak out against this FB group who blocked me (after asking me to invite all my friends) for making the point that this recall not be made a MAGA one. \n\nI didn\u2019t stump on the ground for Trump, I did it for my children."
},
{
"attachments": {
"media_keys": [
"3_1376986160963145736",
"3_1376986160988368898",
"3_1376986160963198980",
"3_1376986160954757129"
]
},
"author_id": "1000347213145563136",
"created_at": "2021-03-30T19:54:20.000Z",
"id": "1376986169704071171",
"text": "#Bobbrock8013 #irishson19161 #RandPaul It's ok to question the election of Trump, but if you question Biden's win you are a \"domestic terrorist.\" Does the Biden Admin welcome a discussion of opposing views on policies regarding lockdowns, masks and vaccines? Why is Big Tech censoring conservatives? Fascists censor."
},
{
"attachments": {
"media_keys": [
"3_1376961169450221571"
]
},
"author_id": "328673472",
"created_at": "2021-03-30T18:15:00.000Z",
"id": "1376961171841036291",
"text": "#ByronYork Newsworthy, but Democrats via their minions will likely censor Trump's statement from Twitter, Facebook, CNN, MSNBC, Washington Post, NY Times etc You know our free speech rules now are based on the Democrats' version of what they will ALLOW us Deplorables to say let alone think."
},
{
"author_id": "18774517",
"created_at": "2021-03-30T10:31:58.000Z",
"id": "1376844643837566986",
"text": "RT #BrexitBuster: #EditingMike #LauraHa15799415 I\u2019m old enough to remember when Piers Morgan was Donald J Trump\u2019s number one fanboy. Are yo\u2026"
},
{
"author_id": "52405628",
"created_at": "2021-03-30T10:30:33.000Z",
"id": "1376844286646480899",
"text": "RT #BrexitBuster: #EditingMike #LauraHa15799415 I\u2019m old enough to remember when Piers Morgan was Donald J Trump\u2019s number one fanboy. Are yo\u2026"
},
{
"author_id": "848911132496723969",
"created_at": "2021-03-30T10:30:11.000Z",
"id": "1376844194921250818",
"text": "RT #BrexitBuster: #EditingMike #LauraHa15799415 I\u2019m old enough to remember when Piers Morgan was Donald J Trump\u2019s number one fanboy. Are yo\u2026"
},
{
"attachments": {
"media_keys": [
"3_1376836461601898499",
"3_1376836461614542853"
]
},
"author_id": "848911132496723969",
"created_at": "2021-03-30T09:59:37.000Z",
"id": "1376836504308305921",
"text": "#EditingMike #LauraHa15799415 I\u2019m old enough to remember when Piers Morgan was Donald J Trump\u2019s number one fanboy. Are you?\n\nThen he praised Joe Biden\u2019s speech... until he was offered the chance to pen a vicious hatchet piece for the Daily Mail! Pointing this out earned me a block.\n#shapeshiftingcreep"
},
{
"attachments": {
"media_keys": [
"3_1376821889004363777"
]
},
"author_id": "31308988",
"created_at": "2021-03-30T09:01:34.000Z",
"id": "1376821895811715073",
"text": "A lady sent this to my messenger right before she blocked me because she was mad I typed the names of Trump's sex assault victims"
},
{
"attachments": {
"media_keys": [
"3_1376704749379145731"
]
},
"author_id": "198202008",
"created_at": "2021-03-30T01:16:05.000Z",
"id": "1376704753145643014",
"text": "#moondancer34 #MrCrispyMAGA #lonelymilkshake #EFMoriarty #CBSNews Who is this person who blocked me? A MAGA lover? Guess that\u2019s why. But how ironic that he\u2019s a Trump supporter yet a WA fan when Woody is about as liberal as they get. In fact he donated to Hillary\u2019s campaign so she\u2019d win against Trump. Whatever! \ud83d\ude02\ud83e\udd37\u200d\u2640\ufe0f"
}
],
"includes": {
"media": [
{
"media_key": "3_1376989039262195713",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExwMPFDUYAEHKn0.jpg"
},
{
"media_key": "3_1376986160963145736",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExwJnijWUAgfPlb.jpg"
},
{
"media_key": "3_1376986160988368898",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExwJnipXMAIHmJp.jpg"
},
{
"media_key": "3_1376986160963198980",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExwJnijXIAQ4F_x.jpg"
},
{
"media_key": "3_1376986160954757129",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExwJnihWUAkr8bi.jpg"
},
{
"media_key": "3_1376961169450221571",
"type": "photo",
"url": "https://pbs.twimg.com/media/Exvy416WQAMRlO0.jpg"
},
{
"media_key": "3_1376836461601898499",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExuBd4-WQAMgTTR.jpg"
},
{
"media_key": "3_1376836461614542853",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExuBd5BXMAU2-p_.jpg"
},
{
"media_key": "3_1376821889004363777",
"type": "photo",
"url": "https://pbs.twimg.com/media/Ext0Np0WYAEUBXy.jpg"
},
{
"media_key": "3_1376704749379145731",
"type": "photo",
"url": "https://pbs.twimg.com/media/ExsJrOtWUAMgVxk.jpg"
}
],
"users": [
{
"created_at": "2018-02-17T00:45:13.000Z",
"description": "Congressional Candidate for CA-28 Proud Angeleno/Catholic/Californio by marriage Localist\u2022Centrist\u2022Pragmatist\u2022Realist",
"id": "964661980551266304",
"name": "Beatrice Cardenas",
"username": "RealBetyCardens"
},
{
"created_at": "2018-05-26T12:05:35.000Z",
"description": "Following President Trump .... KAG 2020 \ud83c\uddfa\ud83c\uddf8",
"id": "1000347213145563136",
"name": "Joseph Fong",
"username": "JosephEugeneFo1"
},
{
"created_at": "2011-07-03T20:29:43.000Z",
"description": "Husband, Dad, Granddad, Christian,Army MP Sgt vet, I.U. grad, former banker & retired City Finance Director, Reagan guy. Cancer survivor. \u271d\ufe0f\ud83c\uddfa\ud83c\uddf8",
"id": "328673472",
"name": "Steve B",
"username": "Stevebfrs"
},
{
"created_at": "2009-01-08T19:06:29.000Z",
"description": "a younger Victor Meldrew but interesting - I hope - nice sometimes !",
"id": "18774517",
"name": "NORBET",
"username": "NORBET"
},
{
"created_at": "2009-06-30T14:17:41.000Z",
"description": "Tanglewood and Gretsch",
"id": "52405628",
"name": "FSociety Tom \ud83c\uddea\ud83c\uddfa #FBPE ANTIFA #RESIST #FBPPR #BLM",
"username": "thebdaman"
},
{
"created_at": "2017-04-03T14:52:40.000Z",
"description": "We are the Remain Resistance... popping Brexit bubbles one at a time. Mostly sarcasm, occasionally deadly serious. Love the UK & the EU. Detest racism & Nazis.",
"id": "848911132496723969",
"name": "Brexit Buster",
"username": "BrexitBuster"
},
{
"created_at": "2009-04-15T02:18:58.000Z",
"description": "No DMs !!! \ud83c\udf0a \ud83c\udf0a\nBLM ,Trans lives matter, LGBT \ud83c\udf08\nAlly of all marginalized",
"id": "31308988",
"name": "Stephy Pachuco (Her, She) \ud83c\udf0a\ud83c\udf0a",
"username": "Stephaniespc"
},
{
"created_at": "2010-10-03T16:56:45.000Z",
"description": "How'd you know I was looking at you if you weren't looking at me? \ud83d\udde3Mike Patton \u2615\ufe0fCoffee \ud83d\ude0eWeekends \ud83c\udf0aPolitics \ud83d\ude0dNYC \ud83e\udd96Museum Employee",
"id": "198202008",
"name": "Patti\ud83d\uddfd",
"username": "PattiFromNYC"
}
]
},
"meta": {
"newest_id": "1376989044039544836",
"next_token": "b26v89c19zqg8o3fosqtjm19orv2gber5hh7b0fu7uem5",
"oldest_id": "1376704753145643014",
"result_count": 9
}
}
I can successfully fetch the data from the data object which is 'id', 'text', 'created_at', and 'author_id' using the following code. However, the code does not retrieve the 'URL' and 'description' data from the included object which leaves me with two empty columns.
# Create file
csvFile = open("data.csv", "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# Create headers for the data
csvWriter.writerow(
['author id', 'created_at', 'id', 'tweet', 'bio', 'image_url'])
csvFile.close()
def append_to_csv(json_response, fileName):
# A counter variable
counter = 0
# Open OR create the target CSV file
csvFile = open(fileName, "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)
# Loop through each tweet
for tweet in json_response['data']:
# We will create a variable for each since some of the keys might not exist for some tweets
# So we will account for that
# 1. Author ID
author_id = tweet['author_id']
# 2. Time created
created_at = dateutil.parser.parse(tweet['created_at'])
# 3. Tweet ID
tweet_id = tweet['id']
# 4. Tweet text
text = tweet['text']
# 5. description
if('description' in tweet):
bio = tweet['users']['description']
else:
bio = " "
# 6. image url
if ('url' in tweet):
image_url = tweet['media']['url']
else:
image_url = " "
# Assemble all data in a list
res = [author_id, created_at, tweet_id, text, bio, image_url]
# Append the result to the CSV file
csvWriter.writerow(res)
counter += 1
# When done, close the CSV file
csvFile.close()
# Print the number of tweets for this iteration
print("# of Tweets added from this response: ", counter)

LUIS.ai json migrated to Rasa format json is not returning the entities but correct intent is returned

I have migrated the json downloaded from LUIS app to RASA format using command: python -m rasa_nlu.train -c config_spacy.json
My configuration file looks like this:
{
"path" : "./models",
"data" : "./data/examples/rasa/BookACab.json",
"pipeline" : ["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy",
"ner_crf", "ner_synonyms", "intent_classifier_sklearn",
"ner_duckling"]
}
A model was generated with json in RASA format as below. However, when I query this model using
http://localhost:5000/parse?q=book a ride later
the correct high scoring intent relating to the text I entered and all its related entities are returned. But when I try another text like:
http://localhost:5000/parse?q=I want to go ride today 5pm
The intent returned is correct one but it's Entities object is empty. As you can see below json,this utterance is also having entities mapped to it similar to the working example.
Please help me to know if this is an issue for everyone with RASA or am I doing any mistake? Thank You!
{
"rasa_nlu_data": {
"common_examples": [
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 0,
"end": 5
}
],
"intent": "None",
"text": "later"
},
{
"entities": [],
"intent": "ServiceRequestEnquiry",
"text": "wake up"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no not now"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "not sure"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no bot"
},
{
"entities": [],
"intent": "ConfirmationNo",
"text": "no goride bot"
},
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride later"
},
{
"entities": [
{
"entity": "RideTime",
"value": "now",
"start": 21,
"end": 24
}
],
"intent": "BookCab",
"text": "i want go for a ride now"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride today"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today 5pm",
"start": 18,
"end": 27
}
],
"intent": "BookCab",
"text": "I want to go ride today 5pm"
},
{
"entities": [
{
"entity": "RideTime",
"value": "today",
"start": 12,
"end": 17
}
],
"intent": "BookCab",
"text": "book a ride today 5pm"
},
{
"entities": [
{
"entity": "RideTime",
"value": "later",
"start": 13,
"end": 18
}
],
"intent": "BookCab",
"text": "book shuttle later"
},
{
"entities": [
{
"entity": "RideTime",
"value": "now",
"start": 15,
"end": 18
}
],
"intent": "None",
"text": "i want to book now"
},
{
"entities": [
{
"entity": "RideTime",
"value": "booknow",
"start": 10,
"end": 17
}
],
"intent": "None",
"text": "i want to booknow"
},
{
"entities": [
{
"entity": "RideTime",
"value": "book later",
"start": 10,
"end": 20
}
],
"intent": "None",
"text": "i want to book later"
}
],
"regex_features": []
}
}
It would be helpful if you could include the pipeline you are using with Rasa. You can find this in your configuration file. Assuming you haven't changed the default pipeline in config_spacy.json then you're using ner_crf for entity recognition.
It's very likely that because of library differences Rasa just requires more training data than LUIS did. Qualitatively the mitie pipeline generally requires less training data, but the trade off is that it takes more time to train.
So the basic answer to your question is: If you want to use ner_crf then you need to increase the amount of training data you are providing for entity recognition.
That being said: is RideTime your only entity? If so you should look into adding ner_duckling to your pipeline, which can recognize dates. This would perform better than you trying to train dates by yourself.
So using your training data above and the pipeline:
["nlp_spacy", "tokenizer_spacy", "intent_featurizer_spacy", "ner_crf", "ner_synonyms", "intent_classifier_sklearn", "ner_duckling"]
Here is the result:
{
"entities": [
{
"additional_info": {
"grain": "hour",
"others": [
{
"grain": "hour",
"value": "2017-07-26T17:00:00.000Z"
}
],
"value": "2017-07-26T17:00:00.000Z"
},
"end": 27,
"entity": "time",
"extractor": "ner_duckling",
"start": 18,
"text": "today 5pm",
"value": "2017-07-26T17:00:00.000Z"
}
],
"intent": {
"confidence": 0.5469262356494486,
"name": "BookCab"
},
"intent_ranking": [
{
"confidence": 0.5469262356494486,
"name": "BookCab"
},
{
"confidence": 0.2812606328712321,
"name": "None"
},
{
"confidence": 0.08727531874740564,
"name": "ConfirmationNo"
},
{
"confidence": 0.0845378127319134,
"name": "ServiceRequestEnquiry"
}
],
"text": "I want to go ride today 5pm"
}
This complete training set works quite well for me. It was just a matter of adding more training examples. So as you test more, if you come across an example that doesn't work as expected, add it to the training data and re-train. Thus teaching your model to handle more varied requests.
https://gist.github.com/wrathagom/7f05fbda75c785977bd07cd89e62ddd7

How to get the ID of the parent comment (facebook Graph API)?

When sending a request
https://graph.facebook.com/v2.1/123456898765432/comments?access_token=TOKEN&pretty=1&filter=stream&limit=1&summary=1
I get an answer
{
"data": [
{
"created_time": "2015-06-17T10:32:04+0000",
"from": {
"name": "First Name",
"id": "12345678987654"
},
"message": "Message",
"can_remove": true,
"like_count": 0,
"user_likes": false,
"id": "123456898765432_123456789765433"
}
],
"paging": {
"cursors": {
"before": "...",
"after": "..."
},
},
"summary": {
"order": "chronological",
"total_count": 2532
}
}
But if the comment of the second level, I do not know the ID of the parent comment, and I can not answer it programmatically.
Maybe there are some arguments that can be specified, and additional data comment?
I found that there is still an argument metadata = 1
But it shows additional information counter on the object, and there is also no parent ID
I just had this problem and it seems that you can get the parent comment.
request
111791572258237_803009099803144?fields=comments.filter(stream).limit(50){message,id,from,parent}
(the ACCESS_TOKEN was omitted)
response
{
"message": "Sim, está acontecendo com várias pessoas. A Valve vai arrumar logo, provavelmente",
"id": "803009099803144_803075496463171",
"from": {
"name": "Jonathan Gouvea",
"id": "1218897258138073"
},
"parent": {
"created_time": "2016-01-28T19:58:39+0000",
"from": {
"name": "César Rodryguês",
"id": "552640601571460"
},
"message": "Dota ta fechando o de vcs quando vai entra na partida ?",
"id": "803009099803144_803068649797189"
}
},

Iterating over JSON object individually for processing each post

I have the following JSON object that I get from the Instagram API, it can have n number of posts (depending upon the count parameter provided).
{
"pagination": {
"next_url": "https:\/\/api.instagram.com\/v1\/users\/3\/media\/recent?access_token=184046392.f59def8.c5726b469ad2462f85c7cea5f72083c0&max_id=205140190233104928_3",
"next_max_id": "205140190233104928_3"
},
"meta": {
"code": 200
},
"data": [{
"attribution": null,
"tags": [],
"type": "image",
"location": {
"latitude": 37.798594362,
"name": "Presidio Bowling Center",
"longitude": -122.459878922,
"id": 27052
},
"comments": {
"count": 132,
"data": [{
"created_time": "1342734265",
"text": "Distinguishing!",
"from": {
"username": "naiicamilos",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_53690312_75sq_1336573463.jpg",
"id": "53690312",
"full_name": "Naii Camilos"
},
"id": "239194812924826175"
}, {
"created_time": "1342737428",
"text": "#kevin in Spanish Presidio means Jail",
"from": {
"username": "jm0426",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_25881992_75sq_1342156673.jpg",
"id": "25881992",
"full_name": "Juan Mayen"
},
"id": "239221343768285211"
}, {
"created_time": "1342768120",
"text": "Good imagination",
"from": {
"username": "kidloca",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_193133903_75sq_1342032241.jpg",
"id": "193133903",
"full_name": "Khaleda Noon"
},
"id": "239478811731694145"
}, {
"created_time": "1342775967",
"text": "Cwl!",
"from": {
"username": "awesomeath",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_179252164_75sq_1339745821.jpg",
"id": "179252164",
"full_name": "awesomeath"
},
"id": "239544638740894674"
}, {
"created_time": "1342796153",
"text": "\u597d\u7f8e\u263a",
"from": {
"username": "hidelau",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_47295330_75sq_1342763977.jpg",
"id": "47295330",
"full_name": "Hide Lau"
},
"id": "239713963951001995"
}, {
"created_time": "1343018007",
"text": "#mindfreak",
"from": {
"username": "info2021",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/anonymousUser.jpg",
"id": "27664191",
"full_name": "info2021"
},
"id": "241575017119224582"
}, {
"created_time": "1343068374",
"text": "#kevin please share and promote my last pic. This will be the new hype as instagram",
"from": {
"username": "thansy_mansy",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_189019343_75sq_1342951587.jpg",
"id": "189019343",
"full_name": "thansy_mansy"
},
"id": "241997523093295303"
}, {
"created_time": "1343068382",
"text": "#kevin :P",
"from": {
"username": "thansy_mansy",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_189019343_75sq_1342951587.jpg",
"id": "189019343",
"full_name": "thansy_mansy"
},
"id": "241997589589790922"
}]
},
"filter": "Rise",
"created_time": "1342676212",
"link": "http:\/\/instagr.am\/p\/NQD4KAABKF\/",
"likes": {
"count": 4810,
"data": [{
"username": "caitlyn_hammonds",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/anonymousUser.jpg",
"id": "198322184",
"full_name": "caitlyn_hammonds"
}, {
"username": "sophiafrancis",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_43092892_75sq_1340548333.jpg",
"id": "43092892",
"full_name": "Sophiaaa."
}, {
"username": "amna7861",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_175807260_75sq_1343135903.jpg",
"id": "175807260",
"full_name": "Amna Haroon"
}, {
"username": "yaya0318",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_74056_75sq_1287001004.jpg",
"id": "74056",
"full_name": "Mao Yaya"
}, {
"username": "jay_damage",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_197465040_75sq_1342932411.jpg",
"id": "197465040",
"full_name": "jay_damage"
}, {
"username": "reves",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_671833_75sq_1335966794.jpg",
"id": "671833",
"full_name": "Fernando D. Ramirez"
}, {
"username": "lizray1",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_198450407_75sq_1343144120.jpg",
"id": "198450407",
"full_name": "lizray1"
}, {
"username": "alivewtheglory",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_37907416_75sq_1341561441.jpg",
"id": "37907416",
"full_name": "Marilynn C"
}, {
"username": "mnforever55",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_29255977_75sq_1334833008.jpg",
"id": "29255977",
"full_name": "mnforever55"
}]
},
"images": {
"low_resolution": {
"url": "http:\/\/distilleryimage9.s3.amazonaws.com\/beb7f896d16311e19fe21231380f3636_6.jpg",
"width": 306,
"height": 306
},
"thumbnail": {
"url": "http:\/\/distilleryimage9.s3.amazonaws.com\/beb7f896d16311e19fe21231380f3636_5.jpg",
"width": 150,
"height": 150
},
"standard_resolution": {
"url": "http:\/\/distilleryimage9.s3.amazonaws.com\/beb7f896d16311e19fe21231380f3636_7.jpg",
"width": 612,
"height": 612
}
},
"caption": {
"created_time": "1342676255",
"text": "Happy birthday #amy !",
"from": {
"username": "kevin",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_3_75sq_1325536697.jpg",
"id": "3",
"full_name": "Kevin Systrom"
},
"id": "238708186813567655"
},
"user_has_liked": false,
"id": "238707833418289797_3",
"user": {
"username": "kevin",
"website": "",
"bio": "CEO & Co-founder of Instagram",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_3_75sq_1325536697.jpg",
"full_name": "Kevin Systrom",
"id": "3"
}
}, {
"attribution": null,
"tags": [],
"type": "image",
"location": {
"latitude": 38.503100608,
"name": "Goose & Gander",
"longitude": -122.468387538,
"id": 12059278
},
"comments": {
"count": 85,
"data": [{
"created_time": "1342555499",
"text": "Cheers !!! \ud83d\ude18\ud83d\ude18",
"from": {
"username": "kattiab",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_1345073_75sq_1340495505.jpg",
"id": "1345073",
"full_name": "kattia b"
},
"id": "237695212468572732"
}, {
"created_time": "1342558279",
"text": "happy birthday instagram!",
"from": {
"username": "alanasayshi",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_4235095_75sq_1341960681.jpg",
"id": "4235095",
"full_name": "Alana Boy\u00e9r"
},
"id": "237718535382504383"
}, {
"created_time": "1342567977",
"text": "Happy Natal Day Instagram!",
"from": {
"username": "cynrtst",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_12493918_75sq_1341538462.jpg",
"id": "12493918",
"full_name": "Cynthia L"
},
"id": "237799888639758668"
}, {
"created_time": "1342568896",
"text": "Happy Birthday \ud83c\udf89\ud83c\udf89\ud83c\udf89 was it a long labour \ud83d\ude02\ud83d\ude02\ud83d\ude02",
"from": {
"username": "relzie",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_15718874_75sq_1332290975.jpg",
"id": "15718874",
"full_name": "Relz"
},
"id": "237807595966960050"
}, {
"created_time": "1342579289",
"text": "Cheers #kevin and Happy Birthday #instagram thank you so much Kevin for creating instagram it's truly got me back out there taking more photos and falling in love with photography all over again...",
"from": {
"username": "bpphotographs",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_12149171_75sq_1339457436.jpg",
"id": "12149171",
"full_name": "bpphotographs"
},
"id": "237894779172557723"
}, {
"created_time": "1342652660",
"text": "#suz_h",
"from": {
"username": "ianyorke",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_1041088_75sq_1339844137.jpg",
"id": "1041088",
"full_name": "Ian Yorke"
},
"id": "238510264889118973"
}, {
"created_time": "1342667574",
"text": "Love your app\ud83d\udc97\ud83d\udc97\ud83d\udc97\ud83d\udc97\ud83d\udc97",
"from": {
"username": "gothangel1997",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_186799125_75sq_1342668825.jpg",
"id": "186799125",
"full_name": "Angel Mercado"
},
"id": "238635368570687940"
}, {
"created_time": "1342843274",
"text": "\ud83d\ude09\u2764",
"from": {
"username": "andescu",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_145554839_75sq_1337641309.jpg",
"id": "145554839",
"full_name": "andescu"
},
"id": "240109245637333685"
}]
},
"filter": "Sierra",
"created_time": "1342332400",
"link": "http:\/\/instagr.am\/p\/NF0G6bABA2\/",
"likes": {
"count": 3282,
"data": [{
"username": "caysondesigns",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_23464609_75sq_1329177054.jpg",
"id": "23464609",
"full_name": "Jasmine at Cayson Designs"
}, {
"username": "m_azooz16",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_198179404_75sq_1343096096.jpg",
"id": "198179404",
"full_name": "m_azooz16"
}, {
"username": "shulinghuang",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_144887495_75sq_1342460246.jpg",
"id": "144887495",
"full_name": "H\u3002"
}, {
"username": "caitlyn_hammonds",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/anonymousUser.jpg",
"id": "198322184",
"full_name": "caitlyn_hammonds"
}, {
"username": "sophiafrancis",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_43092892_75sq_1340548333.jpg",
"id": "43092892",
"full_name": "Sophiaaa."
}, {
"username": "beatle1234",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/anonymousUser.jpg",
"id": "197988834",
"full_name": "beatle1234"
}, {
"username": "yaya0318",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_74056_75sq_1287001004.jpg",
"id": "74056",
"full_name": "Mao Yaya"
}, {
"username": "lizray1",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_198450407_75sq_1343144120.jpg",
"id": "198450407",
"full_name": "lizray1"
}, {
"username": "rawr1234321",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_198492630_75sq_1343151765.jpg",
"id": "198492630",
"full_name": "rawr1234321"
}]
},
"images": {
"low_resolution": {
"url": "http:\/\/distilleryimage6.s3.amazonaws.com\/3ec59e18ce4311e1b8031231380702ee_6.jpg",
"width": 306,
"height": 306
},
"thumbnail": {
"url": "http:\/\/distilleryimage6.s3.amazonaws.com\/3ec59e18ce4311e1b8031231380702ee_5.jpg",
"width": 150,
"height": 150
},
"standard_resolution": {
"url": "http:\/\/distilleryimage6.s3.amazonaws.com\/3ec59e18ce4311e1b8031231380702ee_7.jpg",
"width": 612,
"height": 612
}
},
"caption": {
"created_time": "1342332465",
"text": "Mellivora capensis - eagle rare, peat, honey, lemon, pineapple, black cardamom, chili, coconut foam",
"from": {
"username": "kevin",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_3_75sq_1325536697.jpg",
"id": "3",
"full_name": "Kevin Systrom"
},
"id": "235824269324456712"
},
"user_has_liked": false,
"id": "235823728972271670_3",
"user": {
"username": "kevin",
"website": "",
"bio": "CEO & Co-founder of Instagram",
"profile_picture": "http:\/\/images.instagram.com\/profiles\/profile_3_75sq_1325536697.jpg",
"full_name": "Kevin Systrom",
"id": "3"
}
}, .....
So I m looking to iterate over each post individually and extract the tags, id, and the image urls. I m having some trouble, (since I m a PHP developer and finding it really hard to work with Python as a beginner).
Here's the code that I m using to iterate over each post and process the attributes provided. I dont want to store them in a list or dict. Just want to search through the tags.
(this is just a attempted code since I couldnt find which loop should I use)
info= simplejson.load(info)
print type(info['data']) # I get it as a list
for k, v in info['data']:
print v
I could have done this easily using php with a foreach :
foreach($info->data as $i) {
$tags = $i->tags();
$id = $i->id();
}
If info['data'] is a list, you should be able to iterate over it like so:
for post in info['data']:
tags = post['tags']
id = post['id']
image_urls = [] # An empty list -- we'll fill it below
for img_type in ['low_resolution', 'thumbnail', 'standard_resolution']:
image_urls.append(post['images'][img_type]['url'])
# Now image_urls has all the image urls in it
I think the part that's rather different from PHP is that where the key is "tags" in the JSON structure, you have to use the string "tags" in Python, whereas you would use the literal tags() in PHP.

Categories

Resources