From MongoDB convert from dictionary to row with Pandas

From MongoDB convert from dictionary to row with Pandas - python

This is a test coming from MongoDB, I need to convert to MySQL. But! Sometimes there is more then one "agents", if that's the case I need each agent on their own row and that agent should have the same "display_name". For example Walter should have Gloria on one row and Barb on next and both have Walt Mosley under "display_name".
[{'name': 'Loomis, Gloria',
'primaryemail': 'gloria#gmail.com',
'primaryphone': '212-382-1121'},
{'name': 'Hogson, Barb',
'primaryemail': 'bho124#aol.com',
'primaryphone': ''}]
I've tried this but it just splits out the key/values.
a,b,c = [[d[e] for d in test] for e in sorted(test[0].keys())]
print(a,b,c)
This is the original JSON format:
{'_id': ObjectId('58e6ececafb08d6'),
'item_type': 'Contributor',
'role': 0,
'short_bio': 'Walter Mosley (b. 1952)',
'firebrand_id': 1588,
'display_name': 'Walter Mosley',
'first_name': 'Walter',
'last_name': 'Mosley',
'slug': 'walter-mosley',
'updated': datetime.datetime(2020, 1, 7, 8, 17, 11, 926000),
'image': 'https://s3.amazonaws.com/8588-book-contributor.jpg',
'social_media_name': '',
'social_media_link': '',
'website': '',
'agents': [{'name': 'Loomis, Gloria',
'primaryemail': 'gloria#gmail.com',
'primaryphone': '212-382-1121'},
{'name': 'Hogson, Barb',
'primaryemail': 'bho124#aol.com',
'primaryphone': ''}],
'estates': [],
'deleted': False}

If you've an array of dictionaries from your JSON file, try this :
JSON input :
inputJSON = [{'item_type': 'Contributor',
'role': 0,
'short_bio': 'Walter Mosley (b. 1952)',
'firebrand_id': 1588,
'display_name': 'Walter Mosley',
'first_name': 'Walter',
'last_name': 'Mosley',
'slug': 'walter-mosley',
'image': 'https://s3.amazonaws.com/8588-book-contributor.jpg',
'social_media_name': '',
'social_media_link': '',
'website': '',
'agents': [{'name': 'Loomis, Gloria',
'primaryemail': 'gloria#gmail.com',
'primaryphone': '212-382-1121'},
{'name': 'Hogson, Barb',
'primaryemail': 'bho124#aol.com',
'primaryphone': ''}],
'estates': [],
'deleted': False}]
Code :
import copy
finalJSON = []
for each in inputJSON:
for agnt in each.get('agents'):
newObj = copy.deepcopy(each)
newObj['agents'] = agnt
finalJSON.append(newObj)
print(finalJSON)

Related

get a part from a dictionary

i'm trying to get the pulse as an output for the given url using this code
from OTXv2 import OTXv2
from OTXv2 import IndicatorTypes
otx = OTXv2("my_key")
test=otx.get_indicator_details_full(IndicatorTypes.DOMAIN, "google.com")
and when i print test i become this output:
{'general': {'sections': ['general', 'geo', 'url_list', 'passive_dns', 'malware', 'whois', 'http_scans'], 'whois': 'http://whois.domaintools.com/google.com', 'alexa': 'http://www.alexa.com/siteinfo/google.com', 'indicator': 'google.com', 'type': 'domain', 'type_title': 'Domain', 'validation': [{'source': 'ad_network', 'message': 'Whitelisted ad network domain www-google-analytics.l.google.com', 'name': 'Whitelisted ad network domain'}, {'source': 'akamai', 'message': 'Akamai rank: #3', 'name': 'Akamai Popular Domain'}, {'source': 'alexa', 'message': 'Alexa rank: #1', 'name': 'Listed on Alexa'}, {'source': 'false_positive', 'message': 'Known False Positive', 'name': 'Known False Positive'}, {'source': 'majestic', 'message': 'Whitelisted domain google.com', 'name': 'Whitelisted domain'}, {'source': 'whitelist', 'message': 'Whitelisted domain google.com', 'name': 'Whitelisted domain'}], 'base_indicator': {'id': 12915, 'indicator': 'google.com', 'type': 'domain', 'title': '', 'description': '', 'content': '', 'access_type': 'public', 'access_reason': ''}, 'pulse_info': {'count': 0, 'pulses': [], 'references': [], 'related': {'alienvault': {'adversary': [], 'malware_families': [], 'industries': []}, 'other': {'adversary': [], 'malware_families': [], 'industries': []}}}, 'false_positive':...
i want to get only the part 'count': 0 in pulse_info
i tried using test.values() but it's like i have many dictionaries together
any idea how can i solve that?
Thank you

print(test["general"]["pulse_info"]["count"])

Flattening deeply nested JSON into pandas data frame

I am trying to import a deeply nested JSON into pandas dataframe. Here is the structure of the JSON file (this is only the first record (retweets[:1]):
[{'lang': 'en',
'author_id': '1076979440372965377',
'reply_settings': 'everyone',
'entities': {'mentions': [{'start': 3,
'end': 17,
'username': 'Terry81987010',
'url': '',
'location': 'Florida',
'entities': {'description': {'hashtags': [{'start': 29,
'end': 32,
'tag': '2A'}]}},
'created_at': '2019-02-01T23:01:11.000Z',
'protected': False,
'public_metrics': {'followers_count': 520,
'following_count': 567,
'tweet_count': 34376,
'listed_count': 1},
'name': "Terry's Take",
'verified': False,
'id': '1091471553437593605',
'description': 'Less government more Freedom #2A is a constitutional right. Trump2020, common sense rules, God bless America! Vet 82nd Airborne F/A, proud Republican',
'profile_image_url': 'https://pbs.twimg.com/profile_images/1289626661911134208/WfztLkr1_normal.jpg'},
{'start': 19,
'end': 32,
'username': 'DineshDSouza',
'location': 'United States',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]},
'description': {'urls': [{'start': 80,
'end': 103,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]}},
'created_at': '2009-11-22T22:32:41.000Z',
'protected': False,
'public_metrics': {'followers_count': 1748832,
'following_count': 5355,
'tweet_count': 65674,
'listed_count': 6966},
'name': "Dinesh D'Souza",
'verified': True,
'pinned_tweet_id': '1393309917239562241',
'id': '91882544',
'description': "I am an author, filmmaker, and host of the Dinesh D'Souza Podcast.\n\nSubscribe: ",
'profile_image_url': 'https://pbs.twimg.com/profile_images/890967538292711424/8puyFbiI_normal.jpg'}]},
'conversation_id': '1253462541881106433',
'created_at': '2020-04-23T23:15:32.000Z',
'id': '1253462541881106433',
'possibly_sensitive': False,
'referenced_tweets': [{'type': 'retweeted',
'id': '1253052684489437184',
'in_reply_to_user_id': '91882544',
'attachments': {'media_keys': ['3_1253052312144293888',
'3_1253052620937277442'],
'media': [{}, {}]},
'entities': {'annotations': [{'start': 126,
'end': 128,
'probability': 0.514,
'type': 'Organization',
'normalized_text': 'CDC'},
{'start': 145,
'end': 146,
'probability': 0.5139,
'type': 'Place',
'normalized_text': 'NY'}],
'mentions': [{'start': 0,
'end': 13,
'username': 'DineshDSouza',
'location': 'United States',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]},
'description': {'urls': [{'start': 80,
'end': 103,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]}},
'created_at': '2009-11-22T22:32:41.000Z',
'protected': False,
'public_metrics': {'followers_count': 1748832,
'following_count': 5355,
'tweet_count': 65674,
'listed_count': 6966},
'name': "Dinesh D'Souza",
'verified': True,
'pinned_tweet_id': '1393309917239562241',
'id': '91882544',
'description': "I am an author, filmmaker, and host of the Dinesh D'Souza Podcast.\n\nSubscribe: ",
'profile_image_url': 'https://pbs.twimg.com/profile_images/890967538292711424/8puyFbiI_normal.jpg'}],
'urls': [{'start': 187,
'end': 210,
'expanded_url': 'https://twitter.com/Terry81987010/status/1253052684489437184/photo/1',
'display_url': 'pic.twitter.com/H4NpN5ZMkW'},
{'start': 187,
'end': 210,
'expanded_url': 'https://twitter.com/Terry81987010/status/1253052684489437184/photo/1',
'display_url': 'pic.twitter.com/H4NpN5ZMkW'}]},
'lang': 'en',
'author_id': '1091471553437593605',
'reply_settings': 'everyone',
'conversation_id': '1253050942716551168',
'created_at': '2020-04-22T20:06:55.000Z',
'possibly_sensitive': False,
'referenced_tweets': [{'type': 'replied_to', 'id': '1253050942716551168'}],
'public_metrics': {'retweet_count': 208,
'reply_count': 57,
'like_count': 402,
'quote_count': 38},
'source': 'Twitter Web App',
'text': "#DineshDSouza Here's some proof of artificially inflating the cv deaths. Noone is dying of pneumonia anymore according to the CDC. And of course NY getting paid for each cv death $60,000",
'context_annotations': [{'domain': {'id': '10',
'name': 'Person',
'description': 'Named people in the world like Nelson Mandela'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}},
{'domain': {'id': '35',
'name': 'Politician',
'description': 'Politicians in the world, like Joe Biden'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}}],
'author': {'url': '',
'username': 'Terry81987010',
'location': 'Florida',
'entities': {'description': {'hashtags': [{'start': 29,
'end': 32,
'tag': '2A'}]}},
'created_at': '2019-02-01T23:01:11.000Z',
'protected': False,
'public_metrics': {'followers_count': 520,
'following_count': 567,
'tweet_count': 34376,
'listed_count': 1},
'name': "Terry's Take",
'verified': False,
'id': '1091471553437593605',
'description': 'Less government more Freedom #2A is a constitutional right. Trump2020, common sense rules, God bless America! Vet 82nd Airborne F/A, proud Republican',
'profile_image_url': 'https://pbs.twimg.com/profile_images/1289626661911134208/WfztLkr1_normal.jpg'},
'in_reply_to_user': {'username': 'DineshDSouza',
'location': 'United States',
'entities': {'url': {'urls': [{'start': 0,
'end': 23,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]},
'description': {'urls': [{'start': 80,
'end': 103,
'expanded_url': 'https://podcasts.apple.com/us/podcast/the-dinesh-dsouza-podcast/id1547827376',
'display_url': 'podcasts.apple.com/us/podcast/the…'}]}},
'created_at': '2009-11-22T22:32:41.000Z',
'protected': False,
'public_metrics': {'followers_count': 1748832,
'following_count': 5355,
'tweet_count': 65674,
'listed_count': 6966},
'name': "Dinesh D'Souza",
'verified': True,
'pinned_tweet_id': '1393309917239562241',
'id': '91882544',
'description': "I am an author, filmmaker, and host of the Dinesh D'Souza Podcast.\n\nSubscribe: ",
'profile_image_url': 'https://pbs.twimg.com/profile_images/890967538292711424/8puyFbiI_normal.jpg'}}],
'public_metrics': {'retweet_count': 208,
'reply_count': 0,
'like_count': 0,
'quote_count': 0},
'source': 'Twitter for iPhone',
'text': "RT #Terry81987010: #DineshDSouza Here's some proof of artificially inflating the cv deaths. Noone is dying of pneumonia anymore according t…",
'context_annotations': [{'domain': {'id': '10',
'name': 'Person',
'description': 'Named people in the world like Nelson Mandela'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}},
{'domain': {'id': '35',
'name': 'Politician',
'description': 'Politicians in the world, like Joe Biden'},
'entity': {'id': '1138120064119369729', 'name': "Dinesh D'Souza"}}],
'author': {'url': '',
'username': 'set1952',
'location': 'Etats-Unis',
'created_at': '2018-12-23T23:14:42.000Z',
'protected': False,
'public_metrics': {'followers_count': 103,
'following_count': 44,
'tweet_count': 44803,
'listed_count': 0},
'name': 'SunSet1952',
'verified': False,
'id': '1076979440372965377',
'description': '',
'profile_image_url': 'https://abs.twimg.com/sticky/default_profile_images/default_profile_normal.png'},
'__twarc': {'url': 'https://api.twitter.com/2/tweets/search/all?expansions=author_id%2Cin_reply_to_user_id%2Creferenced_tweets.id%2Creferenced_tweets.id.author_id%2Centities.mentions.username%2Cattachments.poll_ids%2Cattachments.media_keys%2Cgeo.place_id&user.fields=created_at%2Cdescription%2Centities%2Cid%2Clocation%2Cname%2Cpinned_tweet_id%2Cprofile_image_url%2Cprotected%2Cpublic_metrics%2Curl%2Cusername%2Cverified%2Cwithheld&tweet.fields=attachments%2Cauthor_id%2Ccontext_annotations%2Cconversation_id%2Ccreated_at%2Centities%2Cgeo%2Cid%2Cin_reply_to_user_id%2Clang%2Cpublic_metrics%2Ctext%2Cpossibly_sensitive%2Creferenced_tweets%2Creply_settings%2Csource%2Cwithheld&media.fields=duration_ms%2Cheight%2Cmedia_key%2Cpreview_image_url%2Ctype%2Curl%2Cwidth%2Cpublic_metrics&poll.fields=duration_minutes%2Cend_datetime%2Cid%2Coptions%2Cvoting_status&place.fields=contained_within%2Ccountry%2Ccountry_code%2Cfull_name%2Cgeo%2Cid%2Cname%2Cplace_type&max_results=500&query=retweets_of%3ATerry81987010&start_time=2020-03-09T00%3A00%3A00%2B00%3A00&end_time=2020-04-24T00%3A00%3A00%2B00%3A00',
'version': '2.0.8',
'retrieved_at': '2021-05-17T17:13:17+00:00'}},
Here is my code:
retweets = []
for line in open('Data/usersRetweetsFlatten_sample.json', 'r'):
retweets.append(json.loads(line))
df = json_normalize(
retweets, 'referenced_tweets', ['referenced_tweets', 'type'],
meta_prefix= ".",
errors='ignore'
)
df[['author_id', 'type', '.type', 'id', 'in_reply_to_user_id', 'referenced_tweets']].head()
Here is the resulting dataframe:
As you can see, the column referenced_tweets is not flattened yet (please note that there are two different referenced_tweets arrays in my JSON file: one is in a deeper level insdide the other "referenced_tweets"). For example, the one at the higher level return this:
>>> retweets[0]["referenced_tweets"][0]["type"]
"retweeted"
and the one in the deeper level return this:
>>> retweets[0]["referenced_tweets"][0]["referenced_tweets"][0]["type"]
'replied_to'
QUESTION: I was wondering how I can flatten the deeper referenced_tweets. I want to have two separate columns as referenced_tweets.type and referenced_tweets.id, where the value of the column referenced_tweets.type in the above example should be replied_to.

I think the issue here is that your data is double nested... there is a key referenced_tweets within referenced_tweets.
import json
from pandas import json_normalize
with open("flatten.json", "r") as file:
data = json.load(file)
df = json_normalize(
data,
record_path=["referenced_tweets", "referenced_tweets"],
meta=[
"author_id",
# ["author", "username"], # not possible
# "author", # possible but not useful
["referenced_tweets", "id"],
["referenced_tweets", "type"],
["referenced_tweets", "in_reply_to_user_id"],
["referenced_tweets", "in_reply_to_user", "username"],
]
)
print(df)
See also: https://stackoverflow.com/a/37668569/42659
Note: Above code will fail if second nested referenced_tweet is missing.
Edit: Alternatively you could further normalize your data (which you already partly normalized with your code) in your question with an additional manual iteration. See example below. Note: Code is not optimized and may be slow depending on the amount of data.
# load your `data` with `json.load()` or `json.loads()`
df = json_normalize(
data,
record_path="referenced_tweets",
meta=["referenced_tweets", "type"],
meta_prefix= ".",
errors="ignore",
)
columns = [*df.columns, "_type", "_id"]
normalized_data = []
def append(row, type, id):
normalized_row = [*row.to_list(), type, id]
normalized_data.append(normalized_row)
for _, row in df.iterrows():
# a list/array is expected
if type(row["referenced_tweets"]) is list:
for tweet in row["referenced_tweets"]:
append(row, tweet["type"], tweet["id"])
# if empty list
else:
append(row, None, None)
else:
append(row, None, None)
enhanced_df = pd.DataFrame(data=normalized_data, columns=columns)
enhanced_df.drop("referenced_tweets", 1)
print(enhanced_df)
Edit 2: referenced_tweets should be an array. However, if there is no referenced tweet, the Twitter API seems to omit referenced_tweets completely. In that case, the cell value is NaN (float) instead of an empty list. I updated the code above to take that into account.

Creating a Python dictionary from other nested list containing dictionary in python

I have this list that contains dictionaries as its element
dict_1 = [{'id': '0eb7df70-f319-4562-ab2a-9e641e978b3b', 'first_name': 'Rahx', 'surname': 'Smith ', 'devices': {'os': 'Apple iPhone', 'mac_address': 'f4:af:e7:b7:ab:22', 'manufacturer': 'Apple'}, 'lat': 54.33166199876629, 'lng': -6.277842272724769, 'seenTime': 1582814754000},
{'id': 'a0bb8d38-0d27-4d7f-acc0-1e850a706b6c', 'first_name': 'Lucy', 'surname': 'Pye', 'devices': {'os': 'Apple iPhone', 'mac_address': 'f8:87:f1:72:4c:4d', 'manufacturer': 'Apple'}, 'lat': 54.33166199876629, 'lng': -6.277842272724769, 'seenTime': 1582814754000},
{'id': '0eb7df70-f319-4562-ab2a-9e641e978b3b', 'first_name': 'xyx', 'surname': 'dcsdd', 'devices': {'os': 'NOKIA Phone', 'mac_address': '78:28:ca:a8:56:b9', 'manufacturer': 'NOKIA'}, 'lat': 54.33166199876629, 'lng': -6.277842272724769, 'seenTime': 1582814754000},
{'id': 'a0bb8d38-0d27-4d7f-acc0-1e850a706b6c', 'first_name': 'ddwdw', 'surname': 'sdsds', 'devices': {'os': 'MI Phone', 'mac_address': 'dc:08:0f:3f:57:0c', 'manufacturer': 'MI'}, 'lat': 54.33218267030654, 'lng': -6.27796001203896, 'seenTime': 1582814693000}]
and I want output like this from dict_1 variable
{
"f77df8c2-b19d-4341-9021-7beab4b9ebcd":{
"first_name":"anonymous",
"surname":"anonymous",
"lat":57.14913102,
"lng":-2.09987143,
"devices": {'os': 'MI Phone', 'mac_address': 'dc:08:0f:3f:57:0c', 'manufacturer': 'MI'},
"seenTime": 1582814693000
},
"7beab4b9ebcd-b19d-9021-f77df8c2-4341":{
etc.
},
etc.
}
help me to know what should I do in this case.

Try this.
dict_1 = {x.pop('id'): x for x in dict_1}

I think this could do the job :
dict_2 = {}
for d in dict_1 :
id = d.pop('id')
dict_2[id] = d

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported on json file

{'meta': {'code': 200, 'requestId': '5e7c703bb9a389001b7d1e8c'},
'response': {'suggestedFilters': {'header': 'Tap to show:',
'filters': [{'name': 'Open now', 'key': 'openNow'}]},
'headerLocation': 'Lagos',
'headerFullLocation': 'Lagos',
'headerLocationGranularity': 'city',
'totalResults': 39,
'suggestedBounds': {'ne': {'lat': 6.655478745000045,
'lng': 3.355524537252914},
'sw': {'lat': 6.565478654999954, 'lng': 3.2650912627470863}},
'groups': [{'type': 'Recommended Places',
'name': 'recommended',
'items': [{'reasons': {'count': 0,
'items': [{'summary': 'This spot is popular',
'type': 'general',
'reasonName': 'globalInteractionReason'}]},
'venue': {'id': '502806dce4b0f23b021f3b77',
'name': 'KFC',
'location': {'lat': 6.604589745106469,
'lng': 3.3089358809010045,
'labeledLatLngs': [{'label': 'display',
'lat': 6.604589745106469,
'lng': 3.3089358809010045}],
'distance': 672,
'cc': 'NG',
'city': 'Egbeda',
'state': 'Lagos',
'country': 'Nigeria',
'formattedAddress': ['Egbeda', 'Lagos', 'Nigeria']},
'categories': [{'id': '4bf58dd8d48988d16e941735',
'name': 'Fast Food Restaurant',
'pluralName': 'Fast Food Restaurants',
'shortName': 'Fast Food',
'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/fastfood_',
'suffix': '.png'},
'primary': True}],
'photos': {'count': 0, 'groups': []}},
'referralId': 'e-0-502806dce4b0f23b021f3b77-0'},
That is a part of my file about called 'results'
I then
def getCAT(row):
try:
categories_list=row['categories']
except:
categories_list=row['venue.categories']
if len(categories_list)==0:
return None
else:
return categories_list[0]['name']
venues=results['response']['groups'][0]['items']
nearby_venues=pd.json_normalize(venues)
filtered_cols=['venue.name', 'venue.catergories', 'venue.location.lat', 'venue.location.lng']
nearby_venues= nearby_venues.loc[: , filtered_cols]
nearby_venues['venue.categories']=nearby_venues.apply(getCAT, axis=1)
nearby_venues.columns=[col.split(".")[-1] for col in nearby_venues.columns]
nearby_venues.head()
I get KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported on json file.
if I comment out that part, it runs well but with limited result. What am I doing wrong?

pandas.DataFrame.loc
property DataFrame.loc
Access a group of rows and columns by label(s) or a boolean array.
Try to remove the venue. from the line iltered_cols=['venue.name', 'venue.catergories', 'venue.location.lat', 'venue.location.lng']

Removing duplicate entries?

I need to compare values from different rows. Each row is a dictionary, and I need to compare the values in adjacent rows for the key 'flag'. How would I do this? Simply saying:
for row in range(1,len(myjson))::
if row['flag'] == (row-1)['flag']:
print yes
returns a TypeError: 'int' object is not subscriptable
Even though range returns a list of ints...
RESPONSE TO COMMENTS:
List of rows is a list of dictionaries. Originally, I import a tab-delimited file and read it in using the csv.dict module such that it is a list of dictionaries with the keys corresponding to the variable names.
Code: (where myjson is a list of dictionaries)
for row in myjson:
print row
Output:
{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}
{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': 'cersei#got.com'}
{'website': '', 'phone': '', 'flag': 2, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}
{'website': '', 'phone': '', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}
{'website': '', 'phone': '', 'flag': 4, 'name': 'D G Bamf M.S.', 'email': ''}
{'website': '', 'phone': '', 'flag': 5, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}
Also:
type(myjson)
<type 'list'>

For comparing adjacent items you can use zip:
Example:
>>> lis = [1,1,2,3,4,4,5,6,7,7]
for x,y in zip(lis, lis[1:]):
if x == y :
print x,y,'are equal'
...
1 1 are equal
4 4 are equal
7 7 are equal
For your list of dictionaries, you can do something like :
from itertools import izip
it1 = iter(list_of_dicts)
it2 = iter(list_of_dicts)
next(it2)
for x,y in izip(it1, it2):
if x['flag'] == y['flag']
print yes
Update:
For more than 2 adjacent items you can use itertools.groupby:
>>> lis = [1,1,1,1,1,2,2,3,4]
for k,group in groupby(lis):
print list(group)
[1, 1, 1, 1, 1]
[2, 2]
[3]
[4]
For your code it would be :
>>> for k, group in groupby(dic, key = lambda x : x['flag']):
... print list(group)
...
[{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}]
[{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': 'cersei#got.com'}]
[{'website': '', 'phone': '', 'flag': 2, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}]
[{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}, {'website': '', 'phone': '', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': ''}, {'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}]
[{'website': '', 'phone': '', 'flag': 4, 'name': 'D G Bamf M.S.', 'email': ''}]
[{'website': '', 'phone': '', 'flag': 5, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}]

Your exception indicates that list_of_rows is not what you think it is.
To look at other, adjacent rows, provided list_of_rows is indeed a list, I'd use enumerate() to include the current index and then use that index to load next and previous rows:
for i, row in enumerate(list_of_rows):
previous = list_of_rows[i - 1] if i else None
next = list_of_rows[i + 1] if i + 1 < len(list_of_rows) else None

Looks like you want to access list elements in batches:
http://code.activestate.com/recipes/303279/

You could try this
pre_item = list_of_rows[0]['flag']
for row in list_of_rows[1:]:
if row['flag'] == pre_item :
print yes
pre_item = row['flag']

list_of_rows = [ { 'a': 'foo',
'flag': 'bar' },
{ 'a': 'blo',
'flag': 'bar' } ]
for row, successor_row in zip(list_of_rows, list_of_rows[1:]):
if row['flag'] == successor_row['flag']:
print "yes"

It's simple. If you need to remove those dicts that have the same value for key "flag", as the title of your post suggests (it is somewhat misleading because your dictionaries are not strictly speaking duplicates), you can simply loop over the whole list of dictionaries, keeping track of flags in a separate list, if an item has a flag which is already in the list of flags simply don't add it, it would look something like:
def filterDicts(listOfDicts):
result = []
flags = []
for di in listOfDicts:
if di["flag"] not in flags:
result.append(di)
flags.append(di["flag"])
return result
When called with value of list of dictionaries that you have provided, it returns list with 5 items, each has an unique value of flag.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

From MongoDB convert from dictionary to row with Pandas - python

Related

get a part from a dictionary

Flattening deeply nested JSON into pandas data frame

Creating a Python dictionary from other nested list containing dictionary in python

KeyError: 'Passing list-likes to .loc or [] with any missing labels is no longer supported on json file

Removing duplicate entries?

Categories

Resources