Removing duplicate entries? - python

I need to compare values from different rows. Each row is a dictionary, and I need to compare the values in adjacent rows for the key 'flag'. How would I do this? Simply saying:
for row in range(1,len(myjson))::
if row['flag'] == (row-1)['flag']:
print yes
returns a TypeError: 'int' object is not subscriptable
Even though range returns a list of ints...
RESPONSE TO COMMENTS:
List of rows is a list of dictionaries. Originally, I import a tab-delimited file and read it in using the csv.dict module such that it is a list of dictionaries with the keys corresponding to the variable names.
Code: (where myjson is a list of dictionaries)
for row in myjson:
print row
Output:
{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}
{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': 'cersei#got.com'}
{'website': '', 'phone': '', 'flag': 2, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}
{'website': '', 'phone': '', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}
{'website': '', 'phone': '', 'flag': 4, 'name': 'D G Bamf M.S.', 'email': ''}
{'website': '', 'phone': '', 'flag': 5, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}
Also:
type(myjson)
<type 'list'>

For comparing adjacent items you can use zip:
Example:
>>> lis = [1,1,2,3,4,4,5,6,7,7]
for x,y in zip(lis, lis[1:]):
if x == y :
print x,y,'are equal'
...
1 1 are equal
4 4 are equal
7 7 are equal
For your list of dictionaries, you can do something like :
from itertools import izip
it1 = iter(list_of_dicts)
it2 = iter(list_of_dicts)
next(it2)
for x,y in izip(it1, it2):
if x['flag'] == y['flag']
print yes
Update:
For more than 2 adjacent items you can use itertools.groupby:
>>> lis = [1,1,1,1,1,2,2,3,4]
for k,group in groupby(lis):
print list(group)
[1, 1, 1, 1, 1]
[2, 2]
[3]
[4]
For your code it would be :
>>> for k, group in groupby(dic, key = lambda x : x['flag']):
... print list(group)
...
[{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}]
[{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': 'cersei#got.com'}]
[{'website': '', 'phone': '', 'flag': 2, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}]
[{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}, {'website': '', 'phone': '', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': ''}, {'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}]
[{'website': '', 'phone': '', 'flag': 4, 'name': 'D G Bamf M.S.', 'email': ''}]
[{'website': '', 'phone': '', 'flag': 5, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}]

Your exception indicates that list_of_rows is not what you think it is.
To look at other, adjacent rows, provided list_of_rows is indeed a list, I'd use enumerate() to include the current index and then use that index to load next and previous rows:
for i, row in enumerate(list_of_rows):
previous = list_of_rows[i - 1] if i else None
next = list_of_rows[i + 1] if i + 1 < len(list_of_rows) else None

Looks like you want to access list elements in batches:
http://code.activestate.com/recipes/303279/

You could try this
pre_item = list_of_rows[0]['flag']
for row in list_of_rows[1:]:
if row['flag'] == pre_item :
print yes
pre_item = row['flag']

list_of_rows = [ { 'a': 'foo',
'flag': 'bar' },
{ 'a': 'blo',
'flag': 'bar' } ]
for row, successor_row in zip(list_of_rows, list_of_rows[1:]):
if row['flag'] == successor_row['flag']:
print "yes"

It's simple. If you need to remove those dicts that have the same value for key "flag", as the title of your post suggests (it is somewhat misleading because your dictionaries are not strictly speaking duplicates), you can simply loop over the whole list of dictionaries, keeping track of flags in a separate list, if an item has a flag which is already in the list of flags simply don't add it, it would look something like:
def filterDicts(listOfDicts):
result = []
flags = []
for di in listOfDicts:
if di["flag"] not in flags:
result.append(di)
flags.append(di["flag"])
return result
When called with value of list of dictionaries that you have provided, it returns list with 5 items, each has an unique value of flag.

Related

Merge dictionaries in a list of dictionaries by combining the values

Help me please.
I have query set from Model.objects.value('name', 'language__name') and it give me the list of dictionary:
list = [{'id': 1, 'name': 'Adel', 'language': 'С#'},
{'id': 1, 'name': 'Adel', 'language': 'Python'},
{'id': 5, 'name': 'Dora', 'language': 'С#'},
{'id': 5, 'name': 'Dora', 'language': 'Java'},
{'id': 6, 'name': 'Dars', 'language': 'Python'}];
how can I to do a list of dictionary but to unit key and value?
I want to get like this:
list = [{'id': 1, 'name': 'Adel', 'language':['С#','Python']},
{'id': 5, 'name': 'Dora', 'language': ['С#','Java']},
{'id': 6, 'name': 'Dars', 'language': 'Python'}];
I tried this:
mapping = {}
for d in qs:
try:
entry = mapping[d['id']] # raises KeyError
entry['language__name'].append(d['language__name']) # raises AttributeError
except KeyError:
mapping[d['id']] = d
except AttributeError:
entry['language__name'] = [entry['language__name'], d['language__name']]
print(list(mapping.values()))
You can iterate over lst and create a dictionary out where the keys correspond to "id" values in the dicts in lst and the values are dicts. In each iteration, check if the value under "language" key is a list or not and append to the list if it's a list, create a list, if not. Finally, pass the values of out to a list constructor for the final outcome.
out = {}
for d in lst:
if d['id'] in out:
if isinstance(out[d['id']]['language'], list):
out[d['id']]['language'].append(d['language'])
else:
out[d['id']]['language'] = [out[d['id']]['language'], d['language']]
else:
out[d['id']] = d
out = list(out.values())
Output:
[{'id': 1, 'name': 'Adel', 'language': ['С#', 'Python']},
{'id': 5, 'name': 'Dora', 'language': ['С#', 'Java']},
{'id': 6, 'name': 'Dars', 'language': 'Python'}]

How to get key values in a json object(Python)

This is my json :
{'1': {'name': 'poulami', 'password': 'paul123', 'profession': 'user', 'uid': 'poulamipaul'}, '2': {'name': 'test', 'password': 'testing', 'profession': 'tester', 'uid': 'jarvistester'}}
I want to get a list of all the values of name.
What should be my code in python
d.values gives all the values, then you can get the attribute name of each value.
d = {'1': {'name': 'poulami', 'password': 'paul123', 'profession': 'user', 'uid': 'poulamipaul'}, '2': {'name': 'test', 'password': 'testing', 'profession': 'tester', 'uid': 'jarvistester'}}
[i['name'] for i in d.values()]
['poulami', 'test']
Also note that d.values returns a generator and not a list so to convert to list use list(d.values())
That is not JSON format. It is a Python Dictionary.
Iterate over the values of the dictionary(d.values()) and get the name from each item.
d = {'1': {'name': 'poulami', 'password': 'paul123', 'profession': 'user', 'uid': 'poulamipaul'}, '2': {'name': 'test', 'password': 'testing', 'profession': 'tester', 'uid': 'jarvistester'}}
names_list = []
for i in d.values():
names_list.append(i['name'])
names_list = ['poulami', 'test']

From MongoDB convert from dictionary to row with Pandas

This is a test coming from MongoDB, I need to convert to MySQL. But! Sometimes there is more then one "agents", if that's the case I need each agent on their own row and that agent should have the same "display_name". For example Walter should have Gloria on one row and Barb on next and both have Walt Mosley under "display_name".
[{'name': 'Loomis, Gloria',
'primaryemail': 'gloria#gmail.com',
'primaryphone': '212-382-1121'},
{'name': 'Hogson, Barb',
'primaryemail': 'bho124#aol.com',
'primaryphone': ''}]
I've tried this but it just splits out the key/values.
a,b,c = [[d[e] for d in test] for e in sorted(test[0].keys())]
print(a,b,c)
This is the original JSON format:
{'_id': ObjectId('58e6ececafb08d6'),
'item_type': 'Contributor',
'role': 0,
'short_bio': 'Walter Mosley (b. 1952)',
'firebrand_id': 1588,
'display_name': 'Walter Mosley',
'first_name': 'Walter',
'last_name': 'Mosley',
'slug': 'walter-mosley',
'updated': datetime.datetime(2020, 1, 7, 8, 17, 11, 926000),
'image': 'https://s3.amazonaws.com/8588-book-contributor.jpg',
'social_media_name': '',
'social_media_link': '',
'website': '',
'agents': [{'name': 'Loomis, Gloria',
'primaryemail': 'gloria#gmail.com',
'primaryphone': '212-382-1121'},
{'name': 'Hogson, Barb',
'primaryemail': 'bho124#aol.com',
'primaryphone': ''}],
'estates': [],
'deleted': False}
If you've an array of dictionaries from your JSON file, try this :
JSON input :
inputJSON = [{'item_type': 'Contributor',
'role': 0,
'short_bio': 'Walter Mosley (b. 1952)',
'firebrand_id': 1588,
'display_name': 'Walter Mosley',
'first_name': 'Walter',
'last_name': 'Mosley',
'slug': 'walter-mosley',
'image': 'https://s3.amazonaws.com/8588-book-contributor.jpg',
'social_media_name': '',
'social_media_link': '',
'website': '',
'agents': [{'name': 'Loomis, Gloria',
'primaryemail': 'gloria#gmail.com',
'primaryphone': '212-382-1121'},
{'name': 'Hogson, Barb',
'primaryemail': 'bho124#aol.com',
'primaryphone': ''}],
'estates': [],
'deleted': False}]
Code :
import copy
finalJSON = []
for each in inputJSON:
for agnt in each.get('agents'):
newObj = copy.deepcopy(each)
newObj['agents'] = agnt
finalJSON.append(newObj)
print(finalJSON)

Deleting a value in an array in Python [duplicate]

This question already has answers here:
Deleting list elements based on condition
(2 answers)
Closed 6 years ago.
I've created an array for my output by doing this:
for i in nameList
test_array.append({'Name': i, 'Email': memberMail, 'Department': memberDepartment})
However, later in the code, I must remove designated values in my test_array depending on their email. After that, I can easily print out what I need to my csv file.
How do I delete a specific entry from this sort of dictionary list?
For those curious, when I print the array currently it looks like this:
[{'Department': 'Public Works', 'Email': 'joe#xyz.gov', 'Name': 'Joe'}, {'Department': 'Infrastructure', 'Email': 'bob#xyz.gov', 'Name': 'Bob'}, {'Department': 'IT', 'Email': 'suzanne#xyz.gov', 'Name': 'Suzanne'}]
For when you not want to modify test_array
filtered_test_array = filter(lambda entry: entry['Email'] != 'email#example.com', test_array)
Try this:
for i in range(0,len(nameList)):
if nameList[i]['Email'] == 'abc#pqr.com" :
index = i;
del nameList[index];
Try like this.
list_ = [{'Department': 'Public Works', 'Email': 'joe#xyz.gov', 'Name': 'Joe'}, {'Department': 'Infrastructure', 'Email': 'bob#xyz.gov', 'Name': 'Bob'}, {'Department': 'IT', 'Email': 'suzanne#xyz.gov', 'Name': 'Suzanne'}]
for val in list_:
if val['Email'] == 'joe#xyz.gov':
list_.remove(val)
Result
[{'Department': 'Infrastructure', 'Email': 'bob#xyz.gov', 'Name': 'Bob'},
{'Department': 'IT', 'Email': 'suzanne#xyz.gov', 'Name': 'Suzanne'}]
You can iterate over the list and delete with respect to a unique value or a key. Something like this:
for entry in test_array:
if entry['Email'] == 'suzanne#xyz.gov':
test_array.remove(entry)

Flagging Entries with the Same Names?

I'm working with data where people have entered their names and some contact information. However, since they were unable to enter multiple entries for some of the fields, some people entered their names multiple times, resulting in 'duplicate' entries...
I'm trying to mark duplicate entries by the same user using a variable 'flag'.
For each row, what I want to happen is that if the name entry in the row is NOT the same as the name entry in the next row, the flag entry should increase by one.
How do I do this?
This is the code I currently have:
# FLAG 2
import csv
myjson = []
with(open("ieca_first_col_fake_text.txt", "rU")) as f:
sheet = csv.DictReader(f,delimiter="\t")
sheet.fieldnames.append('flag')
print sheet.fieldnames
for row in sheet:
myjson.append(row)
flag_counter = 0
myjson[0]['flag'] = flag_counter
for i in range(len(myjson)-1):
if myjson[i]['name'] != myjson[i+1]['name']:
myjson[i+1]['flag'] = flag_counter + 1
else:
myjson[i]['flag'] = flag_counter
for i in range(len(myjson)):
print myjson[i]
This is example data:
name phone email website area degree
Diane Grant Albrecht M.S.
Lannister G. Cersei M.A.T., CEP 111-222-3333 cersei#got.com www.got.com
Argle D. Bargle Ed.M.
Sam D. Man Ed.M. 000-000-1111 dman123#gmail.com www.daManWithThePlan.com
Sam D. Man Ed.M.
Sam D. Man Ed.M. 111-222-333 dman123#gmail.com www.daManWithThePlan.com
D G Bamf M.S.
Amy Tramy Lamy Ph.D.
And this is the output that results from operating on the example data:
['name', 'phone', 'email', 'website', 'flag']
{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}
{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': 'cersei#got.com'}
{'website': '', 'phone': '', 'flag': 1, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 0, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}
{'website': None, 'phone': '', 'flag': 0, 'name': 'Sam D. Man Ed.M.', 'email': None}
{'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': None, 'name': 'Sam D. Man Ed.M.', 'email': ' dman123#gmail.com'}
{'website': '', 'phone': '', 'flag': 1, 'name': 'D G Bamf M.S.', 'email': ''}
{'website': '', 'phone': '', 'flag': 1, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}
Note that the flags do not correspond to the desired pattern.
And here is an ideal output (notice the difference in flag entries):
['name', 'phone', 'email', 'website', 'flag']
{'website': '', 'phone': '', 'flag': 0, 'name': 'Diane Grant Albrecht M.S.', 'email': ''}
{'website': 'www.got.com', 'phone': '111-222-3333', 'flag': 1, 'name': 'Lannister G. Cersei M.A.T., CEP', 'email': 'cersei#got.com'}
{'website': '', 'phone': '', 'flag': 2, 'name': 'Argle D. Bargle Ed.M.', 'email': ''}
{'website': 'www.daManWithThePlan.com', 'phone': '000-000-1111', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': 'dman123#gmail.com'}
{'website': None, 'phone': '', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': None}
{'website': 'www.daManWithThePlan.com', 'phone': '111-222-333', 'flag': 3, 'name': 'Sam D. Man Ed.M.', 'email': ' dman123#gmail.com'}
{'website': '', 'phone': '', 'flag': 4, 'name': 'D G Bamf M.S.', 'email': ''}
{'website': '', 'phone': '', 'flag': 5, 'name': 'Amy Tramy Lamy Ph.D.', 'email': ''}
EDIT:
Ths loop workes for me (output as expected):
for i in range(len(myjson)-1):
if myjson[i]['name'] != myjson[i+1]['name']:
print "not same" ,myjson[i]['name'] ,' ', myjson[i+1]['name']
flag_counter = flag_counter + 1
myjson[i+1]['flag'] = flag_counter
else:
print 'equal', myjson[i]['name'] ,' ', myjson[i+1]['name']
myjson[i]['flag'] = flag_counter
Note that I had to format the csv file by hand (tabs weren't tabs, but spaces). Make sure it is correct in your file. The names have to be exactly correct, no additional spaces allows
But I am not sure if this is the only bug, as there are many dangerous 'off-by-one' traps. If it still doesn't work, just update your output and code and we will see!

Categories

Resources