Python Remove Duplicate Dict

Python Remove Duplicate Dict - python

I am trying to find a way to remove duplicates from a dict list. I don't have to test the entire object contents because the "name" value in a given object is enough to identify duplication (i.e., duplicate name = duplicate object). My current attempt is this;
newResultArray = []
for i in range(0, len(resultArray)):
for j in range(0, len(resultArray)):
if(i != j):
keyI = resultArray[i]['name']
keyJ = resultArray[j]['name']
if(keyI != keyJ):
newResultArray.append(resultArray[i])
, which is wildly incorrect. Grateful for any suggestions. Thank you.

If name is unique, you should just use a dictionary to store your inner dictionaries, with name being the key. Then you won't even have the issue of duplicates, and you can remove from the list in O(1) time.
Since I don't have access to the code that populates resultArray, I'll simply show how you can convert it into a dictionary in linear time. Although the best option would be to use a dictionary instead of resultArray in the first place, if possible.
new_dictionary = {}
for item in resultArray:
new_dictionary[item['name']] = item
If you must have a list in the end, then you can convert back into a dictionary as such:
new_list = [v for k,v in new_dictionary.items()]

Since "name" provides uniqueness... and assuming "name" is a hashable object, you can build an intermediate dictionary keyed by "name". Any like-named dicts will simply overwrite their predecessor in the dict, giving you a list of unique dictionaries.
tmpDict = {result["name"]:result for result in resultArray}
newArray = list(tmpDict.values())
del tmpDict
You could shrink that down to
newArray = list({result["name"]:result for result in resultArray}.values())
which may be a bit obscure.

Related

accelerate comparing dictionary keys and values to strings in list in python

Sorry if this is trivial I'm still learning but I have a list of dictionaries that looks as follow:
[{'1102': ['00576', '00577', '00578', '00579', '00580', '00581']},
{'1102': ['00582', '00583', '00584', '00585', '00586', '00587']},
{'1102': ['00588', '00589', '00590', '00591', '00592', '00593']},
{'1102': ['00594', '00595', '00596', '00597', '00598', '00599']},
{'1102': ['00600', '00601', '00602', '00603', '00604', '00605']}
...]
it contains ~89000 dictionaries. And I have a list containing 4473208 paths. example:
['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv',
'/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv',
...]
and what I want to do is group each path that contains the grouped values in the dict in the folder containing the key together.
I tried using for loops like this:
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for file in ct_paths:
for key, val in elem.items():
if (file[16:20] == key) and (any(x in file[21:26] for x in val)):
temp1.append(file)
grpd_cts.append(temp1)
but this takes around 30hours. is there a way to make it more efficient? any itertools function or something?
Thanks a lot!

ct_paths is iterated repeatedly in your inner loop, and you're only interested in a little bit of it for testing purposes; pull that out and use it to index the rest of your data, as a dictionary.
What does make your problem complicated is that you're wanting to end up with the original list of filenames, so you need to construct a two-level dictionary where the values are lists of all originals grouped under those two keys.
ct_path_index = {}
for f in ct_paths:
ct_path_index.setdefault(f[16:20], {}).setdefault(f[21:26], []).append(f)
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for key, val in elem.items():
d2 = ct_path_index.get(key)
if d2:
for v in val:
v2 = d2.get(v)
if v2:
temp1 += v2
grpd_cts.append(temp1)
ct_path_index looks like this, using your data:
{'1102': {'00575': ['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv'],
'00578': ['/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv']}}
The use of setdefault (which can be a little hard to understand the first time you see it) is important when building up collections of collections, and is very common in these kinds of cases: it makes sure that the sub-collections are created on demand and then re-used for a given key.
Now, you've only got two nested loops; the inner checks are done using dictionary lookups, which are close to O(1).
Other optimizations would include turning the lists in dict_list into sets, which would be worthwhile if you made more than one pass through dict_list.

How to replace the first character of all keys in a dictionary?

What is the best way to replace the first character of all keys in a dictionary?
old_cols= {"~desc1":"adjustment1","~desc23":"adjustment3"}
I am trying to get
new_cols= {"desc1":"adjustment1","desc23":"adjustment3"}
I have tried:
for k,v in old_cols.items():
new_cols[k]=old_cols.pop(k[1:])

old_cols = {"~desc1":"adjustment1", "~desc23":"adjustment3"}
new_cols = {}
for k, v in old_cols.items():
new_key = k[1:]
new_cols[new_key] = v

Here it is with a dictionary comprehension:
old_cols= {"~desc1":"adjustment1","~desc23":"adjustment3"}
new_cols = {k.replace('~', ''):old_cols[k] for k in old_cols}
print(new_cols)
#{'desc1': 'adjustment1', 'desc23': 'adjustment3'}

There are many ways to do this with list-comprehension or for-loops. What is important is to understand is that dictionaries are mutable. This basically means that you can either modify the existing dictionary or create a new one.
If you want to create a new one (and I would recommend it! - see 'A word of warning ...' below), both solutions provided in the answers by #Ethem_Turgut and by #pakpe do the trick. I would probably wirte:
old_dict = {"~desc1":"adjustment1","~desc23":"adjustment3"}
# get the id of this dict, just for comparison later.
old_id = id(old_dict)
new_dict = {k[1:]: v for k, v in old_dict.items()}
print(f'Is the result still the same dictionary? Answer: {old_id == id(new_dict)}')
Now, if you want to modify the dictionary in place you might loop over the keys and adapt the key/value to your liking:
old_dict = {"~desc1":"adjustment1","~desc23":"adjustment3"}
# get the id of this dict, just for comparison later.
old_id = id(old_dict)
for k in [*old_dict.keys()]: # note the weird syntax here
old_dict[k[1:]] = old_dict.pop(k)
print(f'Is the result still the same dictionary? Answer: {old_id == id(old_dict)}')
A word of warning for the latter approach:
You should be aware that you are modifying the keys while looping over them. This is in most of the cases problematic and can even lead to a RuntimeError if you loop directly over old_dict. I avoided this by explicitly unpacking the keys into a list and the looping over that list with [*old_dict.keys()].
Why can modifying keys while looping over them be problematic? Imagine for example that you have the keys '~key1' and 'key1' in your old_dict. Now when your loop handles '~key1' it will modify it to 'key1' which already exists in old_dict and thus it will overwrite the value of 'key1' with the value from '~key1'.
So, only use the latter approach if you are certain to not produce issues like the example mentioned here before. If you are uncertain, simply create a new dictionary!

Sorting a list of dict from redis in python

in my current project i generate a list of data, each entry is from a key in redis in a special DB where only one type of key exist.
r = redis.StrictRedis(host=settings.REDIS_AD, port=settings.REDIS_PORT, db='14')
item_list = []
keys = r.keys('*')
for key in keys:
item = r.hgetall(key)
item_list.append(item)
newlist = sorted(item_list, key=operator.itemgetter('Id'))
The code above let me retrieve the data, create a list of dict each containing the information of an entry, problem is i would like to be able to sort them by ID, so they come out in order when displayed on my html tab in the template, but the sorted function doesn't seem to work since the table isn't sorted.
Any idea why the sorted line doesn't work ? i suppose i'm missing something to make it work but i can't find what.
EDIT :
Thanks to the answer in the comments,the problem was that my 'Id' come out of redis as a string and needed to be casted as int to be sorted
key=lambda d: int(d['Id'])

All values returned from redis are apparently strings and strings do not sort numerically ("10" < "2" == True).
Therefore you need to cast it to a numerical value, probably to int (since they seem to be IDs):
newlist = sorted(item_list, key=lambda d: int(d['Id']))

How to add to python dictionary without replacing

the current code I have is category1[name]=(number) however if the same name comes up the value in the dictionary is replaced by the new number how would I make it so instead of the value being replaced the original value is kept and the new value is also added, giving the key two values now, thanks.

You would have to make the dictionary point to lists instead of numbers, for example if you had two numbers for category cat1:
categories["cat1"] = [21, 78]
To make sure you add the new numbers to the list rather than replacing them, check it's in there first before adding it:
cat_val = # Some value
if cat_key in categories:
categories[cat_key].append(cat_val)
else:
# Initialise it to a list containing one item
categories[cat_key] = [cat_val]
To access the values, you simply use categories[cat_key] which would return [12] if there was one key with the value 12, and [12, 95] if there were two values for that key.
Note that if you don't want to store duplicate keys you can use a set rather than a list:
cat_val = # Some value
if cat_key in categories:
categories[cat_key].add(cat_val)
else:
# Initialise it to a set containing one item
categories[cat_key] = set(cat_val)

a key only has one value, you would need to make the value a tuple or list etc
If you know you are going to have multiple values for a key then i suggest you make the values capable of handling this when they are created

It's a little hard to understand your question.
I think you want this:
>>> d[key] = [4]
>>> d[key].append(5)
>>> d[key]
[4, 5]

Depending on what you expect, you could check if name - a key in your dictionary - already exists. If so, you might be able to change its current value to a list, containing both the previous and the new value.
I didn't test this, but maybe you want something like this:
mydict = {'key_1' : 'value_1', 'key_2' : 'value_2'}
another_key = 'key_2'
another_value = 'value_3'
if another_key in mydict.keys():
# another_key does already exist in mydict
mydict[another_key] = [mydict[another_key], another_value]
else:
# another_key doesn't exist in mydict
mydict[another_key] = another_value
Be careful when doing this more than one time! If it could happen that you want to store more than two values, you might want to add another check - to see if mydict[another_key] already is a list. If so, use .append() to add the third, fourth, ... value to it.
Otherwise you would get a collection of nested lists.

You can create a dictionary in which you map a key to a list of values, in which you would want to append a new value to the lists of values stored at each key.
d = dict([])
d["name"] = 1
x = d["name"]
d["name"] = [1] + x

I guess this is the easiest way:
category1 = {}
category1['firstKey'] = [7]
category1['firstKey'] += [9]
category1['firstKey']
should give you:
[7, 9]
So, just use lists of numbers instead of numbers.

Create constants for offsets into a list

I have a list similar to this:
host_info = ['stackoverflow.com', '213.213.214.213', 'comments', 'desriprion', 'age']
Sometimes, I need to remove one or more elements from the list and then return the list to the caller. To remove the comments item as an example I'd do this:
host_info.pop(2)
I'd prefer to use a constant for each item rather than some magic number.
Then I could use:
host_info.pop(comments) # comments is integer 2
What's the best way to enumerate these constants so it's simple to maintain them when the list offsets change?

The easiest approach would be to change your list to a dictionary:
keys_list = ['host',...]
host_info = dict(zip(['stackoverflow.com', '213.213.214.213', 'comments', 'desriprion', 'age'], keys_list))
Now you don't have to worry about the key list and if you remove an item from the host_info dictionary it won't affect the key => value associations
or if you only have one host info list:
host_info = {'host':'stackoverflow.com, 'ip':'213.213.214.213',...}
EDIT:
To remove a key from the host_info use pop:
host_info.pop('host')
If you aren't sure the key is there you can use:
host_info.pop('host',None)
or you can use:
if 'host' in host_info:
del host_info['host']
As a side note AdamKG has a nice comment explaining that you wouldn't want to convert this back to a list. You could instead do this:
host = host_info['host']
This removes the dependence on a "key" mapping list and combines the two lists into a single associative container.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python Remove Duplicate Dict - python

Related

accelerate comparing dictionary keys and values to strings in list in python

How to replace the first character of all keys in a dictionary?

Sorting a list of dict from redis in python

How to add to python dictionary without replacing

Create constants for offsets into a list

Categories

Resources