I am looking to remove duplicates in a python dictionary but only where the keys are the same. Here is an example.
original_dict = {'question a' : 'pizza', 'question b' : 'apple', 'question a': 'banana'}
I want to remove the 'question a' item so there would only be one 'question a'. The problem I am facing is that the values are not the same. Any way to do this easily in Python 3.x?
By definition, the dictionary keeps only 1 value per key, so you will not have duplicates. In your example, the last value for the duplicate key is the one that will be kept (that's what "the old value associated with that key is forgotten" means below):
original_dict = {'question a' : 'pizza', 'question b' : 'apple', 'question a': 'banana'}
print(original_dict)
# {'question a': 'banana', 'question b': 'apple'}
From the docs:
It is best to think of a dictionary as a set of key: value pairs, with the requirement that the keys are unique (within one dictionary). [...] If you store using a key that is already in use, the old value associated with that key is forgotten.
Python dictionary won't allow duplicates at the first place. If you created a dictionary containing two or more same keys, it will consider the last occurrence irrespective of the value and drop the other(s).
In this case,
'question a': 'pizza' will be dropped and 'question a': 'banana' will be considered as it is the last occurrence here.
Related
I have to filter texts that I process by checking if people's names appear in the text (texts). If they do appear, the texts are appended as nested list of dictionaries to the existing list of dictionaries containing people's names (people). However, since in some texts more than one person's name appears, the child document containing the texts will be repeated and added again. As a result, the child document does not contain a unique ID and this unique ID is very important, regardless of the texts being repeated.
Is there a smarter way of adding a unique ID even if the texts are repeated?
My code:
import uuid
people = [{'id': 1,
'name': 'Bob',
'type': 'person',
'_childDocuments_': [{'text': 'text_replace'}]},
{'id': 2,
'name': 'Kate',
'type': 'person',
'_childDocuments_': [{'text': 'text_replace'}]},
{'id': 3,
'name': 'Joe',
'type': 'person',
'_childDocuments_': [{'text': 'text_replace'}]}]
texts = ['this text has the name Bob and Kate',
'this text has the name Kate only ']
for text in texts:
childDoc={'id': str(uuid.uuid1()), #the id will duplicate when files are repeated
'text': text}
for person in people:
if person['name'] in childDoc['text']:
person['_childDocuments_'].append(childDoc)
Current output:
[{'id': 1,
'name': 'Bob',
'type': 'person',
'_childDocuments_': [{'text': 'text_replace'},
{'id': '7752597f-410f-11eb-9341-9cb6d0897972', #duplicate ID here
'text': 'this text has the name Bob and Kate'}]},
{'id': 2,
'name': 'Kate',
'type': 'person',
'_childDocuments_': [{'text': 'text_replace'},
{'id': '7752597f-410f-11eb-9341-9cb6d0897972', #duplicate ID here
'text': 'this text has the name Bob and Kate'},
{'id': '77525980-410f-11eb-b667-9cb6d0897972',
'text': 'this text has the name Kate only '}]},
{'id': 3,
'name': 'Joe',
'type': 'person',
'_childDocuments_': [{'text': 'text_replace'}]}]
As you can see in the current output, the ID for the text 'this text has the name Bob and Kate' has the same identifier: '7752597f-410f-11eb-9341-9cb6d0897972' , because it is appended twice. But I would like each identifier to be different.
Desired output:
Same as current output, except we want every ID to be different for every appended text even if these texts are the same/duplicates.
Move the generation of the UUID inside the inner loop:
for text in texts:
for person in people:
if person['name'] in text:
childDoc={'id': str(uuid.uuid1()),
'text': text}
person['_childDocuments_'].append(childDoc)
This does not actually ensure that the UUID are unique. For that you need to have a set of used UUID, and when generating a new one you check if it is already used and if it is you generate another. And test that one and repeat until you have either exhausted the UUID space or have found an unused UUID.
There is a 1 in 2**61 chance that duplicates are generated. I can't accept collisions as they result in data loss. So when I use UUID I have a loop around the generator that looks like this:
used = set()
while True:
identifier = str(uuid.uuid1())
if identifier not in used:
used.add(identifier)
break
The used set is actually stored persistently. I don't like this code although I have a program that uses it as it ends up in an infinite loop when it can't find a unused UUID.
Some document databases provide automatic UUID assignment and they do this for you internally to ensure that a given database instance never ends up with two documents with the same UUID.
I have a complex code which reads some values into nested defaultdict.
Then there is a cycle going through the keys in the dictionary and working with them - basically assigning them to another nested defaultdict.
Problem is, when I want to use the values from the dictionary and access them and pass them as values to a function.... I get either empty {} or something like this: defaultdict(<function tree at 0x2aff774309d8>
I have tried to write the dict so I can see if it is really empty. Part of my code;
if (not families_data[family]['cell_db']['output']):
print(rf"Output for {family} is empty.")
print(dict(families_data[family]['celldb']))
The really fun part is, when this "if" is true, then I get the following output:
Output for adfull is empty.
{'name': 'adfullx05_b', 'family': 'adfull', 'drive_strength': 0.5, 'template': 'adfull', 'category': '', 'pinmap': '', 'output': 'CO S', 'inout': '', 'input': 'A B CI', 'rail_supply': 'VDD VSS', 'well_supply': '', 'description': ''}
if I change the second line in the if to
print(families_data[family]['celldb'])
I get the following output:
defaultdict(<function tree at 0x2b45844059d8>, {'name': 'adfullx05_b', 'family': 'adfull', 'drive_strength': 0.5, 'template': 'adfull', 'category': '', 'pinmap': '', 'output': 'CO S', 'inout': '', 'input': 'A B CI', 'rail_supply': 'VDD VSS', 'well_supply': '', 'description': ''})
Why is the "if" even true, when there is a value 'CO S' in the output key?
Why am I getting {} when trying to access any value like families_data[family]['cell_db']['input'] and passing it to function as a parameter?
What the heck am I doing wrong?
The "cell_db" key in the if statement has an underscore while it does not in the print statement.
This should fix it:
if (not families_data[family]['celldb']['output']):
print(rf"Output for {family} is empty.")
print(dict(families_data[family]['celldb']))
I am trying to loop through two querysets with keys based on dates in the set. Each date has two types of items: Life events and work. The dict should look like this:
Timeline['1980']['event'] = "He was born"
Timeline['1992']['work'] = "Symphony No. 1"
Timeline['1993']['event'] = "He was married"
Timeline['1993']['work'] = "Symphony No. 2"
How do I create this dictionary?
I tried the following:
timeline = defaultdict(list)
for o in opus:
if o.date_comp_f is not None:
timeline[o.date]['work'].append(o)
timeline = dict(timeline)
for e in event:
if e.date_end_y is not None:
timeline[e.date]['event'].append(e)
timeline = dict(timeline)
I keep getting bad Key errors.
t = {}
t['1980'] = {
'event':'He was born',
'work':'None'
}
Or
t = {}
t['1980'] = {}
t['1980']['event'] = 'He was born'
t['1980']['work'] = 'None'
I am not sure what you want, but I guess you want to initialize a dictionary where you can make such assignments. You may need something like this:
from collections import defaultdict
# Create empty dict for assignment
Timeline = defaultdict(defaultdict)
# store values
Timeline['1980']['event'] = "He was born"
Timeline['1992']['work'] = "Symphony No. 1"
# Create a regular dict for checking if both are equal
TimelineRegular = {'1980':{'event':"He was born"},'1992':{'work':"Symphony No. 1"}}
# check
print(Timeline==TimelineRegular)
Output:
>>> True
timeline = {'1980':{'event':'He was born', 'work':'None'}, '1992':{'event':'None', 'work':'Symphony No. 1'}, '1993':{'event':'He was married', 'work':'Symphony No. 2'}}
With results:
>>> timeline['1980']['event']
'He was born'
>>> timeline['1992']['work']
'Symphony No. 1'
>>> timeline['1993']['event']
'He was married'
>>> timeline['1993']['work']
'Symphony No. 2'
This is a nested dictionary, the external dictionary are keys of dates with values of another dictionary. Internal dictionary are keys of work or event with values of the final value.
And to add more:
>>> timeline['2019'] = {'event':'Asked stackoverflow question', 'work':'unknown'}
>>> timeline
{'1980': {'event': 'He was born', 'work': 'None'}, '1992': {'event': 'None', 'work': 'Symphony No. 1'}, '1993': {'event': 'He was married', 'work': 'Symphony No. 2'}, '2019': {'event': 'Asked stackoverflow question', 'work': 'unknown'}}
When you add a new key, you need to make the value your empty dictionary with placeholders for each future key.
timeline['year'] = {'work':'', 'event':''}
or just an empty dictionary, though you may end up with missing keys later
timeline['year'] = {}
i have to solve the following task. First of all i'm using pyscard (python module to interact with smartcards) to query the smartcardreaders that are connected to the host. This works just fine and gives my a list of the connected readers.
To make this list available to puppet via facter, i need the list in form of key:value which i can than convert with json.dumps(list) and use it thru a custom fact.
The actual question is: How can i add the keys (0..8) to the given values from the pyscard list.
At the very end, the output should look similar to something like Reader
0: REINER SCT cyberJack ecom_a (0856136421) 00 00
Thanks in advance
Use dictionary comprehension to convert list to dict.
lis = ["reader one", "reader two", "reader three"]
d={'reader '+str(k):v for k,v in enumerate(lis)}
Output:
{'reader 0': 'reader one',
'reader 1': 'reader two',
'reader 2': 'reader three'}
One way is to use zip
>>> l0 = ["reader one", "reader two", "reader three"]
>>> dict(zip(range(len(l0)),l0))
{0: 'reader one', 1: 'reader two', 2: 'reader three
I have following lists in python ["john","doe","1","90"] and ["prince","2","95"]. the first number column is field: id and second number field is score. I would like to use re in python to parse out the field and print. So far, I only know how to do split of field comma. Any one can help?
You better use a dictionary than a regex (which I don't see how you use here):
{'name': 'John Doe', 'id': '1', 'score': '90'}
Or better yet, use numbers:
{'name': 'John Doe', 'id': 1, 'score': 90}
You don't really need regular expression here. You can just use isinstance() and slicing.
This should do what you want :
a_list = ['john','doe','1','90']
for i, elem in enumerate(a_list):
try:
elem = int(elem)
except ValueError, e:
pass
if isinstance(elem, int):
names_part = a_list[:i-1]
id_and_score = a_list[i-1:]
print 'name(s): {0}, '.format(' '.join(names_part)), 'id: {id}, score: {score}'.format(id=id_and_score[0], score=id_and_score[1])
Though, this solution could be improve if we were know the source of your data or if there is a way to pridict the field position you can just turn your list into a dict as suggested. If you extract your data you may consider building a dict instead of a list which prevent you from having to do what above.