How to create dictionary with list with regex and defaultdict

How to create dictionary with list with regex and defaultdict - python

A dictionary is below
my = [{'Name':'Super', 'Gender':'Male', 'UNNO':111234},
{'Name':'Spider', 'Gender':'Male', 'UNNO':11123},
{'Name':'Bat', 'Gender':'Female', 'UNNO':113456},
{'Name':'pand', 'Gender':'Female', 'UNNO':13456}]
The unique number is the value for key "UNNO" for each dictionary.
All UNNO numbers must contain 6 digits.
UNNO number start from 11 is only valid
Expected Out
my_dict_list = {'Male':['Super'], 'Female':['Bat']}
Original Code with out regex
d = {}
for i in my:
if str(i['UNNO']).startswith('11') and len(str(i['UNNO'])) == 6:
# To get {'Male':['Super'], 'Female':['Bat']}
d[i['Gender']] = [i['Name']]
How to write with help of regex, wrote regular expression, how to complete with help of defaultdict
import re
from collections import defaultdict
# regular expression
rx = re.compile(r'^(?=\d{6}$)(?P<Male>11\d+)|(?P<Female>11\d+)')
# output dict
output = defaultdict(list)

To engage regex matching in solving your issue - use the following approach:
import re
from collections import defaultdict
my_list = [{'Name': 'Super', 'Gender': 'Male', 'UNNO': 111234},
{'Name': 'Spider', 'Gender': 'Male', 'UNNO': 11123},
{'Name': 'Bat', 'Gender': 'Female', 'UNNO': 113456},
{'Name': 'pand', 'Gender': 'Female', 'UNNO': 13456}]
genders = defaultdict(list)
pat = re.compile(r'^11\d{4}$') # crucial pattern to validate `UNNO` number
for d in my_list:
if pat.search(str(d['UNNO'])):
genders[d['Gender']].append(d['Name'])
print(dict(genders)) # {'Male': ['Super'], 'Female': ['Bat']}

Related

Python - Compare lists of dictionaries and return not matches of one of the keys

I want to compare 2 lists (with dictionaries inside) and get values from the dictionaries that don't match.
So I have something like this:
list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]
list2 = [{'text': 'dog'}]
And I want to get the texts that are not on both lists. Texts are the only criteria. It's not relevant if the numbers are the same or not.
The desired result would look like this:
list_notmatch = [{'text': 'cat'},{'text': 'horse'}]
If it's easier or faster, this would be OK too:
list_notmatch = [{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]
I've seen a similar question (Compare two lists of dictionaries in Python. Return non match) but the output it's not exactly what I need and I don't know if it's the best solution for what I need.
The real lists are quite long (there could be more than 10.000 dictionaries inside list1), so I guess I need a performant solution (or at least a not very slow one).
Order is not important.
Thanks!

The first form of output:
Take the 'text' in each dictionary as two sets, and then use the symmetric_difference method or xor operator:
>>> {d['text'] for d in list1} ^ {d['text'] for d in list2}
{'horse', 'cat'}
>>> {d['text'] for d in list1}.symmetric_difference({d['text'] for d in list2})
{'horse', 'cat'}
>>> [{'text': v} for v in _]
[{'text': 'horse'}, {'text': 'cat'}]
The two methods can be targeted to do some optimization. If operators are used, the set with shorter length can be placed on the left:
>>> timeit(lambda: {d['text'] for d in list1} ^ {d['text'] for d in list2})
0.59890600000017
>>> timeit(lambda: {d['text'] for d in list2} ^ {d['text'] for d in list1})
0.5732289999996283
If you use the symmetric_difference method, you can use generator expressions or maps to avoid explicitly creating a second set:
>>> timeit(lambda: {d['text'] for d in list1}.symmetric_difference({d['text'] for d in list2}))
0.6045051000000967
>>> timeit(lambda: {d['text'] for d in list1}.symmetric_difference(map(itemgetter('text'), list2)))
0.579385199999706
The second form of output:
A simple way to get the dictionary itself in the list is:
Create a dictionary for each list, where the key is the 'text' of each dictionary and the value is the corresponding dictionary.
The dict.keys() can use operators like sets (in Python3.10+, for lower versions, you need to manually convert them to sets.), so use twice subtraction to calculate the difference set, and then take the initial dictionary from the two large dictionaries according to the results.
>>> dict1 = {d['text']: d for d in list1}
>>> dict2 = {d['text']: d for d in list2}
>>> dict1_keys = dict1.keys() # use set(dict1.keys()) if the version of Python is not 3.10+
>>> dict2_keys = dict2.keys() # ditto
>>> [dict1[k] for k in dict1_keys - dict2_keys] + [dict2[k] for k in dict2_keys - dict1_keys]
[{'text': 'horse', 'number': 40}, {'text': 'cat', 'number': 40}]
Note that using the xor operator to directly obtain the symmetry difference here may not be an ideal method, because you also need to take the results from the large dictionary separately. If you want to use the xor operator, you can combine the two dictionaries and take values from them:
>>> list(map((dict1 | dict2).__getitem__, dict1_keys ^ dict2_keys))
[{'text': 'horse', 'number': 40}, {'text': 'cat', 'number': 40}]

in O(N+M) you can do this way
# your code goes here
list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]
list2 = [{'text': 'dog'}]
matched = {}
no_match =[]
for i in list2:
matched[i['text']] = []
for i in list1:
if i['text'] in matched:
matched[i['text']].append(i)
else:
no_match.append(i)
matched = matched.values()
print(matched, no_match)
output
dict_values([[{'text': 'dog', 'number': 10}]]) [{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]

I would use set arithmetics following way
list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]
list2 = [{'text': 'dog'}]
texts1 = set(i['text'] for i in list1)
texts2 = set(i['text'] for i in list2)
texts = texts1.symmetric_difference(texts2)
list_notmatch1 = [{"text":i} for i in texts]
list_notmatch2 = [i for i in list1+list2 if i['text'] in texts]
print(list_notmatch1)
print(list_notmatch2)
output
[{'text': 'horse'}, {'text': 'cat'}]
[{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]
Explanation: I create set from texts from each list, then use symmetric_difference which does
Return the symmetric difference of two sets as a new set.
(i.e. all elements that are in exactly one of the sets.)
then texts might be used to create 1st format or used to filter concatenation of list1 and list2 to get 2nd format.

You can try this:
list1 = [{'text': 'dog', 'number': 10},{'text': 'cat', 'number': 40},{'text': 'horse', 'number': 40}]
list2 = [{'text': 'dog'}]
result = []
for d1 in list1:
if not any(d2['text'] == d1['text'] for d2 in list2):
result.append(d1)
print(result)
Output:
[{'text': 'cat', 'number': 40}, {'text': 'horse', 'number': 40}]

Count frequency of words inside a list in a dictionary

I have a list of common keywords:
common_keywords = ['dog', 'person', 'cat']
And a list of dictionaries, containing keywords and sometimes the common_keywords listed above:
people = [{'name':'Bob', 'keywords': ['dog', 'dog', 'car', 'trampoline']},
{'name':'Kate', 'keywords': ['cat', 'jog', 'tree', 'flower']},
{'name':'Sasha', 'keywords': ['cooking', 'stove', 'person', 'cat']}]
I would like to count the frequency of the common_keywords for each person, so the desired output would look something like:
counts = [{'name': 'Bob', 'counts': [{'dog': 2}]},
{'name': 'Kate', 'counts': [{'cat': 1}]},
{'name': 'Sasha', 'counts': [{'person':1}, {'cat': 1}]]
I am able to use dict(Counter()) to count the keywords and filter them if they appear in the common_keywords but I am struggling with linking these counts back to the original name as shown in the desired output: counts.
Current code (I think I am slowly getting there):
freq_dict = {}
for p in people:
name = p['name']
for c in p['keywords']:
if c not in freq_dict:
freq_dict[name] = {c: 1}
else:
if c not in freq_dict[name]:
freq_dict[c] = 1
else:
freq_dict[c] +=1

You can use a list-comprehension along with collections.Counter which does exactly what you want with the nested list. -
from collections import Counter
[{'name':i.get('name'),
'keywords':[dict(Counter([j for j in i.get('keywords')
if j in common_keywords]))]} for i in people]
[{'name': 'Bob', 'keywords': [{'dog': 2}]},
{'name': 'Kate', 'keywords': [{'cat': 1}]},
{'name': 'Sasha', 'keywords': [{'person': 1, 'cat': 1}]}]
First, with the list comprehension you want to reconstruct the original list of dicts with keys separately defined along with i.get('key'). This will let to work with the nested list value for keywords.
Iterate over the list and filter only the ones in common_keywords
Pass this list into collections.Counter to get your dict
Return it as a list with a single dict inside as you expect it to be

Python - Use a list containing keys to make a dictionary out of a string

With something like this:
keys = ["Name:", "Date:", "Time:sec:", "Room"]
string = "Name:BobDate:1/3Time:sec:3:00:00RoomA1"
How can I get a dictionary like:
dict1 = {"Name" : "Bob", "Date" : "1/3", "Time:sec" : "3:00:00", "Room" : "A1"}
Removing the colon is optional.
I am able to remove keys from the string entirely using re.split(), .join(), and map() but I want to create a dictionary instead.

Just a regex way...
dict(zip(keys, re.match('(.*)'.join(keys + [""]), string).groups()))
Demo:
>>> if 1:
import re
keys = ["Name:", "Date:", "Time:sec:", "Room"]
string = "Name:BobDate:1/3Time:sec:3:00:00RoomA1"
dict(zip(keys, re.match('(.*)'.join(keys + [""]), string).groups()))
{'Name:': 'Bob', 'Date:': '1/3', 'Time:sec:': '3:00:00', 'Room': 'A1'}

Given your input:
keys = ["Name:", "Date:", "Time:sec:", "Room"]
string = "Name:BobDate:1/3Time:sec:3:00:00RoomA1"
You can split it using a regular expression preserving the split key itself, eg:
split = re.split('({})'.format('|'.join(re.escape(k) for k in keys)), string)
# ['', 'Name:', 'Bob', 'Date:', '1/3', 'Time:sec:', '3:00:00', 'Room', 'A1']
Then, use dict with zipping the appropriate slices (we start from 1 because of the leading empty match), eg:
dct = dict(zip(split[1::2], split[2::2]))
# {'Date:': '1/3', 'Name:': 'Bob', 'Time:sec:': '3:00:00', 'Room': 'A1'}

Using .split() in a loop works:
keys = ["Name:", "Date:", "Time:sec:", "Room"]
s = "Name:BobDate:1/3Time:sec:3:00:00RoomA1"
values = []
temp = s.split(keys[0])[-1]
for key in keys[1:]:
val, temp = temp.split(key)
values.append(val)
values.append(temp)
dict1 = dict(zip(keys, values))
print(dict1)
Output:
{'Name:': 'Bob', 'Date:': '1/3', 'Time:sec:': '3:00:00', 'Room': 'A1'}

One line approach,
In [40]: dict(zip(keys,[string.split(j)[-1].split(keys[-1])[0] if i == (len(keys) - 1) else string.split(j)[-1].split(keys[i+1])[0] for i,j in enumerate(keys)]))
Out[40]: {'Date:': '1/3', 'Name:': 'Bob', 'Room': 'A1', 'Time:sec:': '3:00:00'}
I know it's pretty complex approach :), Just to show the different option of answer.

We could replace the keys in string with a rowbreak (something to split by). And then perform a dict(zip(...
keys = ["Name:", "Date:", "Time:sec:", "Room"]
string = "Name:BobDate:1/3Time:sec:3:00:00RoomA1"
for key in keys:
string = string.replace(key,"\n")
d = dict(zip(keys,string.split('\n')[1:])) # 1: to handle first row break
d equals:
{'Date:': '1/3', 'Name:': 'Bob', 'Room': 'A1', 'Time:sec:': '3:00:00'}

You can try this:
import re
keys = ["Name:", "Date:", "Time:sec:", "Room"]
string = "Name:BobDate:1/3Time:sec:3:00:00RoomA1"
new_data = dict(zip(map(lambda x:x[:-1], keys), filter(None, re.split('\*', re.sub('|'.join(keys), '*', string)))))
Output:
{'Date': '1/3', 'Time:sec': '3:00:00', 'Name': 'Bob', 'Roo': 'A1'}

Here is my journey on finding the answer
In [17]: import re
In [18]: keys = ["Name:", "Date:", "Time:sec:", "Room"]
In [19]: string = "Name:BobDate:1/3Time:sec:3:00:00RoomA1"
In [20]: separators = '|'.join(keys)
In [21]: separators
Out[21]: 'Name:|Date:|Time:sec:|Room'
In [22]: re.split(separators, string)
Out[22]: ['', 'Bob', '1/3', '3:00:00', 'A1']
In [23]: re.split(separators, string)[1:]
Out[23]: ['Bob', '1/3', '3:00:00', 'A1']
In [24]: values = re.split(separators, string)[1:]
In [25]: dict(zip(keys, values))
Out[25]: {'Date:': '1/3', 'Name:': 'Bob', 'Room': 'A1', 'Time:sec:': '3:00:00'}
In [26]: dict1 = dict(zip(keys, values))
Notes
Line 20, 21: Create a list of separators to use in re.split later on. The pipe symbol (|) means "or" in regular expression
Line 22: Split the string using this list of separators. We almost got what we want except for the first element that is blank
Line 23 fixes that first blank element
Line 24, with that, we assign the result to values, to be used later to construct the dictionary
Line 25, 26: Construct that dictionary and assign to dict1

Python: Increment value of dictionary stored in a list

Simple example here:
I want to have a list which is filled with dictionaries for every type of animal.
The print should look like this:
dictlist_animals = [{'type':'horse','amount':2},
{'type':'monkey','amount':2},
{'type':'cat','amount':1},
{'type':'dog','amount':1}]
Because some animals exist more than once I've added a key named 'amount' which should count how many animals of every type exist.
I am not sure if the 'if-case' is correctly and what do I write in the 'else case'?
dictlist_animals = []
animals = ['horse', 'monkey', 'cat', 'horse', 'dog', 'monkey']
for a in animals:
if a not in dictlist_animals['type']:
dictlist_animals.append({'type': a, 'amount' : 1})
else:
#increment 'amount' of animal a

Better to use Counter. It's create dictionary where keys are elements of animals list and values are their count. Then you can use list comprehension for creating list with dictionaries:
from collections import Counter
animals_dict = [{'type': key, 'amount': value} for key, value in Counter(animals).items()]

Try below code,
dictlist_animals = []
animals = ['horse', 'monkey', 'cat', 'horse', 'dog', 'monkey']
covered_animals = []
for a in animals:
if a in covered_animals:
for dict_animal in dictlist_animals:
if a == dict_animal['type']:
dict_animal['amount'] = dict_animal['amount'] + 1
else:
covered_animals.append(a)
dictlist_animals.append({'type': a, 'amount' : 1})
print dictlist_animals
[{'amount': 2, 'type': 'horse'}, {'amount': 2, 'type': 'monkey'}, {'amount': 1, 'type': 'cat'}, {'amount': 1, 'type': 'dog'}]

You can't directly call dictlist_animals['type'] on a list because they are indexed numerically. What you can do is to store this data in an intermediate dictionary and then convert it in the data structure you want:
dictlist_animals = []
animals = ['horse', 'monkey', 'cat', 'horse', 'dog', 'monkey']
animals_count = {};
for a in animals:
c = animals_count.get(a, 0)
animals_count[a] = c+1
for animal, amount in animals_count.iteritems():
dictlist_animals.append({'type': animal, 'amount': amount})
Note that c = animals_count.get(a, 0) gets the current amount for the animal a if it is present, otherwise it returns the default value 0 so that you don't have to use an if/else statement.

You can also use defaultdict.
from collections import defaultdict
d = defaultdict(int)
for animal in animals:
d[animal]+= 1
dictlist_animals = [{'type': key, 'amount': value} for key, value in d.iteritems()]

Use of dictionary in Python

I'm writing a concept learning programs, where I need to convert from index to the name of categories.
For example:
# binary concept learning
# candidate eliminaton learning algorithm
import numpy as np
import csv
def main():
d1={0:'0', 1:'Japan', 2: 'USA', 3: 'Korea', 4: 'Germany', 5:'?'}
d2={0:'0', 1:'Honda', 2: 'Chrysler', 3: 'Toyota', 4:'?'}
d3={0:'0', 1:'Blue', 2:'Green', 3: 'Red', 4:'White', 5:'?'}
d4={0:'0', 1:1970,2:1980, 3:1990, 4:2000, 5:'?'}
d5={0:'0', 1:'Economy', 2:'Sports', 3:'SUV', 4:'?'}
a=[0,1,2,3,4]
print a
if __name__=="__main__":
main()
So [0,1,2,3,4] should convert to ['0', 'Honda', 'Green', '1990', '?']. What is the most pythonic way to do this?

I think you need a basic dictionary crash course:
this is a proper dictionary:
>>>d1 = { 'tires' : 'yoko', 'manufacturer': 'honda', 'vtec' : 'no' }
You can call invidual things in the dictionary easily:
>>>d1['tires']
'yoko'
>>>d1['vtec'] = 'yes' #mad vtec yo
>>>d1['vtec']
'yes'
Dictionaries are broken up into two different sections, the key and the value
testDict = {'key':'value'}
You were using a dictionary the exact same way as a list:
>>>test = {0:"thing0", 1:"thing1"} #dictionary
>>>test[0]
'thing0'
which is pretty much the exact same as saying
>>>test = ['thing0','thing1'] #list
>>>test[0]
'thing0'
in your particular case, you may want to either format your dictionaries properly ( i would suggest something like masterdictionary = {'country': ['germany','france','USA','japan], 'manufacturer': ['honda','ferrarri','hoopty'] } etcetera because you could call each individual item you wanted a lot easier
with that same dictionary:
>>>masterdictionary['country'][1]
'germany'
which is
dictionaryName['key'][iteminlistindex]
of course there is nothing preventing you from putting dictionaries as values inside of dictionaries.... inside values of other dictionaries...

You can do:
data = [d1,d2,d3,d4,d5]
print [d[key] for key, d in zip(a, data)]
The function zip() can be used to combine to iterables; lists in this case.

You've already got the answer to your direct question, but you may wish to consider re-structuring the data. To me, the following makes a lot more sense, and will enable you to more easily index into it for what you asked, and for any possible later queries:
from pprint import pprint
items = [[el.get(i, '?') for el in (d1,d2,d3,d4,d5)] for i in range(6)]
pprint(items)
[['0', '0', '0', '0', '0'],
['Japan', 'Honda', 'Blue', 1970, 'Economy'],
['USA', 'Chrysler', 'Green', 1980, 'Sports'],
['Korea', 'Toyota', 'Red', 1990, 'SUV'],
['Germany', '?', 'White', 2000, '?'],
['?', '?', '?', '?', '?']]

I would use a list of dicts d = [d1, d2, d3, d4, d5], and then a list comprehension:
[d[i][key] for i, key in enumerate(a)]
To make the whole thing more readable, use nested dictionaries - each of your dictionaries seems to represent something you could give a more descriptive name than d1 or d2:
data = {'country': {0: 'Japan', 1: 'USA' ... }, 'brand': {0: 'Honda', ...}, ...}
car = {'country': 1, 'brand': 2 ... }
[data[attribute][key] for attribute, key in car.items()]
Note this would not necessarily be in order if that is important, though I think there is an ordered dictionary type.
As suggested by the comment, a dictionary with contiguous integers as keys can be replaced by a list:
data = {'country': ['Japan', 'USA', ...], 'brand': ['Honda', ...], ...}

If you need to keep d1, d2, etc. as is:
newA = [locals()["d%d"%(i+1)][a_value] for i,a_value in enumerate(a)]
Pretty ugly, and fragile, but it should work with your existing code.

You don't need a dictionary for this at all. Lists in python automatically support indexing.
def main():
d1=['0','Japan','USA','Korea','Germany',"?"]
d2=['0','Honda','Chrysler','Toyota','?']
d3=['0','Blue','Green','Red','White','?']
d4=['0', 1970,1980,1990,2000,'?']
d5=['0','Economy','Sports','SUV','?']
ds = [d1, d2, d3, d4, d5] #This holds all your lists
#This is what range is for
a=range(5)
#Find the nth index from the nth list, seems to be what you want
print [ds[n][n] for n in a] #This is a list comprehension, look it up.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to create dictionary with list with regex and defaultdict - python

Related

Python - Compare lists of dictionaries and return not matches of one of the keys

Count frequency of words inside a list in a dictionary

Python - Use a list containing keys to make a dictionary out of a string

Python: Increment value of dictionary stored in a list

Use of dictionary in Python

Categories

Resources