Suppose I have a set of tuples with people's names. I want to find everyone who shares the same last name, excluding people who don't share their last name with anyone else:
# input
names = set([('John', 'Lee'), ('Mary', 'Miller'), ('Paul', 'Ryan'),
('Bob', 'Ryan'), ('Tina', 'Lee'), ('Bob', 'Smith')])
# expected output
{'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']} # or similar
This is what I am using
def find_family(names):
result = {}
try:
while True:
name = names.pop()
if name[1] in result:
result[name[1]].append(name[0])
else:
result[name[1]] = [name[0]]
except KeyError:
pass
return dict(filter(lambda x: len(x[1]) > 1, result.items()))
This looks ugly and inefficient. Is there a better way?
defaultdict can be used to simplify the code:
from collections import defaultdict
def find_family(names):
d = defaultdict(list)
for fn, ln in names:
d[ln].append(fn)
return dict((k,v) for (k,v) in d.items() if len(v)>1)
names = set([('John', 'Lee'), ('Mary', 'Miller'), ('Paul', 'Ryan'),
('Bob', 'Ryan'), ('Tina', 'Lee'), ('Bob', 'Smith')])
print find_family(names)
This prints:
{'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']}
Instead of using a while loop, use a for loop (or similar construct) over the set contents (and while you're at it, you can destructure the tuples):
for firstname, surname in names:
# do your stuff
You might want to use a defaultdict or OrderedDict (http://docs.python.org/library/collections.html) to hold your data in the body of the loop.
>>> names = set([('John', 'Lee'), ('Mary', 'Miller'), ('Paul', 'Ryan'),
... ('Bob', 'Ryan'), ('Tina', 'Lee'), ('Bob', 'Smith')])
You can get a dictionary of all the people where the keys are their lastnames easily with a for-loop:
>>> families = {}
>>> for name, lastname in names:
... families[lastname] = families.get(lastname, []) + [name]
...
>>> families
{'Miller': ['Mary'], 'Smith': ['Bob'], 'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']}
Then, you just need to filter the dictionary with the condition len(names) > 1. This filtering could be done using a "dictionary comprehension":
>>> filtered_families = {lastname: names for lastname, names in families.items() if len(names) > 1}
>>> filtered_families
{'Lee': ['Tina', 'John'], 'Ryan': ['Bob', 'Paul']}
Related
I have the following array in Python in the following format:
Array[('John', '123'), ('Alex','456'),('Nate', '789')]
Is there a way I can assign the array variables by field as below?
Name = ['john', 'Alex', 'Nate']
ID = ['123', '456', '789']
In the spirit of "explicit is better than implicit":
data = [('John', '123'), ('Alex', '456'), ('Nate', '789')]
names = [x[0] for x in data]
ids = [x[1] for x in data]
print(names) # prints ['John', 'Alex', 'Nate']
print(ids) # prints ['123', '456', '789']
Or even, to be even more explicit:
data = [('John', '123'), ('Alex', '456'), ('Nate', '789')]
NAME_INDEX = 0
ID_INDEX = 1
names = [x[NAME_INDEX] for x in data]
ids = [x[ID_INDEX] for x in data]
this is a compact way to do this using zip:
lst = [('John', '123'), ('Alex','456'),('Nate', '789')]
name, userid = list(zip(*lst))
print(name) # ('John', 'Alex', 'Nate')
print(userid) # ('123', '456', '789')
note that the results are stored in (immutable) tuples; if you need (mutatble) lists you need to cast.
I'm new to programming and would appreciate if someone can help with the following in Python/Pandas.
I have a dictionary that has a list as the values. I'd like to be able to group together keys that have similar values. I've seen similar questions on here, but the catch in this case is i want to disregard the order of the values for example:
classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
jack and charles have the same values but in different order. I'd like an output that will give the value irrespective of order. In this case, the output would be written to a csv as
['20','male','soccer']: jack, charles
['26','male','tennis']: brian
['19','basketball','male']: zulu
Using frozensets, apply, groupby + agg:
s = pd.DataFrame(classmates).T.apply(frozenset, 1)
s2 = pd.Series(s.index.values, index=s)\
.groupby(level=0).agg(lambda x: list(x))
s2
(soccer, 20, male) [charles, jack]
(26, male, tennis) [brian]
(basketball, male, 19) [zulu]
dtype: object
You can invert the dictionary in the way you want with the following code:
classmates={'jack':['20','male','soccer'],'brian':['26','male','tennis'],'charles':['male','soccer','20'],'zulu':['19','basketball','male']}
out_dict = {}
for key, value in classmates.items():
current_list = out_dict.get(tuple(sorted(value)), [])
current_list.append(key)
out_dict[tuple(sorted(value))] = current_list
print(out_dict)
This prints
{('20', 'male', 'soccer'): ['charles', 'jack'], ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']}
from collections import defaultdict
ans = defaultdict(list)
classmates={'jack':['20','male','soccer'],
'brian':['26','male','tennis'],
'charles':['male','soccer','20'],
'zulu':['19','basketball','male']
}
for k, v in classmates.items():
sorted_tuple = tuple(sorted(v))
ans[sorted_tuple].append(k)
# ans is: a dict you desired
# defaultdict(<class 'list'>, {('20', 'male', 'soccer'): ['jack','charles'],
# ('26', 'male', 'tennis'): ['brian'], ('19', 'basketball', 'male'): ['zulu']})
for k, v in ans.items():
print(k, ':', v)
# output:
# ('20', 'male', 'soccer') : ['jack', 'charles']
# ('26', 'male', 'tennis') : ['brian']
# ('19', 'basketball', 'male') : ['zulu']
First of all convert your dictionary to a pandas dataframe.
df= pd.DataFrame.from_dict(classmates,orient='index')
Then sort it in ascending order by age.
df=df.sort_values(by=0,ascending=True)
Here 0 is a default column name. You can rename this column name.
You could do this in one line:
print({tuple(sorted(v)) : [k for k,vv in a.items() if sorted(vv) == sorted(v)] for v in a.values()})
or
Here is detailed solution :
dict_1 = {'jack': ['20', 'male', 'soccer'], 'brian': ['26', 'male', 'tennis'], 'charles': ['male', 'soccer', '20'],
'zulu': ['19', 'basketball', 'male']}
sorted_dict = {}
for key,value in dict_1.items():
sorted_1 = sorted(value)
sorted_dict[key] = sorted_1
tracking_of_duplicate = []
final_dict = {}
for key1,value1 in sorted_dict.items():
if value1 not in tracking_of_duplicate:
tracking_of_duplicate.append(value1)
final_dict[tuple(value1)] = [key1]
else:
final_dict[tuple(value1)].append(key1)
print(final_dict)
dictionary = {'0': "Linda", "1": "Anna", "2": 'Theda', "3":'Thelma',"4": 'Thursa',"5" :"Mary"}
dictionary2 = ['Linda', 'Ula', 'Vannie', 'Vertie', 'Mary']
I want to remove the same values from dictionary to dictionary2, I wrote the code like this:
[k for k, v in dictionary.items() if v not in dictionarya]
But it still can print out the words above same words, like this:
['0', '1', '2', '3', '4']
How to remove all the repeat words? so it can print out like this: e,g.
['1', '2', '3', '4']
How to just get the last loop? Thank you
To get the keys in which the values aren't contained in the other list you can use the following list comprehension
>>> [k for k,v in dictionary.items() if v not in dictionary2]
['2', '4', '3', '1']
Just do the following list comprehension:
>>> dictionary = {'0': "Linda", "1": "Anna", "2":"Mary"}
>>> dictionary2 = ['Linda', 'Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie', 'Mary']
>>> value = [i for i in dictionary2 if i not in dictionary.values()]
>>> value
['Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie']
>>>
This works:
dictionary = {'0': "Linda", "1": "Anna", "2":"Mary"}
dictionary2 = ['Linda', 'Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie', 'Mary']
# I want to remove the same values from dictionary to dictionary2, I wrote the code like this:
output = [i for i in dictionary2 if i not in dictionary.values()]
Result:
['Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie']
You could convert dictionary2 to a set, and use this syntax:
>>> dictionary = {'0': "Linda", "1": "Anna", "2": 'Theda', "3":'Thelma',"4": 'Thursa',"5" :"Mary"}
>>> dictionary2 = ['Linda', 'Ula', 'Vannie', 'Vertie', 'Mary']
>>> remove_names = set(dictionary2)
>>> [name for name in dictionary.values() if name not in remove_names]
['Anna', 'Theda', 'Thelma', 'Thursa']
>>> [id for id, name in dictionary.items() if name not in remove_names]
['1', '2', '3', '4']
Note that dictionary2 isn't a dict but a list.
Also, Python dicts are unordered. You cannot be sure that the result will always be 1,2,3,4.
Finally, if all your dict keys are integers (or look like integers), you'd better use a list :
>>> names = ["Linda","Anna",'Theda','Thelma','Thursa',"Mary"]
>>> remove_names = set(['Linda', 'Ula', 'Vannie', 'Vertie', 'Mary'])
>>> list(enumerate(names))
[(0, 'Linda'), (1, 'Anna'), (2, 'Theda'), (3, 'Thelma'), (4, 'Thursa'), (5, 'Mary')]
>>> [name for name in names if name not in remove_names]
['Anna', 'Theda', 'Thelma', 'Thursa']
>>> [id for id, name in enumerate(names) if name not in remove_names]
[1, 2, 3, 4]
This should be more readable and faster than your code.
dictionary = {'0': "Linda", "1": "Anna", "2":"Mary"}
dictionary2 = ['Linda', 'Theda', 'Thelma', 'Thursa', 'Ula','Vannie','Vertie', 'Mary']
for i in range(0, len(dictionary2)-1):
if dictionary2[i] in dictionary.values():
del dictionary2[i]
print (dictionary2)
Prints this:
['Theda', 'Thelma', 'Thursa', 'Ula', 'Vannie', 'Vertie']
I'm trying to use enumerate to iterate of a list and store the elements of the list as well as use the index to grab the index of another list the same size.
Using a silly example:
animal = ['cat', 'dog', 'fish' , 'monkey']
name = ['george', 'steve', 'john', 'james']
x = []
for count, i in enumerate(animal):
y = zip(name[count], i)
x = x +y
Instead of producing tuples of each element of both lists. It produces tuples by letter. Is there a way to do this but get the elements of each list rather than each letter? I know there is likely a better more pythonic way of accomplishing this same task, but I'm specifically looking to do it this way.
enumerate() is doing no such thing. You are pairing up the letters here:
y = zip(name[count], i)
For example, for the first element in animal, count is 0 and i is set to 'cat'. name[0] is 'george', so you are asking Python to zip() together 'george' and 'cat':
>>> zip('george', 'cat')
[('g', 'c'), ('e', 'a'), ('o', 't')]
This is capped at the shorter wordlength.
If you wanted a tuple, just use:
y = (name[count], i)
and then append that to your x list:
x.append(y)
You could use zip() instead of enumerate() to create your pairings:
x = zip(name, animal)
without any loops required:
>>> animal = ['cat', 'dog', 'fish' , 'monkey']
>>> name = ['george', 'steve', 'john', 'james']
>>> zip(name, animal)
[('george', 'cat'), ('steve', 'dog'), ('john', 'fish'), ('james', 'monkey')]
When you use zip() it actually creates a list of tuples of corresponding elements at each index.
So when you provide strings as the input, it provides the result as list of tuples at each character. Example -
>>> zip('cat','george')
[('c', 'g'), ('a', 'e'), ('t', 'o')]
This is what you are doing, when you iterate over each element in the list and use zip.
Instead , you should directly use zip , without iterating over the elements of the list.
Example -
>>> animal = ['cat', 'dog', 'fish' , 'monkey']
>>> name = ['george', 'steve', 'john', 'james']
>>> zip(animal,name)
[('cat', 'george'), ('dog', 'steve'), ('fish', 'john'), ('monkey', 'james')]
I'm trying to figure out how to run 1 list through another list, and whenever the first names match, append it to the new list if it exists
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
So it looks something like this.
list3 = [["Ryan","Canada","10"],["John","United States","30"],["Jake","Spain","15"]
So far I haven't really been able to even come close, so even the smallest guidance would be much appreciated. Thanks.
You could transform them into dictionaries and then use a list comprehension:
dic1 = dict(list1)
dic2 = dict(list2)
list3 = [[k,dic2[k],dic1[k]] for k in dic2 if k in dic1]
If ordering isn't a concern, the most straightforward way is to convert the lists into more suitable data structures: dictionaries.
ages = dict(list1)
countries = dict(list2)
That'll make it a cinch to combine the pieces of data:
>>> {name: [ages[name], countries[name]] for name in ages.keys() & countries.keys()}
{'Ryan': ['10', 'Canada'], 'Jake': ['15', 'Spain'], 'John': ['30', 'United States']}
Or even better, use nested dicts:
>>> {name: {'age': ages[name], 'country': countries[name]} for name in ages.keys() & countries.keys()}
{'Ryan': {'country': 'Canada', 'age': '10'},
'Jake': {'country': 'Spain', 'age': '15'},
'John': {'country': 'United States', 'age': '30'}}
If the names are unique you can make list1 into a dictionary and then loop through list2 adding items from this dictionary.
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
list1_dict = dict(list1)
output = [item + [list1_dict[item[0]]] for item in list2]
If not, then you need to decide how to deal with cases of duplicate names.
You can use a set and an OrderedDict to combine the common names and keep order:
list1 = [["Ryan","10"],["James","40"],["John","30"],["Jake","15"],["Adam","20"]]
list2 = [["Ryan","Canada"],["John","United States"],["Jake","Spain"]]
from collections import OrderedDict
# get set of names from list2
names = set(name for name,_ in list2)
# create an OrderedDict using name as key and full sublist as value
# filtering out names that are not also in list2
d = OrderedDict((sub[0], sub) for sub in list1 if sub[0] in names)
for name, country in list2:
if name in d:
# add country from each sublist with common name
d[name].append(country)
print(d.values()) # list(d.values()) for python3
[['Ryan', '10', 'Canada'], ['John', '30', 'United States'], ['Jake', '15', 'Spain']]
If list2 always has common names you can remove the if name in d: