Finding the same second elements in nested lists - recursive function

Finding the same second elements in nested lists - recursive function - python

I have nested lists looks like this;
[['CELTIC AMBASSASDOR', 'Warrenpoint'],['HAV SNAPPER', 'Silloth'],['BONAY', 'Antwerp'],['NINA', 'Antwerp'],['FRI SKIEN', 'Warrenpoint']]
and goes on. How can I find the lists that have same second elements, for example
['CELTIC AMBASSASDOR', 'Warrenpoint']
['FRI SKIEN', 'Warrenpoint']
['BONAY', 'Antwerp']
['NINA', 'Antwerp']
The list is too long (I'm reading it from a .csv file) and I can't determine to search which thing exactly (eg: I can't search for 'Antwerp' to find all Antwerps because I don't know all of the texts in csv file), so I thought I need a recursive function that will search until find the all nested lists seperated by second items. Couldn't figure out how to make the recursive function, if anyone has a better solution, much appreciated.

There's no need to use recursion here. Create a dictionary with a key of the second element and values of the whole sublist, then create a result that only includes the matches you're interested in:
import collections
l = [['CELTIC AMBASSASDOR', 'Warrenpoint'],['HAV SNAPPER', 'Silloth'],['BONAY', 'Antwerp'],['NINA', 'Antwerp'],['FRI SKIEN', 'Warrenpoint']]
d = collections.defaultdict(list)
for item in l:
d[item[1]].append(item)
result = dict(item for item in d.items() if len(d[item[0]]) > 1)
Result:
>>> import pprint
>>> pprint.pprint(result)
{'Antwerp': [['BONAY', 'Antwerp'], ['NINA', 'Antwerp']],
'Warrenpoint': [['CELTIC AMBASSASDOR', 'Warrenpoint'],
['FRI SKIEN', 'Warrenpoint']]}

filter(lambda x:x[1] in set(filter(lambda x:zip(*l)[1].count(x)==2,zip(*l)[1])),l)

Related

Given two list of words, than return as dictionary and set together

Hey (Sorry bad english) so am going to try and make my question more clear. if i have a function let's say create_username_dict(name_list, username_list). which takes in two list's 1 being the name_list with names of people than the other list being usernames that is made out of the names of people. what i want to do is take does two list than convert them to a dictonary and set them together.
like this:
>>> name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
>>> username_list = ["alejon", "carli", "hanri"]
>>> create_username_dict(name_list, username_list)
{
"Albert Jones": "alejon",
"Carlos Lion": "carli",
"Hanna Richardo": "hanri"
}
i have tried look around on how to connect two different list in too one dictonary, but can't seem to find the right solution

If both lists are in matching order, i.e. the i-th element of one list corresponds to the i-th element of the other, then you can use this
D = dict(zip(name_list, username_list))

Use zip to pair the list.
d = {key: value for key,value in zip(name_list, username_list)}
print(d)
Output:
{'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}

Considering both the list are same length and one to one mapping
name_list = ["Ola Nordmann", "Kari Olsen", "Roger Jensen"]
username_list = ["alejon", "carli", "hanri"]
result_stackoverflow = dict()
for index, name in enumerate(name_list):
result_stackoverflow[name] = username_list[index]
print(result_stackoverflow)
>>> {'Ola Nordmann': 'alejon', 'Kari Olsen': 'carli', 'Roger Jensen': 'hanri'}
Answer by #alex does the same but maybe too encapsulated for a beginner. So this is the verbose version.

accelerate comparing dictionary keys and values to strings in list in python

Sorry if this is trivial I'm still learning but I have a list of dictionaries that looks as follow:
[{'1102': ['00576', '00577', '00578', '00579', '00580', '00581']},
{'1102': ['00582', '00583', '00584', '00585', '00586', '00587']},
{'1102': ['00588', '00589', '00590', '00591', '00592', '00593']},
{'1102': ['00594', '00595', '00596', '00597', '00598', '00599']},
{'1102': ['00600', '00601', '00602', '00603', '00604', '00605']}
...]
it contains ~89000 dictionaries. And I have a list containing 4473208 paths. example:
['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv',
'/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv',
...]
and what I want to do is group each path that contains the grouped values in the dict in the folder containing the key together.
I tried using for loops like this:
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for file in ct_paths:
for key, val in elem.items():
if (file[16:20] == key) and (any(x in file[21:26] for x in val)):
temp1.append(file)
grpd_cts.append(temp1)
but this takes around 30hours. is there a way to make it more efficient? any itertools function or something?
Thanks a lot!

ct_paths is iterated repeatedly in your inner loop, and you're only interested in a little bit of it for testing purposes; pull that out and use it to index the rest of your data, as a dictionary.
What does make your problem complicated is that you're wanting to end up with the original list of filenames, so you need to construct a two-level dictionary where the values are lists of all originals grouped under those two keys.
ct_path_index = {}
for f in ct_paths:
ct_path_index.setdefault(f[16:20], {}).setdefault(f[21:26], []).append(f)
grpd_cts = []
for elem in tqdm(dict_list):
temp1 = []
for key, val in elem.items():
d2 = ct_path_index.get(key)
if d2:
for v in val:
v2 = d2.get(v)
if v2:
temp1 += v2
grpd_cts.append(temp1)
ct_path_index looks like this, using your data:
{'1102': {'00575': ['/****/**/******_1102/00575***...**0CT.csv',
'/****/**/******_1102/00575***...**1CT.csv',
'/****/**/******_1102/00575***...**2CT.csv',
'/****/**/******_1102/00575***...**3CT.csv',
'/****/**/******_1102/00575***...**4CT.csv'],
'00578': ['/****/**/******_1102/00578***...**1CT.csv',
'/****/**/******_1102/00578***...**2CT.csv',
'/****/**/******_1102/00578***...**3CT.csv']}}
The use of setdefault (which can be a little hard to understand the first time you see it) is important when building up collections of collections, and is very common in these kinds of cases: it makes sure that the sub-collections are created on demand and then re-used for a given key.
Now, you've only got two nested loops; the inner checks are done using dictionary lookups, which are close to O(1).
Other optimizations would include turning the lists in dict_list into sets, which would be worthwhile if you made more than one pass through dict_list.

How do I index f.items()?

I could run a for loop as such:
for v in f.items():
BUT, it takes too much time. I know I want the second object in f.items(). How to I directly get the second object and save it?
Im not sure what the syntax is: e.g is it f.items(2), f.items()[2]? None of these work so I was wondering what does.

If you want values(your 2nd objects) from f.items() you should use the below: -
for k,v in f.items()
Or if you want the 2nd item from f.items() you should use the below: -
f = {1:'A',2:'B',3:'C'}
for item in enumerate(f.items(),1):
k,v = item
if k == 2:
print(v)
Do still want to extract 2nd value from 2nd item ?

You can create a list and then index.
item = list(f.items())[1]
Lists are created often in python and this operation is relatively inexpensive. If your dictionary is large, you can create an iterator and take its second value.
i = iter(f.items())
next(i)
item = next(i)
But the dict would need to be rather large to make this the better option.

simplify expression in list comprehension

I am trying to generate a list of strings, and I am looking for a simple expression to do so but can't find out.
What I have:
aScanListNames = ["AIN0", "AIN1", "AIN2", "AIN3"]
[[chan+"_NEGATIVE_CH", chan+"_RANGE", chan+"_RESOLUTION_INDEX", chan+"_EF_CONFIG_D", chan+"_EF_CONFIG_E"] for chan in aScanListNames]
Gives :
[['AIN0_NEGATIVE_CH', 'AIN0_RANGE', 'AIN0_RESOLUTION_INDEX', 'AIN0_EF_CONFIG_D', 'AIN0_EF_CONFIG_E'], ['AIN1_NEGATIVE_CH', 'AIN1_RANGE', 'AIN1_RESOLUTION_INDEX', 'AIN1_EF_CONFIG_D', 'AIN1_EF_CONFIG_E'], ['AIN2_NEGATIVE_CH', 'AIN2_RANGE', 'AIN2_RESOLUTION_INDEX', 'AIN2_EF_CONFIG_D', 'AIN2_EF_CONFIG_E'], ['AIN3_NEGATIVE_CH', 'AIN3_RANGE', 'AIN3_RESOLUTION_INDEX', 'AIN3_EF_CONFIG_D', 'AIN3_EF_CONFIG_E']]
which is , as expected, a list of lists. I would like to obtain a simple list, like this :
['AIN0_NEGATIVE_CH','AIN0_RANGE','AIN0_RESOLUTION_INDEX','AIN0_EF_CONFIG_D','AIN0_EF_CONFIG_E','AIN1_NEGATIVE_CH','AIN1_RANGE','AIN1_RESOLUTION_INDEX','AIN1_EF_CONFIG_D','AIN1_EF_CONFIG_E','AIN2_NEGATIVE_CH','AIN2_RANGE','AIN2_RESOLUTION_INDEX','AIN2_EF_CONFIG_D','AIN2_EF_CONFIG_E','AIN3_NEGATIVE_CH','AIN3_RANGE','AIN3_RESOLUTION_INDEX','AIN3_EF_CONFIG_D','AIN3_EF_CONFIG_E']
For my personnal knowledge, I would like to know if there a way to obtain this directly using list comprehension?
If not, what would be a pythonic way to do so?
EDIT: I know I can flatten my list of list, but I want to know if there is a solution not involving creating a list of lists to flatten it after.

You were almost there. No need for itertools
aScanListNames = ["AIN0", "AIN1", "AIN2", "AIN3"]
suffixes = ["_NEGATIVE_CH", "_RANGE", "_RESOLUTION_INDEX", "_EF_CONFIG_D", "_EF_CONFIG_E"]
result = [name+suffix for name in aScanListNames for suffix in suffixes]

I would say this one-liner is intuitive enough:
import itertools
aScanListNames = ["AIN0", "AIN1", "AIN2", "AIN3"]
suffixes = ["_NEGATIVE_CH", "_RANGE", "_RESOLUTION_INDEX", "_EF_CONFIG_D", "_EF_CONFIG_E"]
[i[0] + i[1] for i in itertools.product(aScanListNames, suffixes)]
Output:
['AIN0_NEGATIVE_CH', 'AIN0_RANGE', 'AIN0_RESOLUTION_INDEX', 'AIN0_EF_CONFIG_D', 'AIN0_EF_CONFIG_E', 'AIN1_NEGATIVE_CH', 'AIN1_RANGE', 'AIN1_RESOLUTION_INDEX', 'AIN1_EF_CONFIG_D', 'AIN1_EF_CONFIG_E', 'AIN2_NEGATIVE_CH', 'AIN2_RANGE', 'AIN2_RESOLUTION_INDEX', 'AIN2_EF_CONFIG_D', 'AIN2_EF_CONFIG_E', 'AIN3_NEGATIVE_CH', 'AIN3_RANGE', 'AIN3_RESOLUTION_INDEX', 'AIN3_EF_CONFIG_D', 'AIN3_EF_CONFIG_E']
If you really want a one-liner you can of course provide suffixes list inline, but that's just messy.

Comparing lists of dictionaries

I have two lists of test results. The test results are represented as dictionaries:
list1 = [{testclass='classname', testname='testname', testtime='...},...]
list2 = [{testclass='classname', testname='testname', ...},...]
The dictionary representation is slightly different in both lists, because for one list I have some
more information. But in all cases, every test dictionary in either list will have a classname and testname element which together effectively form a way of uniquely identifying the test and a way to compare it across lists.
I need to figure out all the tests that are in list1 but not in list2, as these represent new test failures.
To do this I do:
def get_new_failures(list1, list2):
new_failures = []
for test1 in list1:
for test2 in list2:
if test1['classname'] == test2['classname'] and \
test1['testname'] == test2['testname']:
break; # Not new breakout of inner loop
# Doesn't match anything must be new
new_failures.append(test1);
return new_failures;
I am wondering is a more python way of doing this. I looked at filters. The function the filter uses would need to get a handle to both lists. One is easy, but I am not sure how it would get a handle to both. I do know the contents of the lists until runtime.
Any help would be appreciated,
Thanks.

Try this:
def get_new_failures(list1, list2):
check = set([(d['classname'], d['testname']) for d in list2])
return [d for d in list1 if (d['classname'], d['testname']) not in check]

To compare two dict d1 and d2 on a subset of their keys, use:
all(d1[k] == d2[k] for k in ('testclass', 'testname'))
And if your two list have the same lenght, you can use zip() to pair them.

If each combination of classname and testname is truly unique, then the more computationally efficient approach would be to use two dictionaries instead of two lists. As key to the dictionary, use a tuple like so: (classname, testname). Then you can simply say if (classname, testname) in d: ....
If you need to preserve insertion order, and are using Python 2.7 or above, you could use an OrderedDict from the collections module.
The code would look something like this:
tests1 = {('classname', 'testname'):{'testclass':'classname',
'testname':'testname',...},
...}
tests2 = {('classname', 'testname'):{'testclass':'classname',
'testname':'testname',...},
...}
new_failures = [t for t in tests1 if t not in tests2]
If you must use lists for some reason, you could iterate over list2 to generate a set, and then test for membership in that set:
test1_tuples = ((d['classname'], d['testname']) for d in test1)
test2_tuples = set((d['classname'], d['testname']) for d in test2)
new_failures = [t for t in test1_tuples if t not in test2_tuples]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Finding the same second elements in nested lists - recursive function - python

filter(lambda x:x[1] in set(filter(lambda x:zip(l)[1].count(x)==2,zip(l)[1])),l)

Related

Given two list of words, than return as dictionary and set together

accelerate comparing dictionary keys and values to strings in list in python

How do I index f.items()?

simplify expression in list comprehension

Comparing lists of dictionaries

Categories

Resources