Comparing elements between two lists of lists in python - python

List A:
[('Harry', 'X', 'A'),
('James', 'Y', 'G'),
('John', 'Z', 'D')]
List B:
[('Helen', '2', '(A; B)', '3'),
('Victor', '9', '(C; D; E)', '4'),
('Alan', '10', '(A)', '57'),
('Paul', '11', '(F; B)', '43'),
('Sandra', '12', '(F)', '31')]
Basically I have to compare the third element (for x in listA -> x[2]) from list A and check if is there any list in list B that has the same element (for y in listB, x[2] == y[2]) but I'm just losing my mind with this.
My idea was to get the third element from each list in list B, put them into a new list, and then remove that ";" so I could access each element way more easily.
for x in listB:
j = x[2]
j = j.strip().split(', ')
for k in j:
FinalB.append(k)
FinalB = [(k[1:-1].split(";")) for k in FinalB]
Then I'd take the third element from each list of list A and compare them with the elements inside each list of FinalB: if there was a match, I'd get the index of the element in FinalB (the one that's matched), use that index to access his list in listB and get the first element of his list inside list B (basically, I have to know the names from the users inside each list that have the same 3rd element)
My code so far:
FinalB= []
DomainsList = []
for x in listA:
j = x[2]
j = j.strip().split(', ')
for k in j:
FinalB.append(k)
FinalB = [(k[1:-1].split(";")) for k in FinalB]
for y in listA:
for z in FinalB:
for k in z:
if y[2] == k:
m = FinalB.index(z)
DomainsList.append([listA[m][0],listB[m][0]])
return DomainsList
Yes, this is not working (no error, I probably just did this in an absolute wrong way) and I can't figure out what and where I'm doing wrong.

First, I think a better way to handle '(C; D; E)' is to change it to 'CDE', so the first loop becomes:
FinalB = [filter(str.isalpha, x[2]) for x in listB]
We take each string and keep only the alpha characters, so we end up with:
In [18]: FinalB
Out[18]: ['AB', 'CDE', 'A', 'FB', 'F']
This means we can use listA[x][2] in FinalB[y] to test if we have a match:
for y in listA:
for z in FinalB:
if y[2] in z:
DomainsList.append([y[0], listB[FinalB.index(z)][0]])
I had to tweak the arguments to the append() to pick the right elements, so we end up with:
In [17]: DomainsList
Out[17]: [['Harry', 'Helen'], ['Harry', 'Alan'], ['John', 'Victor']]
Usefully, if instead of '(C; D; E)' you have '(foo; bar; baz)', then with just one tweak the code can work for that too:
import re
FinalB = [filter(None, re.split("[; \(\)]+", x[2])) for x in listB]
The remaining code works as before.

It will always help to start a question with context and details.
The python version could also come into play.
The data structure you have given for us to work with is very questionable - especially the third element in each of the tuples in listB...why have a string element and then define it like this '(C; D; E)' ??
Even though I don't understand where you are coming from with this or what this is meant to achieve,no context provided in post, this code should get you there.
It will give you a list of tupples ( listC ), with each tuple having two elements. Element one having the name from listA and element 2 having the name from listB where they have a match as described in post.
NOTE: at the moment the match is simply done with a find, which will work perfectly with the provided details, however you may need to change this to be suitable for your needs if you could have data that would cause false positives or if you want to ignore case.
listA = [('Harry', 'X', 'A'), ('James', 'Y', 'G'), ('John', 'Z', 'D')]
listB = [('Helen', '2', '(A; B)', '3'),
('Victor', '9', '(C; D; E)', '4'),
('Alan', '10', '(A)', '57'),
('Paul', '11', '(F; B)', '43'),
('Sandra', '12', '(F)', '31')]
listC = []
for a in listA:
for b in listB:
if b[2].find(a[2]) != -1:
listC.append((a[0], b[0]))
print(listC)
This gives you.
[('Harry', 'Helen'), ('Harry', 'Alan'), ('John', 'Victor')]

Related

How to compare two lists and remove elements in python

I have two lists. I am trying to remove all the elements in the first list occurred prior to any element in the second list. This should be happened only once.
As for an example:
Let,
list1 = ['a','d','k','1','3','a','f','3']
list2=['1','2','3','4','5']
what my resulting list should be:
listResult=['1','3','a','f','3']
in list 1, the element '1' is found in the list2. Hence, all the items prior to '1' has to be removed. This should not be happened with element '3', as the removing process has to be break after one removal.
I have tried with the following but it only considers one given specific element. what i am trying is to not only consider one element and to consider a list of elements.
from itertools import dropwhile
list1 = ['a','d','k','1','3','a','f','3']
element = '1'
list(dropwhile(lambda x: x != element, list1))
The in operator should be used in lambda for containment test:
>>> list(dropwhile(lambda x, s=frozenset(list2): x not in s, list1))
['1', '3', 'a', 'f', '3']
Another interesting but slightly inefficient solution:
>>> from itertools import compress, accumulate
>>> from operator import or_
>>> s = set(list2)
>>> list(compress(list1, accumulate((x in s for x in list1), or_)))
['1', '3', 'a', 'f', '3']
Because the default value of the second parameter of accumulate is equivalent to operator.add, and positive numbers are always regarded as true in Python, the or_ can be omitted:
>>> list(compress(list1, accumulate(x in s for x in list1)))
['1', '3', 'a', 'f', '3']
We can make a set using list1 which can be used to check if item in list2 present in list1.
If it is present, we can take the index of the item and take the subarray from the item index. Then we can break the loop.
list1 = ['a','d','k','1','3','a','f','3']
list2 = ['1','2','3','4','5']
listResult = []
set1 = set(list1)
for i in list2:
if (i in set1):
idx = list1.index(i)
listResult = list1[idx:]
break
print(listResult)

python: Insert elements into tuples inside of list of tuples

I have two lists. One list of tuples containing two elements, another containing only strings.
What is the best way to combine these in this way:
list1 = [('1','2'), ('3','4')]
list2 = ['one','two']
expected_result = [('1','2','one'), ('3','4','two')]
I am stuck on something like:
result = [elt+(list2[0],) for elt in list1]
However, I'm not sure if it's possible to iterate 2 lists inside of one list comprehension at the same time. Having a bit of a brain fart here. any help would be appreciated.
Bonus points if it fits on one line (list comprehension style)!
You can zip() the two lists together. This will give you elements like (('1', '2'), 'one'). This assumes the lists are the same length. Then in a list comprehension make a new tuple from those either by concating them, or spreading the first into tuple as below:
list1 = [('1','2'), ('3','4')]
list2 = ['one','two']
[(*t, word) for t, word in zip(list1, list2)]
# [('1', '2', 'one'), ('3', '4', 'two')]
alternatively:
[t + (word,) for t, word in zip(list1, list2)]
List comprehension style solution (one liner)
list1 = [('1','2'), ('3','4')]
list2 = ['one','two']
new_lst = [ list1[i]+(list2[i],) for i in range(len(list1))]
print(new_lst)
>> [('1', '2', 'one'), ('3', '4', 'two')]

using python 3.6 to slice substring with same char [duplicate]

I am not well experienced with Regex but I have been reading a lot about it. Assume there's a string s = '111234' I want a list with the string split into L = ['111', '2', '3', '4']. My approach was to make a group checking if it's a digit or not and then check for a repetition of the group. Something like this
L = re.findall('\d[\1+]', s)
I think that \d[\1+] will basically check for either "digit" or "digit +" the same repetitions. I think this might do what I want.
Use re.finditer():
>>> s='111234'
>>> [m.group(0) for m in re.finditer(r"(\d)\1*", s)]
['111', '2', '3', '4']
If you want to group all the repeated characters, then you can also use itertools.groupby, like this
from itertools import groupby
print ["".join(grp) for num, grp in groupby('111234')]
# ['111', '2', '3', '4']
If you want to make sure that you want only digits, then
print ["".join(grp) for num, grp in groupby('111aaa234') if num.isdigit()]
# ['111', '2', '3', '4']
Try this one:
s = '111234'
l = re.findall(r'((.)\2*)', s)
## it this stage i have [('111', '1'), ('2', '2'), ('3', '3'), ('4', '4')] in l
## now I am keeping only the first value from the tuple of each list
lst = [x[0] for x in l]
print lst
output:
['111', '2', '3', '4']
If you don't want to use any libraries then here's the code:
s = "AACBCAAB"
L = []
temp = s[0]
for i in range(1,len(s)):
if s[i] == s[i-1]:
temp += s[i]
else:
L.append(temp)
temp = s[i]
if i == len(s)-1:
L.append(temp)
print(L)
Output:
['AA', 'C', 'B', 'C', 'AA', 'B']

Sort inter-dependant elements of a dictionary

I have a dictionary like this
dict = {'name':''xyz','c1':['a','b','r','d','c'],'c2':['21','232','11','212','34']}
Here dict.c1 values and dict.c2 values are inter-dependant. That is, 'a' is related to '21', 'd' is related to '212', 'b' is related to '232'...
I have to sort c1 and c2 should get reflected accordingly. The final output should be
dict = {'name':''xyz','c1':['a','b','c','d','r'],'c2':['21','232','34','212','11']}
What is the most efficient way to do this?
This works:
d = {'name': 'xyz','c1':['a','b','r','d','c'],'c2':['21','232','11','212','34']}
s = sorted(list(zip(d['c1'], d['c2'])))
d['c1'] = [x[0] for x in s]
d['c2'] = [x[1] for x in s]
Result:
{'c1': ['a', 'b', 'c', 'd', 'r'],
'c2': ['21', '232', '34', '212', '11'],
'name': 'xyz'}
UPDATE
The call to list is not needed. Thanks to tzaman for the hint. It was a relict of putting together the solution from separate steps.
d = {'name': 'xyz','c1':['a','b','r','d','c'],'c2':['21','232','11','212','34']}
s = sorted(zip(d['c1'], d['c2']))
d['c1'] = [x[0] for x in s]
d['c2'] = [x[1] for x in s]
Your data structure does not reflect the real relationship between its elements. I would start by merging c1 and c2 into an OrderedDict. It would take care of both the relationship and the order of elements. Like this:
dict = dict(
name='xyz',
c = OrderedDict(sorted(
zip(
['a','b','r','d','c'],
['21','232','11','212','34']
)
))
)
Frankly, the most efficient way is to store the data in a form compatible with its use. In this case, you would use a table (e.g. PANDAS data frame) or simply a list:
xyz_table = [('a', '21'), ('b'. 232'), ...]
Then you simply sort the list on the first element of each tuple:
xyz_table = [('a', '21'), ('b', '232'), ('c', '34'), ('d', '212'), ('r', '11')]
xyz_sort = sorted(xyz_table, key = lambda row: row[0])
print xyz_sort
You could also look up a primer on Python sorting, such as this
Here's one way you could do it, using zip for both joining and unjoining. You can re-cast them as lists instead of tuples when you put them back into the dictionary.
a = ['a','b','r','d','c']
b = ['21','232','11','212','34']
mix = list(zip(a, b))
mix.sort()
a_new, b_new = zip(*mix)
>>> a_new
('a', 'b', 'c', 'd', 'r')
>>> b_new
('21', '232', '34', '212', '11')

The last list elements and conditional loops

mylist="'a','b','c'"
count=0
i=0
while count< len(mylist):
if mylist[i]==mylist[i+1]:
print mylist[i]
count +=1
i +=1
Error:
File "<string>", line 6, in <module>
IndexError: string index out of range
I'm assuming that when it gets to the last (nth) element it can't find an n+1 to compare it to, so it gives me an error.
Interestingly, i think that I've done this before and not had this problem on a larger list: Here is an example (with credit to Raymond Hettinger for fixing it up)
list=['a','a','x','c','e','e','f','f','f']
i=0
count = 0
while count < len(list)-2:
if list[i] == list[i+1]:
if list [i+1] != list [i+2]:
print list[i]
i+=1
count +=1
else:
print "no"
count += 1
else:
i +=1
count += 1
For crawling through a list in the way I've attempted, is there any fix so that I don't go "out of range?" I plan to implement this on a very large list, where I'll have to check if "list[i]==list[i+16]", for example. In the future, I would like to add on conditions like "if int(mylist[i+3])-int(mylist[i+7])>10: newerlist.append[mylist[i]". So it's important that I solve this problem.
I thought about inserting a break statement, but was unsuccessful.
I know this is not the most efficient, but I'm at the point where it's what i understand best.
So it sounds like you are trying to compare elements in your list at various fixed offsets. perhaps something like this could help you:
for old, new in zip(lst, lst[n:]):
if some_cond(old, new):
do_work()
Explanation:
lst[n:] returns a copy of lst, starting from the nth (mind the 0-indexing) element
>>> lst = [1,2,2,3];
>>> lst[1:]
[2,2,3]
zip(l1, l2) creates a new list of tuples, with one element from each list
>>> zip(lst, lst[1:])
[(1, 2), (2, 2), (2, 3)]
Note that it stops as soon as either list runs out. in this case, the offset list runs out first.
for a list of tuples, you can "upack directly" in the loop variable, so
for old, new in zip(lst, lst[1:])
gives loops through the elements you want (pairs of successive elements in your list)
As a general idea, if you are trying to look ahead a certain number of places, you can do a few things:
In the loop check (I.e. count < length), you'll need to check on the max field. So in your example, you wanted to go 16 spaces. This would mean that you would need to check count < (length - 16). The downside is that your last elements (the last 16) won't be iterated over.
Check inside the loop to make sure the index is applicable. That is, on each if statement start with: if(I+16 < length && logic_you_want_to_check). This will allow you to continue through the loop, but when the logic will fail because its out of bounds, you won't error out.
Note- this probably isn't what you want, but ill add it for completeness. Wrap around your logic. This will only work if wrap arounds can be considered. If you literally want to check the 16th index ahead of your current index (I.e like a place in a line perhaps), then wrapping around doesn't really suit well. But if don't need that logic, and want to model your values in a circular pattern, you can modulus your index. That is: if array[i] == array [(i + 16)%length(array)] would check either 16 ahead or wrap around to the front of the array.
Edit:
Right, with the new information in the OP, this becomes much simpler. Use the itertools grouper() recipe to group the data for each person into tuples:
import itertools
def grouper(iterable, n, fillvalue=None):
"""Collect data into fixed-length chunks or blocks"""
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
args = [iter(iterable)] * n
return itertools.zip_longest(*args, fillvalue=fillvalue)
data = ['John', 'Sally', '5', '10', '11', '4', 'John', 'Sally', '3', '7', '7', '10', 'Bill', 'Hallie', '4', '6', '2', '1']
grouper(data, 6)
Now your data looks like:
[
('John', 'Sally', '5', '10', '11', '4'),
('John', 'Sally', '3', '7', '7', '10'),
('Bill', 'Hallie', '4', '6', '2', '1')
]
Which should be easy to work with, by comparison.
Old Answer:
If you need to make more arbitrary links, rather than just checking continuous values:
def offset_iter(iterable, n):
offset = iter(iterable)
consume(offset, n)
return offset
data = ['a', 'a', 'x', 'c', 'e', 'e', 'f', 'f', 'f']
offset_3 = offset_iter(data, 3)
for item, plus_3 in zip(data, offset_3): #Naturally, itertools.izip() in 2.x
print(item, plus_3) #if memory usage is important.
Naturally, you would want to use semantically valid names. The advantage to this method is it works with arbitrary iterables, not just lists, and is efficient and readable, without any ugly, inefficient iteration by index. If you need to continue checking once the offset values have run out (for other conditions, say) then use itertools.zip_longest() (itertools.izip_longest() in 2.x).
Using the consume() recipe from itertools.
import itertools
import collections
def consume(iterator, n):
"""Advance the iterator n-steps ahead. If n is none, consume entirely."""
# Use functions that consume iterators at C speed.
if n is None:
# feed the entire iterator into a zero-length deque
collections.deque(iterator, maxlen=0)
else:
# advance to the empty slice starting at position n
next(itertools.islice(iterator, n, n), None)
I would, however, greatly question if you need to re-examine your data structure in this case.
Original Answer:
I'm not sure what your aim is, but from what I gather you probably want itertools.groupby():
>>> import itertools
>>> data = ['a', 'a', 'x', 'c', 'e', 'e', 'f', 'f', 'f']
>>> grouped = itertools.groupby(data)
>>> [(key, len(list(items))) for key, items in grouped]
[('a', 2), ('x', 1), ('c', 1), ('e', 2), ('f', 3)]
You can use this to work out when there are (arbitrarily large) runs of repeated items. It's worth noting you can provide itertools.groupby() with a key argument that will group them based on any factor you want, not just equality.
If you adhere to "Practicality beats purity":
for idx, element in enumerate(yourlist[n:]):
if yourlist[idx] == yourlist[idx-n]
...
If you don't care about memory efficiency go for second's answer. If you want the purest answer then go for Lattyware's one.

Categories

Resources