i have two list with the same length over 11 rows. I would like df[0] to find a match in any position in df2[0] and df[1] to find a match in any position df2[1] and etc.... Instead of me typing one by one is there a easier method?
df = [[[1, 5,7,9,12,13,17],
[2,17,18,23,32,34,45],
[3,5,11,33,34,36,45]],
[[6,21,22,50,56,58,72],
[7,5,12,13,55,56,74],
[8,23,24,32,56,58,64]]]
df2 = [[[100,5,12,15,27,32,54],
[120,10,17,18,19,43,55],
[99,21,32,33,34,36,54]],
[[41,16,32,45,66,67,76],
[56,10,11,43,54,55,56],
[77,12,16,18,19,21,23]]]
i would like my output to be like this.
output = [[[[5,12,],[17]],
[[17,18],[32,34,36]]],
[[[55,56],[32]],[[56]]]
As of your reworked question, it is still not quite clear to me what exactly you want to accomplish. I assume you want element based matching. By using this approach we can find the matching sequences of two lists.
For the presented case, we just need to iterate over all the elements of your array.
The matches function will find all matching sequences. Using it in the nested for loop allows for element wise comparison. The matching sequences are the ten written to matched_sequences which will hold all identified matches.
import difflib
df = [
[[1, 5, 7, 9, 12, 13, 17], [2, 17, 18, 23, 32, 34, 45], [3, 5, 11, 33, 34, 36, 45]],
[[6, 21, 22, 50, 56, 58, 72], [7, 5, 12, 13, 55, 56, 74], [8, 23, 24, 32, 56, 58, 64]],
]
df2 = [
[[100, 5, 12, 15, 27, 32, 54], [120, 10, 17, 18, 19, 43, 55], [99, 21, 32, 33, 34, 36, 54]],
[[41, 16, 32, 45, 66, 67, 76], [56, 10, 11, 43, 54, 55, 56], [77, 12, 16, 18, 19, 21, 23]],
]
def matches(list1, list2):
while True:
mbs = difflib.SequenceMatcher(None, list1, list2).get_matching_blocks()
if len(mbs) == 1:
break
for i, j, n in mbs[::-1]:
if n > 0:
yield list1[i : i + n]
del list1[i : i + n]
del list2[j : j + n]
matched_sequences = []
for row_df, row_df2 in zip(df, df2):
for el1, el2 in zip(row_df, row_df2):
matched_sequences.extend(list(matches(el1, el2)))
print(matched_sequences)
This will produce as identified matches:
[[12], [5], [17, 18], [33, 34, 36], [55, 56], [23]]
Related
I am trying to process a 3D image in chunks (non-overlapping windows). Once this is done I want to put the chunks back together in the right order.
I have been chunking the image as below:
tens = torch.tensor(range(64))
tens = tens.view((4,4,4))
print(tens)
>>>tensor([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11],
[12, 13, 14, 15]],
[[16, 17, 18, 19],
[20, 21, 22, 23],
[24, 25, 26, 27],
[28, 29, 30, 31]],
[[32, 33, 34, 35],
[36, 37, 38, 39],
[40, 41, 42, 43],
[44, 45, 46, 47]],
[[48, 49, 50, 51],
[52, 53, 54, 55],
[56, 57, 58, 59],
[60, 61, 62, 63]]])
tens = torch.chunk(tens,2, -1)
tens = torch.stack(tens)
tens = torch.chunk(tens,2, -2)
tens = torch.concat(tens)
tens = torch.chunk(tens,2, -3)
tens = torch.concat(tens)
print(torch.shape)
>>>torch.Size([8, 2, 2, 2])
Then I want to put it back together in the original order
tens = tens.view([4,4,2,2])
tens = tens.view([2,4,4,2])
tens = tens.view([4,4,4])
print(tens)
>>>tensor([[[ 0, 1, 4, 5],
[16, 17, 20, 21],
[ 2, 3, 6, 7],
[18, 19, 22, 23]],
[[ 8, 9, 12, 13],
[24, 25, 28, 29],
[10, 11, 14, 15],
[26, 27, 30, 31]],
[[32, 33, 36, 37],
[48, 49, 52, 53],
[34, 35, 38, 39],
[50, 51, 54, 55]],
[[40, 41, 44, 45],
[56, 57, 60, 61],
[42, 43, 46, 47],
[58, 59, 62, 63]]])
and I can't figure out how to get the elements in the right order. I realise I probably missed something in the docs or something else obvious but I can't find it. Any ideas?
Operator torch.chunk doesn't reduce dimensions so its inverse is torch.cat, not torch.stack.
Here are the transforms with the corresponding inverse operations:
Splitting the last dimension in two:
>>> chunks = tens.chunk(chunks=2, dim=-1)
>>> torch.cat(chunks, dim=-1)
Splitting the second dimension into two:
>>> chunks = tens.chunk(chunks=2, dim=-2)
>>> torch.cat(chunks, dim=-2)
Splitting the first dimension into two:
>>> chunks = tens.chunk(chunks=2, dim=-3)
>>> torch.cat(chunks, dim=-3)
If you want to invert the whole sequence, you just have to keep in mind that torch.cat is the reverse operation of torch.chunk:
>>> tens = torch.cat(tens.chunk(2), dim=-3)
>>> tens = torch.cat(tens.chunk(2), dim=-2)
>>> tens = torch.cat(tens.chunk(2), dim=-1)
Community of Stackoverflow:
I have a lists of sublists of sublists named dicts that was built by taken randomly from a df's index some values. The values can be repeated within the first level of the list of lists but not within the level of lists[e]. For example:
[[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 9, 23, 3, 27], [3, 30, 43]], #wrong because 9,3 and 43 are repeated in the three sublists
[[2, 26, 42, 29, 44], [], [2, 3, 44, 31, 27]], #2,44 are repeated
[[31, 43, 32, 23, 33], [], [44, 9, 27, 23, 29]], #23 is repeated
[[12, 27, 9, 44, 2], [], [25, 29, 40, 27, 12]]] #27 repeated
As it can be seen, it doesn't matter if the number 3 is repeated in the second sublist of sublists and also in the third sublist of sublists. The empty lists don't matter.
I've built a function that "corrects" the repeating of those values but apparently it doesn't solve all the cases. It takes three arguments: the mentioned list of lists, the df where it takes the numbers (the df's index) called matrix and "cuantosamples" which is a list of lists that indicates how the final result will be partitioned (in uneven sized lists). It's important to note that the code also contains a segment that doesn't allow a value that is replacing a repeated value to be taken again to replace another value in the next sublist:
def vigilado(list1,matrix,cuantosamples):
stored=[]
lists=[[]for e in range(len(dicts))]
vals=list(matrix.index.values)
for e,g in zip(list1,lists):
vig=list(itertools.chain(*e))
dup=list(duplicates(vig))
lendup=len(dup)
if lendup>0:
#assign new values
vals=[e for e in vals if e not in dup and e not in vig and e not in stored] #si esta repetido en la sublista 1, que no vuelva atomar esos valores
sample=matrix.loc[vals].sample(len(dup),weights='weights')
vls=list(sample.index.values)
#identify values to be replaced
dups=[i for i, j in enumerate(vig) if j in dup]
dups2=dups[lendup:]
for i in range(len(dups2)):
vig[dups2[i]]=vls[i]
g.extend(vig)
stored.extend(vig)
l1=[[]for e in range(0,5)]
for e,g,h in zip(lists,cuantosamples,l1):
iterate=iter(e)
l2=[list(islice(iterate,0,i))for i in g]
h.extend(l2)
return(l1)
vigilated=vigilado(dicts,matrix,cuantosamples)
vigilated
This return the following lists of lists, which as it can be seen, it works in mostly of the cases but not in all of them and I don't know why:
[[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 9, 23, 16, 27], [33, 30, 14]], #3 and 43 are no longer repeated, BUT 9 IS STILL REPEATED
[[2, 26, 42, 29, 44], [], [22, 3, 5, 31, 27]], #2 and 44 no longer repeated
[[31, 43, 32, 23, 33], [], [44, 9, 27, 6, 29]], #23 no longer repeated
[[12, 27, 9, 44, 2], [], [25, 29, 40, 1, 28]]] #27 no longer repeated
Can someone please help me? I don't have any idea of why the code is not applied to all cases because I thought that would solve it. Thanks.
Edit: this would be my desired output:
[[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 10, 23, 16, 27], [33, 30, 14]], #9 that wasn't replaced before is replaced here with a 10
[[2, 26, 42, 29, 44], [], [22, 3, 5, 31, 27]],
[[31, 43, 32, 23, 33], [], [44, 9, 27, 6, 29]],
[[12, 27, 9, 44, 2], [], [25, 29, 40, 1, 28]]]
As you can see it's very similar to my resulting list (because my code somehows replaces almost all values but one or two). The change here was that I replaced the 9 of the lists[1][3] to 10.
My response does not point out where the problem of your code is, but two approaches to your goal.
Approach 1
Generate dicts that does not have repeated index within each list of dicts. Explanations in code.
import numpy as np
index = np.arange(100)
cuantosamples = [[5, 0, 5], [5, 0, 5, 3], [5, 0, 5], [5, 0, 5], [5, 0, 5]]
np.random.seed(0)
dicts = [
list(map(list, # convert np.array to list
np.split( # split a list into sublists
np.random.choice(index, sum(needs), replace=False), # generate random choices without replacement
np.cumsum(needs)[:-1] # how to split
)))
for needs in cuantosamples
]
# print(dicts)
Approach 2
Replace repeated values with new values. Explanations in code.
dicts = [
[[40, 23, 29, 41, 42], [], [19, 17, 21, 20, 24]],
[[3, 9, 43, 44, 17], [], [20, 9, 23, 3, 27], [3, 30, 43]],
[[2, 26, 42, 29, 44], [], [2, 3, 44, 31, 27]],
[[31, 43, 32, 23, 33], [], [44, 9, 27, 23, 29]],
[[12, 27, 9, 44, 2], [], [25, 29, 40, 27, 12]]
]
np.random.seed(0)
new_dicts = []
for lists, needs in zip(dicts, cuantosamples):
ary = np.array([x for l in lists for x in l ]) # flatten lists into an array
candidates = [x for x in index if x not in ary] # find out what to be replaced with
values, counts = np.unique(ary, return_counts=True) # find out what to replace
for v, c in zip(values, counts - 1):
if c:
ary[ary==v] = np.concatenate([[v], np.random.choice(candidates, c, replace=False)]) #replace
new_dicts.append(list(map(list, np.split(ary, np.cumsum(needs)[:-1]))))
new_dicts
I want to create a 6x6 numpy matrix, with the first row filled with: 0, 1, ..., 5, the second row filled with 10, 11, ... , 15, and the last row filled with 50, 51, ... , 55.
I thought about using (1) nested (two layer) list comprehensions, and then converting list-of-list into a numpy.matrix object, or (2) using variables inside of range function, i.e. - range(x) and vary x from 1 to 6. I was not able to get either of these two ideas to work.
Below is my non-vectorized / looping code to construct this matrix. Is there a more Pythonic way of doing this?
a = np.zeros((6,6))
for i in range(6):
for j in range(6):
a[i,j] = 10*i + j
print(a)
(This is one of the examples given at 39:00 in the intro video to NumPy on Youtube:
Intro to Numerical Computing with NumPy
How about np.ogrid?
np.add(*np.ogrid[:60:10, :6])
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Details
ogrid returns an open meshgrid:
a, b = np.ogrid[:60:10, :6]
a
# array([[ 0],
# [10],
# [20],
# [30],
# [40],
# [50]])
b
# array([[0, 1, 2, 3, 4, 5]])
You can then perform broadcasted addition:
# a + b
np.add(a, b)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
Similarly, you can also generate two ranges using np.arange and add them:
np.arange(0, 60, 10)[:,None] + np.arange(6)
# array([[ 0, 1, 2, 3, 4, 5],
# [10, 11, 12, 13, 14, 15],
# [20, 21, 22, 23, 24, 25],
# [30, 31, 32, 33, 34, 35],
# [40, 41, 42, 43, 44, 45],
# [50, 51, 52, 53, 54, 55]])
This can be accomplished with broadcasting,
arange(0, 6) + 10*arange(0, 6)[:, None]
array([[ 0, 1, 2, 3, 4, 5],
[10, 11, 12, 13, 14, 15],
[20, 21, 22, 23, 24, 25],
[30, 31, 32, 33, 34, 35],
[40, 41, 42, 43, 44, 45],
[50, 51, 52, 53, 54, 55]])
I'd recommend reading https://docs.scipy.org/doc/numpy/user/basics.broadcasting.html and https://docs.scipy.org/doc/numpy/reference/arrays.indexing.html. "Pythonic" doesn't really matter when working with numpy. Some times iterating, list comprehensions, and other pythonic approaches work well with arrays, other times they are terribly inefficient. However, the links given cover some high level concepts that are very powerfull with numpy.
I seems found a bug when I'm using python 2.7 with numpy module:
import numpy as np
x=np.arange(3*4*5).reshape(3,4,5)
x
Here I got the full 'x' array as follows:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
Then I try to indexing single row values in sheet [1]:
x[1][0][:]
Result:
array([20, 21, 22, 23, 24])
But something wrong while I was try to indexing single column in sheet [1]:
x[1][:][0]
Result still be the same as previous:
array([20, 21, 22, 23, 24])
Should it be array([20, 25, 30, 35])??
It seems something wrong while indexing the middle index with range?
No, it's not a bug.
When you use [:] you are using slicing notation and it takes all the list:
l = ["a", "b", "c"]
l[:]
#output:
["a", "b", "c"]
and in your case:
x[1][:]
#output:
array([[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]])
What you realy wish is using numpy indexing notation:
x[1, : ,0]
#output:
array([20, 25, 30, 35])
This is not a bug. x[1][:][0] is not a multiple index ("give me the elements where first dimension is 1, second is any, third is 0"). Instead, you are indexing three times, three objects.
x1 = x[1] # x1 is the first 4x5 subarray
x2 = x1[:] # x2 is same as x1
x3 = x2[0] # x3 is the first row of x2
To use multiple index, you want to do it in a single slice:
x[1, :, 0]
I have two files that look like this with some differences between them:
First file:
{16:[3, [-7, 87, 20, 32]]}
{17:[2, [-3, 88, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 28], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 85, 20, 32]]}
{20:[2, [9, 94, 16, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [2, 76, 19, 31]]}
{22:[2, [15, 97, 16, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 16, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 19, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 16, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}
Second file:
{16:[3, [-7, 86, 20, 32]]}
{17:[2, [-3, 82, 16, 28], 3, [-6, 84, 20, 32]]}
{18:[2, [-1, 88, 16, 27], 3, [-3, 84, 20, 32]]}
{19:[2, [1, 89, 16, 28], 3, [-2, 84, 20, 32]]}
{20:[2, [9, 94, 15, 28], 3, [1, 85, 20, 32]]}
{21:[2, [12, 96, 16, 28], 3, [1, 76, 19, 31]]}
{22:[2, [15, 97, 17, 28], 3, [4, 73, 19, 29]]}
{23:[2, [18, 96, 18, 28], 3, [6, 71, 19, 29], 10, [-10, 60, 51, 82]]}
{24:[2, [22, 97, 16, 28], 3, [9, 71, 20, 27], 10, [-5, 63, 49, 78]]}
{25:[2, [25, 99, 16, 28], 3, [13, 71, 17, 26], 10, [-1, 64, 46, 77]]}
{26:[2, [29, 101, 17, 28], 3, [17, 70, 16, 25], 10, [-1, 65, 45, 77]]}
I compare them both using difflib and print out the lines that have a difference in them.
What i am trying to do is print out the minimum and maximum frame values that share the same id.
The frame is the key in every line so the frames in this case range from 16 to 26. The id is the value that preceeds every list of 4 values. So the id on the first line is 3. The second line has two id's which are 2 and then 3.
So an example of what i'd like to write out is:
17 - 36
given that one of the frames that share the id 3 is different than the file that i am comparing with.
For every difference like that, i need to write out a new file that only contains the start frame and the end frame, then i'll work on concatenating additional strings to each file.
this is the current difflib usage that prints out each line that has a different:
def compare(f1, f2):
with open(f1+'.txt', 'r') as fin1, open(f2+'.txt', 'r') as fin2:
diff = difflib.ndiff(fin1.readlines(), fin2.readlines())
outcome = ''.join(x[2:] for x in diff if x.startswith('- '))
print outcome
How would i be able to achieve what i described above with tweaking this execution block?
Note that both files share the same frame ammount but not the same ids so i would need to write two different files for each difference, possibly into a folder. So if the two files have 20 differences, i need to have two main folders one for each original file that each contain text files for every start and end frame of the same id.
Suppose your list of differences is the file content you give at the beginning of your post. I proceeded in 2 times, 1st get list of frames per id:
>>> from collections import defaultdict
>>> diffs = defaultdict(list)
>>> for line in s.split('\n'):
d = eval(line) # We have a dict
for k in d: # Only one value, k is the frame
# Only get even values for ids
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
>>> diffs # We now have a dict with ids as keys :
defaultdict(<type 'list'>, {10: [23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 2: [17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33], 3: [16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36], 29: [31, 32, 33, 34, 35, 36]})
Now we get the ranges per id, thanks to this other SO post that helps getting the ranges from a list of indexes:
>>> from operator import itemgetter
>>> from itertools import groupby
>>> for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])
id 10 : 23 -> 36
id 2 : 17 -> 33
id 3 : 16 -> 36
id 29 : 31 -> 36
You then have, for each id, the range of differences. I guess that with a little adaptation you can get to what to you want.
EDIT : here is the final answer with the same kind of block:
>>> def compare(f1, f2):
# 2 embedded 'with' because I'm on Python 2.5 :-)
with open(f1+'.txt', 'r') as fin1:
with open(f2+'.txt', 'r') as fin2:
lines1 = fin1.readlines()
lines2 = fin2.readlines()
# Do not forget the strip function to remove unnecessary '\n'
diff_lines = [l.strip() for l in lines1 if l not in lines2]
# Ok, we have our differences (very basic)
diffs = defaultdict(list)
for line in diff_lines:
d = eval(line) # We have a dict
for k in d:
list_ids = d[k] # Only one value, k is the frame
for i in range(0, len(d[k]), 2):
diffs[d[k][i]].append(k)
for id_ in diffs:
diffs[id_].sort()
for k, g in groupby(enumerate(diffs[id_]), lambda (i, x): i - x):
group = map(itemgetter(1), g)
print 'id {0} : {1} -> {2}'.format(id_, group[0], group[-1])
>>> compare(r'E:\CFM\Dev\Python\test\f1', r'E:\CFM\Dev\Python\test\f2')
id 2 : 17 -> 24
id 2 : 26 -> 26
id 3 : 16 -> 24
id 3 : 26 -> 26
id 10 : 23 -> 24
id 10 : 26 -> 26