A function I am writing will receive as input a matrix H=A x B x I x I, where each matrix is square and of dimension d, the cross refers to the Kronecker product np.kron and I is the identity np.eye(d). Thus
I = np.eye(d)
H = np.kron(A, B)
H = np.kron(H, I)
H = np.kron(H, I)
Given H and the above form, but without knowledge of A and B, I would like to construct G = I x A x I x B e.g. the result of
G = np.kron(I, A)
G = np.kron(G, I)
G = np.kron(G, B)
It should be possible to do this by applying some permutation to H. How do I implement that permutation?
Transposing with (2,0,3,1,6,4,7,5) (after expanding to 8 axes) appears to do it:
>>> from functools import reduce
>>>
>>> A = np.random.randint(0,10,(10,10))
>>> B = np.random.randint(0,10,(10,10))
>>> I = np.identity(10, int)
>>> G = reduce(np.kron, (A,B,I,I))
>>> H = reduce(np.kron, (I,A,I,B))
>>>
>>>
>>> (G.reshape(*8*(10,)).transpose(2,0,3,1,6,4,7,5).reshape(10**4,10**4) == H).all()
True
Explanation: Let's look at a minimal example to understand how the Kronecker product relates to reshaping and axis shuffling.
Two 1D factors:
>>> A, B = np.arange(1,5), np.array(list("abcd"), dtype=object)
>>> np.kron(A, B)
array(['a', 'b', 'c', 'd', 'aa', 'bb', 'cc', 'dd', 'aaa', 'bbb', 'ccc',
'ddd', 'aaaa', 'bbbb', 'cccc', 'dddd'], dtype=object)
We can observe that the arrangement is row-major-ish, so if we reshape we actually get the outer product:
>>> np.kron(A, B).reshape(4, 4)
array([['a', 'b', 'c', 'd'],
['aa', 'bb', 'cc', 'dd'],
['aaa', 'bbb', 'ccc', 'ddd'],
['aaaa', 'bbbb', 'cccc', 'dddd']], dtype=object)
>>> np.outer(A, B)
array([['a', 'b', 'c', 'd'],
['aa', 'bb', 'cc', 'dd'],
['aaa', 'bbb', 'ccc', 'ddd'],
['aaaa', 'bbbb', 'cccc', 'dddd']], dtype=object)
If we do the same with factors swapped we get the transpose:
>>> np.kron(B, A).reshape(4, 4)
array([['a', 'aa', 'aaa', 'aaaa'],
['b', 'bb', 'bbb', 'bbbb'],
['c', 'cc', 'ccc', 'cccc'],
['d', 'dd', 'ddd', 'dddd']], dtype=object)
With 2D factors things are similar
>>> A2, B2 = A.reshape(2,2), B.reshape(2,2)
>>>
>>> np.kron(A2, B2)
array([['a', 'b', 'aa', 'bb'],
['c', 'd', 'cc', 'dd'],
['aaa', 'bbb', 'aaaa', 'bbbb'],
['ccc', 'ddd', 'cccc', 'dddd']], dtype=object)
>>> np.kron(A2, B2).reshape(2,2,2,2)
array([[[['a', 'b'],
['aa', 'bb']],
[['c', 'd'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['aaaa', 'bbbb']],
[['ccc', 'ddd'],
['cccc', 'dddd']]]], dtype=object)
But there is a minor complication in that the corresponding outer product has axes arranged differently:
>>> np.multiply.outer(A2, B2)
array([[[['a', 'b'],
['c', 'd']],
[['aa', 'bb'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['ccc', 'ddd']],
[['aaaa', 'bbbb'],
['cccc', 'dddd']]]], dtype=object)
We need to swap middle axes to get the same result.
>>> np.multiply.outer(A2, B2).swapaxes(1,2)
array([[[['a', 'b'],
['aa', 'bb']],
[['c', 'd'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['aaaa', 'bbbb']],
[['ccc', 'ddd'],
['cccc', 'dddd']]]], dtype=object)
So if we want to go the swapped Kronecker product we can swap the middle axes: (0,2,1,3)
now we have the outer product. swapping factors exchanges the first two axes with the second two: (1,3,0,2)
going back to Kronecker, swap the middle axes
=> total axis permutation: (1,0,3,2)
>>> np.all(np.kron(A2, B2).reshape(2,2,2,2).transpose(1,0,3,2).reshape(4,4) == np.kron(B2, A2))
True
Using the same principles leads to the recipe for the four factor original question.
This answer expands on Paul Panzer's correct answer to document how one would solve similar problems like this more generally.
Suppose we wish to map a matrix string reduce(kron, ABCD) into, for example, reduce(kron, CADB), where each matrix has dimension d columns. Both of the strings are thus d**4, d**4 matrices. Alternatively they are [d,]*8 shaped arrays.
The way np.kron arranges data means that the index ordering of ABDC corresponds to that of its constituents as follows: D_0 C_0 B_0 A_0 D_1 C_1 B_1 A_1 where for example D_0 (D_1) is the fastest (slowest) oscillating index in D. For CADB the index ordering is instead (B_0 D_0 A_0 C_0 B_1 D_1 A_1 C_1); you just read the string backwards once for the faster and once for the slower indices. The appropriate permutation string in this case is thus (2,0,3,1,6,4,7,5).
Related
df
col1 col2
['aa', 'bb', 'cc', 'dd'] [['ee', 'ff', 'gg', 'hh'], ['qq', 'ww', 'ee', 'rr']]
['ss', 'dd', 'ff', 'gg'] [['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]
['ss', 'dd'] [['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]
I'd like to be able to run a function that concats the first list element in col1 to the first sublist elements (there are multiple sublists) in col2, then concats the second list element in col1 to the second sublist elements in col2.
Results would be like this column:
results
[['aaee', 'bbff', 'ccgg', 'ddhh'],['aaqq', 'bbww', 'ccee', 'ddrr']]
[['ssmm', 'ddnn', 'ffvv', 'ggcc'],['sszz', 'ddaa', 'ffjj', 'ggkk']]
[['ssmm', 'ddnn'],['sszz', 'ddaa']]
I'm thinking it would have something to do with looping through the first elements in col1 and somehow loop and match them to the corresponding items in each sublist in col2 - how can I do this?
Converted code
[[[df1.agg(lambda x: get_top_matches(u,w), axis=1) for u,w in zip(x,v)]\
for v in y] for x,y in zip(df1['parent_org_name_list'], df1['children_org_name_sublists'])]
Results:
You can just use zip here:
[[[u+w for u,w in zip(x,v)] for v in y] for x,y in zip(df['col1'], df['col2'])]
Output:
[[['aaee', 'bbff', 'ccgg', 'ddhh'], ['aaqq', 'bbww', 'ccee', 'ddrr']],
[['ssmm', 'ddnn', 'ffvv', 'ggcc'], ['sszz', 'ddaa', 'ffjj', 'ggkk']],
[['ssmm', 'ddnn'], ['sszz', 'ddaa']]]
To assign back to your dataframe, you can do:
df['results'] = [[[u+w for u,w in zip(x,v)] for v in y]
for x,y in zip(df['col1'], df['col2'])]
Max, try this solution with a cycle. It allows finer control over transformations, including dealing with uneven lengths (see len_limit in the example):
import pandas as pd
df = pd.DataFrame({'c1':[['aa', 'bb', 'cc', 'dd'],['ss', 'dd', 'ff', 'gg']],
'c2':[[['ee', 'ff', 'gg', 'hh'], ['qq', 'ww', 'ee', 'rr']],
[['mm', 'nn', 'vv', 'cc'], ['zz', 'aa', 'jj', 'kk']]],})
df ['c3'] = 'empty' # send string to 'c3' so it is object data type
print(df)
c1 c2 c3
0 [aa, bb, cc, dd] [[ee, ff, gg, hh], [qq, ww, ee, rr]] empty
1 [ss, dd, ff, gg] [[mm, nn, vv, cc], [zz, aa, jj, kk]] empty
for i, row in df.iterrows():
c3_list = []
len_limit = len (row['c1']
for c2_sublist in row['c2']:
c3_list.append([j1+j2 for j1, j2 in zip(row['c1'], c2_sublist[:len_limit])])
df.at[i, 'c3'] = c3_list
print (df['c3'])
0 [[aaee, bbff, ccgg, ddhh], [aaqq, bbww, ccee, ...
1 [[ssmm, ddnn, ffvv, ggcc], [sszz, ddaa, ffjj, ...
Name: c3, dtype: object
Try:
df["results"] = df[["col1", "col2"]].apply(lambda x: [list(map(''.join, zip(x["col1"], el))) for el in x["col2"]], axis=1)
Outputs:
>>> df["results"]
0 [[aaee, bbff, ccgg, ddhh], [aaqq, bbww, ccee, ...
1 [[ssmm, ddnn, ffvv, ggcc], [sszz, ddaa, ffjj, ...
2 [[ssmm, ddnn], [sszz, ddaa]]
I need a list with 6 unique elements, like 000001, 000002, 000003 etc. It isn't neccessary have to be in digits, it can be a string, like AAAAAA, AAAAAB, ABCDEF etc.
If I generate a list with np.arange() I won't have 6-dimensional elements. I only decided to use 'for' cicles like
but I think there are a lot of more convenient ways to do this.
You need a cartesian product of the string "ABCDEF" by itself, taken five times (in other words, the product of six identical strings). It can be calculated using product() function from module itertools. The result of the product is a list of 6-tuples of individual characters. The tuples are converted to strings with join().
from itertools import product
symbols = "ABCDEF"
[''.join(x) for x in product(*([symbols] * len(symbols)))]
#['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE',
# 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD',...
# 'FFFFFA', 'FFFFFB', 'FFFFFC', 'FFFFFD', 'FFFFFE', 'FFFFFF']
You can change the value of symbols to any other combination of distinct characters.
You can use the function combinations_with_replacement():
from itertools import combinations_with_replacement
list(map(''.join, combinations_with_replacement('ABC', r=3)))
# ['AAA', 'AAB', 'AAC', 'ABB', 'ABC', 'ACC', 'BBB', 'BBC', 'BCC', 'CCC']
If you need all possible combinations use the function product():
from itertools import product
list(map(''.join, product('ABC', repeat=3)))
# ['AAA', 'AAB', 'AAC', 'ABA', 'ABB', 'ABC', 'ACA', 'ACB', 'ACC', 'BAA', 'BAB', 'BAC', 'BBA', 'BBB', 'BBC', 'BCA', 'BCB', 'BCC', 'CAA', 'CAB', 'CAC', 'CBA', 'CBB', 'CBC', 'CCA', 'CCB', 'CCC']
You can use np.unravel_index to get an index array:
idx = np.array(np.unravel_index(np.arange(30000), 6*(6,)), order='F').T
idx
# array([[0, 0, 0, 0, 0, 0],
# [0, 0, 0, 0, 0, 1],
# [0, 0, 0, 0, 0, 2],
# ...,
# [3, 5, 0, 5, 1, 3],
# [3, 5, 0, 5, 1, 4],
# [3, 5, 0, 5, 1, 5]])
You can replace the indices with more or less anything you like afterwards:
symbols = np.fromiter('ABCDEF', 'U1')
symbols
# array(['A', 'B', 'C', 'D', 'E', 'F'], dtype='<U1')
symbols[idx]
# array([['A', 'A', 'A', 'A', 'A', 'A'],
# ['A', 'A', 'A', 'A', 'A', 'B'],
# ['A', 'A', 'A', 'A', 'A', 'C'],
# ...,
# ['D', 'F', 'A', 'F', 'B', 'D'],
# ['D', 'F', 'A', 'F', 'B', 'E'],
# ['D', 'F', 'A', 'F', 'B', 'F']], dtype='<U1')
If you need the result as a list of words:
final = symbols[idx].view('U6').ravel().tolist()
final[:20]
# ['AAAAAA', 'AAAAAB', 'AAAAAC', 'AAAAAD', 'AAAAAE', 'AAAAAF', 'AAAABA', 'AAAABB', 'AAAABC', 'AAAABD', 'AAAABE', 'AAAABF', 'AAAACA', 'AAAACB', 'AAAACC', 'AAAACD', 'AAAACE', 'AAAACF', 'AAAADA', 'AAAADB']
my question is how to get the indices of an array of strings that would sort another array.
I have this two arrays of strings:
A = np.array([ 'a', 'b', 'c', 'd' ])
B = np.array([ 'd', 'b', 'a', 'c' ])
I would like to get the indices that would sort the second one in order to match the first.
I have tried the np.argsort function giving the second array (transformed in a list) as order, but it doesn't seem to work.
Any help would be much apreciated.
Thanks and best regards,
Bradipo
edit:
def sortedIndxs(arr):
???
such that
sortedIndxs([ 'd', 'b', 'a', 'c' ]) = [2,1,3,0]
A vectorised approach is possible via numpy.searchsorted together with numpy.argsort:
import numpy as np
A = np.array(['a', 'b', 'c', 'd'])
B = np.array(['d', 'b', 'a', 'c'])
xsorted = np.argsort(B)
res = xsorted[np.searchsorted(B[xsorted], A)]
print(res)
[2 1 3 0]
A code that obtains a conversion rule from an arbitrary permutation to an arbitrary permutation.
creating indexTable: O (n)
examining indexTable: O (n)
Total: O (n)
A = [ 'a', 'b', 'c', 'd' ]
B = [ 'd', 'b', 'a', 'c' ]
indexTable = {k: v for v, k in enumerate(B)}
// {'d': 0, 'b': 1, 'a': 2, 'c': 3}
result = [indexTable[k] for k in A]
// [2, 1, 3, 0]
I want to sort this list:
>>> L = ['A', 'B', 'C', ... 'Z', 'AA', 'AB', 'AC', ... 'AZ', 'BA' ...]
Exactly the way it is, regardless of the contents (assuming all CAPS alpha).
>>> L.sort()
>>> L
['A', 'AA', 'AB', 'AC'...]
How can I make this:
>>> L.parkinglot_sort()
>>> L
['A', 'B', 'C', ... ]
I was thinking of testing for length, and sorting each length, and mashing all the separate 1-length, 2-length, n-length elements of L into the new L.
Thanks!
What about this?
l.sort(key=lambda element: (len(element), element))
It will sort the list taking into account not only each element, but also its length.
>>> l = ['A', 'AA', 'B', 'BB', 'C', 'CC']
>>> l.sort(key=lambda element: (len(element), element))
>>> print l
['A', 'B', 'C', 'AA', 'BB', 'CC']
Given a character array:
In [21]: x = np.array(['a ','bb ','cccc '])
One can remove the whitespace using:
In [22]: np.char.strip(x)
Out[22]:
array(['a', 'bb', 'cccc'],
dtype='|S8')
but is there a way to also shrink the width of the column to the minimum required size, in the above case |S4?
Do you just want to change the data type?
import numpy as NP
a = NP.array(["a", "bb", "ccc"])
a
# returns array(['a', 'bb', 'ccc'], dtype='|S3')
a = NP.array(a, dtype="|S8") # change dtype
# returns array(['a', 'bb', 'ccc'], dtype='|S8')
a = NP.array(a, dtype="|S3") # change it back
# returns array(['a', 'bb', 'ccc'], dtype='|S3')
>>> x = np.array(['a ','bb ','cccc '])
>>> x = np.array([s.strip() for s in x])
>>> x
array(['a', 'bb', 'cccc'],
dtype='|S4')