Related
I'm trying to compare one list lst with another one lst2 and see if the values in one lst correspond to a portion of the other list lst2 and if it has the same string order of the first one lst and if is not returns the values that do not have the right position.
This are the examples:
lst = ['a', 'b', 'd', 'c', 'e']
lst2 = ['DD', 'OO', 'CC' ,'WW', 'GG', 'a', 'b', 'c', 'd', 'e', 'AA' 'QQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
Supose that the the lst values will change. I mean it will have a different length with other string values added in near future but the string index 'GG' and 'AA' in lst2 will not change, it just will change the values from 'a' to 'e' as lst but the process it will be the same.
It is better using pandas dataframes or "\n".join() as string columns or just using list?
The question was not actually clear, but as I understand it - you want to find the elements in lst2 that did not appear in lst1.
for both upcoming approaches I checked with the example inputs as you provided:
lst = ['a', 'b', 'd', 'c', 'e']
lst2 = ['DD', 'OO', 'CC' ,'WW', 'GG', 'a', 'b', 'c', 'd', 'e', 'AA' 'QQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
If the order of appearances must be the same in both lists - you can use tricky way of manipulating strings. like in here:
lst_str = ' '.join(lst)
lst2_str = ' '.join(lst2)
index_found = lst2_str.find(lst_str)
lst4 = lst2_str.split()
if index_found!=-1:
lst4 = (lst2_str[0:index_found] + lst2_str[index_found+len(lst_str):]).split()
output:
['DD', 'OO', 'CC', 'WW', 'GG', 'a', 'b', 'c', 'd', 'e', 'AAQQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
this approach assume that whitespaces are not allowed in the elements in the lists (you can use other seperator as well of course)
Otherwise if the order does not matter, a simple list comprehension will do the work:
lst3 = [item for item in lst2 if not item in lst]
output:
['DD', 'OO', 'CC', 'WW', 'GG', 'AAQQ', 'EE', 'ZZ', 'XX', 'YY', 'UU', 'II', 'OO', 'HH']
the differences between the approaches outputs comes up because the order in elements in lst is differet than in lst2, therefore the 2 approaches retrieve different outputs
A function I am writing will receive as input a matrix H=A x B x I x I, where each matrix is square and of dimension d, the cross refers to the Kronecker product np.kron and I is the identity np.eye(d). Thus
I = np.eye(d)
H = np.kron(A, B)
H = np.kron(H, I)
H = np.kron(H, I)
Given H and the above form, but without knowledge of A and B, I would like to construct G = I x A x I x B e.g. the result of
G = np.kron(I, A)
G = np.kron(G, I)
G = np.kron(G, B)
It should be possible to do this by applying some permutation to H. How do I implement that permutation?
Transposing with (2,0,3,1,6,4,7,5) (after expanding to 8 axes) appears to do it:
>>> from functools import reduce
>>>
>>> A = np.random.randint(0,10,(10,10))
>>> B = np.random.randint(0,10,(10,10))
>>> I = np.identity(10, int)
>>> G = reduce(np.kron, (A,B,I,I))
>>> H = reduce(np.kron, (I,A,I,B))
>>>
>>>
>>> (G.reshape(*8*(10,)).transpose(2,0,3,1,6,4,7,5).reshape(10**4,10**4) == H).all()
True
Explanation: Let's look at a minimal example to understand how the Kronecker product relates to reshaping and axis shuffling.
Two 1D factors:
>>> A, B = np.arange(1,5), np.array(list("abcd"), dtype=object)
>>> np.kron(A, B)
array(['a', 'b', 'c', 'd', 'aa', 'bb', 'cc', 'dd', 'aaa', 'bbb', 'ccc',
'ddd', 'aaaa', 'bbbb', 'cccc', 'dddd'], dtype=object)
We can observe that the arrangement is row-major-ish, so if we reshape we actually get the outer product:
>>> np.kron(A, B).reshape(4, 4)
array([['a', 'b', 'c', 'd'],
['aa', 'bb', 'cc', 'dd'],
['aaa', 'bbb', 'ccc', 'ddd'],
['aaaa', 'bbbb', 'cccc', 'dddd']], dtype=object)
>>> np.outer(A, B)
array([['a', 'b', 'c', 'd'],
['aa', 'bb', 'cc', 'dd'],
['aaa', 'bbb', 'ccc', 'ddd'],
['aaaa', 'bbbb', 'cccc', 'dddd']], dtype=object)
If we do the same with factors swapped we get the transpose:
>>> np.kron(B, A).reshape(4, 4)
array([['a', 'aa', 'aaa', 'aaaa'],
['b', 'bb', 'bbb', 'bbbb'],
['c', 'cc', 'ccc', 'cccc'],
['d', 'dd', 'ddd', 'dddd']], dtype=object)
With 2D factors things are similar
>>> A2, B2 = A.reshape(2,2), B.reshape(2,2)
>>>
>>> np.kron(A2, B2)
array([['a', 'b', 'aa', 'bb'],
['c', 'd', 'cc', 'dd'],
['aaa', 'bbb', 'aaaa', 'bbbb'],
['ccc', 'ddd', 'cccc', 'dddd']], dtype=object)
>>> np.kron(A2, B2).reshape(2,2,2,2)
array([[[['a', 'b'],
['aa', 'bb']],
[['c', 'd'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['aaaa', 'bbbb']],
[['ccc', 'ddd'],
['cccc', 'dddd']]]], dtype=object)
But there is a minor complication in that the corresponding outer product has axes arranged differently:
>>> np.multiply.outer(A2, B2)
array([[[['a', 'b'],
['c', 'd']],
[['aa', 'bb'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['ccc', 'ddd']],
[['aaaa', 'bbbb'],
['cccc', 'dddd']]]], dtype=object)
We need to swap middle axes to get the same result.
>>> np.multiply.outer(A2, B2).swapaxes(1,2)
array([[[['a', 'b'],
['aa', 'bb']],
[['c', 'd'],
['cc', 'dd']]],
[[['aaa', 'bbb'],
['aaaa', 'bbbb']],
[['ccc', 'ddd'],
['cccc', 'dddd']]]], dtype=object)
So if we want to go the swapped Kronecker product we can swap the middle axes: (0,2,1,3)
now we have the outer product. swapping factors exchanges the first two axes with the second two: (1,3,0,2)
going back to Kronecker, swap the middle axes
=> total axis permutation: (1,0,3,2)
>>> np.all(np.kron(A2, B2).reshape(2,2,2,2).transpose(1,0,3,2).reshape(4,4) == np.kron(B2, A2))
True
Using the same principles leads to the recipe for the four factor original question.
This answer expands on Paul Panzer's correct answer to document how one would solve similar problems like this more generally.
Suppose we wish to map a matrix string reduce(kron, ABCD) into, for example, reduce(kron, CADB), where each matrix has dimension d columns. Both of the strings are thus d**4, d**4 matrices. Alternatively they are [d,]*8 shaped arrays.
The way np.kron arranges data means that the index ordering of ABDC corresponds to that of its constituents as follows: D_0 C_0 B_0 A_0 D_1 C_1 B_1 A_1 where for example D_0 (D_1) is the fastest (slowest) oscillating index in D. For CADB the index ordering is instead (B_0 D_0 A_0 C_0 B_1 D_1 A_1 C_1); you just read the string backwards once for the faster and once for the slower indices. The appropriate permutation string in this case is thus (2,0,3,1,6,4,7,5).
I have the following letters:
Letters = ["a", "b", "c", "d", "e"]
What I would like is to write a generator function that will create strings that can be formed by taking a combination of any of the letters, preferably in some deterministic order like from smallest to biggest.
So for example if I were to run the generator 20 times I would get
a
b
c
d
e
aa
ab
ac
ad
ae
ba
bb
bc
bd
be
ca
cb
cc
cd
ce
da
How would I write this generator?
Generator function:
from itertools import *
def wordgen(letters):
for n in count(1):
yield from map(''.join, product(letters, repeat=n))
Usage:
for word in wordgen('abcde'):
print(word)
Output:
a
b
c
d
e
aa
ab
ac
ad
ae
ba
bb
bc
bd
be
ca
...
A self-made alternative without using itertools:
def wordgen(letters):
yield from letters
for word in wordgen(letters):
for letter in letters:
yield word + letter
Golf-version (admittedly starts with the empty string):
def w(s):yield'';yield from(w+c for w in w(s)for c in s)
Use the combinations functions from the itertools library. There's both combinations with replacement and without replacement
for item in itertools.combinations(Letters, 2):
print("".join(item))
https://docs.python.org/3.4/library/itertools.html
Use itertools.product():
from itertools import product, imap
letters = ["a", "b", "c", "d", "e"]
letters += imap(''.join, product(letters, repeat=2))
print letters
['a', 'b', 'c', 'd', 'e', 'aa', 'ab', 'ac', 'ad', 'ae', 'ba', 'bb', 'bc', 'bd', 'be', 'ca', 'cb', 'cc', 'cd', 'ce', 'da', 'db', 'dc', 'dd', 'de', 'ea', 'eb', 'ec', 'ed', 'ee']
I use a recursive generator function (without itertools)
Letters = ["a", "b", "c", "d", "e"]
def my_generator(list, first=""):
for letter in list:
yield first + letter
my_generators = []
for letter in list:
my_generators.append(my_generator(list, first + letter))
i = 0
while True:
for j in xrange(len(list)**(i/len(list)+1)):
yield next(my_generators[i%len(list)])
i+=1
gen = my_generator(Letters)
[next(gen) for c in xrange(160)]
you get
['a', 'b', 'c', 'd', 'e', 'aa', 'ab', 'ac', 'ad', 'ae', 'ba', 'bb',
'bc', 'bd', 'be', 'ca', 'cb', 'cc', 'cd', 'ce', 'da', 'db', 'dc',
'dd', 'de', 'ea', 'eb', 'ec', 'ed', 'ee', 'aaa', 'aab', 'aac', 'aad',
'aae', 'aba', 'abb', 'abc', 'abd', 'abe', 'aca', 'acb', 'acc', 'acd',
'ace', 'ada', 'adb', 'adc', 'add', 'ade', 'aea', 'aeb', 'aec', 'aed',
'aee', 'baa', 'bab', 'bac', 'bad', 'bae', 'bba', 'bbb', 'bbc', 'bbd',
'bbe', 'bca', 'bcb', 'bcc', 'bcd', 'bce', 'bda', 'bdb', 'bdc', 'bdd',
'bde', 'bea', 'beb', 'bec', 'bed', 'bee', 'caa', 'cab', 'cac', 'cad',
'cae', 'cba', 'cbb', 'cbc', 'cbd', 'cbe', 'cca', 'ccb', 'ccc', 'ccd',
'cce', 'cda', 'cdb', 'cdc', 'cdd', 'cde', 'cea', 'ceb', 'cec', 'ced',
'cee', 'daa', 'dab', 'dac', 'dad', 'dae', 'dba', 'dbb', 'dbc', 'dbd',
'dbe', 'dca', 'dcb', 'dcc', 'dcd', 'dce', 'dda', 'ddb', 'ddc', 'ddd',
'dde', 'dea', 'deb', 'dec', 'ded', 'dee', 'eaa', 'eab', 'eac', 'ead',
'eae', 'eba', 'ebb', 'ebc', 'ebd', 'ebe', 'eca', 'ecb', 'ecc', 'ecd',
'ece', 'eda', 'edb', 'edc', 'edd', 'ede', 'eea', 'eeb', 'eec', 'eed',
'eee', 'aaaa', 'aaab', 'aaac', 'aaad', 'aaae']
I want to sort this list:
>>> L = ['A', 'B', 'C', ... 'Z', 'AA', 'AB', 'AC', ... 'AZ', 'BA' ...]
Exactly the way it is, regardless of the contents (assuming all CAPS alpha).
>>> L.sort()
>>> L
['A', 'AA', 'AB', 'AC'...]
How can I make this:
>>> L.parkinglot_sort()
>>> L
['A', 'B', 'C', ... ]
I was thinking of testing for length, and sorting each length, and mashing all the separate 1-length, 2-length, n-length elements of L into the new L.
Thanks!
What about this?
l.sort(key=lambda element: (len(element), element))
It will sort the list taking into account not only each element, but also its length.
>>> l = ['A', 'AA', 'B', 'BB', 'C', 'CC']
>>> l.sort(key=lambda element: (len(element), element))
>>> print l
['A', 'B', 'C', 'AA', 'BB', 'CC']
I don't know of a better way to word what I'm looking for, so please bear with me.
Let's say that I have a list of 17 elements. For the sake of brevity we'll represent this list as ABCDEFGHIJKLMNOPQ. If I wanted to divide this into 7 sufficiently "even" sub-lists, it might look like this:
ABC DE FGH IJ KL MNO PQ
Here, the lengths of each sub-list are 3, 2, 3, 2, 2, 3, 2. The maximum length is only one more than the minimum length: ABC DE FGH I JKL MN OPQ has seven sub-lists as well, but the range of lengths is two here.
Furthermore, examine how many 2's separate each pair of 3's: this follows the same rule of RANGE ≤ 1. The range of lengths in ABC DEF GH IJ KLM NO PQ is 1 as well, but they are imbalanced: 3, 3, 2, 2, 3, 2, 2. Ideally, if one were to keep reducing the sub-list in such a fashion, the numbers would never deviate from one another by more than one.
Of course, there is more than one way to "evenly" divide a list into sub-lists in this fashion. I'm not looking for an exhaustive set of solutions - if I can get one solution in Python for a list of any length and any number of sub-lists, that's good enough for me. The problem is that I don't even know where to begin when solving such a problem. Does anyone know what I'm looking for?
>>> s='ABCDEFGHIJKLMNOPQ'
>>> parts=7
>>> [s[i*len(s)//parts:(i+1)*len(s)//parts] for i in range(parts)]
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
>>> import string
>>> for j in range(26):
... print [string.uppercase[i*j//parts:(i+1)*j//parts] for i in range(parts)]
...
['', '', '', '', '', '', '']
['', '', '', '', '', '', 'A']
['', '', '', 'A', '', '', 'B']
['', '', 'A', '', 'B', '', 'C']
['', 'A', '', 'B', '', 'C', 'D']
['', 'A', 'B', '', 'C', 'D', 'E']
['', 'A', 'B', 'C', 'D', 'E', 'F']
['A', 'B', 'C', 'D', 'E', 'F', 'G']
['A', 'B', 'C', 'D', 'E', 'F', 'GH']
['A', 'B', 'C', 'DE', 'F', 'G', 'HI']
['A', 'B', 'CD', 'E', 'FG', 'H', 'IJ']
['A', 'BC', 'D', 'EF', 'G', 'HI', 'JK']
['A', 'BC', 'DE', 'F', 'GH', 'IJ', 'KL']
['A', 'BC', 'DE', 'FG', 'HI', 'JK', 'LM']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MN']
['AB', 'CD', 'EF', 'GH', 'IJ', 'KL', 'MNO']
['AB', 'CD', 'EF', 'GHI', 'JK', 'LM', 'NOP']
['AB', 'CD', 'EFG', 'HI', 'JKL', 'MN', 'OPQ']
['AB', 'CDE', 'FG', 'HIJ', 'KL', 'MNO', 'PQR']
['AB', 'CDE', 'FGH', 'IJ', 'KLM', 'NOP', 'QRS']
['AB', 'CDE', 'FGH', 'IJK', 'LMN', 'OPQ', 'RST']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STU']
['ABC', 'DEF', 'GHI', 'JKL', 'MNO', 'PQR', 'STUV']
['ABC', 'DEF', 'GHI', 'JKLM', 'NOP', 'QRS', 'TUVW']
['ABC', 'DEF', 'GHIJ', 'KLM', 'NOPQ', 'RST', 'UVWX']
['ABC', 'DEFG', 'HIJ', 'KLMN', 'OPQ', 'RSTU', 'VWXY']
If you have a list of length N, and you want some number of sub-lists S, it seems to me that you should start with a division with remainder. For N == 17 and S == 7, you have 17 // 7 == 2 and 17 % 7 == 3. So you can start with 7 length values of 2, but know that you need to increment 3 of the length values by 1 to handle the remainder. Since your list of length values is length 7, and you have 3 values to increment, you could compute X = 7 / 3 and use that as a stride: increment the 0th item, then the int(X) item, the int(2*X) item, and so on.
If that doesn't work for you, I suggest you get a book called The Algorithm Design Manual by Skiena, and look through the set and tree algorithms.
http://www.algorist.com/
See the "grouper" example at http://docs.python.org/library/itertools.html