Subtraction between two nested lists of strings in Python - python

I am trying to follow the used structure in this question for nested lists but I'm confused and don't know how to figure it out. Suppose that to subtract the two lists a = ['5', '35.1', 'FFD'] and b = ['8.5', '11.3', 'AMM'], the following code is used for reaching to equation c = b - a:
diffs = []
for i, j in zip(a, b):
try:
diffs.append(str(float(j) - float(i)))
except ValueError:
diffs.append('-'.join([j, i]))
>>> print(diffs)
['3.5', '-23.8', 'AMM-FFD']
My question is, how do I get C = B - A by considering the following structure:
A = [['X1','X2'],['52.3','119.4'],['45.1','111']]
B = [['Y1','Y2'],['66.9','65'],['99','115.5']]
C = [['Y1-X1','Y2-X2'],['14.6','-54.4'],['53.9','4.5']]
and how do I the first and second elements of each internal list, something like:
Array 1 = ['Y1-X1', '14.6', '53.9']
Array 2 = ['Y2-X2', '-54.4', '4.5']
I appreciate any kind of help.

Well, if it's guaranteed that the lists will always be 2 levels nested, you can simply add one more loop:
diffs_lists = []
for i, j in zip(a, b):
diffs = []
for k, l in zip(i, j):
try:
diffs.append(str(float(k) - float(l)))
except ValueError:
diffs.append('-'.join([k, l]))
diffs_lists.append(diffs)
To separate the result in two as you asked, simply use zip:
zip(*diffs_lists)

You just need another level of looping:
res = []
for a, b in zip(A, B):
diffs = []
res.append(diffs)
for i, j in zip(a, b):
try:
diffs.append(str(float(j) - float(i)))
except ValueError:
diffs.append('-'.join([j, i]))
print(res)
#[['Y1-X1', 'Y2-X2'], ['14.600000000000009', '-54.400000000000006'], ['53.9', '4.5']]
print(list(zip(*res)))
#[('Y1-X1', '14.600000000000009', '53.9'), ('Y2-X2', '-54.400000000000006', '4.5')]

diffs=[]
for sub_b, sub_a in zip(b, a):
curr = []
for atom_b, atom_a in zip(sub_b, sub_a):
try:
curr.append(float(atom_b) - float(atom_a))
except ValueError:
curr.append('-'.join([atom_b, atom_a]))
diffs.append(curr)
ans1, ans2 = zip(*diffs)
The zip function can also be used to unzip iterables.

Suppose you have a list_diffs function, that is basically the code you provided:
list_diffs(a, b):
diffs = []
for i, j in zip(a, b):
try:
diffs.append(str(float(j) - float(i)))
except ValueError:
diffs.append('-'.join([j, i]))
return diffs
Then, the C you want is just a list whose elements are diffs between elements of A and elements of B. So the following gives you C:
C = []
for i in range(len(A)):
C.append(list_diffs(A[i], B[i]))
To get the lists of the first and of the second elements:
array1 = [c[0] for c in C]
array2 = [c[1] for c in C]

In case you need this to work with arbitrary amount of nesting you could use recursion:
def subtract(x, y):
diffs = []
for a, b in zip(x, y):
try:
if isinstance(a, list):
diffs.append(subtract(a, b))
else:
diffs.append(str(float(b) - float(a)))
except ValueError:
diffs.append('-'.join([b, a]))
return diffs
As others have pointed out zip can be used for unzipping:
res = subtract(A, B)
t1, t2 = zip(*res)
print(t1)
print(t2)
Output:
('Y1-X1', '14.6', '53.9')
('Y2-X2', '-54.4', '4.5')

i try it with a recursive method
A = [['X1','X2'],['52.3','119.4'],['45.1','111']]
B = [['Y1','Y2'],['66.9','65'],['99','115.5']]
C = [['Y1-X1','Y2-X2'],['14.6','-54.4'],['53.9','4.5']]
Array_a,Array_b = [[] for __ in range(2)]
def diff(B,A):
_a = 0
for b,a in zip(B,A):
if isinstance(b,list):
diff(b,a)
else:
try:
Array_b.append(float(b)-float(a)) if _a else Array_a.append(float(b)-float(a))
_a = True
except (ValueError,TypeError) as e:
Array_b.append("{0}-{1}".format(b,a)) if _a else Array_a.append("{0}-{1}".format(b,a))
_a = True
return (Array_a,Array_b)
print (diff(B,A))
>>>(['Y1-X1', 14.600000000000009, 53.9], ['Y2-X2', -54.400000000000006, 4.5])

Related

Changing a list (passed as a function parameter) changes the list with the same name in the previous function call

Recently a friend of mine asked me to explain a strange behavior in a piece of code originally intended to count permutations using recursion. There were many improvements that could be made to the code, which I noted, but these seemed to not have any real impact.
I simplified the code down to the following, which reproduces only the problem, and not the permutations.
def foo(bar, lst):
if(bar == 1):
lst.append(0)
return
print(lst)
foo(1, lst)
print(lst)
foo(2, [])
The output is
[]
[0]
I tried lst += [0] or deleting lst after appending 0, but these did not help. Doing lst2 = lst.copy() followed by lst2.append(0) gave the expected result of two []s, however. I am confused as to why appending 0 (or any value) to lst where bar == 1 would have an effect on the lst where bar == 2. I do not consider myself a total beginner to Python, and I usually can determine the behavior of local variables. This has baffled me though. An explanation would be really appreciated.
In case you want the original code, though I don't think it'll give much more info, here it is:
A = 0
A2 = 0
NN = 3
def P(N, C):
global A
Temp = [X for X in range(1, NN + 1) if X not in C]
if(N == 1):
C.append(None)
A += 1
return
for e in Temp:
C.append(e)
P(N - 1, C)
del C[-1]
def P2(N, C):
global A2
Temp = [X for X in range(1, NN + 1) if X not in C]
if(N == 1):
A2 += 1
return
for e in Temp:
C.append(e)
P2(N - 1, C)
del C[-1]
P(NN, [])
P2(NN, [])
print(A, A2, sep = " ")
print(A == A2)

How to tell if one generator is a subsequence or a prefix of another generator?

I have two generators, A and B, of unknown length.
I want to know if B is a subsequence (contiguous) of A, so I do the following:
def subseq(A, B):
b0 = next(B)
for a in A:
if a == b0:
break
else: # no-break
# b0 not found in A so B is definitely not a subseq of A
return False
# is the remaining of B a prefix of the remaining of B
return prefix(A, B)
def prefix(A, B):
return all(a == b for a, b in zip(A, B))
However, prefix(A, B) is not exactly correct, as if what remains of A is shorter than what remains of B, then I might get a false positive:
E.g. with A = 'abc' and B = 'abcd' (imagine they are generators), then return all(a == b for a, b in zip(A, B)) would return True.
But if I use zip_longest instead, then I have the complimentary problem -- I would get false negatives:
E.g. with A = 'abcd' and B = 'abc', then return all(a == b for a, b in zip_longest(A, B)) would return False.
What's a sensible way to do this? Specifically, I want to zip to the length of the second argument. I basically want something like zip_(A, B, ziplengthselect=1)
where ziplengthselect=i tells the function that it should zip to the length of the ith argument.
Then the expression all(a == b for a, b in zip_(A, B, fillvalue=sentinel, ziplengthselect=1)) where sentinel is something not found in B, would have the following behavior. If the expression
reaches end of B, then it would evaluate to True
reaches end of A, then it would use the fillvalue, check sentinel == b, fail the check since sentinel was chosen to be something not found in B, and return False
fails an a == b check, then it would evaluate to False
I can think of solutions with try, except blocks, but was wondering if there's a better way.
# Whether generator B is a prefix of generator A.
def prefix(A, B):
for b in B:
try:
a = next(A)
if a != b:
return False
except StopIteration:
# reached end of A
return False
return True
OR
# Whether generator B is a prefix of generator A.
def prefix(A, B):
prefix = all(a == b for a, b in zip(A, B))
if not prefix:
return False
try:
next(B)
# end of B was reached
return True
except StopIteration:
# end of B was not reached
return False
The above code works when A has no duplicates. However if A has duplicates, then we have to tee the generators as follows:
from itertools import tee
def subseq(A, B):
try:
b0 = next(B)
except StopIteration:
return True
while True:
try:
a = next(A)
if a == b0:
A, Acop = tee(A)
B, Bcop = tee(B)
if prefix(Acop, Bcop):
return True
del Acop, Bcop
except StopIteration:
return False
def prefix(A, B):
for b in B:
try:
a = next(A)
if a != b:
return False
except StopIteration:
# reached end of A
return False
return True
# Some tests
A = (i for i in range(10))
B = (i for i in range(5,8))
print(subseq(A, B)) # True
A = (i for i in range(10))
B = (i for i in range(5,11))
print(subseq(A, B)) # False
A = (i for i in [1,2,3]*10 + [1,2,3,4])
B = (i for i in [1,2,3])
print(subseq(A, B)) # True
A = (i for i in [1,1,2,1,1,2]*8 + [3])
B = (i for i in [1,1,2,3])
print(subseq(A, B)) # True
Here's how I solved the analogous subsequence problem for lists. Lists are easier because you can know their length:
def isSublist(lst, sublst):
N, M = len(lst), len(sublst)
starts = (i for i in range(N - M + 1) if lst[i] == sublst[0])
for i in starts:
# i <= N - M so N - i >= M
j = 0
while j < M and lst[i] == sublst[j]:
i += 1
j += 1
if j == M:
return True
return False
I might use deques (although this assumes B is finite):
from collections import deque
from itertools import islice
def subseq(A, B):
B = deque(B)
if not B:
return True
n = len(B)
Asub = deque(islice(A, n-1), n)
for a in A:
Asub.append(a)
if Asub == B:
return True
return False
Might take more or less time/memory than yours. Depends on the input.
Try it online!
A note about yours: For an input like A = iter('a'+'b'*10**7), B = iter('ac') you waste a lot of memory (90 MB on 64-bit Python), since your Acop from the very beginning causes the underlying tee storage to never let go of anything. You'd better do del Acop, Bcop after an unsuccessful prefix check.
It’s possible to build KMP’s partial match table lazily.
from itertools import islice
def has_substring(sup, sub):
sub = LazySequence(sub)
if not sub:
return True
t = kmp_table(sub)
k = 0
for x in sup:
while x != sub[k]:
k = t[k]
if k == -1:
break
if k == -1:
k = 0
continue
k += 1
try:
sub[k]
except IndexError:
return True
return False
class LazySequence:
def __init__(self, iterator):
self.consumed = []
self.iterator = None if iterator is None else iter(iterator)
def __getitem__(self, index):
if index >= len(self.consumed):
self.consumed.extend(islice(self.iterator, index - len(self.consumed) + 1))
return self.consumed[index]
def __iter__(self):
consumed = self.consumed
yield from consumed
for x in self.iterator:
consumed.append(x)
yield x
def __bool__(self):
for _ in self:
return True
return False
def lazy_sequence(g):
def wrap_generator(*args, **kwargs):
ls = LazySequence(None)
ls.iterator = g(ls.consumed, *args, **kwargs)
return ls
return wrap_generator
#lazy_sequence
def kmp_table(t, w):
yield -1
cnd = 0
for x in islice(w, 1, None):
if x == w[cnd]:
yield t[cnd]
else:
yield cnd
while cnd != -1 and x != w[cnd]:
cnd = t[cnd]
cnd += 1
This search is fast (asymptotically optimal time of O(|sub| + |sup|)) and doesn’t use unnecessary time/space when one generator is much longer than the other – including being able to return True when sup is infinite and being able to return False when sub is infinite.

How to judge if a string contains a given substring (have gap)

e.g.
a = 'abc123def'
b = 'abcdef'
I want a function which can judge whether b in a.
contains(a,b)=True
p.s. gap is also allowed in the represention of b, e.g.
b='abc_def'
but regular expressions are not allowed.
If what you want to do is to check whether b is a subsequence of a, you can write:
def contains(a, b):
n, m = len(a), len(b)
j = 0
for i in range(n):
if j < m and a[i] == b[j]:
j += 1
return j == m
Try using list comprehension:
def contains(main_string, sub_string):
return all([i in main_string for i in sub_string])
NOTE: 'all' is a builtin function which takes an iterable of booleans and returns try if all are True.
def new_contained(a,b):
boo = False
c = [c for c in a]
d = [i for i in b]
if len(c)<=len(d):
for i in c:
if i in d:
boo = True
return boo

Dask delayed: pass combination of two lists

I have a feeling this should be easily possible, but I fail to pass combinations of (lazy) lists to a delayed function:
def test(a,b):
return(str(a)+','+str(b))
a = [1,2] #not lazy for example
b = [3,4] #not lazy
c = dask.delayed(test)(a,b)
c = c.compute()
out:
'[1,2][3,4]'
desired output:
['1,3','1,4','2,3','2,4']
Also tried:
def test(c):
a = c[0]
b = c[1]
return(str(a)+','+str(b))
def combine_a_b(a,b):
return([(i,j) for i in a for j in b])
c = dask.delayed(combine_a_b)(a,b)
c = dask.delayed(test)(c)
c = c.compute()
out:
'(1,3)(1,4)'
What am I doing wrong here?

How to test all possible values ​for all variables to get the maximum result for the function

I have three variables called a, b and c, each of these can assume a different value defined in a range. I'd like to create a function that tests every possible variable value and gives me their best combination for the output 'f'.
a = list(range(1, 10, 2))
b = list(range(5, 8, 1))
c = list(range(1, 3, 1))
def all_combinations (a, b, c):
#something
f = a + (b * a) - (c*(a ^ b))
return BEST a, b, c for my f
it's possible to do it ? what is the best way to do it?
You can use itertools.product() to get all the possible combinations of a, b, and c.
Then calculate your formula for each unique combination of a b c, keep track of the result, and if the result is better than the previous best, save the current values of a b c.
import itertools
def all_combinations (alist, blist, clist):
best_a = 0
best_b = 0
best_c = 0
best_f = 0
for a,b,c in itertools.product(alist, blist, clist):
f = a + (b * a) - (c*(a ^ b))
if f > best_f: # use your own definition of "better"
best_a = a
best_b = b
best_c = c
best_f = f
return best_a, best_b, best_c
First of all, you said I have three variables called a, b and c, each of these can assume a different value defined in a range. Note that the variables in your code are actually equal to three lists of integers, not three integers.
The naive algorithm to test all possible combinations is 3 nested for loops. Here I assume that by "best" you mean "maximum value":
def all_combinations (list1, list2, list3):
best_f, best_a, best_b, best_c = None, None, None, None
for a in list1:
for b in list2:
for c in list3:
f = a + (b * a) - (c*(a ^ b))
# here you have to define what f being "better" than best_f means:
if not f or f > best_f:
best_f = f
best_a = a
best_b = b
best_c = c
return best_a, best_b, best_c
If you're sure those are the only values you want to test, then the following will work. Otherwise you might want to look into scipy.optimize.
from itertools import product
import numpy as np
parameters = list(product(a, b, c))
results = [my_fun(*x) for x in parameters]
print(parameters[np.argmax(results)])
obviously replace np.argmax with np.argmin if you want to minimize the function

Categories

Resources