Does zip operation conserve order? - python

I was reading the following example from geeksforgeeks:
# Python code to demonstrate the working of
# zip()
# initializing lists
name = [ "Manjeet", "Nikhil", "Shambhavi", "Astha" ]
roll_no = [ 4, 1, 3, 2 ]
marks = [ 40, 50, 60, 70 ]
# using zip() to map values
mapped = zip(name, roll_no, marks)
# converting values to print as set
mapped = set(mapped)
# printing resultant values
print ("The zipped result is : ",end="")
print (mapped)
but if you see the result:
The zipped result is : {('Shambhavi', 3, 60), ('Astha', 2, 70),
('Manjeet', 4, 40), ('Nikhil', 1, 50)}
I would have expected to see {('Manjeet', 4, 40), ('Nikhil', 1, 50), ('Shambhavi', 3, 60), ('Astha', 2, 70)}. So this made me thing if I want to do a mathematical operation between two lists by using zip, will zip itself change the order? I tried this little code, but it seems it doesn't, but still, I have the doubt. Did I just have luck this time or do I have to worry about it? I really need that the position of the couples in (A,B) do not change.
A = range(1,14)
B = range(2,15)
data = [x + y for x, y in zip(A, B)]
print(data)

zip makes use of the underlying iterators. It doesn't change the order.
Here is the doc
The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n)

Zip does not change the order of the objects passed to it. However set does, as it forms an unordered set from the zip results.

Related

Change value of portion of a string within a list get sum of string

I am working on using itertools to get a list of combinations, but am stuck with manipulating those combinations once I have them. Here is what I have:
k_instances = 3 #Instances of lysine
k_modifications = {'Hydroxylation', 'Carboxylation', #Modifications applicable to lysine
}
k_combinations = itertools.combinations_with_replacement(k_modifications, k_instances) #Possible modifications assigned
k_comb_list = list(k_combinations) #Convert combinations to a list
k_comb_list_str = [k_comb_list[i:i+k_instances] for i in range(0, len(k_comb_list), k_instances)]
for i in range(len(k_comb_list_str)):
k_comb_list_str[i] = 16 if k_comb_list_str[i] == 'Hydroxylation' else k_comb_list_str[i]
print(k_comb_list_str)
When running this, I get:
[[('Carboxylation', 'Carboxylation', 'Carboxylation'), ('Carboxylation', 'Carboxylation', 'Hydroxylation'), ('Carboxylation', 'Hydroxylation', 'Hydroxylation')], [('Hydroxylation', 'Hydroxylation', 'Hydroxylation')]]
My idea is to replace each of these variables with their mass, for instance replace all occurrences of Carboxylation with 16. Doing this I would like to end up with a list of strings, something like this:
[[(16,16,16),(16,16,2),(16,2,2)...]]
I would then get the sum of each of the strings:
[[(48),(32),(20)]]
And then essentially have a list of values possible based on the combinations.
I'm sure there is a simpler way about carrying this out, so any suggestions for how to execute this would be appreciated. I have tried replacing each value using else if statements, but it doesn't work because I can't figure out how to manipulate within the string, so I can only search for the string, which defeats the purpose.
The easiest option is to make the various molecules variables, and use those, rather than trying to do string replacement later. For example:
import itertools
Hydroxylation = 2
Carboxylation = 16
k_instances = 3
k_modifications = [Hydroxylation, Carboxylation]
k_combinations = itertools.combinations_with_replacement(k_modifications, k_instances)
k_comb_l = list(k_combinations)
print(k_comb_l)
# [(16, 16, 16), (16, 16, 2), (16, 2, 2), (2, 2, 2)]
print([sum(x) for x in k_comb_l])
# [48, 34, 20, 6]

Merging two lists of tuples of different size into a list of dictionaries

Now I have two lists of tuples of different size like this:
a = [('NC', 0, 'Eyes'),('NC', 3, 'organs'),('NC', 19, 'neurons'),...]
b = [(0, 'Hypernym', 3),(19, 'Holonym', 0),...]
The common values from the above lists is the int number, and the expected result should look like:
result = [
{'s_type':'NC', 's':'Eyes', 'predicate':'Hypernym', 'o_type':'NC', 'o':'organs'},
{'s_type':'NC', 's':'neurons', 'predicate':'Holonym', 'o_type':'NC', 'o':'Eyes'},
...]
I have converted the above two lists into dictionaries and tried nested loop but failed to get this output. Can somebody kindly help me out?
I managed to get this one working. Let me know if there are any other specifics that need fixing.
a = [('NC', 0, 'Eyes'), ('NC', 3, 'organs'), ('NC', 19, 'neurons')]
b = [(0, 'Hypernym', 3), (19, 'Holonym', 0)]
result = []
for s_type, common, s in a:
related = list(filter(lambda x: x[0] == common, b))
for o_type, predicate, next in related:
next_related = list(filter(lambda x: x[1] == next, a))
for s_type, _, organ in next_related:
result.append({'s_type': s_type, 's': s,
'predicate': predicate, 'o_type': o_type, 'o': organ})
print(result)
I hope this is what you were looking for.
There are many other ways to do this, but given your description of the problem, this should do.

Find gaps in list of range values

I found numerous similar questions in other programming languages (ruby, C++, JS, etc) but not for Python. Since Python has e.g. itertools I wonder whether we can do the same more elegantly in Python.
Let's say we have a "complete range", [1,100] and then a subset of ranges within/matching the "complete range":
[10,50]
[90,100]
How can we extract the not covered positions, in this case [1,9], [51,89]?
This is a toy example, in my real dataset I have ranges up to thousands.
Here is a neat solution using itertools.chain: I've assumed the input ranges don't overlap. If they do, they need to be simplified first using a union-of-ranges algorithm.
from itertools import chain
def range_gaps(a, b, ranges):
ranges = sorted(ranges)
flat = chain((a-1,), chain.from_iterable(ranges), (b+1,))
return [[x+1, y-1] for x, y in zip(flat, flat) if x+1 < y]
Taking range_gaps(1, 100, [[10, 50], [90, 100]]) as an example:
First sort the ranges in case they aren't already in order. If they are guaranteed to be in order, this step is not needed.
Then flat is an iterable which will give the sequence 0, 10, 50, 90, 100, 101.
Since flat is lazily evaluated and is consumed by iterating over it, zip(flat, flat) gives a sequence of pairs like (0, 10), (50, 90), (100, 101).
The ranges required are then like (1, 9), (51, 89) and the case of (100, 101) should give an empty range so it is discarded.
Assuming the list contains only integers, and the sub-ranges are in increasing order and not overlapping, You can use below code.
This code will take all sub ranges one by one, and will compare with original complete range and the sub range before it, to find the missing range.
[start,end]=[1,100]
chunks=[[25,31],[7,15],[74,83]]
print([r for r in [[start,chunks[0][0]-1] if start!=chunks[0][0] else []] + [[chunks[i-1][1]+1, chunks[i][0]-1] for i in range(1,len(chunks))]+[[chunks[-1][1]+1,end] if end!=chunks[-1][1] else []] if r])
Input
[1,100]
[[7,15],[25,31],[74,83]]
Output
[[1, 6], [16, 24], [32, 73], [84, 100]]
If increasing order of sub ranges are not guaranteed. you can include below line to sort chunks.
chunks.sort(key=lambda x: x[0])
This is a generic solution:
def gap(N, ranges):
ranges=[(min1, max1), (min2, (max2), ......, (minn, maxn)]
original=set(range(N))
for i in ranges:
original=original-set(range(i[0], i[1]))
return original

Adding list python

Help! What do I have to change so that it comes out like this?
[('Mavis', 3), ('Ethel', 1), ('Rick', 2), ('Joseph', 5), ('Louis', 4)]
Right now, with my code, it comes out like this.
bots_status = [(bot_one_info) + (bot_two_info) + (bot_three_info) + (bot_four_info) + (bot_five_info)]
[('Mavis', 3, 'Ethel', 1, 'Rick', 2, 'Joseph', 5, 'Louis', 4)]
Place commas instead of + signs between your bots.
If working with a variable amount of entries, initialize an array and add to it using append.
bots_status = []
for bot_info in bot_infos:
bots_status.append(bot_info)
Replace the plusses (+) by commas (,) to make this a list of tuples instead of a list of one concatenated tuple:
bots_status = [bot_one_info, bot_two_info, bot_three_info, bot_four_info, bot_five_info]
Since your bot_x_info variables already are tuples, you also don’t need to use parentheses around the names (those don’t do anything).
The problem with your code was that you were using + on the tuples. The add operator concatenates tuples to a single one:
>>> (1, 2) + (3, 4)
(1, 2, 3, 4)
That’s why you ended up with one giant tuple in your list.
What you wanted is have each tuple as a separate item in the list, so you just need to create a list from those. Just like you would do [1, 2, 3] to create a list with three items, using a comma to separate each item, you also do this with other values, e.g. tuples in your case.
Let's say:
bot_one_info = ('Mavis', 3)
bot_two_info = ('Mavi', 3)
If you use +
lis = [bot_one_info + bot_two_info]
print lis
#Output
[('Mavis', 3, 'Mavi', 3)]
But if you use ,
lis = [bot_one_info,bot_two_info]
print lis
#Output
[('Mavis', 3), ('Mavi', 3)]
You can use here , instead of +.

How do you construct an array suitable for numpy sorting?

I need to sort two arrays simultaneously, or rather I need to sort one of the arrays and bring the corresponding element of its associated array with it as I sort. That is if the array is [(5, 33), (4, 44), (3, 55)] and I sort by the first axis (labeled below dtype='alpha') then I want: [(3.0, 55.0) (4.0, 44.0) (5.0, 33.0)]. These are really big data sets and I need to sort first ( for nlog(n) speed ) before I do some other operations. I don't know how to merge my two separate arrays though in the proper manner to get the sort algorithm working. I think my problem is rather simple. I tried three different methods:
import numpy
x=numpy.asarray([5,4,3])
y=numpy.asarray([33,44,55])
dtype=[('alpha',float), ('beta',float)]
values=numpy.array([(x),(y)])
values=numpy.rollaxis(values,1)
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')
print "Try 1:\n", values
values=numpy.empty((len(x),2))
for n in range (len(x)):
values[n][0]=y[n]
values[n][1]=x[n]
print "Try 2:\n", values
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')
###
values = [(x[0], y[0]), (x[1],y[1]) , (x[2],y[2])]
print "Try 3:\n", values
values = numpy.array(values, dtype=dtype)
a=numpy.array(values,dtype=dtype)
q=numpy.sort(a,order='alpha')
print "Result:\n",q
I commented out the first and second trys because they create errors, I knew the third one would work because that was mirroring what I saw when I was RTFM. Given the arrays x and y (which are very large, just examples shown) how do I construct the array (called values) that can be called by numpy.sort properly?
*** Zip works great, thanks. Bonus question: How can I later unzip the sorted data into two arrays again?
I think what you want is the zip function. If you have
x = [1,2,3]
y = [4,5,6]
then zip(x,y) == [(1,4),(2,5),(3,6)]
So your array could be constructed using
a = numpy.array(zip(x,y), dtype=dtype)
for your bonus question -- zip actually unzips too:
In [1]: a = range(10)
In [2]: b = range(10, 20)
In [3]: c = zip(a, b)
In [4]: c
Out[4]:
[(0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)]
In [5]: d, e = zip(*c)
In [6]: d, e
Out[6]: ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19))
Simon suggested argsort as an alternative approach; I'd recommend it as the way to go. No messy merging, zipping, or unzipping: just access by index.
idx = numpy.argsort(x)
ans = [ (x[idx[i]],y[idx[i]]) for i in idx]
zip() might be inefficient for large arrays. numpy.dstack() could be used instead of zip:
ndx = numpy.argsort(x)
values = numpy.dstack((x[ndx], y[ndx]))
I think you just need to specify the axis that you are sorting on when you have made your final ndarray. Alternatively argsort one of the original arrays and you'll have an index array that you can use to look up in both x and y, which might mean you don't need values at all.
(scipy.org seems to be unreachable right now or I would post you a link to some docs)
Given that your description doesn't quite match your code snippet it's hard to say with certainty, but I think you have over-complicated the creation of your numpy array.
I couldn't get a working solution using Numpy's sort function, but here's something else that works:
import numpy
x = [5,4,3]
y = [33,44,55]
r = numpy.asarray([(x[i],y[i]) for i in numpy.lexsort([x])])
lexsort returns the permutation of the array indices which puts the rows in sorted order. If you wanted your results sorted on multiple keys, e.g. by x and then by y, use numpy.lexsort([x,y]) instead.

Categories

Resources