I'm trying to figure out how to manually enumerate a list however I'm stuck as I cannot figure out how to split up the data list. This is the code I have so far..
enumerated_list = []
data = [5, 10, 15]
for x in (data):
print(x)
for i in range(len(data)):
enumerate_rule = (i, x)
enumerated_list.append(enumerate_rule)
print(enumerated_list)
This prints out..
5
10
15
[(0, 15), (1, 15), (2, 15)]
When what I'm after is [(0, 5), (1, 15), (2, 15)]. How would I go about this?
Use the enumerate() built-in:
>>> list(enumerate([5, 15, 15]))
[(0, 5), (1, 15), (2, 15)]
Your original code's fault lies in the fact you use x in your loop, however, x doesn't change in that loop, it's simply left there from the previous loop where you printed values.
However, this method of doing it is a bad way. Fixing it would require looping by index, something which Python isn't designed to do - it's slow and hard to read. Instead, we loop by value. The enumerate() built-in is there to do this job for us, as it's a reasonably common task.
If you really don't want to use enumerate() (which doesn't ever really make sense, but maybe as an arbitrary restriction trying to teach you about something else, at a stretch), then there are still better ways:
>>> from itertools import count
>>> list(zip(count(), [5, 15, 15]))
[(0, 5), (1, 15), (2, 15)]
Here we use zip(), which is the python function used to loop over two sets of data at once. This returns tuples of the first value from each iterable, then the second from each, etc... This gives us the result we want when combined with itertools.count(), which does what it says on the tin.
If you really feel the need to build a list manually, the more pythonic way of doing something rather unpythonic would be:
enumerated_list = []
count = 0
for item in data:
enumerated_list.append((count, item))
count += 1
Note, however, that generally, one would use a list comprehension to build a list like this - in this case, as soon as one would do that, it makes more sense to use one of the earlier methods. This kind of production of a list is inefficient and hard to read.
Since x goes through every element in `data, at the end of:
for x in (data):
print(x)
x will be the last element. Which is why you get 15 as the second element in each tuple:
[(0, 15), (1, 15), (2, 15)]
You only need one loop:
for i in range(len(data)):
enumerate_rule = (i, data[i]) # data[i] gets the ith element of data
enumerated_list.append(enumerate_rule)
enumerate_rule = (i, x) is the problem. You are using the same value (x, the last item in the list) each time. Change it to enumerate_rule = (i, data[i]).
I would use a normal "for loop" but with enumerated(), so you can use an index i in the loop:
enumerated_list=[]
data = [5, 10, 15]
for i,f in enumerate(data):
enumerated_list.append((i,f))
print enumerated_list
Result:
[(0, 5), (1, 15), (2, 15)]
Related
this is my code
def width2colspec(widths):
tupleback = []
a=0
for w in widths:
b=a+w
tupleback.append((a,a+w))
a=b
return tupleback
eg:
widths=[15,9,50,10]
width2colspec(widths)
Result:
[(0, 15), (15, 24), (24, 74), (74,84)]
(first one always has to be a zero)
It works and all(maybe not very elegant tho)
To practice I tried to convert it into a list comprehension one liner but i couldn't make it work, closest i got was this.
widths=[15,9,50,10]
colspecs=list((widths[i],widths[i]+widths[i+1]) for i in range(len(widths)-1))
result:
[(0, 15), (15, 24), (9, 59), (50,60)]
(its not maintaining the sum trough the loop)
So my question is, is it possible?
You can do this as a pure list comprehension, but it involves a lot of re-computation, so I wouldn't recommend actually doing it this way.
Start by building a list of all the sums (note that this is re-summing the same numbers over and over, so it's less efficient than your original code that keeps a running sum):
>>> [sum(widths[:i]) for i in range(len(widths)+1)]
[0, 15, 24, 74, 84]
and then iterate over that to produce your list of tuples:
>>> [tuple([sum(widths[:i]) for i in range(len(widths)+1)][i:i+2]) for i in range(len(widths))]
[(0, 15), (15, 24), (24, 74), (74, 84)]
Note that we're now re-computing all those sums again because we didn't assign the original list to a variable.
If you're using Python 3.8 or later, you can use an assignment expression (utilizing the walrus operator :=) to do this in a single list comprehension. I'm "cheating" a little by initializing last via a default argument, but this isn't strictly necessary (I just wanted to make it a "one-liner").
def width2colspec(widths, last=0):
return [(last, last := last+width) for width in widths]
I don't find it that confusing, to be honest.
I have a large list myList containing tuples.
I need to remove the duplicates in this list (that is the tuples with same elements in the same order). I also need to keep track of this list's indices in a separate list, indexList. If I remove a duplicate, I need to change its index in indexList to first identical value's index.
To demonstrate what I mean, if myList looks like this:
myList = [(6, 2), (4, 3), (6, 2), (8, 1), (5, 4), (4, 3), (2, 1)]
Then I need to construct indexList like this:
indexList = (0, 1, 0, 2, 3, 1, 4)
Here the third value is identical to first, so it (third value) gets index 0. Also the subsequent value gets an updated index of 2 and so on.
Here is how I achieved this:
unique = set()
i = 0
for v in myList[:]:
if v not in unique:
unique.add(v)
indexList.append(i)
i = i+1
else:
myList.pop(i)
indexList.append(myList.index(v))
This does what I need. However index() method makes the script very slow when myList contains hundreds of thousands of elements. As I understand this is because it's an O(n) operation.
So what changes could I make to achieve the same result but make it faster?
If you make a dict to store the first index of each value, you can do the lookup in O(1) instead of O(n). So in this case, before the for loop, do indexes = {}, and then in the if block, do indexes[v] = i and in the else block use indexes[v] instead of myList.index(v).
So, I wanted a set of XY positions that would be different to each other. In order to do this, I had used a list to store the variables XY, which were randomly generated. If the position was not in the list, it would be added to it, and if it were in the list, then it would remake a position for it.
I am unsure if this will work in all instances, and wonder if there is a better way of doing this.
import random
positionList = []
for i in range(6):
position = [random.randint(0,5),random.randint(0,5)]
print("Original", position)
while position in positionList:
position = [random.randint(0,5),random.randint(0,5)]
positionList.append(position)
print(position)
Could the remade position be the same as other positions in the list?
Could the remade position be the same as other positions in the list?
Yes, because you are using random. If you want to be sure that you are preserving the unique items you can use a set object for that aim which preserve the unique items for you. But note that since lists are not hashable objects you should use a hashable container for pairs (like tuple):
>>> position_set = set()
>>>
>>> while len(position_set) != 6:
... position = (random.randint(0,5), random.randint(0,5))
... position_set.add(position)
...
>>> position_set
set([(3, 2), (5, 0), (2, 5), (5, 2), (1, 0), (3, 5)])
If you really need lists, you can convert, if not just leave the code as is:
import random
position_set = set()
for i in range(6):
position = random.randint(0, 5), random.randint(0, 5)
print("Original", position)
while position in position_set:
position = random.randint(0, 5), random.randint(0, 5)
position_set.add(position)
print(position)
print(position_set)
A set lookup is O(1) vs O(n) for a list, since order seems irrelevant just use a set altogether is probably sufficient.
To be sure of 6 different elements, you can use random.shuffle :
from random import shuffle
all=[(x,y) for x in range(5) for y in range(5)]
shuffle(all)
print(all[:6])
"""
[(0, 1), (3, 4), (1, 1), (1, 3), (4, 3), (0, 0)]
"""
I just run your code, and it seems to work pretty fine. I believe it is correct. Let's consider your while loop.
You check if the randomly generated 'position' is already in the 'positionList' list. The mere statement:
position in positionList
returns either True or False. If the 'position' already appears in your list, the while loop gets executed. And you simply calculate another random position.
The only advice I could give is to add a loop counter. When you run out of possible XY positions, the loop runs forever.
This is one way of doing it
import random
my_list =[]
num_of_points = 6
while True:
position = [random.randint(0,5),random.randint(0,5)]
if position not in my_list:
my_list.append(position)
print num_of_points
num_of_points -=1
if (num_of_points == 0):
break
print my_list
of course you just need to make sure that the number of possible random pairs exceeds the num_op_points value.
In the list of tuples called mixed_sets, three separate sets exist. Each set contains tuples with values that intersect. A tuple from one set will not intersect with a tuple from another set.
I've come up with the following code to sort out the sets. I found that the python set functionality was limited when tuples are involved. It would be nice if the set intersection operation could look into each tuple index and not stop at the enclosing tuple object.
Here's the code:
mixed_sets= [(1,15),(2,22),(2,23),(3,13),(3,15),
(3,17),(4,22),(4,23),(5,15),(5,17),
(6,21),(6,22),(6,23),(7,15),(8,12),
(8,15),(9,19),(9,20),(10,19),(10,20),
(11,14),(11,16),(11,18),(11,19)]
def sort_sets(a_set):
idx= 0
idx2=0
while len(mixed_sets) > idx and len(a_set) > idx2:
if a_set[idx2][0] == mixed_sets[idx][0] or a_set[idx2][1] == mixed_sets[idx][1]:
a_set.append(mixed_sets[idx])
mixed_sets.pop(idx)
idx=0
else:
idx+=1
if idx == len(mixed_sets):
idx2+=1
idx=0
a_set.pop(0) #remove first item; duplicate
print a_set, 'a returned set'
return a_set
sorted_sets=[]
for new_set in mixed_sets:
sorted_sets.append(sort_sets([new_set]))
print mixed_sets #Now empty.
OUTPUT:
[(1, 15), (3, 15), (5, 15), (7, 15), (8, 15), (3, 13), (3, 17), (5, 17), (8, 12)] a returned set
[(2, 22), (2, 23), (4, 23), (6, 23), (4, 22), (6, 22), (6, 21)] a returned set
[(9, 19), (10, 19), (10, 20), (11, 19), (9, 20), (11, 14), (11, 16), (11, 18)] a returned set
Now this doesn't look like the most pythonic way of doing this task. This code is intended for large lists of tuples (approx 2E6) and I felt the program would run quicker if it didn't have to check tuples already sorted. Therefore I used pop() to shrink the mixed_sets list. I found using pop() made list comprehensions, for loops or any iterators problematic, so I've used the while loop instead.
It does work, but is there a more pythonic way of carrying out this task that doesn't use while loops and the idx and idx2 counters?.
Probably you can increase the speed by first computing a set of all the first elements in the tuples in the mixed_sets, and a set of all the second elements. Then in your iteration you can check if the first or the second element is in one of these sets, and find the correct complete tuple using binary search.
Actually you'd need multi-sets, which you can simulate using dictionaries.
Something like[currently not tested]:
from collections import defaultdict
# define the mixed_sets list.
mixed_sets.sort()
first_els = defaultdict(int)
secon_els = defaultdict(int)
for first,second in mixed_sets:
first_els[first] += 1
second_els[second] += 1
def sort_sets(a_set):
index= 0
while mixed_sets and len(a_set) > index:
first, second = a_set[index]
if first in first_els or second in second_els:
if first in first_els:
element = find_tuple(mixed_sets, first, index=0)
first_els[first] -= 1
if first_els[first] <= 0:
del first_els[first]
else:
element = find_tuple(mixed_sets, second, index=1)
second_els[second] -= 1
if second_els[second] <= 0:
del second_els[second]
a_set.append(element)
mixed_sets.remove(element)
index += 1
a_set.pop(0) #remove first item; duplicate
print a_set, 'a returned set'
return a_set
Where "find_tuple(mixed_sets, first, index=0,1)" return the tuple belonging to mixed_sets that has "first" at the given index.
Probably you'll have to duplicate also mixed_sets and order one of the copies by the first element and the other one by the second element.
Or maybe you could play with dictionaries again. Adding to the values in "first_els" and "second_els" also a sorted list of tuples.
I don't know how the performances will scale, but I think that if the data is in the order of 2 millions you shouldn't have too much to worry about.
I need to sort two arrays simultaneously, or rather I need to sort one of the arrays and bring the corresponding element of its associated array with it as I sort. That is if the array is [(5, 33), (4, 44), (3, 55)] and I sort by the first axis (labeled below dtype='alpha') then I want: [(3.0, 55.0) (4.0, 44.0) (5.0, 33.0)]. These are really big data sets and I need to sort first ( for nlog(n) speed ) before I do some other operations. I don't know how to merge my two separate arrays though in the proper manner to get the sort algorithm working. I think my problem is rather simple. I tried three different methods:
import numpy
x=numpy.asarray([5,4,3])
y=numpy.asarray([33,44,55])
dtype=[('alpha',float), ('beta',float)]
values=numpy.array([(x),(y)])
values=numpy.rollaxis(values,1)
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')
print "Try 1:\n", values
values=numpy.empty((len(x),2))
for n in range (len(x)):
values[n][0]=y[n]
values[n][1]=x[n]
print "Try 2:\n", values
#values = numpy.array(values, dtype=dtype)
#a=numpy.array(values,dtype=dtype)
#q=numpy.sort(a,order='alpha')
###
values = [(x[0], y[0]), (x[1],y[1]) , (x[2],y[2])]
print "Try 3:\n", values
values = numpy.array(values, dtype=dtype)
a=numpy.array(values,dtype=dtype)
q=numpy.sort(a,order='alpha')
print "Result:\n",q
I commented out the first and second trys because they create errors, I knew the third one would work because that was mirroring what I saw when I was RTFM. Given the arrays x and y (which are very large, just examples shown) how do I construct the array (called values) that can be called by numpy.sort properly?
*** Zip works great, thanks. Bonus question: How can I later unzip the sorted data into two arrays again?
I think what you want is the zip function. If you have
x = [1,2,3]
y = [4,5,6]
then zip(x,y) == [(1,4),(2,5),(3,6)]
So your array could be constructed using
a = numpy.array(zip(x,y), dtype=dtype)
for your bonus question -- zip actually unzips too:
In [1]: a = range(10)
In [2]: b = range(10, 20)
In [3]: c = zip(a, b)
In [4]: c
Out[4]:
[(0, 10),
(1, 11),
(2, 12),
(3, 13),
(4, 14),
(5, 15),
(6, 16),
(7, 17),
(8, 18),
(9, 19)]
In [5]: d, e = zip(*c)
In [6]: d, e
Out[6]: ((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19))
Simon suggested argsort as an alternative approach; I'd recommend it as the way to go. No messy merging, zipping, or unzipping: just access by index.
idx = numpy.argsort(x)
ans = [ (x[idx[i]],y[idx[i]]) for i in idx]
zip() might be inefficient for large arrays. numpy.dstack() could be used instead of zip:
ndx = numpy.argsort(x)
values = numpy.dstack((x[ndx], y[ndx]))
I think you just need to specify the axis that you are sorting on when you have made your final ndarray. Alternatively argsort one of the original arrays and you'll have an index array that you can use to look up in both x and y, which might mean you don't need values at all.
(scipy.org seems to be unreachable right now or I would post you a link to some docs)
Given that your description doesn't quite match your code snippet it's hard to say with certainty, but I think you have over-complicated the creation of your numpy array.
I couldn't get a working solution using Numpy's sort function, but here's something else that works:
import numpy
x = [5,4,3]
y = [33,44,55]
r = numpy.asarray([(x[i],y[i]) for i in numpy.lexsort([x])])
lexsort returns the permutation of the array indices which puts the rows in sorted order. If you wanted your results sorted on multiple keys, e.g. by x and then by y, use numpy.lexsort([x,y]) instead.