windowed selection from a list in python - python

Suppose I have a list
a=[0,1,2,3,4,5,6,7,8,9]
I randomly choose an index using,
i = np.random.choice(np.arange(a.size))
and now, I want to select a symmetric interval around i of some size, say 2. So for example for the given list, if the index i = 5 is selected, I get something like
print(a[i-2:i+2+1])
>> [3, 4, 5, 6, 7]
This works fine
however, if i happens to be near one of the end points, i = 1. Using what I have i get,
print(a[i-2:i+2+1])
>> []
Instead I want something it to print an asymmetric interval, like [0, 1, 2, 3]
if i = 8
print(a[i-2:i+2+1])
>> [6, 7, 8, 9]
like I want it to too, so being near the end point doesn't seem to matter. The closest I have gotten to a solution is (say for i = 1)
print([a[0:i+3] if a[i-2:i+2+1] == [] else a[i-2:i+2+1] ])
>> [[0, 1, 2, 3]]
But this returns, [[0,1,2,3]] instead of [0,1,2,3]
Is there a nice way to do this in python/numpy using list comprehension or something else?

You just need go clip the lower index at zero:
>>> print(a[max(i-2,0):i+2+1])
[0, 1, 2, 3]
Without this, it can got into negative numbers. This has special meaning in slicing: negative indices count from the end of the list.

You've tripped across the right-end numbering of Python. You gave it the limits [-1:3], but -1 denotes the right-hand element. Since the "first" element is past the "last" element, the resulting slice is 0. You won't have this problem on the high indices, because there's no "wrap-around" on that end.
Simply drive the lower index to a rail of 0, using max.
print(a[max(i-2, 0)]:i+2+1])

The problem is that when you exit the bounds of the list, it returns an empty list. Try:
a=[0,1,2,3,4,5,6,7,8,9]
i = 1
interval=2
print( a[ max(i-2, 0) : min(i+2+1, len(a)) ] )
I just put max/min bounds on it so it doesn't escape it. Not very pythonic, but it's a quick fix.

Related

How to find values and slicing a list? [duplicate]

Consider the following simple python code
>>> L = range(3)
>>> L
[0, 1, 2]
We can take slices of this array as follows:
>>> L[1:3]
[1, 2]
Is there any way to wrap around the above array by shifting to the left
[1, 2, 0]
by simply using slice operations?
Rotate left n elements (or right for negative n):
L = L[n:] + L[:n]
Note that collections.deque has support for rotations. It might be better to use that instead of lists.
Left:
L[:1], L[1:] = L[-1:], L[:-1]
Right:
L[-1:], L[:-1] = L[:1], L[1:]
To my mind, there's no way, unless you agree to cut and concatenate lists as shown above.
To make the wrapping you describe you need to alter both starting and finishing index.
A positive starting index cuts away some of initial items.
A negative starting index gives you some of the tail items, cutting initial items again.
A positive finishing index cuts away some of the tail items.
A negative finishing index gives you some of the initial items, cutting tail items again.
No combination of these can provide the wrapping point where tail items are followed by initial items. So the entire thing can't be created.
Numerous workarounds exist. See answers above, see also itertools.islice and .chain for a no-copy sequential approach if sequential access is what you need (e.g. in a loop).
If you are not overly attached to the exact slicing syntax, you can write a function that produces the desired output including the wrapping behavior.
E.g., like this:
def wrapping_slice(lst, *args):
return [lst[i%len(lst)] for i in range(*args)]
Example output:
>>> L = range(3)
>>> wrapping_slice(L, 1, 4)
[1, 2, 0]
>>> wrapping_slice(L, -1, 4)
[2, 0, 1, 2, 0]
>>> wrapping_slice(L, -1, 4, 2)
[2, 1, 0]
Caveat: You can't use this on the left-hand side of a slice assignment.

Sorting Function. Explanantion

def my_sort(array):
length_of_array = range(1, len(array))
for i in length_of_array:
value = array[i]
last_value = array[i-1]
if value<last_value:
array[i]=last_value
array[i-1]=value
my_sort(array)
return array
I know what the function does in general. Its a sorting alogarithm.... But i dont know how what each individual part/section does.
Well, I have to say that the best way to understand this is to experiment with it, learn what it is using, and, basically, learn Python. :)
However, I'll go through the lines one-by-one to help:
Define a function named my_sort that accepts one argument named array. The rest of the lines are contained in this function.
Create a range of numbers using range that spans from 1 inclusive to the length of array non-inclusive. Then, assign this range to the variable length_of_array.
Start a for-loop that iterates through the range defined in the preceding line. Furthermore, assign each number returned to the variable i. This for-loop encloses lines 4 through 9.
Create a variable value that is equal to the item returned by indexing array at position i.
Create a variable last_value that is equal to the item returned by indexing array at position i-1.
Test if value is less than last_value. If so, run lines 7 through 9.
Make the i index of array equal last_value.
Make the i-1 index of array equal value.
Rerun my_sort recursively, passing in the argument array.
Return array for this iteration of the recursive function.
When array is finally sorted, the recursion will end and you will be left with array all nice and sorted.
I hope this shed some light on the subject!
I'll see what I can do for you. The code, for reference:
def my_sort(array):
length_of_array = range(1, len(array))
for i in length_of_array:
value = array[i]
last_value = array[i-1]
if value<last_value:
array[i]=last_value
array[i-1]=value
my_sort(array)
return array
def my_sort(array):
A function that takes an array as an argument.
length_of_array = range(1, len(array))
We set the variable length_of_array to a range of numbers that we can iterate over, based on the number of items in array. I assume you know what range does, but if you don't, in short you can iterate over it in the same way you'd iterate over a list. (You could also use xrange() here.)
for i in length_of_array:
value = array[i]
last_value = array[-1]
What we're doing is using the range to indirectly traverse the array because there's the same total of items in each. If we look closely, though, value uses the i as its index, which starts off at 1, so value is actually array[1], and last_value is array[1-1] or array[0].
if value<last_value:
array[i]=last_value
array[i-1]=value
So now we're comparing the values. Let's say we passed in [3, 1, 3, 2, 6, 4]. We're at the first iteration of the loop, so we're essentially saying, if array[1], which is 1, is less than array[0], which is 3, swap them. Of course 1 is less than 3, so swap them we do. But since the code can only compare each item to the previous item, there's no guarantee that array will be properly sorted from lowest to highest. Each iteration could unswap a properly swapped item if the item following it is larger (e.g. [2,5,6,4] will remain the same on the first two iterations -- they will be skipped over by the if test -- but when it hits the third, 6 will swap with 4, which is still wrong). In fact, if we were to finish this out without the call to my_sort(array) directly below it, our original array would evaluate to [1, 3, 2, 3, 4, 6]. Not quite right.
my_sort(array)
So we call my_sort() recursively. What we're basically saying is, if on the first iteration something is wrong, correct it, then pass the new array back to my_sort(). This sounds weird at first, but it works. If the if test was never satisfied at all, that would mean each item in our original list was smaller than the next, which is another way (the computer's way, really) of saying it was sorted in ascending order to begin with. That's the key. So if any list item is smaller than the preceding item, we jerk it one index left. But we don't really know if that's correct -- maybe it needs to go further still. So we have to go back to the beginning and (i.e., call my_sort() again on our newly-minted list), and recheck to see if we should pull it left again. If we can't, the if test fails (each item is smaller than the next) until it hits the next error. On each iteration, this teases the same smaller number leftward by one index until it's in its correct position. This sounds more confusing than it is, so let's just look at the output for each iteration:
[3, 1, 3, 2, 6, 4]
[1, 3, 3, 2, 6, 4]
[1, 3, 2, 3, 6, 4]
[1, 2, 3, 3, 6, 4]
[1, 2, 3, 3, 4, 6]
Are you seeing what's going on? How about if we only look at what's changing on each iteration:
[3, 1, ... # Wrong; swap. Further work ceases; recur (return to beginning with a fresh call to my_sort()).
[1, 3, 3, 2, ... # Wrong; swap. Further work ceases; recur
[1, 3, 2, ... # Wrong; swap. Further work ceases; recur
[1, 2, 3, 3, 6, 4 # Wrong; swap. Further work ceases; recur
[1, 2, 3, 3, 4, 6] # All numbers all smaller than following number; correct.
This allows the function to call itself as many times as it needs to pull a number from the back to the front. Again, each time it's called, it focuses on the first wrong instance, pulling it one left until it puts it in its proper position. Hope that helps! Let me know if you're still having trouble.

Selecting unique random values from the third column of a an array in python

I have a 41000x3 numpy array that I call "sortedlist" in the function below. The third column has a bunch of values, some of which are duplicates, others which are not. I'd like to take a sample of unique values (no duplicates) from the third column, which is sortedlist[:,2]. I think I can do this easily with numpy.random.sample(sortedlist[:,2], sample_size). The problem is I'd like to return, not only those values, but all three columns where, in the last column, there are the randomly chosen values that I get from numpy.random.sample.
EDIT: By unique values I mean I want to choose random values which appear only once. So If I had an array:
array = [[0, 6, 2]
[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[3, 1, 1]
[5, 2, 8]]
And I wanted to choose 4 values of the third column, I want to get something like new_array_1 out:
new_array_1 = [[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[5, 2, 8]]
But I don't want something like new_array_2, where two values in the 3rd column are the same:
new_array_2 = [[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[3, 1, 1]]
I have the code to choose random values but without the criterion that they shouldn't be duplicates in the third column.
samplesize = 100
rand_sortedlist = sortedlist[np.random.randint(len(sortedlist), size = sample_size),:]]
I'm trying to enforce this criterion by doing something like this
array_index = where( array[:,2] == sample(SelectionWeight, sample_size) )
But I'm not sure if I'm on the right track. Any help would be greatly appreciated!
I can't think of a clever numpythonic way to do this that doesn't involve multiple passes over the data. (Sometimes numpy is so much faster than pure Python that's still the fastest way to go, but it never feels right.)
In pure Python, I'd do something like
def draw_unique(vec, n):
# group indices by value
d = {}
for i, x in enumerate(vec):
d.setdefault(x, []).append(i)
drawn = [random.choice(d[k]) for k in random.sample(d, n)]
return drawn
which would give
>>> a = np.random.randint(0, 10, (41000, 3))
>>> drawn = draw_unique(a[:,2], 3)
>>> drawn
[4219, 6745, 25670]
>>> a[drawn]
array([[5, 6, 0],
[8, 8, 1],
[5, 8, 3]])
I can think of some tricks with np.bincount and scipy.stats.rankdata but they hurt my head, and there always winds up being one step at the end I can't see how to vectorize.. and if I'm not vectorizing the whole thing I might as well use the above which at least is simple.
I believe this will do what you want. Note that the running time will almost certainly be dominated by whatever method you use to generate your random numbers. (An exception is if the dataset is gigantic but you only need a small number of rows, in which case very few random numbers need to be drawn.) So I'm not sure this will run much faster than a pure python method would.
# arrayify your list of lists
# please don't use `array` as a variable name!
a = np.asarray(arry)
# sort the list ... always the first step for efficiency
a2 = a[np.argsort(a[:, 2])]
# identify rows that are duplicates (3rd column is non-increasing)
# Note this has length one less than a2
duplicate_rows = np.diff(a2[:, 2]) == 0)
# if duplicate_rows[N], then we want to remove row N and N+1
keep_mask = np.ones(length(a2), dtype=np.bool) # all True
keep_mask[duplicate_rows] = 0 # remove row N
keep_mask[1:][duplicate_rows] = 0 # remove row N + 1
# now actually slice the array
a3 = a2[keep_mask]
# select rows from a3 using your preferred random number generator
# I actually prefer `random` over numpy.random for sampling w/o replacement
import random
result = a3[random.sample(xrange(len(a3)), DESIRED_NUMBER_OF_ROWS)]

Wrapping around a list as a slice operation

Consider the following simple python code
>>> L = range(3)
>>> L
[0, 1, 2]
We can take slices of this array as follows:
>>> L[1:3]
[1, 2]
Is there any way to wrap around the above array by shifting to the left
[1, 2, 0]
by simply using slice operations?
Rotate left n elements (or right for negative n):
L = L[n:] + L[:n]
Note that collections.deque has support for rotations. It might be better to use that instead of lists.
Left:
L[:1], L[1:] = L[-1:], L[:-1]
Right:
L[-1:], L[:-1] = L[:1], L[1:]
To my mind, there's no way, unless you agree to cut and concatenate lists as shown above.
To make the wrapping you describe you need to alter both starting and finishing index.
A positive starting index cuts away some of initial items.
A negative starting index gives you some of the tail items, cutting initial items again.
A positive finishing index cuts away some of the tail items.
A negative finishing index gives you some of the initial items, cutting tail items again.
No combination of these can provide the wrapping point where tail items are followed by initial items. So the entire thing can't be created.
Numerous workarounds exist. See answers above, see also itertools.islice and .chain for a no-copy sequential approach if sequential access is what you need (e.g. in a loop).
If you are not overly attached to the exact slicing syntax, you can write a function that produces the desired output including the wrapping behavior.
E.g., like this:
def wrapping_slice(lst, *args):
return [lst[i%len(lst)] for i in range(*args)]
Example output:
>>> L = range(3)
>>> wrapping_slice(L, 1, 4)
[1, 2, 0]
>>> wrapping_slice(L, -1, 4)
[2, 0, 1, 2, 0]
>>> wrapping_slice(L, -1, 4, 2)
[2, 1, 0]
Caveat: You can't use this on the left-hand side of a slice assignment.

average of the list in Python

I have a problem: i need to find an average of the list using this scheme:
First of all, we find an average of two elements, three elements..... len(list) elements and form a new list using averages. The use .pop() and find all averages again. Function should stop when len(list) == 2. Recursion should be used.
Example:
list: [-1, 4, 8, 1]
1 step:
find an average of [-1, 4], [-1, 4, 8], [-1, 4, 8, 1]
Then we form a new list: [1.5, 3.66..., 3] (averages)
Then find averages of new list: [1.5, 3.66...], [1.5, 3.66..., 3]
Then we form a new list: [2.5833.., 7.222...] (averages)
When len(list) == 2, find an average of this two elements.
Answer is 2.652777.
What should i write:
jada = []
while True:
print 'Lst elements:'
a = input()
if (a == ''):
break
jada.append(a)
print 'Lst is:' + str(Jada)
def keskmine(Jada):
for i in range(len(Jada) - 1):
...
jada.pop()
return keskmine(Jada)
Actually, this is a part of a homework, but i don't know how to solve it.
Accept the list as the function argument. If the list has one item, return that. Create two iterators from the list. Pop one item off one of the lists, zip them together, then find the averages of the zip results. Recurse.
In short, you're finding the "running average" from a list of numbers.
Using recursion would be helpful here. Return the only element when "len(lst) == 1" otherwise, compute the running average and recurse.
There are two parts in this assignment. First, you need to transform lists like [-1, 4, 8, 1] to lists like [1.5, 3.66, 3] (find the running averages). Second, you need to repeat this process with the result of the running averages until your list's length is 2 (or 1).
You can tackle the first problem (find the running averages) independently from the second. Finding the running average is simple, you first keep track of the running sum (e.g. if the list is [-1, 4, 8, 1] the running sum is [-1, 3, 11, 12]) and divide each elements by their respective running index (i.e. just [1, 2, 3, 4]), to get [-1/1, 3/2, 11/3, 12/4] = [-1, 1.5, 3.66, 3]. Then you can discard the first element to get [1.5, 3.66, 3].
The second problem can be easily solved using recursion. Recursion is just another form of looping, all recursive code can be transformed to a regular for/while-loops code and all looping code can be transformed to recursive code. However, some problems have a tendency towards a more "natural" solution in either recursion or looping. In my opinion, the second problem (repeating the process of taking running averages) is more naturally solved using recursion. Let's assume you have solved the first problem (of finding the running average) and we have a function runavg(lst) to solve the first problem. We want to write a function which repeatedly find the running average of lst, or return the average when the lst's length is 2.
First I'll give you an explanation, and then some pseudo code, which you'll have to rewrite in Python. The main idea is to have one function that calls itself passing a lesser problem with each iteration. In this case you would like to decrease the number of items by 1.
You can either make a new list with every call, or reuse the same one if you'd like. Before passing on the list to the next iteration, you will need to calculate the averages thus creating a shorter list.
The idea is that you sum the numbers in a parameter and divide by the number of items you've added so far into the appropriate index in the list. Once you are done, you can pop the last item out.
The code should look something like this: (indexes in sample are zero based)
average(list[])
if(list.length == 0) // Check input and handle errors
exit
if(list.length == 1) // Recursion should stop
return list[0] // The one item is it's own average!
// calculate the averages into the list in indices 0 to length - 2
list.pop() // remove the last value
return average(list) // the recursion happens here
This is also an opportunity to use python 3.x itertools.accumulate:
From docs:
>>> list(accumulate(8, 2, 50))
[8, 10, 60]
Then, you only need to divide each item by its index increased by 1, eliminate the first element and repeat until finished
For example, this works for any list of any length, doing most of the above-indicated steps inside a list comprehension:
>>> from itertools import accumulate
>>> a = [-1, 4, 8, 1]
>>> while len(a) > 1:
a = [item / (index + 1) for (index, item) in enumerate(accumulate(a)) if index > 0]
>>> print(a)
[2.6527777777777777]

Categories

Resources