Slices with several intervals?

Slices with several intervals? - python

I have a file where I want to extract columns 2, 3, 4, 5 and column -4. These columns are not adjacent.
For reasons of code neatness I'd like to do something like
values = line.split()[columns_to_extract]
instead of
values_part_one = line.split()[columns_to_extract_one]
values_part_two = line.split()[columns_to_extract_two]
Therefore I'd like to make a slice that contains the positions 2, 3, 4, 5 and -4 to be able to extract the values in one line. Is this possible?
If not, are there any other neat oneliners that could do this?

Is it possible to make a slice to do that? No.
However, all is not lost! You can use operator.itemgetter:
getter = operator.itemgetter(2, 3, 4, 5, -4)
example:
>>> import operator
>>> getter = operator.itemgetter(2, 3, 4, 5, -4)
>>> getter(range(50)) # Note, returns a `tuple`
(2, 3, 4, 5, 46)

parts = line.split()
values_part_one = [parts[i] for i in columns_to_extract_one]
values_part_two = [parts[i] for i in columns_to_extract_two]
or, as #mgilson points out, you could use operator.itemgetter to get tuples:
import operator
extract_one = operator.itemgetter(*columns_to_extract_one) # or list explicitly
extract_two = operator.itemgetter(*columns_to_extract_two) # if using fixed cols
parts = line.split()
values_part_one = extract_one(parts)
values_part_Two = extract_two(parts)
Note that both of these will fail with IndexError if the thing you are trying to grab from isn't large enough to contain all of the specified indices.

Related

Tuple Indexing, Combine Slice and Index

i have got a question about tuple indexing and slicing in python. I want to write better and clearer code. This is a simplified version of my problem:
I have a tuple a = (1,2,3,4,5) and i want to index into it so that i get b = (1,2,4).
Is it possible to do this in one operation or do i have do b = a[0:2] + (a[3],)? I have thought about indexing with another tuple, which is not possible, i have also searched if there is a way to combine a slice and an index. It just seems to me, that there must be a better way to do it.
Thank you very much :)

Just for fun, you can implement an indexer and use simple syntax to get the results:
from collections.abc import Sequence
class SequenceIndexer:
def __init__(self, seq: Sequence):
self.seq = seq
def __getitem__(self, item):
if not isinstance(item, tuple):
return self.seq[item]
result = []
for i in item:
if isinstance(i, slice):
result.extend(self.seq[i])
else:
result.append(self.seq[i])
return result
a = 1, 2, 3, 4, 5
indexer = SequenceIndexer(a)
print(indexer[:2, 3]) # [1, 2, 4]
print(indexer[:3, 2, 4, -3:]) # [1, 2, 3, 3, 5, 3, 4, 5]

As stated in the comment you can use itemgetter
import operator
a = (1,2,3,4)
result = operator.itemgetter(0, 1, 3)(a)
If you do this kind of indexing often and your arrays are big, you might also have a look at numpy as it supports more complex indexing schemes:
import numpy as np
a = (1,2,3,4)
b = np.array(a)
result = b[[0,1,3]]
If the array is very large and or you have a huge list of single points you want to include it will get convinient to construct the slicing list in advance. For the example above it will look like this:
singlePoints = [3,]
idx = [*range(0,2), *singlePoints]
result = b[idx]

you can easily use unpacking but dont name the parameters you dont need you can name them as underscore (_).
one, two, _, four, _ = (1, 2, 3, 4, 5)
now you have your numbers.
this is one simple way.
b = (one, two, four)
or you can use numpy as well.

Create list of class variables from list of objects

Tricky to word the title well.
I want to create a list of values that correspond to the variables of a list of objects. It can be inelegently done like this;
class Example:
def __init__(self, x):
self.x = x
objlist = [ Example(i) for i in range(10) ]
DESIRED_OUTCOME = [ obj.x for obj in objlist ]
But this seems unpythonic and cumbersome, so I was wondering if there is a way of indexing all the the values out at one time.
Im wondering if there is a syntax that allows me to take all the variables out at once, like pulling a first axis array from a 2d array;
ex = example2darray[:,1] #2d array syntax
OUTCOME = objlist[:, objlist.x] #Is there something like this that exists?
>>> [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
I hope this question makes sense

Nothing unpythonic about that, IMO, but if you really want to iterate over the x values of your instances 'directly' instead of obtaining them from the object itself, you can map them to operator.attrgetter:
import operator
objlist = [Example(i) for i in range(10)]
DESIRED_OUTCOME = map(operator.attrgetter("x"), objlist)
# [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
Beware that on Python 3.x map() returns an iterator so if you want the a list result make sure to turn it into one. Also, unless you construct Example in a special way, pretty much anything will be slower than the good old list comprehension loop which you consider 'inelegant'.

Python/Numpy fast way to selecting every nth chunk in list

Edited for the confusion in the problem, thanks for the answers!
My original problem was that I have a list [1,2,3,4,5,6,7,8], and I want to select every chunk of size x with gap of one. So if I want to select select every other chunk of size 2, the outcome would be [1,2,4,5,7,8]. A chunk size of three would give me [1,2,3,5,6,7].
I've searched a lot on slicing and I couldn't find a way to select chunks instead of element. Make multiple slice operations then join and sort seems a little too expensive. The input can either be a python list or numpy ndarray. Thanks in advance.

To me it seems, you want to skip one element between chunks until the end of the input list or array.
Here's one approach based on np.delete that deletes that single elements squeezed between chunks -
out = np.delete(A,np.arange(len(A)/(x+1))*(x+1)+x)
Here's another approach based on boolean-indexing -
L = len(A)
avoid_idx = np.arange(L/(x+1))*(x+1)+x
out = np.array(A)[~np.in1d(np.arange(L),avoid_idx)]
Sample run -
In [98]: A = [51,42,13,34,25,68,667,18,55,32] # Input list
In [99]: x = 2
# Thus, [51,42,13,34,25,68,667,18,55,32]
^ ^ ^ # Skip these
In [100]: np.delete(A,np.arange(len(A)/(x+1))*(x+1)+x)
Out[100]: array([ 51, 42, 34, 25, 667, 18, 32])
In [101]: L = len(A)
...: avoid_idx = np.arange(L/(x+1))*(x+1)+x
...: out = np.array(A)[~np.in1d(np.arange(L),avoid_idx)]
...:
In [102]: out
Out[102]: array([ 51, 42, 34, 25, 667, 18, 32])

First off, you can create an array of indices then use np.in1d() function in order to extract the indices that should be omit then with a simple not operator get the indices that must be preserve. And at last pick up them using a simple boolean indexing:
>>> a = np.array([1,2,3,4,5,6,7,8])
>>> range_arr = np.arange(a.size)
>>>
>>> a[~np.in1d(range_arr,range_arr[2::3])]
array([1, 2, 4, 6, 8])
General approach:
>>> range_arr = np.arange(np_array.size)
>>> np_array[~np.in1d(range_arr,range_arr[chunk::chunk+1])]

Using a pure python solution:
This assumes the desired items are: [yes, yes, no, yes, yes, no, ...]
Quicker to code, slower to run:
data = [1, 2, 3, 4, 5, 6, 7, 8]
filtered = [item for i, item in enumerate(data) if i % 3 != 2]
assert filtered == [1, 2, 4, 5, 7, 8]
Slightly slower to write, but faster to run:
from itertools import cycle, compress
data = [1, 2, 3, 4, 5, 6, 7, 8]
selection_criteria = [True, True, False]
filtered = list(compress(data, cycle(selection_criteria)))
assert filtered == [1, 2, 4, 5, 7, 8]
The second example runs in 66% of the time the first example does, and is also clearer and easier to change the selection criteria

A simple list solution
>> ll = [1,2,3,4,5,6,7,8]
>> list(itertools.chain(*zip(ll[::3],ll[1::3])))
[1, 2, 4, 5, 7, 8]
At least for this case of chunks of size 2, skipping one value between chunks. The number ll[] slicings determine the chunk size, and the slicing step determines the chunk spacing.
As I commented there is some ambiguity in the problem description, so I hesitate to generalize this solution more until that is cleared up.
It may be easier to generalize the numpy solutions, but they aren't necessarily faster. Conversion to arrays has a time overhead.
list(itertools.chain(*zip(*[ll[i::6] for i in range(3)])))
produces chunks of length 3, skipping 3 elements.
zip(*) is an idiomatic way of 'transposing' a list of lists
itertools.chain(*...) is an idiomatic way of a flattening a list of lists.
Another option is a list comprehension with a condition based on item count
[v for i,v in enumerate(ll) if i%3]
handily skips every 3rd item, same as your example. (0<(i%6)<4) keeps 3, skips 3.

This should do the trick:
step = 3
size = 2
chunks = len(input) // step
input = np.asarray(input)
result = input[:chunks*step].reshape(chunks, step)[:, :size]

A simple list comprehension can do the job:
[ L[i] for i in range(len(L)) if i%3 != 2 ]
For chunks of size n
[ L[i] for i in range(len(L)) if i%(n+1) != n ]

Selecting unique random values from the third column of a an array in python

I have a 41000x3 numpy array that I call "sortedlist" in the function below. The third column has a bunch of values, some of which are duplicates, others which are not. I'd like to take a sample of unique values (no duplicates) from the third column, which is sortedlist[:,2]. I think I can do this easily with numpy.random.sample(sortedlist[:,2], sample_size). The problem is I'd like to return, not only those values, but all three columns where, in the last column, there are the randomly chosen values that I get from numpy.random.sample.
EDIT: By unique values I mean I want to choose random values which appear only once. So If I had an array:
array = [[0, 6, 2]
[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[3, 1, 1]
[5, 2, 8]]
And I wanted to choose 4 values of the third column, I want to get something like new_array_1 out:
new_array_1 = [[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[5, 2, 8]]
But I don't want something like new_array_2, where two values in the 3rd column are the same:
new_array_2 = [[5, 3, 9]
[3, 7, 1]
[5, 3, 2]
[3, 1, 1]]
I have the code to choose random values but without the criterion that they shouldn't be duplicates in the third column.
samplesize = 100
rand_sortedlist = sortedlist[np.random.randint(len(sortedlist), size = sample_size),:]]
I'm trying to enforce this criterion by doing something like this
array_index = where( array[:,2] == sample(SelectionWeight, sample_size) )
But I'm not sure if I'm on the right track. Any help would be greatly appreciated!

I can't think of a clever numpythonic way to do this that doesn't involve multiple passes over the data. (Sometimes numpy is so much faster than pure Python that's still the fastest way to go, but it never feels right.)
In pure Python, I'd do something like
def draw_unique(vec, n):
# group indices by value
d = {}
for i, x in enumerate(vec):
d.setdefault(x, []).append(i)
drawn = [random.choice(d[k]) for k in random.sample(d, n)]
return drawn
which would give
>>> a = np.random.randint(0, 10, (41000, 3))
>>> drawn = draw_unique(a[:,2], 3)
>>> drawn
[4219, 6745, 25670]
>>> a[drawn]
array([[5, 6, 0],
[8, 8, 1],
[5, 8, 3]])
I can think of some tricks with np.bincount and scipy.stats.rankdata but they hurt my head, and there always winds up being one step at the end I can't see how to vectorize.. and if I'm not vectorizing the whole thing I might as well use the above which at least is simple.

I believe this will do what you want. Note that the running time will almost certainly be dominated by whatever method you use to generate your random numbers. (An exception is if the dataset is gigantic but you only need a small number of rows, in which case very few random numbers need to be drawn.) So I'm not sure this will run much faster than a pure python method would.
# arrayify your list of lists
# please don't use `array` as a variable name!
a = np.asarray(arry)
# sort the list ... always the first step for efficiency
a2 = a[np.argsort(a[:, 2])]
# identify rows that are duplicates (3rd column is non-increasing)
# Note this has length one less than a2
duplicate_rows = np.diff(a2[:, 2]) == 0)
# if duplicate_rows[N], then we want to remove row N and N+1
keep_mask = np.ones(length(a2), dtype=np.bool) # all True
keep_mask[duplicate_rows] = 0 # remove row N
keep_mask[1:][duplicate_rows] = 0 # remove row N + 1
# now actually slice the array
a3 = a2[keep_mask]
# select rows from a3 using your preferred random number generator
# I actually prefer `random` over numpy.random for sampling w/o replacement
import random
result = a3[random.sample(xrange(len(a3)), DESIRED_NUMBER_OF_ROWS)]

What is more efficient in python new array creation or in place array manipulation?

Say I have an array with a couple hundred elements. I need to iterate of the array and replace one or more items in the array with some other item. Which strategy is more efficient in python in terms of speed (I'm not worried about memory)?
For example: I have an array
my_array = [1,2,3,4,5,6]
I want to replace the first 3 elements with one element with the value 123.
Option 1 (inline):
my_array = [1,2,3,4,5,6]
my_array.remove(0,3)
my_array.insert(0,123)
Option2 (new array creation):
my_array = [1,2,3,4,5,6]
my_array = my_array[3:]
my_array.insert(0,123)
Both of the above will options will give a result of:
>>> [123,4,5,6]
Any comments would be appreciated. Especially if there is options I have missed.

If you want to replace an item or a set of items in a list, you should never use your first option. Removing and adding to a list in the middle is slow (reference). Your second option is also fairly inefficient, since you're doing two operations for a single replacement.
Instead, just do slice assignment, as eiben's answer instructs. This will be significantly faster and more efficient than either of your methods:
>>> my_array = [1,2,3,4,5,6]
>>> my_array[:3] = [123]
>>> my_array
[123, 4, 5, 6]

arr[0] = x
replaces the 0th element with x. You can also replace whole slices.
>>> arr = [1, 2, 3, 4, 5, 6]
>>> arr[0:3] = [8, 9, 99]
>>> arr
[8, 9, 99, 4, 5, 6]
>>>
And generally it's unclear what you're trying to achieve. Please provide more information or an example.
OK, as for your update. The remove method doesn't work (remove needs one argument). But the slicing I presented works for your case too:
>>> arr
[8, 9, 99, 4, 5, 6]
>>> arr[0:3] = [4]
>>> arr
[4, 4, 5, 6]
I would guess it's the fastest method, but do try it with timeit. According to my tests it's twice as fast as your "new array" approach.

If you're looking speed efficience and manipulate series of integers, You should use the standard array module instead:
>>> import array
>>> my_array = array.array('i', [1,2,3,4,5,6])
>>> my_array = my_array[3:]
>>> my_array.insert(0,123)
>>> my_array
array('i', [123, 4, 5, 6])

The key thing is to avoid moving large numbers of list items more than absolutely have to. Slice assignment, as far as i'm aware, still involves moving the items around the slice, which is bad news.
How do you recognise when you have a sequence of items which need to be replaced? I'll assume you have a function like:
def replacement(objects, startIndex):
"returns a pair (numberOfObjectsToReplace, replacementObject), or None if the should be no replacement"
I'd then do:
def replaceAll(objects):
src = 0
dst = 0
while (src < len(objects)):
replacementInfo = replacement(objects, src)
if (replacementInfo != None):
numberOfObjectsToReplace, replacementObject = replacementInfo
else:
numberOfObjectsToReplace = 1
replacementObject = objects[src]
objects[dst] = replacementObject
src = src + numberOfObjectsToReplace
dst = dst + 1
del objects[dst:]
This code still does a few more loads and stores than it absolutely has to, but not many.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Slices with several intervals? - python

Is it possible to make a slice to do that? No. However, all is not lost! You can use operator.itemgetter: getter = operator.itemgetter(2, 3, 4, 5, -4) example: >>> import operator >>> getter = operator.itemgetter(2, 3, 4, 5, -4) >>> getter(range(50)) # Note, returns a `tuple` (2, 3, 4, 5, 46)

Related

Tuple Indexing, Combine Slice and Index

Create list of class variables from list of objects

Python/Numpy fast way to selecting every nth chunk in list

Selecting unique random values from the third column of a an array in python

What is more efficient in python new array creation or in place array manipulation?

Categories

Resources