how to subset list base on array in python - python

I have following column names in a list:
vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']
I have one array which consists of array indexes
(array([1, 5, 7], dtype=int64),)
I want to subset the list based on array indexes
Desired output should be
vars = ['balance','pdays','job_admin.']
I have tried something like this in python
for i, a in enumerate(X):
if i in new_L:
print i
But,it does not work.

Simply use a loop to do that:
result=[]
for i in your_array:
result.append(vars[i])
or one linear
[vars[i] for i in your_array]

If you're using numpy anyway, use its advanced indexing
import numpy as np
vars = ['age','balance','day','duration','campaign','pdays',
'previous','job_admin.','job_blue-collar']
indices = (np.array([1, 5, 7]),)
sub_array = np.asarray(vars)[indices]
# --> array(['balance', 'pdays', 'job_admin.'], dtype='<U15')
or if you want a list
sub_list = np.asarray(vars)[indices].tolist()
# --> ['balance', 'pdays', 'job_admin.']

index = [1, 5, 7]
vars = [vars[i] for i in index]

If I understand correctly, your data are:
vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']
and indexes are:
idx = [1, 5, 7]
Then you can do:
>>> [vars[i] for i in idx]
['balance', 'pdays', 'job_admin.']

You can use operator.itemgetter:
>>> import numpy as np
>>> import operator
>>> vars = ['age','balance','day','duration','campaign','pdays','previous','job_admin.','job_blue-collar']
>>> idx = np.array([1,5,7])
>>> operator.itemgetter(*idx)(vars)
('balance', 'pdays', 'job_admin.'
This is actually the fastest solution posted so far.
>>> from timeit import repeat
>>> kwds = dict(globals=globals(), number=1000000)
>>>
>>> repeat("np.asarray(vars)[idx]", **kwds)
[2.2382465780247003, 2.225632123881951, 2.1969433058984578]
>>> repeat("[vars[i] for i in idx]", **kwds)
[0.9384958958253264, 0.9366465201601386, 0.9373494561295956]
>>> repeat("operator.itemgetter(*idx)(vars)", **kwds)
[0.9045725339092314, 0.9015877249184996, 0.9032398068811744]
Interestingly, it becomes more than twice as fast if we convert idx to a list first, and that's including the cost of conversion:
>>> repeat("operator.itemgetter(*idx.tolist())(vars)", **kwds)
[0.4062491739168763, 0.4086623480543494, 0.4049343201331794]
We can also afford to convert the result to list and still are much faster than all the other solutions:
>>> repeat("list(operator.itemgetter(*idx.tolist())(vars))", **kwds)
[0.561687784967944, 0.5593925788998604, 0.5586365279741585]

Related

How to sort a list based on the output of numpy's argsort function

I have a list like this:
myList = [10,30,40,20,50]
Now I use numpy's argsort function to get the indices for the sorted list:
import numpy as np
so = np.argsort(myList)
which gives me the output:
array([0, 3, 1, 2, 4])
When I want to sort an array using so it works fine:
myArray = np.array([1,2,3,4,5])
myArray[so]
array([1, 4, 2, 3, 5])
But when I apply it to another list, it does not work but throws an error
myList2 = [1,2,3,4,5]
myList2[so]
TypeError: only integer arrays with one element can be converted to an
index
How can I now use so to sort another list without using a for-loop and without converting this list to an array first?
myList2 is a normal python list, and it does not support that kind of indexing.
You would either need to convert that to a numpy.array , Example -
In [8]: np.array(myList2)[so]
Out[8]: array([1, 4, 2, 3, 5])
Or you can use list comprehension -
In [7]: [myList2[i] for i in so]
Out[7]: [1, 4, 2, 3, 5]
You can't. You have to convert it to an array then back.
myListSorted = list(np.array(myList)[so])
Edit: I ran some benchmarks comparing the NumPy way to the list comprehension. NumPy is ~27x faster
>>> from timeit import timeit
>>> import numpy as np
>>> myList = list(np.random.rand(100))
>>> so = np.argsort(myList) #converts list to NumPy internally
>>> timeit(lambda: [myList[i] for i in so])
12.29590070003178
>>> myArray = np.random.rand(100)
>>> so = np.argsort(myArray)
>>> timeit(lambda: myArray[so])
0.42915570305194706

Using the reduce function on a multidimensional array

So i have a particular array, that has 2 seperate arrays withing itself. What I am looking to do is to average together those 2 seperate arrays, so for instance, if i have my original array such as [(2,3,4),(4,5,6)] and I want an output array like [3,5], how would i do this? My attempt to do this is as follows:
averages = reduce(sum(array)/len(array), [array])
>>> map(lambda x: sum(x)/len(x), [(2,3,4),(4,5,6)])
[3, 5]
reduce is not a good choice here. Just use a list comprehension:
>>> a = [(2,3,4),(4,5,6)]
>>> [sum(t)/len(t) for t in a]
[3, 5]
Note that / is integer division by default in python2.
If you have numpy available, you have a nicer option:
>>> import numpy as np
>>> a = np.array(a)
>>> a.mean(axis=1)
array([ 3., 5.])
You can do this with a list comphrehesion:
data = [(2,3,4),(4,5,6)]
averages = [ sum(tup)/len(tup) for tup in data ]

How can I convert a list of strings into numerical values?

I want to find out how to convert a list of strings into a list of numbers.
I have a php form through which user enters values for x and y like this:
X: [1,3,4]
Y: [2,4,5]
These values are stored into database as varchars. From there, these are called by a python program which is supposed to use them as numerical (numpy) arrays. However, these are called as plain strings, which means that calculation can not be performed over them. Is there a way to convert them into numerical arrays before processing or is there something else which is wrong?
You can use list comprehension along with the strip() and split() function to turn this into numeric values.
x = '[1,3,4]'
new_x = [int(i) for i in x.strip('[]').split(',')]
new_x
[1, 3, 4]
Use this list of ints as you see fit, e.g., passing them on to numpy etc.
from numpy import array
a = array(new_x)
a
array([1, 3, 4])
a * 4
array([ 4, 12, 16])
Here's one way:
>>> import numpy
>>> block = "[1,3,4]"
>>> block = block.strip("[]")
>>> a = numpy.fromstring(block, sep=",", dtype=int)
>>> a
array([1, 3, 4])
>>> a*2
array([2, 6, 8])
If I understand your question correctly, you can use the eval() and compile() built-in functions to achieve your aim:
>>> lst_str = '[1,2,3]'
>>> lst_obj = compile(lst_str, '<string>', 'eval')
>>> eval(lst_obj)
[1, 2, 3]
Keep in mind that using eval() in this manner is potentially unsafe, however, unless you can validate the input.
import ast
import numpy as np
def parse_array(s):
return np.array(ast.literal_eval(s))
s = '[1,2,3]'
data = parse_array(s) # -> numpy.array([1,2,3])

how to define an array in python?

i want to define an array in python . how would i do that ? do i have to use list?
Normally you would use a list. If you really want an array you can import array:
import array
a = array.array('i', [5, 6]) # array of signed ints
If you want to work with multidimensional arrays, you could try numpy.
List is better, but you can use array like this :
array('l')
array('c', 'hello world')
array('u', u'hello \u2641')
array('l', [1, 2, 3, 4, 5])
array('d', [1.0, 2.0, 3.14])
More infos there
Why do you want to use an array over a list? Here is a comparison of the two that clearly states the advantages of lists.
There are several types of arrays in Python, if you want a classic array it would be with the array module:
import array
a = array.array('i', [1,2,3])
But you can also use tuples without needing import other modules:
t = (4,5,6)
Or lists:
l = [7,8,9]
A Tuple is more efficient in use, but it has a fixed size, while you can easily add new elements to lists:
>>> l.append(10)
>>> l
[7, 8, 9, 10]
>>> t[1]
5
>>> l[1]
8
If you need an array because you're working with other low-level constructs (such as you would in C), you can use ctypes.
import ctypes
UINT_ARRAY_30 = ctypes.c_uint*30 # create a type of array of uint, length 30
my_array = UINT_ARRAY_30()
my_array[0] = 1
my_array[3] == 0

Convert NumPy array to Python list

How do I convert a NumPy array into a Python List?
Use tolist():
>>> import numpy as np
>>> np.array([[1,2,3],[4,5,6]]).tolist()
[[1, 2, 3], [4, 5, 6]]
Note that this converts the values from whatever numpy type they may have (e.g. np.int32 or np.float32) to the "nearest compatible Python type" (in a list). If you want to preserve the numpy data types, you could call list() on your array instead, and you'll end up with a list of numpy scalars. (Thanks to Mr_and_Mrs_D for pointing that out in a comment.)
c = np.array([[1,2,3],[4,5,6]])
list(c.flatten())
The numpy .tolist method produces nested lists if the numpy array shape is 2D.
if flat lists are desired, the method below works.
import numpy as np
from itertools import chain
a = [1,2,3,4,5,6,7,8,9]
print type(a), len(a), a
npa = np.asarray(a)
print type(npa), npa.shape, "\n", npa
npa = npa.reshape((3, 3))
print type(npa), npa.shape, "\n", npa
a = list(chain.from_iterable(npa))
print type(a), len(a), a`
tolist() works fine even if encountered a nested array, say a pandas DataFrame;
my_list = [0,1,2,3,4,5,4,3,2,1,0]
my_dt = pd.DataFrame(my_list)
new_list = [i[0] for i in my_dt.values.tolist()]
print(type(my_list),type(my_dt),type(new_list))
Another option
c = np.array([[1,2,3],[4,5,6]])
c.ravel()
#>> array([1, 2, 3, 4, 5, 6])
# or
c.ravel().tolist()
#>> [1, 2, 3, 4, 5, 6]
also works.
The easiest way to convert array to a list is using the numpy package:
import numpy as np
#2d array to list
2d_array = np.array([[1,2,3],[8,9,10]])
2d_list = 2d_array.tolist()
To check the data type, you can use the following:
type(object)

Categories

Resources