I have an array (a) that is the shape (1800,144) where a[0:900,:] are all real numbers and the second half of the array a[900:1800,:] are all zeros. I want to take the second half of the array and put it next to the first half horizontally and push them together so that the new array shape (a) will be (900,288) and the array, a, will look like this:
[[1,2,3,......,0,0,0],
[1,2,3,......,0,0,0],
...
]
if that makes sense.
when I try to use np.reshape(a,(900,288)) it doesn't exactly do what I want. It makes the array all real numbers from a[0:450,:] and zeros from a[450:900,:]. I want all of the zeros to be tacked onto the second dimension so that from a[0:900,0:144] is all real numbers and a[0:900,144:288] are all zeros.
Is there an easy way to do this?
You can use numpy.hstack() to concatenate the two arrays:
import numpy as np
np.hstack([a[0:900,], a[900:1800,]])
If you'd like to split the array into more than two sub arrays, you can combine the usage of np.split and np.hstack, as #HM14 has commented:
np.hstack(np.split(a, n)) # assuming len(a) % n == 0 here
sorry, this is too big for a comment, so I will post it here.
If you have a long array and you need to split it and reassemble it, there are other methods that can accomplish this. This example shows how to assemble an equally sized sequence of numbers into a single array.
a = np.arange(100)
>>> b = np.split(a,10)
>>> c = np.c_[b]
>>> c
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
so you can split a sequence easily and reassemble it easily. You could reorder the sequence of stacking if you want. Perhaps that is easier to show in this sequence.
d = np.r_[b[5:],b[:5]].ravel()
>>> d
array([50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67,
68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85,
86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 0, 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39,
40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
This example simply shows that you can take the last five split sequences and throw them into the front of the pile.
It shouldn't take long to figure out that if you have a series of values, even of unequal length, you can place them in a list and reassemble them using np.c_ and np.r_ convenience functions (np.c_ would normally expect equal sized arrays).
So not a solution to your specific case perhaps, but some suggestions on how to reassemble samples in various ways.
Related
I am using the following function to find the subsets of a list L. However, when converting the output of the function powerset into a list it takes way too long. Any suggestion?
For clarification, this powerset function does not output the empty subset and the subset L itself (it is intentional).
My list L:
L = [0, 3, 5, 6, 8, 9, 11, 13, 16, 18, 19, 20, 23, 25, 28, 29, 30, 32, 33, 35, 36, 38, 42, 43, 44, 45, 49, 50, 51, 53, 54, 56, 57, 62, 63, 64, 65, 66, 67, 71, 76, 78, 79, 81, 82, 84, 86, 87, 90, 92, 96, 97, 98, 100, 107]
The code:
def powerset(s):
x = len(s)
masks = [1 << i for i in range(x)]
for i in range(1, (1 << x)-1):
yield [ss for mask, ss in zip(masks, s) if i & mask]
my_Subsets = list(powerset(L)) # <--- THIS TAKES WAY TOO LONG
Your set has 55 elements. Meaning 2^55=36028797018963968 subsets.
There's no way, in any language, any algorithm to make that fast. Because for each subset you need at least one allocation, and that single operation repeated 2^55 times will run forever. For example if we were to run one allocation per nanosecond (in reality this is orders of magnitude slower) we are looking at something over a year (if my calculations are correct). In Python probably 100 years. :P
Not to mention that the final result is unlikely to fit in the entire world's data storage (ram + hard drives) currently available. And definitely not in a single machine's storage. And so final list(...) conversion will fail with 100% probability, even if you wait those years.
Whatever you are trying to achieve (this is likely an XY problem) you are doing it the wrong way.
What you could do is create a class that will behave like a list but would only compute the items as needed and not actually store them:
class Powerset:
def __init__(self,base):
self.base = base
def __len__(self):
return 2**len(self.base)-2 # - 2 you're excluding empty and full sets
def __getitem__(self,index):
if isinstance(index,slice):
return [ self.__getitem__(i) for i in range(len(self))[index] ]
else:
return [ss for bit,ss in enumerate(self.base) if (1<<bit) & (index+1)]
L = [0, 3, 5, 6, 8, 9, 11, 13, 16, 18, 19, 20, 23, 25, 28, 29, 30, 32, 33, 35, 36, 38, 42, 43, 44, 45, 49, 50, 51, 53, 54, 56, 57, 62, 63, 64, 65, 66, 67, 71, 76, 78, 79, 81, 82, 84, 86, 87, 90, 92, 96, 97, 98, 100, 107]
P = Powerset(L)
print(len(P)) # 36028797018963966
print(P[:10]) # [[0], [3], [0, 3], [5], [0, 5], [3, 5], [0, 3, 5], [6], [0, 6], [3, 6]]
print(P[3:6]) # [[5], [0, 5], [3, 5]]
print(P[-3:]) # [[5, 6, 8, 9, 11, 13, 16, 18, 19, 20, 23, 25, 28, 29, 30, 32, 33, 35, 36, 38, 42, 43, 44, 45, 49, 50, 51, 53, 54, 56, 57, 62, 63, 64, 65, 66, 67, 71, 76, 78, 79, 81, 82, 84, 86, 87, 90, 92, 96, 97, 98, 100, 107], [0, 5, 6, 8, 9, 11, 13, 16, 18, 19, 20, 23, 25, 28, 29, 30, 32, 33, 35, 36, 38, 42, 43, 44, 45, 49, 50, 51, 53, 54, 56, 57, 62, 63, 64, 65, 66, 67, 71, 76, 78, 79, 81, 82, 84, 86, 87, 90, 92, 96, 97, 98, 100, 107], [3, 5, 6, 8, 9, 11, 13, 16, 18, 19, 20, 23, 25, 28, 29, 30, 32, 33, 35, 36, 38, 42, 43, 44, 45, 49, 50, 51, 53, 54, 56, 57, 62, 63, 64, 65, 66, 67, 71, 76, 78, 79, 81, 82, 84, 86, 87, 90, 92, 96, 97, 98, 100, 107]]
Obviously, if the next thing you do is a sequential search or traversal of the powerset, it will still take forever.
I am completely stuck. In my original dataframe I have 1 column of interest (fluorescence) and I want to take a fixed amount of elements (=3, color yellow) at fixed interval (5) and average them. The output should be saved into a NewList.
fluorescence = df.iloc[1:20, 0]
fluorescence=pd.to_numeric(fluorescence)
## add a list to count
fluorescence['time']= list(range(1,20,1))
## create a list with interval
interval = list(range(1, 20, 5))
NewList=[]
for i in range(len(fluorescence)):
if fluorescence['time'][i] == interval[i]:
NewList.append(fluorescence[fluorescence.tail(3).mean()])
print(NewList)
Any input is welcome!!
Thank you in advance
Here, I'm taking subset of dataframe for every 5 consecutive iterations and taking tail 3 rows mean
import pandas as pd
fluorescence=pd.DataFrame([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15])
NewList=[]
j=0
for i1 in range(4,len(fluorescence),5):
NewList.append(fluorescence.loc[j:i1,0].tail(3).mean())
j=i1
print(NewList)
If you have a list of data and you want to grab 3 entries out of every 5 you can segment your list as follows:
from statistics import mean
data = [63, 64, 43, 91, 44, 84, 14, 43, 87, 53, 81, 98, 34, 33, 60, 82, 86, 6, 81, 96, 99, 10, 76, 73, 63, 89, 70, 29, 32, 3, 98, 52, 37, 8, 2, 80, 50, 99, 71, 5, 7, 35, 56, 47, 40, 2, 8, 56, 69, 15, 76, 52, 24, 56, 89, 52, 30, 70, 68, 71, 17, 4, 39, 39, 85, 29, 18, 71, 92, 8, 1, 95, 52, 94, 71, 88, 59, 64, 100, 96, 65, 15, 89, 19, 63, 38, 50, 65, 52, 26, 46, 79, 85, 32, 12, 67, 35, 22, 54, 81]
new_data = []
for i in range(0, len(data), 5):
every_five = data[i:i+5]
three_out_of_five = every_five[2:5]
new_data.append(mean(three_out_of_five))
print(new_data)
This question already has an answer here:
Select specific columns in NumPy array using colon notation
(1 answer)
Closed 5 years ago.
I have a numpy array
test_array = np.arange(100).reshape((4,25))
and I want to merge the following cols to form a new array
1:3, 2:4, 3:15, 2:24, 6:8, 12:13
I know this code will work
np.hstack((test_array[:,1:3],test_array[:,2:4],test_array[:,3:15],test_array[:,2:24],test_array[:,6:8],test_array[:,12:13]))
But if there is any better way to avoid copying so many 'test_array', something like:
np.hstack((test_array[:,[1:3 2:4 3:15 2:24 6:8 12:13]]))
You can use np.r_ to create the respective range of indices from your slices. It also accepts multiple slice at once.
In [25]: test_array[:, np.r_[1:3, 2:4, 3:15, 2:24, 6:8, 12:13]]
Out[25]:
array([[ 1, 2, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
20, 21, 22, 23, 6, 7, 12],
[26, 27, 27, 28, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 27,
28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44,
45, 46, 47, 48, 31, 32, 37],
[51, 52, 52, 53, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 52,
53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69,
70, 71, 72, 73, 56, 57, 62],
[76, 77, 77, 78, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 77,
78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94,
95, 96, 97, 98, 81, 82, 87]])
Note that as mentioned in comment using r_ is nicer to read and write but does't avoid copying data. And that's because Advanced Indexing always returns a copy, unlike the regular indexing that returns views from array.
What is the most efficient and reliable way in Python to split sectors up like this:
number: 101 (may vary of course)
chunk1: 1 to 30
chunk2: 31 to 61
chunk3: 62 to 92
chunk4: 93 to 101
Flow:
copy sectors 1 to 30
skip sectors in chunk 1 and copy 30 sectors starting from sector 31.
and so on...
I have this solved in a "manual" way using modules and basic math but there's got to be a function for this?
Thank you.
I assume that you will have number in a list format. So, in this case if you want very specific format of cluster of number sequence and you know where it should separate then using indexing is the best way as it will have less time complexity. So,you can always create a small code and make it a function to use repeatedly. Something like below:
def sectors(num_seq,chunk_size=30):
...: import numpy as np
...: sectors = int(np.ceil(len(num_seq)/float(chunk_size))) #create number of sectors
...: for i in range(sectors):
...: if i < (sectors - 1):
...: print num_seq[(chunk_size*i):(chunk_size*(i+1))] #All will chunk equal size except the last one.
...: else:
...: print num_seq[(chunk_size*i):] #Takes rest at the end.
Now, every time you want similar thing you can reuse it and it is efficient as you are defining list index value instead of searching through it.
Here is the output:
x = range(1,101)
print sectors(x)
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
[31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60]
[61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90]
[91, 92, 93, 94, 95, 96, 97, 98, 99, 100]
Please let me know if this meets your requirement.
Easy and fast(single iteration):
>>> input = range(1, 102)
>>> n = 30
>>> output = [input[i:i+n] for i in range(0, len(input), n)]
>>> output
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]]
Another very simple and comprehensive way:
>>> f = lambda x,y: [ x[i:i+y] for i in range(0,len(x),y)]
>>> f(range(1, 102), 30)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30], [31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60], [61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90], [91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101]]
You can try using numpy.histogram if you're looking to spit a number into equal sized bins (sectors).
This will create an array of numbers, demarcating each bin boundary:
import numpy as np
number = 101
values = np.arange(number, dtype=int)
bins = np.histogram(values, bins='auto')
print(bins)
Is there an efficient Numpy mechanism to retrieve the integer indexes of locations in an array based on a condition is true as opposed to the Boolean mask array?
For example:
x=np.array([range(100,1,-1)])
#generate a mask to find all values that are a power of 2
mask=x&(x-1)==0
#This will tell me those values
print x[mask]
In this case, I'd like to know the indexes i of mask where mask[i]==True. Is it possible to generate these without looping?
Another option:
In [13]: numpy.where(mask)
Out[13]: (array([36, 68, 84, 92, 96, 98]),)
which is the same thing as numpy.where(mask==True).
You should be able to use numpy.nonzero() to find this information.
If you prefer the indexer way, you can convert your boolean list to numpy array:
print x[nd.array(mask)]
np.arange(100,1,-1)
array([100, 99, 98, 97, 96, 95, 94, 93, 92, 91, 90, 89, 88,
87, 86, 85, 84, 83, 82, 81, 80, 79, 78, 77, 76, 75,
74, 73, 72, 71, 70, 69, 68, 67, 66, 65, 64, 63, 62,
61, 60, 59, 58, 57, 56, 55, 54, 53, 52, 51, 50, 49,
48, 47, 46, 45, 44, 43, 42, 41, 40, 39, 38, 37, 36,
35, 34, 33, 32, 31, 30, 29, 28, 27, 26, 25, 24, 23,
22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11, 10,
9, 8, 7, 6, 5, 4, 3, 2])
x=np.arange(100,1,-1)
np.where(x&(x-1) == 0)
(array([36, 68, 84, 92, 96, 98]),)
Now rephrase this like :
x[x&(x-1) == 0]