I have a two-dimensional np array and I need to efficiently filter it by values given in a list.
b = np.array([['a', 'b', 'c', 'd'], ['b', 'a', 'c', 'd'], ['c', 'b', 'a', 'd'], ['a', 'd', 'c', 'b']])
values_to_stay_in_b = ['a', 'b']
I found solution with using set difference, but the position in array b is important.
Is any better solution than simple list comprehensions as below:
output = []
for l in b:
output.append([a for a in l if a in values_to_stay_in_b ])
np.array(output)
Result:
array([['a', 'b'],
['b', 'a'],
['b', 'a'],
['a', 'b']], dtype='<U1')
Using pure numpy, so atleast removing for loops.
For each entry in b, compare it with each entry in values_to_stay_in_b and get a mask list. This is done adding extra axis and comapring using broadcasting
Any one in this need to be true.
Since you clarify that after filtering each row has same columns, I reshape it based on number of rows
b[(b[..., None] == values_to_stay_in_b).any(axis=2)].reshape(b.shape[0], -1)
I have a df with columns a-h, and I wish to create a list of these column values, but in the order of values in another list (list1). list1 corresponds to the index value in df.
df
a b c d e f g h
list1
[3,1,0,5,2,7,4,6]
Desired list
['d', 'b', 'a', 'f', 'c', 'h', 'e', 'g']
You can just do df.columns[list1]:
import pandas as pd
df = pd.DataFrame([], columns=list('abcdefgh'))
list1 = [3,1,0,5,2,7,4,6]
print(df.columns[list1])
# Index(['d', 'b', 'a', 'f', 'c', 'h', 'e', 'g'], dtype='object')
First get a np.array of alphabets
arr = np.array(list('abcdefgh'))
Or in your case, a list of your df columns
arr = np.array(df.columns)
Then use your indices as a indexing mask
arr[[3,1,0]]
out:
['d', 'b', 'a']
Check
df.columns.to_series()[list1].tolist()
I have seen similar questions to mine, but nothing I researched really fixed my issue.
So, basically I want to split a list, in order to remove some items and concatenate it back. Those items correspond to indexes that are given by a list of tuples.
import numpy as np
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
print ([list1[0:s] +list1[s+1:e] for s,e in indices])
#Returns: [['x', 'y', 'z', 'a'], ['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f']]
This code I have, which I got from one of the answers from this post nearly does what I need, but I tried to adapt it to loop over the first index of indices once but instead it does twice and it doesn't include the rest of the list.
I want my final list to split from zero index to the first item on first tuple and so on, using a for loop or some iterator.
Something like this,
`final_arr = arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:]<br/>
#Returns: [['x','y','a','b','c','f','g',2,3,4]]`
If someone could do it using for loops, it would be easier for me to see how you understand the problem, then after I can try to adapt to using shorter code.
Sort the indices using sorted and del the slices. You need reverse=True otherwise the indices of the later slices are incorrect.
for x, y in sorted(indices, reverse=True):
del(arr[x:y])
print(arr)
>>> ['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
This is the same result as you get with
print(arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:])
>>> ['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
import itertools
ignore = set(itertools.chain.from_iterable(map(lambda i: range(*i), indices)))
out = [c for idx, c in enumerate(arr) if idx not in ignore]
print(out)
print(arr[0:indices[0][0]] + arr[indices[0][1]:indices[1][0]] + arr[indices[1][1]:])
Output,
['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
['x', 'y', 'b', 'c', 'd', 'g', 2, 3, 4]
Like this:
import numpy as np
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(7,9)] #INDEXES THAT NEED TO BE CUT OUT
print ([v for t in indices for i,v in enumerate(arr) if i not in range(t[0],t[1])])
Output:
['x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 2, 3, 4, 'x', 'y', 'z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 2, 3, 4]
1- If you can remove the list items:
I using the example for JimithyPicker. I change the index list (removed items), because always that one index was removed the size of list change.
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [2,5,5] #INDEXES THAT NEED TO BE CUT OUT
for index in indices:
arr.pop(index)
final_arr = [arr]
print(final_arr)
Output:
[['x', 'y', 'a', 'b', 'c', 'f', 'g', 2, 3, 4]]
2- If you can't remove items:
In this case is necessary change the second index! The number doesn't match with output that you want.
The indices = [(2,4),(7,9)] has the output: ['x', 'y', 'a', 'b', 'c', 'd', 'f', 'g', 2, 3, 4]
arr = ['x','y','z','a','b','c','d','e','f','g',2,3,4]
indices = [(2,4),(6,9)] #INDEXES THAT NEED TO BE CUT OUT
final_arr = arr[0:indices[0][0]] + arr[indices[0][1]-1:indices[1][0]] + arr[indices[1][1]-1:]
print(final_arr)
Output:
['x','y','a','b','c','f','g',2,3,4]
Is there any pandas method to unfactor a dataframe column? I could not find any in the documentation, but was expecting something similar to unfactor in R language.
I managed to come up with the following code, for reconstructing the column (assuming none of the column values are missing), by using the labels array values as indices of uniques.
orig_col = ['b', 'b', 'a', 'c', 'b']
labels, uniques = pd.factorize(orig_col)
recon_col = np.array([uniques[label] for label in labels]).tolist()
orig_col == recon_col
orig_col = ['b', 'b', 'a', 'c', 'b']
labels, uniques = pd.factorize(orig_col)
# To get original list back
uniques[labels]
# array(['b', 'b', 'a', 'c', 'b'], dtype=object)
Yes we can do it via np.vectorize and create the dict
np.vectorize(dict(zip(range(len(uniques)),uniques)).get)(labels)
array(['b', 'b', 'a', 'c', 'b'], dtype='<U1')
Say I have 3 different items being A, B and C. I want to create a combined list containing NA copies of A, NB copies of B and NC copies of C in random orders. So the results should look like this:
finalList = [A, C, A, A, B, C, A, C,...]
Is there a clean way to get around this using np.random.rand Pythonically? If not, any other packages besides numpy?
I don't think you need numpy for that. You can use the random builtin package:
import random
na = nb = nc = 5
l = ['A'] * na + ['B'] *nb + ['C'] * nc
random.shuffle(l)
list l will look something like:
['A', 'C', 'A', 'B', 'C', 'A', 'C', 'B', 'B', 'B', 'A', 'C', 'B', 'C', 'A']
You can define a list of tuples. Each tuple should contain a character and desired frequency. Then you can create a list where each element is repeated with specified frequency and finally shuffle it using random.shuffle
>>> import random
>>> l = [('A',3),('B',5),('C',10)]
>>> a = [val for val, freq in l for i in range(freq)]
>>> random.shuffle(a)
>>> ['A', 'B', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'A', 'C', 'B', 'C']
Yes, this is very much possible (and simple) with numpy. You'll have to create an array with your unique elements, repeat each element a specified number of times using np.repeat (using an axis argument makes this possible), and then shuffle with np.random.shuffle.
Here's an example with NA as 1, NB as 2, and NC as 3.
a = np.array([['A', 'B', 'C']]).repeat([1, 2, 3], axis=1).squeeze()
np.random.shuffle(a)
print(a)
array(['B', 'C', 'A', 'C', 'B', 'C'],
dtype='<U1')
Note that it is simpler to use numpy, specifying an array of unique elements and repeats, versus a pure python implementation when you have a large number of unique elements to repeat.