Keep a single element in dataframe of lists - python

Given the following dataframe:
Movement Distance Speed Delay Loss
0 [1, 1] [1, 1] [25, 25] [0, 0] [0, 0]
1 [1, 1] [1, 1] [25, 25] [0, 0] [0, 0]
2 [1, 1] [1, 1] [25, 25] [0, 0] [0, 0]
3 [1, 1] [1, 1] [25, 25] [0, 0] [0, 0]
4 [1, 1] [1, 1] [25, 25] [0, 0] [0, 0]
How can I remove all but the first element in each column and then unlist so the dataframe becomes like this:
Movement Distance Speed Delay Loss
0 1 1 25 0 0
1 1 1 25 0 0
2 1 1 25 0 0
3 1 1 25 0 0
4 1 1 25 0 0

You can apply with str.slice or indexing equivalently as:
df.apply(lambda x: pd.to_numeric(x.str[0], downcast='integer', errors='ignore'))
Or if the data is already clean, we have convert_dtypes new in pandas 1.0 (thanks cs95):
df.apply(lambda x: x.str[0]).convert_dtypes()
Movement Distance Speed Delay Loss
0 1 1 25 0 0
1 1 1 25 0 0
2 1 1 25 0 0
3 1 1 25 0 0
4 1 1 25 0 0

Related

Is It possible to switch rows or columns between numpy arrays?

I have the following numpy arrays
[[[0 0 1 0 0]
[1 0 0 0 0]
[0 0 1 0 0]]
[[1 0 0 0 0]
[0 0 1 0 0]
[0 0 0 1 0]]]
am trying to switch rows between them, 1 row 2 rows it doesn't matter am trying to see if it's possible.
The output can be for the 1st row or 2nd row or 2 first rows respectively:
[[[0 0 1 0 0] [[[0 0 1 0 0] [[[1 0 0 0 0]x
[1 0 0 0 0] [0 0 1 0 0]x [0 0 1 0 0]x
[0 0 0 1 0]]x [0 0 1 0 0]] [0 0 1 0 0]]
[[1 0 0 0 0] [[1 0 0 0 0] [[0 0 1 0 0]x
[0 0 1 0 0] [1 0 0 0 0]x [1 0 0 0 0]x
[0 0 1 0 0]]]x [0 0 0 1 0]]] [0 0 0 1 0]]]
Is it possible? If so How?
You can switch values like rows on NumPy arrays with Python variable swap operator:
import numpy as np
m = np.array([[0, 0, 1, 0, 0],
[1, 0 ,0, 0, 0],
[0, 0, 1, 0, 0]])
n = np.array([[1, 0, 0, 0, 0],
[0, 0, 1, 0, 0],
[0, 0, 0, 1, 0]])
#m[:, 0], n[:, 0] = n[:, 0].copy(), m[:, 0].copy() #Only for columns
m[0], n[0] = n[0].copy(), m[0].copy() #For rows
print(m, n)
Output:
[[1 0 0 0 0]
[1 0 0 0 0]
[0 0 1 0 0]]
[[0 0 1 0 0]
[0 0 1 0 0]
[0 0 0 1 0]]

Numpy way to integer-mask an array

I have a multi-class segmentation mask
eg.
[1 1 1 2 2 2 2 3 3 3 3 3 3 2 2 2 2 4 4 4 4 4 4]
And going to need to get binary segmentation masks for each value
i.e.
[1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1]
Any elegant numpy way to do this?
Ideally an example, where I can set 0 and 1 to other values, if I have to.
Just do "==" as this
import numpy as np
a = np.array([1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4])
mask1 = (a==1)*5
mask2 = (a==2)*5
mask3 = (a==3)*5
mask4 = (a==4)*5
for mask in [mask1,mask2,mask3,mask4]:
print(mask)
This gives
[5 5 5 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 5 5 5 5 0 0 0 0 0 0 5 5 5 5 0 0 0 0 0 0]
[0 0 0 0 0 0 0 5 5 5 5 5 5 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 5 5 5 5 5]
You can manipulate the masks further in the same manner, i. e.
mask1[mask1==0] = 3
Native python approach:
You can use comprehension and get the equality values for each unique value using set(<sequence>), then convert the boolean to int to get 0,1 values.
>>> ls =[1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4]
>>> {v:[int(v==i) for i in ls] for v in set(ls)}
{1: [1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
2: [0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
3: [0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
4: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1]}
Numpy approach:
Get the unique values for the list using np.unique then expand the axis and transpose the array, then expand the axis for the list also and repeat it n times where n is the number of unique values, finally do the equality comparison and convert it to integer type:
import numpy as np
ls = [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4]
uniques = np.expand_dims(np.unique(ls), 0).T
result = (np.repeat(np.expand_dims(ls, 0), uniques.shape[0], 0)==uniques).astype(int)
OUTPUT:
print(result)
[[1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
[0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1]]
You can build the mask using np.arange and .repeat() and then use broadcasting and the == operator to generate the arrays:
a = np.array([1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2, 4, 4, 4, 4, 4, 4])
mask = np.arange(a.min(), a.max()+1).repeat(a.size).reshape(-1, a.size)
a_masked = (a == m).astype(int)
print(a_masked.shape) # (4, 23)
print(a_masked)
# [[1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1]]
Setting 0 and 1 to other values can be done via normal indexing:
a_masked[a_masked == 0] = 7
a_masked[a_masked == 1] = 42

Iteration over a Pandas DataFrame to extract data

I have a DataFrame that contains hour intervals in the columns, and employee ID's in rows.
I want to iterate over each column(hourly interval) and extract it to a list ONLY if the column contains the number 1 (1 means they are available in that hour , 0 means they are not)
I've tried iterrows() and iteritems() and neither are giving me what I want to see from this DataFrame
Which is a new list called
available = [0800, 0900, 1000, 1100]
Which I can then extract the min and max values to create a schedule.
Apologies if this is somewhat vague Im pretty new to Python 3 and Pandas
You don't need to iterate
Suppose you have a dataframe like this
0 1 2 3 4 5 6 7 8 9
0 0 0 0 0 0 1 0 1 1 0
1 1 0 1 1 1 1 1 1 0 1
2 1 1 1 0 0 0 0 0 0 0
3 0 1 1 0 1 1 0 0 1 1
4 1 0 1 0 1 0 1 0 0 0
5 0 1 1 0 0 0 0 0 0 0
6 1 0 0 0 1 1 1 1 0 0
7 0 1 0 1 0 1 1 1 1 1
8 0 0 1 0 1 1 1 0 0 0
9 1 0 0 1 0 0 1 1 1 1
You can just use this code to get the column names of all the columns where value is 1
df['available'] = df.apply(lambda row: row[row == 1].index.tolist(), axis=1)
0 1 2 3 4 5 6 7 8 9 available
0 0 0 0 0 0 1 0 1 1 0 [5, 7, 8]
1 1 0 1 1 1 1 1 1 0 1 [0, 2, 3, 4, 5, 6, 7, 9]
2 1 1 1 0 0 0 0 0 0 0 [0, 1, 2]
3 0 1 1 0 1 1 0 0 1 1 [1, 2, 4, 5, 8, 9]
4 1 0 1 0 1 0 1 0 0 0 [0, 2, 4, 6]
5 0 1 1 0 0 0 0 0 0 0 [1, 2]
6 1 0 0 0 1 1 1 1 0 0 [0, 4, 5, 6, 7]
7 0 1 0 1 0 1 1 1 1 1 [1, 3, 5, 6, 7, 8, 9]
8 0 0 1 0 1 1 1 0 0 0 [2, 4, 5, 6]
9 1 0 0 1 0 0 1 1 1 1 [0, 3, 6, 7, 8, 9]
And if you want mix/max from this you can use
df['min_max'] = df['available'].apply(lambda x: (min(x), max(x)))
available min_max
0 [5, 7, 8] (5, 8)
1 [0, 2, 3, 4, 5, 6, 7, 9] (0, 9)
2 [0, 1, 2] (0, 2)
3 [1, 2, 4, 5, 8, 9] (1, 9)
4 [0, 2, 4, 6] (0, 6)
5 [1, 2] (1, 2)
6 [0, 4, 5, 6, 7] (0, 7)
7 [1, 3, 5, 6, 7, 8, 9] (1, 9)
8 [2, 4, 5, 6] (2, 6)
9 [0, 3, 6, 7, 8, 9] (0, 9)
You can simply do
available = df.columns[df.T.any(axis=1)].tolist()
In general it is not advisable to iterate over Pandas DataFrames unless they are small, as AFAIK this does not use vectorized functions and is thus slower.
Can you show the rest of your code?
Assuming only 0s and 1s are in the dataframe the following conditional selection should work (if I'm correctly interpreting what you want; it seems more likely that you want what
Shubham Periwal posted):
filtered_df = df[df != 0]
lists = filtered_df.values.tolist()
Or in 1 line:
lists = df[df != 0].values.tolist()

Is it possible to turn a 3D array to coordinate system?

Is it possible to take a 3D array and and turn it into a coordinate system? My array consists of 0s and 1s. If the value is 1 I want to take the xyz coordinate. In the end I want to output all coordinates to a csv file.
import nibabel as nib
coord = []
img = nib.load('test.nii').get_fdata().astype(int)
test.nii array:
[[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 1 ... 1 1 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 1 1 1]
[0 1 0 ... 0 0 0]]
[[1 0 0 ... 0 0 0]
[0 0 1 ... 0 0 0]
[0 1 0 ... 0 0 0]
...
[0 1 0 ... 0 0 0]
[0 1 0 ... 0 0 0]
[0 0 0 ... 1 0 0]]
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 1 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 1 0 ... 0 1 1]]
...
[[0 0 0 ... 1 0 0]
[0 0 1 ... 0 0 0]
[0 0 1 ... 0 0 0]
...
[0 0 0 ... 1 0 0]
[0 0 0 ... 1 0 0]
[0 0 0 ... 1 0 0]]
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 1]
...
[0 1 0 ... 0 0 0]
[1 0 0 ... 0 0 0]
[1 0 0 ... 0 0 0]]
[[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 0 0]
...
[0 0 0 ... 0 0 0]
[0 0 0 ... 0 1 0]
[0 1 0 ... 0 0 0]]]
That might not necessarily be the best solution, but let's keep it simple (would be great if framework did that for us, but...well):
data = [[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 1, 1, 1, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1],
[0, 1, 0, 0, 0, 0]],
[[1, 0, 0, 0, 0, 0],
[0, 0, 1, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 1, 0, 0, 0, 0],
[0, 0, 0, 1, 0, 0]]]
for x in range(len(data)):
for y in range(len(data[x])):
for z in range(len(data[x][y])):
if data[x][y][z] == 1:
print(f"{x} {y} {z}")
yields:
0 2 2
0 2 3
0 2 4
0 4 3
0 4 4
0 4 5
0 5 1
1 0 0
1 1 2
1 2 1
1 3 1
1 4 1
1 5 3
Using np.where() you can get the row, col and depth index of elements that satisfy you condition.
Try this:
row_idx, col_idx, depth_idx = np.where(img==1)

python Input column values from lists

Consider I have the following data.
import pandas as pd
age = [[1,2,3],[2,1],[4,2,3,1],[2,1,3]]
frame = {'age': age }
result = pd.DataFrame(frame)
ver=pd.DataFrame(result.age.values.tolist(), index= result.index)
listado=pd.unique(ver.values.ravel('K'))
cleanedList = [x for x in listado if str(x) != 'nan']
for col in cleanedList:
result[col] = 0
#Return values
age 1.0 2.0 4.0 3.0
[1, 2, 3] 0 0 0 0
[2, 1] 0 0 0 0
[4, 2, 3, 1] 0 0 0 0
[2, 1, 3] 0 0 0 0
How can I impute 1 in the columns corresponding to each list in the age column. So final output would be:
age 1.0 2.0 4.0 3.0
[1, 2, 3] 1 1 0 1
[2, 1] 1 1 0 0
[4, 2, 3, 1] 1 1 1 1
[2, 1, 3] 1 1 1 0
Consider that the amount of elements in the age column is dynamic (as an example I put 4 numbers, but in reality they can be many more).
Check with sklearn
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
s=pd.DataFrame(mlb.fit_transform(result['age']),columns=mlb.classes_, index=result.index)
s
1 2 3 4
0 1 1 1 0
1 1 1 0 0
2 1 1 1 1
3 1 1 1 0
#df = df.join(s)

Categories

Resources