obtain values in dataframe instead of boolean - python

I have a dataframe of calculated distances as following,
x_y_data = pd.read_csv("x_y_points400_labeled_20pnts_csv.csv")
x = x_y_data.loc[:,'x']
y = x_y_data.loc[:,'y']
xs=x.to_numpy()
ys=y.to_numpy()
result = pd.DataFrame(np.sqrt((xs[:, None] - xs)**2 + (ys[:, None] - ys)**2))
i get the results for all distances,
0 1 2 ... 10 11 12
0 0.000000 16.132750 33.039985 ... 17.628989 27.273213 20.898938
1 16.132750 0.000000 16.912458 ... 16.658800 17.480346 25.375308
2 33.039985 16.912458 0.000000 ... 27.985766 19.625398 37.343842
3 10.140420 25.301309 41.896450 ... 20.173079 32.241763 18.523634
4 9.368331 9.228014 25.210365 ... 10.518585 18.039020 17.464249
now when I want to obtain only the values of the dataframe that are less than 12
(by simply adding result2=result<12) I obtain the table of boolean,
result2:
0 1 2 3 4 ... 8 9 10 11 12
0 True False False True True ... False False False False False
1 False True False False True ... False False False False False
2 False False True False False ... True False False False False
3 True False False True False ... False True False False False
4 True True False False True ... False False True False False
where I want just the values that are less than 12 and not equal to zero. can you please help?

Please Try
result[result < 12].fillna('Morethan12')
or
result[result < 12].unstack().fillna('Morethan12')

Just give condition to show values based on 0<result2<12

Related

How to satisfy the condition of 2 columns different rows at the same time

My logic is like this:
cond2 column is true before expected column, and cond1 column is true before cond2 column, then expected column can be true
input
import pandas as pd
import numpy as np
d={'cond1':[False,False,True,False,False,False,False,True,False,False],'cond2':[False,True,False,True,True,False,False,False,True,False]}
df = pd.DataFrame(d)
expected result table
cond1 cond2 expected
0 FALSE FALSE
1 FALSE TRUE
2 TRUE FALSE
3 FALSE TRUE
4 FALSE TRUE
5 FALSE FALSE TRUE
6 FALSE FALSE TRUE
7 TRUE FALSE
8 FALSE TRUE
9 FALSE FALSE TRUE
I have such an idea
get the number of lines from cond1 is true to the present, and then use the cumsum function to calculate the number of lines where cond2 is true is greater than 0
But how to get the number of lines from cond1 is true to the present
The description is not fully clear. It looks like you need a cummax per group starting with True in cond1:
m = df.groupby(df['cond1'].cumsum())['cond2'].cummax()
df['expected'] = df['cond2'].ne(m)
Output:
cond1 cond2 expected
0 False False False
1 False True False
2 True False False
3 False True False
4 False True False
5 False False True
6 False False True
7 True False False
8 False True False
9 False False True
It's not very clear what you're looking for~
df['expected'] = ((df.index > df.idxmax().max())
& ~df.any(axis=1))
# Output:
cond1 cond2 expected
0 False False False
1 False True False
2 True False False
3 False True False
4 False True False
5 False False True
6 False False True
7 True False False
8 False True False
9 False False True

Selecting Dataframe row from a condition to another condition

I have a dataframe with two columns:
A B
0 False False
1 False False
2 False False
3 True False
4 False False
5 False False
6 False True
7 False False
8 False False
9 False False
10 True False
11 False False
12 False False
I would like to create a new column "C" with Boolean values, that turns on (=True) each time B turns on and turns of each time A turns on (ex: here between index 6 to index 10)
Ex: for this df, the output will be:
A B C
0 False False False
1 False False False
2 False False False
3 True False False
4 False False False
5 False False False
6 False True True
7 False False True
8 False False True
9 False False True
10 True False True
11 False False False
12 False False False
I wrote this code with a for loop and a "switch", but I'm pretty sure there will be faster and easier solution to do the same thing for large dataframes. I appreciate your help.
import pandas as pd
import numpy as np
df = pd.DataFrame({
'A': [False,False,False,True,False,False,False,False,False,False,True,False,False],
'B': [False,False,False,False,False,False,True,False,False,False,False,False,False]
})
df["C"]=0
switch=False
for i in df.index :
if df.B.iloc[i]:
switch=True
if switch:
df.C.iloc[i]=True
else:
df.C.iloc[i]=False
if df.A.iloc[i]:
switch=False
print(df)
Alternative approach using ffill
df.loc[df['A'],'C'] = False
df.loc[df['B'],'C'] = True
df['C'] = df['C'].ffill().fillna(False) #start "off"
Combine the two columns, subtract 1, filter out negative and even numbers:
x = (df['A'] | df['B']).cumsum().sub(1)
df['C'] = (x >= 0) & (x % 2 == 1)
Output:
>>> df
A B C
0 False False False
1 False False False
2 False False False
3 True False False
4 False False False
5 False False False
6 False True True <
7 False False True <
8 False False True <
9 False False True <
10 True False False
11 False False False
12 False False False

Is there a way to select interior True values for portions of a DataFrame?

I have a DataFrame that looks like the following:
df = pd.DataFrame({'a':[True]*5+[False]*5+[True]*5,'b':[False]+[True]*3+[False]+[True]*5+[False]*4+[True]})
a b
0 True False
1 True True
2 True True
3 True True
4 True False
5 False True
6 False True
7 False True
8 False True
9 False True
10 True False
11 True False
12 True False
13 True False
14 True False
How can I select blocks where column a is True only when the interior values over the same rows for column b are True?
I know that I could find break apart the DataFrame into consecutive True regions, and apply a function to each DataFrame chunk, but this is for a much larger problem with 10 million+ rows, and I don't think such a solution would scale up very well.
My expected output would be the following:
a b c
0 True False True
1 True True True
2 True True True
3 True True True
4 True False True
5 False True False
6 False True False
7 False True False
8 False True False
9 False True False
10 True False False
11 True False False
12 True False False
13 True False False
14 True True False
You can do a groupby on the a values and then look at the b values in a function, like this:
groupby_consec_a = df.groupby(df.a.diff().ne(0).cumsum())
all_interior = lambda x: x.iloc[1:-1].all()
df['c'] = df.a & groupby_consec_a.b.transform(all_interior)
Try out whether it's fast enough on your data. If not, the lambda will have to be replaced by pandas functions, but that will be more code.

Pandas get one hot encodings from a column as booleans

I'm considering a Pandas Dataframe. I would like to find an efficient way in which the second Dataframe is created.
import pandas as pd
data = {"column":[0,1,2,0,1,2,0]}
df = pd.DataFrame(data)
column
0
1
2
0
1
2
0
column0 column1 column2
true false false
false true false
false false true
true false false
false true false
false false true
true false false
This is a get_dummies problem, but you will additionally need to specify dtype=bool to get columns of bools:
pd.get_dummies(df['column'], dtype=bool)
0 1 2
0 True False False
1 False True False
2 False False True
3 True False False
4 False True False
5 False False True
6 True False False
pd.get_dummies(df['column'], dtype=bool).dtypes
0 bool
1 bool
2 bool
dtype: object
# carbon copy of expected output
dummies = pd.get_dummies(df['column'], dtype=bool)
dummies[:] = np.where(pd.get_dummies(df['column'], dtype=bool), 'true', 'false')
dummies.add_prefix('column')
column0 column1 column2
0 true false false
1 false true false
2 false false true
3 true false false
4 false true false
5 false false true
6 true false false
I also use get_dummies as cs95. However, I use str.get_dummies and concat the word column before get_dummies. Finally, replace
('column'+df.column.astype(str)).str.get_dummies().replace({1:'true', 0:'false'})
Out[2164]:
column0 column1 column2
0 true false false
1 false true false
2 false false true
3 true false false
4 false true false
5 false false true
6 true false false
factorize and slice assignment
i, u = pd.factorize(df.column)
a = np.empty((len(i), len(u)), '<U5')
a.fill('false')
a[np.arange(len(i)), i] = 'true'
pd.DataFrame(a).add_prefix('column')
column0 column1 column2
0 true false false
1 false true false
2 false false true
3 true false false
4 false true false
5 false false true
6 true false false

Groupby with boolean condition True in one of the columns in Pandas

This is my dataframe which I want to use groupby
Value Boolean1 Boolean2
5.175603 False False
5.415855 False False
5.046997 False False
4.607749 True False
5.140482 False False
1.796552 False False
0.139924 False True
4.157981 False True
4.893860 False False
5.091573 False False
6 True False
6.05 False False
I want to use groupby with Boolean1 and Boolean2 column. The groupby continues from False to unless it finds True and it checks in both column and then next False to True again. If there is nomore True, then it can ignore rest of the False (values corresponding to it) or it can be there
I want to achieve similar to this.
Value Boolean1 Boolean2
This is one group
5.175603 False False
5.415855 False False
5.046997 False False
4.607749 True False
This is another one
5.140482 False False
1.796552 False False
0.139924 False True
4.157981 False True
And this is another one
4.893860 False False
5.091573 False False
6 True False
My idea is check Falses in both columns before at least one True column:
#chain condition together by OR and invert
m = ~(df['Boolean1'] | df['Boolean2'])
#get consecutive groups with AND for filter only Trues
#(because inverting, it return False in both cols)
s = (m.ne(m.shift()) & m).cumsum()
for i, x in df.groupby(s):
print (x)
dtype: int32
Value Boolean1 Boolean2
0 5.175603 False False
1 5.415855 False False
2 5.046997 False False
3 4.607749 True False
Value Boolean1 Boolean2
4 5.140482 False False
5 1.796552 False False
6 0.139924 False True
7 4.157981 False True
Value Boolean1 Boolean2
8 4.893860 False False
9 5.091573 False False
10 6.000000 True False
Value Boolean1 Boolean2
11 6.05 False False
Detail:
print (m)
0 True
1 True
2 True
3 False
4 True
5 True
6 False
7 False
8 True
9 True
10 False
11 True
dtype: bool
print (s)
0 1
1 1
2 1
3 1
4 2
5 2
6 2
7 2
8 3
9 3
10 3
11 4
dtype: int32

Categories

Resources