Python:how to get unique values over 2 different columns? - python

I have a dataframe like the following
df
idA idB yA yB
0 3 2 0 1
1 0 1 0 0
2 0 4 0 1
3 0 2 0 1
4 0 3 0 0
I would like to have a unique y for each id. So
df
id y
0 0 0
1 1 0
2 2 1
3 3 3
4 4 1

First create new DataFrame by flatten columns selected by iloc with numpy.ravel, then sort_values and drop_duplicates by id column:
df2 = (pd.DataFrame({'id':df.iloc[:,:2].values.ravel(),
'y': df.iloc[:,2:4].values.ravel()})
.sort_values('id')
.drop_duplicates(subset=['id'])
.reset_index(drop=True))
print (df2)
id y
0 0 0
1 1 0
2 2 1
3 3 0
4 4 1
Detail:
print (pd.DataFrame({'id':df.iloc[:,:2].values.ravel(),
'y': df.iloc[:,2:4].values.ravel()}))
id y
0 3 0
1 2 1
2 0 0
3 1 0
4 0 0
5 4 1
6 0 0
7 2 1
8 0 0
9 3 0

Related

is there any way to convert the columns in Pandas Dataframe using its mirror image Dataframe structure

the df I have is :
0 1 2
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
I wanted to obtain a Dataframe with columns reversed/mirror image :
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1
Is there any way to do that
You can check
df[:] = df.iloc[:,::-1]
df
Out[959]:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1
Here is a bit more verbose, but likely more efficient solution as it doesn't require to rewrite the data. It only renames and reorders the columns:
cols = df.columns
df.columns = df.columns[::-1]
df = df.loc[:,cols]
Or shorter variant:
df = df.iloc[:,::-1].set_axis(df.columns, axis=1)
Output:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1
There are other ways, but here's one solution:
df[df.columns] = df[reversed(df.columns)]
Output:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1

replacing the value of one column conditional on two other columns in pandas

I have a data-frame df:
year ID category
1 1 0
2 1 1
3 1 1
4 1 0
1 2 0
2 2 0
3 2 1
4 2 0
I want to create a new column such that: for a particular 'year' if the 'category' is 1, the 'new-category' will be always 1 for the upcoming years:
year ID category new_category
1 1 0 0
2 1 1 1
3 1 1 1
4 1 0 1
1 2 0 0
2 2 0 0
3 2 1 1
4 2 0 1
I have tried if-else condition but I am getting the same 'category' column
for row in range(1,df.category[i-1]):
df['new_category'] = df['category'].replace('0',df['category'].shift(1))
But I am not getting the desired column
TRY:
df['new_category'] = df.groupby('ID')['category'].cummax()
OUTPUT:
year ID category new_category
0 1 1 0 0
1 2 1 1 1
2 3 1 1 1
3 4 1 0 1
4 1 2 0 0
5 2 2 0 0
6 3 2 1 1
7 4 2 0 1

Is there a way to break a pandas column with categories to seperate true or false columns with the category name as the column name

I have a dataframe with the following column:
df = pd.DataFrame({"A": [1,2,1,2,2,2,0,1,0]})
and i want:
df2 = pd.DataFrame({"0": [0,0,0,0,0,0,1,0,1],"1": [1,0,1,0,0,0,0,1,0],"2": [0,1,0,1,1,1,0,0,0]})
is there an elegant way of doing this using a oneliner.
NOTE
I can do this using df['0'] = df['A'].apply(find_zeros)
I dont mind if 'A' is included in the final.
Use get_dummies:
df2 = pd.get_dummies(df.A)
print (df2)
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0
In [50]: df.A.astype(str).str.get_dummies()
Out[50]:
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0

Merge multiple group ids to form a single consolidated group id?

I have following dataset in pandas Dataframe.
group_id sub_group_id
0 0
0 1
1 0
2 0
2 1
2 2
3 0
3 0
But the I want to those group ids and form a consolidated group id
group_id sub_group_id consolidated_group_id
0 0 0
0 1 1
1 0 2
2 0 3
2 1 4
2 2 5
2 2 5
3 0 6
3 0 6
Is there any generic or mathematical way to do it?
cols = ['group_id', 'sub_group_id']
df.assign(
consolidated_group_id=pd.factorize(
pd.Series(list(zip(*df[cols].values.T.tolist())))
)[0]
)
group_id sub_group_id consolidated_group_id
0 0 0 0
1 0 1 1
2 1 0 2
3 2 0 3
4 2 1 4
5 2 2 5
6 3 0 6
7 3 0 6
You need convert values to tuples and then use factorize:
df['consolidated_group_id'] = pd.factorize(df.apply(tuple,axis=1))[0]
print (df)
group_id sub_group_id consolidated_group_id
0 0 0 0
1 0 1 1
2 1 0 2
3 2 0 3
4 2 1 4
5 2 2 5
6 3 0 6
7 3 0 6
Numpy solutions are a bit modify this answer - change ordering by [::-1] with selecting by [0] for return array (numpy.unique):
a = df.values
def unique_return_inverse_2D(a): # a is array
a1D = a.dot(np.append((a.max(0)+1)[:0:-1].cumprod()[::-1],1))
return np.unique(a1D, return_inverse=1)[::-1][0]
def unique_return_inverse_2D_viewbased(a): # a is array
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
return np.unique(a.view(void_dt).ravel(), return_inverse=1)[::-1][0]
df['consolidated_group_id'] = unique_return_inverse_2D(a)
df['consolidated_group_id1'] = unique_return_inverse_2D_viewbased(a)
print (df)
group_id sub_group_id consolidated_group_id consolidated_group_id1
0 0 0 0 0
1 0 1 1 1
2 1 0 2 2
3 2 0 3 3
4 2 1 4 4
5 2 2 5 5
6 3 0 6 6
7 3 0 6 6

Convert a pandas data frame to a pandas data frame with another style

I have data frame containing the IDs of animals and types they belong to as given below
ID Class
1 1
2 1
3 0
4 4
5 3
6 2
7 1
8 0
I want convert it to a new style with the classes on the header row as follows.
ID 0 1 2 3 4
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
Can you help me to do it with python
See get_dummies():
>>> print df
ID Class
0 1 1
1 2 1
2 3 0
3 4 4
4 5 3
5 6 2
6 7 1
7 8 0
>>> df2 = pd.get_dummies(df, columns=['Class'])
>>> print df2
ID Class_0 Class_1 Class_2 Class_3 Class_4
0 1 0 1 0 0 0
1 2 0 1 0 0 0
2 3 1 0 0 0 0
3 4 0 0 0 0 1
4 5 0 0 0 1 0
5 6 0 0 1 0 0
6 7 0 1 0 0 0
7 8 1 0 0 0 0
And if you want to get rid of "Class_" in the column headers, set both prefix and prefix_sep to the empty string:
df2 = pd.get_dummies(df, columns=['Class'], prefix='', prefix_sep='')

Categories

Resources