Python:how to get unique values over 2 different columns?

Python:how to get unique values over 2 different columns? - python

I have a dataframe like the following
df
idA idB yA yB
0 3 2 0 1
1 0 1 0 0
2 0 4 0 1
3 0 2 0 1
4 0 3 0 0
I would like to have a unique y for each id. So
df
id y
0 0 0
1 1 0
2 2 1
3 3 3
4 4 1

First create new DataFrame by flatten columns selected by iloc with numpy.ravel, then sort_values and drop_duplicates by id column:
df2 = (pd.DataFrame({'id':df.iloc[:,:2].values.ravel(),
'y': df.iloc[:,2:4].values.ravel()})
.sort_values('id')
.drop_duplicates(subset=['id'])
.reset_index(drop=True))
print (df2)
id y
0 0 0
1 1 0
2 2 1
3 3 0
4 4 1
Detail:
print (pd.DataFrame({'id':df.iloc[:,:2].values.ravel(),
'y': df.iloc[:,2:4].values.ravel()}))
id y
0 3 0
1 2 1
2 0 0
3 1 0
4 0 0
5 4 1
6 0 0
7 2 1
8 0 0
9 3 0

Related

is there any way to convert the columns in Pandas Dataframe using its mirror image Dataframe structure

the df I have is :
0 1 2
0 0 0 0
1 0 0 1
2 0 1 0
3 0 1 1
4 1 0 0
5 1 0 1
6 1 1 0
7 1 1 1
I wanted to obtain a Dataframe with columns reversed/mirror image :
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1
Is there any way to do that

You can check
df[:] = df.iloc[:,::-1]
df
Out[959]:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1

Here is a bit more verbose, but likely more efficient solution as it doesn't require to rewrite the data. It only renames and reorders the columns:
cols = df.columns
df.columns = df.columns[::-1]
df = df.loc[:,cols]
Or shorter variant:
df = df.iloc[:,::-1].set_axis(df.columns, axis=1)
Output:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1

There are other ways, but here's one solution:
df[df.columns] = df[reversed(df.columns)]
Output:
0 1 2
0 0 0 0
1 1 0 0
2 0 1 0
3 1 1 0
4 0 0 1
5 1 0 1
6 0 1 1
7 1 1 1

replacing the value of one column conditional on two other columns in pandas

I have a data-frame df:
year ID category
1 1 0
2 1 1
3 1 1
4 1 0
1 2 0
2 2 0
3 2 1
4 2 0
I want to create a new column such that: for a particular 'year' if the 'category' is 1, the 'new-category' will be always 1 for the upcoming years:
year ID category new_category
1 1 0 0
2 1 1 1
3 1 1 1
4 1 0 1
1 2 0 0
2 2 0 0
3 2 1 1
4 2 0 1
I have tried if-else condition but I am getting the same 'category' column
for row in range(1,df.category[i-1]):
df['new_category'] = df['category'].replace('0',df['category'].shift(1))
But I am not getting the desired column

TRY:
df['new_category'] = df.groupby('ID')['category'].cummax()
OUTPUT:
year ID category new_category
0 1 1 0 0
1 2 1 1 1
2 3 1 1 1
3 4 1 0 1
4 1 2 0 0
5 2 2 0 0
6 3 2 1 1
7 4 2 0 1

Is there a way to break a pandas column with categories to seperate true or false columns with the category name as the column name

I have a dataframe with the following column:
df = pd.DataFrame({"A": [1,2,1,2,2,2,0,1,0]})
and i want:
df2 = pd.DataFrame({"0": [0,0,0,0,0,0,1,0,1],"1": [1,0,1,0,0,0,0,1,0],"2": [0,1,0,1,1,1,0,0,0]})
is there an elegant way of doing this using a oneliner.
NOTE
I can do this using df['0'] = df['A'].apply(find_zeros)
I dont mind if 'A' is included in the final.

Use get_dummies:
df2 = pd.get_dummies(df.A)
print (df2)
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0

In [50]: df.A.astype(str).str.get_dummies()
Out[50]:
0 1 2
0 0 1 0
1 0 0 1
2 0 1 0
3 0 0 1
4 0 0 1
5 0 0 1
6 1 0 0
7 0 1 0
8 1 0 0

Merge multiple group ids to form a single consolidated group id?

I have following dataset in pandas Dataframe.
group_id sub_group_id
0 0
0 1
1 0
2 0
2 1
2 2
3 0
3 0
But the I want to those group ids and form a consolidated group id
group_id sub_group_id consolidated_group_id
0 0 0
0 1 1
1 0 2
2 0 3
2 1 4
2 2 5
2 2 5
3 0 6
3 0 6
Is there any generic or mathematical way to do it?

cols = ['group_id', 'sub_group_id']
df.assign(
consolidated_group_id=pd.factorize(
pd.Series(list(zip(*df[cols].values.T.tolist())))
)[0]
)
group_id sub_group_id consolidated_group_id
0 0 0 0
1 0 1 1
2 1 0 2
3 2 0 3
4 2 1 4
5 2 2 5
6 3 0 6
7 3 0 6

You need convert values to tuples and then use factorize:
df['consolidated_group_id'] = pd.factorize(df.apply(tuple,axis=1))[0]
print (df)
group_id sub_group_id consolidated_group_id
0 0 0 0
1 0 1 1
2 1 0 2
3 2 0 3
4 2 1 4
5 2 2 5
6 3 0 6
7 3 0 6
Numpy solutions are a bit modify this answer - change ordering by [::-1] with selecting by [0] for return array (numpy.unique):
a = df.values
def unique_return_inverse_2D(a): # a is array
a1D = a.dot(np.append((a.max(0)+1)[:0:-1].cumprod()[::-1],1))
return np.unique(a1D, return_inverse=1)[::-1][0]
def unique_return_inverse_2D_viewbased(a): # a is array
a = np.ascontiguousarray(a)
void_dt = np.dtype((np.void, a.dtype.itemsize * np.prod(a.shape[1:])))
return np.unique(a.view(void_dt).ravel(), return_inverse=1)[::-1][0]
df['consolidated_group_id'] = unique_return_inverse_2D(a)
df['consolidated_group_id1'] = unique_return_inverse_2D_viewbased(a)
print (df)
group_id sub_group_id consolidated_group_id consolidated_group_id1
0 0 0 0 0
1 0 1 1 1
2 1 0 2 2
3 2 0 3 3
4 2 1 4 4
5 2 2 5 5
6 3 0 6 6
7 3 0 6 6

Convert a pandas data frame to a pandas data frame with another style

I have data frame containing the IDs of animals and types they belong to as given below
ID Class
1 1
2 1
3 0
4 4
5 3
6 2
7 1
8 0
I want convert it to a new style with the classes on the header row as follows.
ID 0 1 2 3 4
1 1
2 1
3 1
4 1
5 1
6 1
7 1
8 1
Can you help me to do it with python

See get_dummies():
>>> print df
ID Class
0 1 1
1 2 1
2 3 0
3 4 4
4 5 3
5 6 2
6 7 1
7 8 0
>>> df2 = pd.get_dummies(df, columns=['Class'])
>>> print df2
ID Class_0 Class_1 Class_2 Class_3 Class_4
0 1 0 1 0 0 0
1 2 0 1 0 0 0
2 3 1 0 0 0 0
3 4 0 0 0 0 1
4 5 0 0 0 1 0
5 6 0 0 1 0 0
6 7 0 1 0 0 0
7 8 1 0 0 0 0
And if you want to get rid of "Class_" in the column headers, set both prefix and prefix_sep to the empty string:
df2 = pd.get_dummies(df, columns=['Class'], prefix='', prefix_sep='')

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python:how to get unique values over 2 different columns? - python

I have a dataframe like the following df idA idB yA yB 0 3 2 0 1 1 0 1 0 0 2 0 4 0 1 3 0 2 0 1 4 0 3 0 0 I would like to have a unique y for each id. So df id y 0 0 0 1 1 0 2 2 1 3 3 3 4 4 1

Related

is there any way to convert the columns in Pandas Dataframe using its mirror image Dataframe structure

replacing the value of one column conditional on two other columns in pandas

Is there a way to break a pandas column with categories to seperate true or false columns with the category name as the column name

Merge multiple group ids to form a single consolidated group id?

Convert a pandas data frame to a pandas data frame with another style

Categories

Resources