I have a sample dataset that has a set list of column names. In shifting data around, I have each row printing letters in each row as seen below.
I am trying to shift the values of each row to match either respective column. I have tried doing pd.shift() to do so but have not had much success. I am trying to get what is seen below. Any thoughts?
import pandas as pd
df = pd.DataFrame({'A': list('AAAAA'),
'B': list('CBBDE'),
'C': list('DDCEG'),
'D': list('EEDF '),
'E': list('FFE '),
'F': list('GGF '),
'G': list(' G ')})
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
After:
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
Here's a broadcasted comparison approach. This will be quite fast, but does have a higher memory complexity.
a = df.to_numpy()
b = df.columns.to_numpy()
pd.DataFrame(np.equal.outer(a, b).any(1) * b, columns=b)
A B C D E F G
0 A C D E F G
1 A B D E F G
2 A B C D E F G
3 A D E F
4 A E G
This is more list pivot problem
s=df.mask(df=='').stack().reset_index()
s.pivot(index='level_0',columns=0,values=0)
Out[34]:
0 A B C D
level_0
0 A B C D
1 A NaN C NaN
2 A NaN C D
Here's one way with stack, merge, pivot:
new_df = df.stack().reset_index()
(new_df.merge(new_df, left_on=['level_0', 'level_1'],
right_on=['level_0',0],
how='left')
.pivot('level_0', 'level_1', '0_y')
)
Output:
level_1 A B C D E F G
level_0
0 A NaN C D E F G
1 A B NaN D E F G
2 A B C D E F G
3 A NaN NaN D E F NaN
4 A NaN NaN NaN E NaN G
Related
I have a DataFrame that looks like this:
A B C D E
1 a a a a a
2 b b b b b
3 c c c c c
4 d d d d d
5 e e e e e
6 f f f f f
Anyone knows how to reorder it using Pandas to make it look like this:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f
I tried reading the documentation https://pandas.pydata.org/docs/user_guide/reshaping.html. Its difficult to understand as a beginner, appreciate any help.
Assuming that you want to group every two rows in a single row, use the underlying numpy array and reshape it:
from string import ascii_uppercase
out = (pd.DataFrame(df.to_numpy().reshape(len(df)//2, -1),
index=df.index[::2])
.rename(columns=dict(enumerate(ascii_uppercase)))
)
Output:
A B C D E F G H I J
1 a a a a a b b b b b
3 c c c c c d d d d d
5 e e e e e f f f f f
I have a pandas dataframe like this below:
A B C
a b c
d e f
where A B and C are column names. Now i have a list:
mylist = [1,2,3]
I want to replace the c in column C with list such as dataframe expands for all value of list, like below:
A B C
a b 1
a b 2
a b 3
d e f
Any help would be appreciated!
I tried this,
mylist = [1,2,3]
x=pd.DataFrame({'mylist':mylist})
x['C']='c'
res= pd.merge(df,x,on=['C'],how='left')
res['mylist']=res['mylist'].fillna(res['C'])
For further,
del res['C']
res.rename(columns={"mylist":"C"},inplace=True)
print res
Output:
A B C
0 a b 1
1 a b 2
2 a b 3
3 d e f
You can use:
print (df)
A B C
0 a b c
1 d e f
2 a b c
3 t e w
mylist = [1,2,3]
idx1 = df.index[df.C == 'c']
df = df.loc[idx1.repeat(len(mylist))].assign(C=mylist * len(idx1)).append(df[df.C != 'c'])
print (df)
A B C
0 a b 1
0 a b 2
0 a b 3
2 a b 1
2 a b 2
2 a b 3
1 d e f
3 t e w
I am trying to convert a dataframe to long form.
The dataframe I am starting with:
df = pd.DataFrame([['a', 'b'],
['d', 'e'],
['f', 'g', 'h'],
['q', 'r', 'e', 't']])
df = df.rename(columns={0: "Key"})
Key 1 2 3
0 a b None None
1 d e None None
2 f g h None
3 q r e t
The number of columns is not specified, there may be more than 4. There should be a new row for each value after the key
This gets what I need, however, it seems there should be a way to do this without having to drop null values:
new_df = pd.melt(df, id_vars=['Key'])[['Key', 'value']]
new_df = new_df.dropna()
Key value
0 a b
1 d e
2 f g
3 q r
6 f h
7 q e
11 q t​
Option 1
You should be able to do this with set_index + stack:
df.set_index('Key').stack().reset_index(level=0, name='value').reset_index(drop=True)
Key value
0 a b
1 d e
2 f g
3 f h
4 q r
5 q s
6 q t
If you don't want to keep resetting the index, then use an intermediate variable and create a new DataFrame:
v = df.set_index('Key').stack()
pd.DataFrame({'Key' : v.index.get_level_values(0), 'value' : v.values})
Key value
0 a b
1 d e
2 f g
3 f h
4 q r
5 q s
6 q t
The essence here is that stack automatically gets rid of NaNs by default (you can disable that by setting dropna=False).
Option 2
More performance with np.repeat and numpy's version of pd.DataFrame.stack:
i = df.pop('Key').values
j = df.values.ravel()
pd.DataFrame({'Key' : v.repeat(df.count(axis=1)), 'value' : j[pd.notnull(j)]
})
Key value
0 a b
1 d e
2 f g
3 f h
4 q r
5 q s
6 q t
By using melt(I do not think dropna create more 'trouble' here)
df.melt('Key').dropna().drop('variable',1)
Out[809]:
Key value
0 a b
1 d e
2 f g
3 q r
6 f h
7 q s
11 q t
And if without dropna
s=df.fillna('').set_index('Key').sum(1).apply(list)
pd.DataFrame({'Key': s.reindex(s.index.repeat(s.str.len())).index,'value':s.sum()})
Out[862]:
Key value
0 a b
1 d e
2 f g
3 f h
4 q r
5 q s
6 q t
With a comprehension
This assumes the key is the first element of the row.
pd.DataFrame(
[[k, v] for k, *r in df.values for v in r if pd.notna(v)],
columns=['Key', 'value']
)
Key value
0 a b
1 d e
2 f g
3 f h
4 q r
5 q s
6 q t
I want to select to new dataframe, columns that have 'C' in value
protein 1 2 3 4 5
prot1 C M D F A
prot2 C D A M A
prot3 C C D F A
prot4 S D F C L
prot5 S D A I L
So i want to have this:
protein 1 2 4
prot1 C M F
prot2 C D M
prot3 C C F
prot4 S D C
prot5 S D I
Number of colums can be n, i found examples only which i must specify column name... i cant do this here. The script should check column by colummn.
In [22]: df[['protein']].join(df[df.columns[df.eq('C').any()]])
Out[22]:
protein 1 2 4
0 prot1 C M F
1 prot2 C D M
2 prot3 C C F
3 prot4 S D C
4 prot5 S D I
Use:
np.random.seed(123)
n = np.random.choice(['C','M','D', '-'], size=(3,10))
n[:,0] = ['a','b','w']
foo = pd.DataFrame(n)
print (foo)
0 1 2 3 4 5 6 7 8 9
0 a M D D C D D M - D
1 b M D M C M D - M C
2 w C - M - D M C C C
mask = foo.eq('C').any()
#set columns which need in output
mask.loc[0] = True
#filter
print (foo.loc[:,mask])
0 1 4 7 8 9
0 a M C M - D
1 b M C - M C
2 w C - C C C
I have a dictionary and want to use .replace to only replace the indices that are in the dictionary key withe values.
dicts = {"certain index element1": "changed element1",
"certain index element2": "changed element2",
"certain index element3": "changed element3",
}
This does not work:
df.replace(dicts,regex=False,inplace=True)
The df is huge so I can not reassign all of the index from scratch. I only need to change certain elements and everything else remains the same.
If I wanted to replace certain elements within the df (not indices) it would work but for indices it does not.
Use rename(index=dicts)
example
df = pd.DataFrame(
np.random.choice(list('abcd'), (10, 10)),
list('ABCDEFGHIJ')
)
dc = {'A': '_A_', 'B': '_B_'}
df.rename(index=dc, inplace=True)
print(df)
0 1 2 3 4 5 6 7 8 9
_A_ a a a a b a d c c b
_B_ b c c a a a b b a b
C b b d b d c c a a b
D d d d d c b b a a d
E d c c c a d a d d a
F d c d c d d d d b d
G b c d c c b c a a b
H c c c b b a a b c a
I c a a b a d c c a a
J a a a c a b d c c c