Replace value in column by value in list by index

Replace value in column by value in list by index - python

My column in dataframe contains indices of values in list. Like:
id | idx
A | 0
B | 0
C | 2
D | 1
list = ['a', 'b', 'c', 'd']
I want to replace each value in idx column greater than 0 by value in list of corresponding index, so that:
id | idx
A | 0
B | 0
C | c # list[2]
D | b # list[1]
I tried to do this with loop, but it does nothing...Although if I move ['idx'] it will replace all values on this row
for index in df.idx.values:
if index >=1:
df[df.idx==index]['idx'] = list[index]

Dont use list like variable name, because builtin (python code word).
Then use Series.map with enumerate in Series.mask:
L = ['a', 'b', 'c', 'd']
df['idx'] = df['idx'].mask(df['idx'] >=1, df['idx'].map(dict(enumerate(L))))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b
Similar idea is processing only matched rows by mask:
L = ['a', 'b', 'c', 'd']
m = df['idx'] >=1
df.loc[m,'idx'] = df.loc[m,'idx'].map(dict(enumerate(L)))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b

Create a dictionary for items where the index is greater than 0, then use the mapping with replace to get your output :
mapping = dict((key,val) for key,val in enumerate(l) if key > 0)
print(mapping)
{1: 'b', 2: 'c', 3: 'd'}
df.replace(mapping)
id idx
0 A 0
1 B 0
2 C c
3 D b
Note : I changed the list variable name to l

Related

count word frequency with groupby

I have a csv file only one tag column:
tag
A
B
B
C
C
C
C
When run groupby to count the word frequency, the output do not have the frequency number
#!/usr/bin/env python3
import pandas as pd
def count(fname):
df = pd.read_csv(fname)
print(df)
dfg = df.groupby('tag').count().reset_index()
print(dfg)
return
count("save.txt")
Output no frequency column:
tag
0 A
1 B
2 B
3 C
4 C
5 C
6 C
tag
0 A
1 B
2 C
expect output:
tag freq
0 A 1
1 B 2
2 C 4

Looks close to me, per my comment:
df = pd.DataFrame({'tag': ['A', 'B', 'B', 'C', 'C', 'C', 'C']})
df.groupby(['tag'], as_index=False).agg(freq=('tag', 'count'))

You could create the addtional column then count values:
Input:
df['freq'] = 1
df = df['tag'].value_counts()
Output:
tag freq
0 C 4
1 B 2
2 A 1

You should use value_counts() and not count()
df.groupby("tag").value_counts().reset_index().rename(columns={0: "freq"})
outputs:
tag freq
0 A 1
1 B 2
2 C 4
To sort in descending order,
df.groupby("tag").value_counts().reset_index().rename(columns={0: "freq"}).sort_values(
by="freq", ascending=False
)

How to split a dataframe heaving a list of column values and counts?

I have a CSV based dataframe
name value
A 5
B 5
C 5
D 1
E 2
F 1
and a values count dictionary like this:
{
5: 2,
1: 1
}
How to split original dataframe into two:
name value
A 5
B 5
D 1
name value
C 5
E 2
F 1
So how to split a dataframe heaving a list of column values and counts in pandas?

This worked for me:
def target_indices(df, value_count):
indices = []
for index, row in df.iterrows():
for key in value_count:
if key == row['value'] and value_count[key] > 0:
indices.append(index)
value_count[key] -= 1
return(indices)
df = pd.DataFrame({'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'value': [5, 5, 5, 1, 2, 1]})
value_count = {5: 2, 1: 1}
indices = target_indices(df, value_count)
df1 = df.iloc[indices]
print(df1)
df2 = df.drop(indices)
print(df2)
Output:
name value
0 A 5
1 B 5
3 D 1
name value
2 C 5
4 E 2
5 F 1

Pandas new column from indexing list by row value

I am looking to create a new column in a Pandas data frame with the value of a list filtered by the df row value.
df = pd.DataFrame({'Index': [0,1,3,2], 'OtherColumn': ['a', 'b', 'c', 'd']})
Index OtherColumn
0 a
1 b
3 c
2 d
l = [1000, 1001, 1002, 1003]
Desired output:
Index OtherColumn Value
0 a -
1 b -
3 c 1003
2 d -
My code:
df.loc[df.OtherColumn == 'c', 'Value'] = l[df.Index]
Which returns an error since 'df.Index' is not recognised as a int but as a list (not filter by OtherColumn == 'c').
For R users, I'm looking for:
df[OtherColumn == 'c', Value := l[Index]]
Thanks.

Convert list to numpy array for indexing and then filter by mask in both sides:
m = df.OtherColumn == 'c'
df.loc[m, 'Value'] = np.array(l)[df.Index][m]
print (df)
Index OtherColumn Value
0 0 a NaN
1 1 b NaN
2 3 c 1003.0
3 2 d NaN
Or use numpy.where:
m = df.OtherColumn == 'c'
df['Value'] = np.where(m, np.array(l)[df.Index], '-')
print (df)
Index OtherColumn Value
0 0 a -
1 1 b -
2 3 c 1003
3 2 d -
Or:
df['value'] = np.where(m, df['Index'].map(dict(enumerate(l))), '-')

Use Series.where + Series.map:
df['value']=df['Index'].map(dict(enumerate(l))).where(df['OtherColumn']=='c','-')
print(df)
Index OtherColumn value
0 0 a -
1 1 b -
2 3 c 1003
3 2 d -

Replacing values in DataFrame column based on values in another column

To try, I have:
test = pd.DataFrame([[1,'A', 'B', 'A B r'], [0,'A', 'B', 'A A A'], [2,'B', 'C', 'B a c'], [1,'A', 'B', 's A B'], [1,'A', 'B', 'A'], [0,'B', 'C', 'x']])
replace = [['x', 'y', 'z'], ['r', 's', 't'], ['a', 'b', 'c']]
I would like to replace parts of values in the last column with 0 only if they exist in the replace list at position corresponding to the number in the first column for that row.
For example, looking at the first three rows:
So, since 'r' is in replace[1], that cell becomes A B 0.
'A' is not in replace[0], so it stays as A A A,
'a' and 'c' are both in replace[2], so it becomes B 0 0,
etc.
I tried something like
test[3] = test[3].apply(lambda x: ' '.join([n if n not in replace[test[0]] else 0 for n in test.split()]))
but it's not changing anything.

IIUC, use zip and a list comprehension to accomplish this.
I've simplified and created a custom replace_ function, but feel free to use regex to perform the replacement if needed.
def replace_(st, reps):
for old,new in reps:
st = st.replace(old,new)
return st
df['new'] = [replace_(b, zip(replace[a], ['0']*3)) for a,b in zip(df[0], df[3])]
Outputs
0 1 2 3 new
0 1 A B A B r A B 0
1 0 A B A A A A A A
2 2 B C B a c B 0 0
3 1 A B s A B 0 A B
4 1 A B A A
5 0 B C x 0

Use list comprehension with lookup in sets:
test[3] = [' '.join('0' if i in set(replace[a]) else i for i in b.split())
for a,b in zip(test[0], test[3])]
print (test)
0 1 2 3
0 1 A B A B 0
1 0 A B A A A
2 2 B C B 0 0
3 1 A B 0 A B
4 1 A B A
5 0 B C 0
Or convert to sets before for improve performance:
r = [set(x) for x in replace]
test[3]=[' '.join('0' if i in r[a] else i for i in b.split()) for a,b in zip(test[0], test[3])]

Finally I know what you need
s=pd.Series(replace).reindex(test[0])
[ "".join([dict.fromkeys(y,'0').get(c, c) for c in x]) for x,y in zip(test[3],s)]
['A B 0', 'A A A', 'B 0 0', '0 A B', 'A', '0']

Mapping dictionary onto dataframe when dictionary key is a list

I have a dictionary where the values are lists:
dict = {1: ['a','b'], 2:['c', 'd']}
I want to map the dictionary onto col1 of my dataframe.
col1
a
c
If the value of col1 is IN one of the values of my dictionary, then I want to replace the value of col1 with the value of the dictionary key.
Like this, my dataframe will become:
col1
1
2
thanks in advance

I would convert the dictionary in the right way:
mapping = {}
for key, values in D.items():
for item in values:
mapping[item] = key
and then
df['col1'] = df['col1'].map(mapping)

You can also try using stack + reset_index and set_index with map.
d = pd.DataFrame({1: ['a','b'], 2:['c', 'd']})
mapping = d.stack().reset_index().set_index(0)["level_1"]
s = pd.Series(['a', 'c'], name="col1")
s.map(mapping)
0 1
1 2
Name: col1, dtype: int64
Step by step demo
d.stack()
0 1 a
2 c
1 1 b
2 d
dtype: object
d.stack().reset_index()
level_0 level_1 0
0 0 1 a
1 0 2 c
2 1 1 b
3 1 2 d
d.stack().reset_index().set_index(0)
level_0 level_1
0
a 0 1
c 0 2
b 1 1
d 1 2
Finally, we select the level_1 column as our mapping to pass in map function.

do you mean something like this???
D = {1 : ['a', 'b'], 2 : ['c', 'd']}
for key, value in D.items():
for each in value:
if each in D[key]:
print(each, "is in D[%s]" % key)
o/p:
a is in D[1]
b is in D[1]
c is in D[2]
d is in D[2]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Replace value in column by value in list by index - python

Related

count word frequency with groupby

How to split a dataframe heaving a list of column values and counts?

Pandas new column from indexing list by row value

Replacing values in DataFrame column based on values in another column

Mapping dictionary onto dataframe when dictionary key is a list

Categories

Resources