Replace value in column by value in list by index - python

My column in dataframe contains indices of values in list. Like:
id | idx
A | 0
B | 0
C | 2
D | 1
list = ['a', 'b', 'c', 'd']
I want to replace each value in idx column greater than 0 by value in list of corresponding index, so that:
id | idx
A | 0
B | 0
C | c # list[2]
D | b # list[1]
I tried to do this with loop, but it does nothing...Although if I move ['idx'] it will replace all values on this row
for index in df.idx.values:
if index >=1:
df[df.idx==index]['idx'] = list[index]

Dont use list like variable name, because builtin (python code word).
Then use Series.map with enumerate in Series.mask:
L = ['a', 'b', 'c', 'd']
df['idx'] = df['idx'].mask(df['idx'] >=1, df['idx'].map(dict(enumerate(L))))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b
Similar idea is processing only matched rows by mask:
L = ['a', 'b', 'c', 'd']
m = df['idx'] >=1
df.loc[m,'idx'] = df.loc[m,'idx'].map(dict(enumerate(L)))
print (df)
id idx
0 A 0
1 B 0
2 C c
3 D b

Create a dictionary for items where the index is greater than 0, then use the mapping with replace to get your output :
mapping = dict((key,val) for key,val in enumerate(l) if key > 0)
print(mapping)
{1: 'b', 2: 'c', 3: 'd'}
df.replace(mapping)
id idx
0 A 0
1 B 0
2 C c
3 D b
Note : I changed the list variable name to l

Related

count word frequency with groupby

I have a csv file only one tag column:
tag
A
B
B
C
C
C
C
When run groupby to count the word frequency, the output do not have the frequency number
#!/usr/bin/env python3
import pandas as pd
def count(fname):
df = pd.read_csv(fname)
print(df)
dfg = df.groupby('tag').count().reset_index()
print(dfg)
return
count("save.txt")
Output no frequency column:
tag
0 A
1 B
2 B
3 C
4 C
5 C
6 C
tag
0 A
1 B
2 C
expect output:
tag freq
0 A 1
1 B 2
2 C 4
Looks close to me, per my comment:
df = pd.DataFrame({'tag': ['A', 'B', 'B', 'C', 'C', 'C', 'C']})
df.groupby(['tag'], as_index=False).agg(freq=('tag', 'count'))
You could create the addtional column then count values:
Input:
df['freq'] = 1
df = df['tag'].value_counts()
Output:
tag freq
0 C 4
1 B 2
2 A 1
You should use value_counts() and not count()
df.groupby("tag").value_counts().reset_index().rename(columns={0: "freq"})
outputs:
tag freq
0 A 1
1 B 2
2 C 4
To sort in descending order,
df.groupby("tag").value_counts().reset_index().rename(columns={0: "freq"}).sort_values(
by="freq", ascending=False
)

How to split a dataframe heaving a list of column values and counts?

I have a CSV based dataframe
name value
A 5
B 5
C 5
D 1
E 2
F 1
and a values count dictionary like this:
{
5: 2,
1: 1
}
How to split original dataframe into two:
name value
A 5
B 5
D 1
name value
C 5
E 2
F 1
So how to split a dataframe heaving a list of column values and counts in pandas?
This worked for me:
def target_indices(df, value_count):
indices = []
for index, row in df.iterrows():
for key in value_count:
if key == row['value'] and value_count[key] > 0:
indices.append(index)
value_count[key] -= 1
return(indices)
df = pd.DataFrame({'name': ['A', 'B', 'C', 'D', 'E', 'F'], 'value': [5, 5, 5, 1, 2, 1]})
value_count = {5: 2, 1: 1}
indices = target_indices(df, value_count)
df1 = df.iloc[indices]
print(df1)
df2 = df.drop(indices)
print(df2)
Output:
name value
0 A 5
1 B 5
3 D 1
name value
2 C 5
4 E 2
5 F 1

Pandas new column from indexing list by row value

I am looking to create a new column in a Pandas data frame with the value of a list filtered by the df row value.
df = pd.DataFrame({'Index': [0,1,3,2], 'OtherColumn': ['a', 'b', 'c', 'd']})
Index OtherColumn
0 a
1 b
3 c
2 d
l = [1000, 1001, 1002, 1003]
Desired output:
Index OtherColumn Value
0 a -
1 b -
3 c 1003
2 d -
My code:
df.loc[df.OtherColumn == 'c', 'Value'] = l[df.Index]
Which returns an error since 'df.Index' is not recognised as a int but as a list (not filter by OtherColumn == 'c').
For R users, I'm looking for:
df[OtherColumn == 'c', Value := l[Index]]
Thanks.
Convert list to numpy array for indexing and then filter by mask in both sides:
m = df.OtherColumn == 'c'
df.loc[m, 'Value'] = np.array(l)[df.Index][m]
print (df)
Index OtherColumn Value
0 0 a NaN
1 1 b NaN
2 3 c 1003.0
3 2 d NaN
Or use numpy.where:
m = df.OtherColumn == 'c'
df['Value'] = np.where(m, np.array(l)[df.Index], '-')
print (df)
Index OtherColumn Value
0 0 a -
1 1 b -
2 3 c 1003
3 2 d -
Or:
df['value'] = np.where(m, df['Index'].map(dict(enumerate(l))), '-')
Use Series.where + Series.map:
df['value']=df['Index'].map(dict(enumerate(l))).where(df['OtherColumn']=='c','-')
print(df)
Index OtherColumn value
0 0 a -
1 1 b -
2 3 c 1003
3 2 d -

Replacing values in DataFrame column based on values in another column

To try, I have:
test = pd.DataFrame([[1,'A', 'B', 'A B r'], [0,'A', 'B', 'A A A'], [2,'B', 'C', 'B a c'], [1,'A', 'B', 's A B'], [1,'A', 'B', 'A'], [0,'B', 'C', 'x']])
replace = [['x', 'y', 'z'], ['r', 's', 't'], ['a', 'b', 'c']]
I would like to replace parts of values in the last column with 0 only if they exist in the replace list at position corresponding to the number in the first column for that row.
For example, looking at the first three rows:
So, since 'r' is in replace[1], that cell becomes A B 0.
'A' is not in replace[0], so it stays as A A A,
'a' and 'c' are both in replace[2], so it becomes B 0 0,
etc.
I tried something like
test[3] = test[3].apply(lambda x: ' '.join([n if n not in replace[test[0]] else 0 for n in test.split()]))
but it's not changing anything.
IIUC, use zip and a list comprehension to accomplish this.
I've simplified and created a custom replace_ function, but feel free to use regex to perform the replacement if needed.
def replace_(st, reps):
for old,new in reps:
st = st.replace(old,new)
return st
df['new'] = [replace_(b, zip(replace[a], ['0']*3)) for a,b in zip(df[0], df[3])]
Outputs
0 1 2 3 new
0 1 A B A B r A B 0
1 0 A B A A A A A A
2 2 B C B a c B 0 0
3 1 A B s A B 0 A B
4 1 A B A A
5 0 B C x 0
Use list comprehension with lookup in sets:
test[3] = [' '.join('0' if i in set(replace[a]) else i for i in b.split())
for a,b in zip(test[0], test[3])]
print (test)
0 1 2 3
0 1 A B A B 0
1 0 A B A A A
2 2 B C B 0 0
3 1 A B 0 A B
4 1 A B A
5 0 B C 0
Or convert to sets before for improve performance:
r = [set(x) for x in replace]
test[3]=[' '.join('0' if i in r[a] else i for i in b.split()) for a,b in zip(test[0], test[3])]
Finally I know what you need
s=pd.Series(replace).reindex(test[0])
[ "".join([dict.fromkeys(y,'0').get(c, c) for c in x]) for x,y in zip(test[3],s)]
['A B 0', 'A A A', 'B 0 0', '0 A B', 'A', '0']

Mapping dictionary onto dataframe when dictionary key is a list

I have a dictionary where the values are lists:
dict = {1: ['a','b'], 2:['c', 'd']}
I want to map the dictionary onto col1 of my dataframe.
col1
a
c
If the value of col1 is IN one of the values of my dictionary, then I want to replace the value of col1 with the value of the dictionary key.
Like this, my dataframe will become:
col1
1
2
thanks in advance
I would convert the dictionary in the right way:
mapping = {}
for key, values in D.items():
for item in values:
mapping[item] = key
and then
df['col1'] = df['col1'].map(mapping)
You can also try using stack + reset_index and set_index with map.
d = pd.DataFrame({1: ['a','b'], 2:['c', 'd']})
mapping = d.stack().reset_index().set_index(0)["level_1"]
s = pd.Series(['a', 'c'], name="col1")
s.map(mapping)
0 1
1 2
Name: col1, dtype: int64
Step by step demo
d.stack()
0 1 a
2 c
1 1 b
2 d
dtype: object
d.stack().reset_index()
level_0 level_1 0
0 0 1 a
1 0 2 c
2 1 1 b
3 1 2 d
d.stack().reset_index().set_index(0)
level_0 level_1
0
a 0 1
c 0 2
b 1 1
d 1 2
Finally, we select the level_1 column as our mapping to pass in map function.
do you mean something like this???
D = {1 : ['a', 'b'], 2 : ['c', 'd']}
for key, value in D.items():
for each in value:
if each in D[key]:
print(each, "is in D[%s]" % key)
o/p:
a is in D[1]
b is in D[1]
c is in D[2]
d is in D[2]

Categories

Resources