Replace values of empty dictionaries in a dataframe column - python

Given the following:
data = pd.DataFrame({"a": [{}, 1, 2]})
How best to replace {} with a particular value?
The following works:
rep = 0
data.apply(lambda x: [y if not isinstance(y, dict) else rep for y in x])
but I'm wondering if there's something more idiomatic.

Try with bool empty object will return False
data.loc[~data.a.astype(bool),'a'] = 0
data
Out[103]:
a
0 0
1 1
2 2

You can use pd.to_numeric with errors='coerce':
In [24]: data['a'] = pd.to_numeric(data['a'], errors='coerce').fillna(0).astype(int)
In [25]: data
Out[25]:
a
0 0
1 1
2 2

Related

Python : How do you filter out columns from a dataset based on substring match in Column names

df_train = pd.read_csv('../xyz.csv')
headers = df_train.columns
I want to filter out those columns in headers which have _pct in their substring.
Use df.filter
df = pd.DataFrame({'a':[1,2,3], 'b_pct':[1,2,3],'c_pct':[1,2,3],'d':[1]*3})
print(df.filter(items=[i for i in df.columns if '_pct' not in i]))
## or as jezrael suggested
# print(df[[i for i in df.columns if '_pct' not in i]])
Output:
a d
0 1 1
1 2 1
2 3 1
Use:
#data from AkshayNevrekar answer
df = df.loc[:, ~df.columns.str.contains('_pct')]
print (df)
Filter solution is not trivial:
df = df.filter(regex=r'^(?!.*_pct).*$')
a d
0 1 1
1 2 1
2 3 1
Thank you, #IanS for another solutions:
df[df.columns.difference(df.filter(like='_pct').columns).tolist()]
df.drop(df.filter(like='_pct').columns, axis=1)
As df.columns returns a list of the column names, you can use list comprehension and build your new list with a simple condition:
new_headers = [x for x in headers if '_pct' not in x]

How to reorder columns based on regex?

Let's say I have a dataframe like this:
df = pd.DataFrame({'foo':[1, 2], 'bar': [3, 4], 'xyz': [5, 6]})
bar foo xyz
0 3 1 5
1 4 2 6
I now want to put the column that contains oo at the first position (i.e. at 0th index); there is always only one column with this pattern.
I currently solve this using filter twice and a concat:
pd.concat([df.filter(like='oo'), df.filter(regex='^((?!(oo)).)*$')], axis=1)
which gives the desired output:
foo bar xyz
0 1 3 5
1 2 4 6
I am wondering whether there is a more efficient way of doing this.
Use list comprehensions only, join lists together and select by subset:
a = [x for x in df.columns if 'oo' in x]
b = [x for x in df.columns if not 'oo' in x]
df = df[a + b]
print (df)
foo bar xyz
0 1 3 5
1 2 4 6
What about:
df[sorted(df, key = lambda x: x not in df.filter(like="oo").columns)]
Using pop:
cols = list(df)
col_oo = [col for col in df.columns if 'oo' in col]
cols.insert(0, cols.pop(cols.index(col_oo[0])))
df = df.ix[:, cols]
Or using regex:
col_oo = [col for col in cols if re.search('oo', col)]

What is the dataset return from dataframe.stack()

I am trying to work on dataframe which i have used .stack() function
df = pd.read_csv('test.csv', usecols =['firstround','secondround','thirdround','fourthround','fifthround'])
sortedArray = df.stack().value_counts()
sortedArray = sortedArray.sort_index()
I need to retrieve the first index column values and the 2nd index column values from the sortedArray, meaning i need x and y value from the sorted array.
Any idea how i can do it?
I think you need Series.iloc, because output from stack is Series:
x = sortedArray.iloc[0]
y = sortedArray.iloc[1]
Sample:
df = pd.DataFrame({'A':['a','a','s'],
'B':['a','s','a'],
'C':['s','d','a']})
print (df)
A B C
0 a a s
1 a s d
2 s a a
sortedArray = df.stack().value_counts()
print (sortedArray)
a 5
s 3
d 1
dtype: int64
sortedArray = sortedArray.sort_index()
print (sortedArray)
a 5
d 1
s 3
dtype: int64
x = sortedArray.iloc[0]
y = sortedArray.iloc[1]
print (x)
5
print (y)
1
print (sortedArray.tolist())
[5, 1, 3]
print (sortedArray.index.tolist())
['a', 'd', 's']

Pandas - Modify string values in each cell

I have a pandas dataframe and I need to modify all values in a given string column. Each column contains string values of the same length. The user provides the index they want to be replaced for each value
for example: [1:3] and the replacement value "AAA".
This would replace the string from values 1 to 3 with the value AAA.
How can I use the applymap(), map() or apply() function to get this done?
SOLUTION: Here is the final solution I went off of using the answer marked below:
import pandas as pd
df = pd.DataFrame({'A':['ffgghh','ffrtss','ffrtds'],
#'B':['ffrtss','ssgghh','d'],
'C':['qqttss',' 44','f']})
print df
old = ['g', 'r', 'z']
new = ['y', 'b', 'c']
vals = dict(zip(old, new))
pos = 2
for old, new in vals.items():
df.ix[df['A'].str[pos] == old, 'A'] = df['A'].str.slice_replace(pos,pos + len(new),new)
print df
Use str.slice_replace:
df['B'] = df['B'].str.slice_replace(1, 3, 'AAA')
Sample Input:
A B
0 w abcdefg
1 x bbbbbbb
2 y ccccccc
3 z zzzzzzzz
Sample Output:
A B
0 w aAAAdefg
1 x bAAAbbbb
2 y cAAAcccc
3 z zAAAzzzzz
IMO the most straightforward solution:
In [7]: df
Out[7]:
col
0 abcdefg
1 bbbbbbb
2 ccccccc
3 zzzzzzzz
In [9]: df.col = df.col.str[:1] + 'AAA' + df.col.str[4:]
In [10]: df
Out[10]:
col
0 aAAAefg
1 bAAAbbb
2 cAAAccc
3 zAAAzzzz

Python Compare rows in two columns and write a result conditionally

I've been searching for quite a while not not getting anywhere close to what I wanted to do...
I have a pandas dataframe in which I want to compare the value of column A to B and write a 1 or 0 in a new column if A and B are equal.
I could write an ugly for loop but I know this is not very pythony.
I'm pretty sure there is a way to do this with apply() but I'm not getting anywhere.
I'd like to be able to compare columns that contain integers as well as columns containing strings.
Thanks in advance for your help.
If df is a Pandas DataFrame, then
df['newcol'] = (df['A'] == df['B']).astype('int')
For example,
In [20]: df = pd.DataFrame({'A': [1,2,'foo'], 'B': [1,99,'foo']})
In [21]: df
Out[21]:
A B
0 1 1
1 2 99
2 foo foo
In [22]: df['newcol'] = (df['A'] == df['B']).astype('int')
In [23]: df
Out[23]:
A B newcol
0 1 1 1
1 2 99 0
2 foo foo 1
df['A'] == df['B'] returns a boolean Series:
In [24]: df['A'] == df['B']
Out[24]:
0 True
1 False
2 True
dtype: bool
astype('int') converts the True/False values to integers -- 0 for False and 1 for True.

Categories

Resources