I want to change the existing dataframe value to nan. What is the way to do it when you need to change several?
dataframe['A', 'B'....] = np.nan
I tried this but nothing changed
Double brackets are required in this case, to pass the list of columns to the dataframe index operator []. For the OP case would be:
dataframe[['A', 'B'....]] = np.nan
Reproducible example:
import numpy as np
import pandas as pd
dict= {'ColA':[1,2,3], 'ColB':[4,5,6], 'ColC':[7,8,9], 'ColD':[-1,-2,-3] }
df=pd.DataFrame(dict,index=['I1','I2','I3'])
print(df)
df[['ColA','ColD']]=np.nan
print(df)
Note:
This solution was originally suggested via comment, now included as an answer with a reproducible example for future reference.
Related
I have a following sample df which I want to drop missing row for period1 column. I use dropna method with subset parameter as a list (test1) and as a string (test2), both returns the same result.
I try to find out -
Would specifying the subset parameter as a string (since I only have one column) cause any issue?
If yes, is it due to different version of Python (currently I am using 3.7)
If yes, is it due to running the code on different OS (currently I am running on window, not on a server)
What would be the potential issue?
Any suggestion would be greatly appreciated.
import numpy as np
import pandas as pd
df1 = { 'item':['item1','item2','item3','item4'],
'period1':[np.nan,222,5555,123],
'period2':[4567,3333,123,123],
'period3':[1234, 254,9993,321],
'period4':[999,525,2345,963]}
df1=pd.DataFrame(df1)
test1 = df1.copy()
test2 = df1.copy()
# period1 in a list
test1.dropna(subset=['period1'],inplace=True)
# period1 as a string
test2.dropna(subset='period1',inplace=True)
I'm looking to convert a column to lower case. The issue is there are some instances where the string within the column only contains numbers. In my real life case this is due to poor data entry. Instead of having these values converted to NaN, I would like to keep the numeric string as is. What is the best approach to achieving this?
Below is my current code and output
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
df['col'].str.lower()
Current Output
Desired Output
Just convert to column to strings first:
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
print(df['col'].astype(str).str.lower())
Pre-Define the data as str format.
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']}, dtype=str)
print(df['col'].str.lower())
to add a slight variation to Tim Roberts' solution without using the .str accessor:
import pandas as pd
df = pd.DataFrame({'col':['G5051', 'G5052', 5053, 'G5054']})
print(df['col'].astype(str).apply(lambda x: x.lower()))
I am quite new to Python programming.
I am working with the following dataframe:
Before
Note that in column "FBgn", there is a mix of FBgn and FBtr string values. I would like to replace the FBtr-containing values with FBgn values provided in the adjacent column called "## FlyBase_FBgn". However, I want to keep the FBgn values in column "FBgn". Maybe keep in mind that I am showing only a portion of the dataframe (reality: 1432 rows). How would I do that? I tried the replace() method from Pandas, but it did not work.
This is actually what I would like to have:
After
Thanks a lot!
With Pandas, you could try:
df.loc[df["FBgn"].str.contains("FBtr"), "FBgn"] = df["## FlyBase_FBgn"]
Welcome to stackoverflow. Please next time provide more info including your code. It is always helpful
Please see the code below, I think you need something similar
import pandas as pd
#ignore the dict1, I just wanted to recreate your df
dict1= {"FBgn": ['FBtr389394949', 'FBgn3093840', 'FBtr000025'], "FBtr": ['FBgn546466646', '', 'FBgn15565555']}
df = pd.DataFrame(dict1) #recreating your dataframe
#print df
print(df)
#function to replace the values
def replace_values(df):
for i in range(0, (df.size//2)):
if 'tr' in df['FBgn'][i]:
df['FBgn'][i] = df['FBtr'][i]
return df
df = replace_values(df)
#print new df
print(df)
My DataFrame has a complex128 in one column. When I access another value via the .loc method it returns a complex128 instead of the stored dtype.
I encountered the problem when I was using some values from a DataFrame inside a class in a function.
Here is a minimal example:
import pandas as pd
arrays = [["f","i","c"],["float","int","complex"]]
ind = pd.MultiIndex.from_arrays(arrays,names=("varname","intended dtype"))
a = pd.DataFrame(columns=ind)
m1 = 1.33+1e-9j
parms1 = [1.,2,None]
a.loc["aa"] = parms1
a.loc["aa","c"] = m1
print(a.dtypes)
print(a.loc["aa","f"])
print("-----------------------------")
print(a.loc["aa",("f","float")])
print("-----------------------------")
print(a["f"])
If the MultiIndex is taken away that does not happen. So it seems to have some impact. But also accessing it in the MultiIndex-way does not help.
I noticed that the dtype assignment happens, because I have not specified any index in the DataFrame creation. This is necessary, because I don't know what to be filled in the beginning.
Is this a normal behavior or can I get rid of it?
pandas version is: 0.24.2
is reproducible in 0.25.3
import pandas as pd
import numpy as np
test_df = pd.DataFrame([[1,2]]*4, columns=['x','y'])
test_df.iloc[0,0] = '1'
test_df.iloc[0,0] = 1
test_df.select_dtypes(include=['number'])
I want to know that why column x does not included in this case
I can reproduce on Pandas v0.19.2. The issue is when, if at all, Pandas chooses to check and recast series. You first define the series as dtype object with this assignment:
test_df.iloc[0, 0] = '1'
Pandas stores any series with strings as object dtype. You then overwrite a value in the next line without explicitly changing the dtype of the series:
test_df.iloc[0, 0] = 1
But you should not assume this automatically triggers conversion to a numeric dtype for the entire series. As far as I am aware, this is not a documented behaviour. While it may work in more recent versions, it is not a behaviour you should assume for a production workflow.