Replace NaN values from DataFrame with values from series - python

I am trying to implement code which will do the following with pandas.
def fill_in_capabilities(df):
capacity_means = df.groupby("LV_Name").mean(["LEO_Capa", "GTO_Capa"])
for row in df:
if np.isnan(row["LEO_Capa"]):
row["LEO_Capa"] = capacity_means[row["LV_Name"]]
return df
Basically, for the rows in df where the value in the column "LEO_Capa" is NaN, I would like to replace the value there with a value from the series capacity_means, indexed by the value in the column "LV_Name" from the df with the missing value. How would one do this with pandas, as the code there does not work. Thanks.

You can use a function:
def fill_in_capabilities(df: pd.DataFrame) -> pd.DataFrame:
df[["LEO_Capa", "GTO_Capa"]] = df[["LEO_Capa", "GTO_Capa"]].fillna(
df.groupby("LV_Name")[["LEO_Capa", "GTO_Capa"]].transform("mean")
)
return df
df = fill_in_capabilities(df)

Related

Sorting DataFrame and getting a value of column from it

I have the following dataframe:
newItem = pd.DataFrame({'c1': range(10), 'c2': (1,90,100,50,30,10,50,30,90,1000)})
Which looks like this:
I want to sort the columns by descending order, and extract the i'th row to a new pandas series.
So my function looks like this:
def getLargestRow(dataFrame, indexAfterSort):
numRows, numCols = dataFrame.shape
seriesToReturn = pd.Series()
dataFrame= dataFrame.sort_values(by=list(df.columns), ascending = False)
My problem is that I can't get to concatenate dataFrame's row number indexAfterSort.
I've tried to use:
seriesToReturn = seriesToReturn.add(df.iloc[indexAfterSort])
But confusingly I got NaN values, instead of the row values.
The dataframe after sort:
The output I receive (no matter what's the input for row index):
What am I missing here?
Thanks in advance.
It's a good idea to use built-in pandas functions for simple operations like sorting. Function sort_values is a good option here. This sorts the rows of the dataframe by c1 column:
seriesToReturn = newItem.sort_values('c1', ascending=False)
This returns a dataframe with both c1 and c2 columns, to get series of c2 column, just use seriesToReturn = seriesToReturn['c2'].

How to match columns title with columns contains and reflet the values?

I have a data frame like the following:
I need to keep some values matching the column's title which contains one extra column.
Could you please suggest the solution?
Here a solution:
# Building of a sample dataframe
df = pd.DataFrame({"index":["B","D","C","A"]}).groupby(["index"]).count()
df["value"] = None
# Function to fill the matching column
def match(index, column):
if(index==column):
return 1
else:
return ""
# Create the matching column and fill it with the right value
for index in df.index.array:
df[index] = df.apply(lambda row: match(row.name, index), axis=1)
# Print the result dataframe
print(df)

How can I update the value in a pandas dataframe

I have a pandas DataFrame df which consists of three columns: doc1, doc2, value
I set value to 0 in all the row. I want to update the value using the jaccard similarity function (suppose it is defined).
I do the following:
df['value'] = 0
for index, row in df.iterrows():
sim = jaccardSim(row['doc1'], row['doc'])
df.at[index, 'value'] = sim
Unfortunately, it does not work. When i print df, I get in df['value'] the value 0.
How can I solve that?
You can try
df['value']=[jaccardSim(x, y) for x , y in zip(df['doc1'], df['doc'])]
you can do it making vectorized function. you should modify the jaccardSim to take a row of df or create a lambda wrapper function
jaccardSim = lambda row: jaccardSim(row["doc1"], row["doc2"])
vect_jaccardSim = np.vectorize(jaccardSim)
df['value'] = vect_jaccardSim(df)

fillna does fill the dataframe in the NaN cells

What am I missing? fillna doesn't fill NaN values:
#filling multi columns df with values..
df.fillna(method='ffill', inplace=True)
df.fillna(method='bfill', inplace=True)
#just for kicks
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
#retun true
print df.isnull().values.any()
I verified it - I actually see NaN values in some first cells..
Edit
So I'm trying to write it myself:
def bfill(df):
for column in df:
for cell in df[column]:
if cell is not None:
tmpValue = cell
break
for cell in df[column]:
if cell is not None:
break
cell = tmpValue
However it doesn't work... Isn't the cell is by ref?
ffill fills rows with values from the previous row if they weren't NaN, bfill fills rows with the values from the NEXT row if they weren't NaN. In both cases, if you have NaNs on the first and/or last row, they won't get filled. Try doing both one after the other. If any columns have entirely NaN values then you will need to fill again with axis=1, (although I get a NotImplementedError when I try to do this with inplace=True on python 3.6, which is super annoying, pandas!).
So, I don't know why but taking the fillna outside the function fixed it..
Origen:
def doWork(df):
...
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
def main():
..
doWork(df)
print df.head(5) #shows NaN
Solution:
def doWork(df):
...
def main():
..
doWork(df)
df = df.fillna(method='ffill')
df = df.fillna(method='bfill')
print df.head(5) #no NaN

How do I overwrite the value of a specific index/column in a DataFrame?

I have a dataframe Exposure with zeros constructed as follows:
Exposure = pd.DataFrame(0, index=dates, columns=tickers)
and a DataFrame df with data.
I want to fill some of the data from df to Exposure:
for index, column in df.iterrows():
# value of df(index,column) to be filled at Exposure(index,column)
How do I overwrite the value of at (index,column) of Exposure with the value of df(index,column)?
The best way is:
df.loc[index, column] = value
You can try this:
for index, column in df.iterrows():
Exposure.loc[index, column.index] = column.values
This will make new index and columns in Exposure if they don't exist, if you want to avoid this, construct the common index and columns firstly, then do the assignment in a vectorized way(avoiding the for loop):
common_index = Exposure.index.intersection(df.index)
common_columns = Exposure.columns.intersection(df.columns)
Exposure.loc[common_index, common_columns] = df.loc[common_index, common_columns]

Categories

Resources