Pandas: Fill new column by condition row-wise - python

import pandas as pd
import numpy as np
df = pd.DataFrame([np.random.rand(100),100*[0.1],100*[0.3]]).T
df.columns = ["value","lower","upper"]
df.head()
How can I create a new column which indicates that value is between lower and upper ?

You can use between for this purpose.
df['new_col'] = df['value'].between(df['lower'], df['upper'])

Related

DataFrame returns empty after .update()

I am trying to create a new DataFrame which contains a calculation from an original DF.
To that purpose, I run a for loop with the calc for each column, but I am still getting the empty original DF and I don't see where is the source of the error.
May I ask for some help here?
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2.update(df["Volume"][i] * df["Close"][i])
df2
I expected to create a new DF which contains the original index but with the calculation obtained from original DF
I think this is what you are looking to do:
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2[i] = df["Volume"][i] * df["Close"][i]
df2

Set a list as the index of a pandas DataFrame

I have a list called: sens_fac = [0.8, 1, 1.2], and a dataframe df defined this way:
import pandas as pd
df = pd.DataFrame(index=range(len(sens_fac)),columns=range(len(factors)))
However, I want to modify the index. I know I can do this in the definition, and it works.
import pandas as pd
df = pd.DataFrame(index=sens_fac,columns=range(len(factors)))
But what if I want to modify the index after it was created? I tried doing this
df.set_index(sens_fac)
But I get this error:
KeyError: 'None of [0.8, 1.2] are in the columns'
You only call the set_index method on an existing pandas.DataFrame object of the same length as your index (so it must not be empty). But there is an index argument in the constructor:
import pandas as pd
import numpy as np
sens_fact = np.random.rand(5)
df = pd.DataFrame(index=sens_fact)
I you want to manipulate an existing pandas.DataFrame than pandas.DataFrame.set_index() is the correct method but it expects the name of a column of the table. So you go with:
df = pd.DataFrame(sens_fact, columns={'sens_fact'})
print(df) # dataframe with standard enumerated indices
df.set_index('sens_fact', inplace=True)
print(df) # dataframe with no columns an non-standard index
Or you can just manipulate the index directly:
df = pd.DataFrame(np.random.rand(len(sens_fact)))
df.index = sens_fact

Delete duplicate by using python pandas

I want to delete all record with condition
import pandas as pd
import numpy as np
# Create a DataFrame
d = {
'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine',
'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'],
'Age':[26,24,23,22,23,24,26,24,22,23,24,24],
'Score':[85,63,55,74,31,77,85,63,42,62,89,77]}
df = pd.DataFrame(d,columns=['Name','Age','Score'])
df
I want to remove all the record of "Alisa" which is duplicate as she is having Score = 85
I have tried below code, but it still displays "Alisa"
df1 = df[df['Score']==85]
df.drop_duplicates(['Name'])
If you want to drop all duplicates where 'Score' is equal to 85 you can use the following solution:
df1 = df[df['Score'] == 85].drop_duplicates(keep='last')
df.drop(df1.index, inplace=True)

Pandas: need to remove the row that contains a string. BUT my condition is not working

from chainer import datasets
from chainer.datasets import tuple_dataset
import numpy as np
import matplotlib.pyplot as plt
import chainer
import pandas as pd
import math
I have a file CSV contains 40300 data.
df =pd.read_csv("Myfile.csv", header = None)
in this area i am removing the ignored rows and columns
columns = [0,1]
rows = [0,1,2]
df.drop(columns, axis = 1, inplace = True) #drop the two first columns that no need to the code
df.drop(rows, axis = 0, inplace = True) #drop the two first rwos that no need to the code
in this area i want to remove the row if string data type faced BUT its not working
df[~df.E.str.contains("Intf Shut")]~this part is not working with me
df.to_csv('summary.csv', index = False, header = False)
df.head()
You have to reassign the value of df in df
df = df[~df.E.str.contains("Intf Shut")]
have to change the column name into array which I choose the third column,
df[~df[2].isin(to_drop)]
Then you can define first a variable "to_drop" to the specific text that contains, Which its like following.
to_drop = ['My text 1', 'My text 2']

Python Pandas DataFrame Difference Between Masking Left side only and Both Side

Lets say we have a pandas dataframe like bellow,
import pandas as pd
df = pd.DataFrame([{'col1':'a', 'col2':'b'}, {'col1':None, 'col2':'d'}, {'col1':'e', 'col2':'f'}, {'col1':None, 'col2':'1'}])
Is there any difference between this two code,
df.loc[~df['col1'].isnull(), 'col1'] = df['col1'].str.upper()
print(df)
vs
mask = ~df['col1'].isnull()
df.loc[mask, 'col1'] = df[mask]['col1'].str.upper()
print(df)

Categories

Resources