DataFrame returns empty after .update() - python

I am trying to create a new DataFrame which contains a calculation from an original DF.
To that purpose, I run a for loop with the calc for each column, but I am still getting the empty original DF and I don't see where is the source of the error.
May I ask for some help here?
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2.update(df["Volume"][i] * df["Close"][i])
df2
I expected to create a new DF which contains the original index but with the calculation obtained from original DF

I think this is what you are looking to do:
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2[i] = df["Volume"][i] * df["Close"][i]
df2

Related

How to assign new column to existing DataFrame in pandas

I'm new to pandas. I'm trying to add new columns to my existing DataFrame but It's not getting assigned don't know why can anyone explain me what I'm missing this is what i tried
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df.assign(test3="Hello")
print("After",df.columns)
Output
Before Index(['test', 'test2'], dtype='object')
After Index(['test', 'test2'], dtype='object')
Pandas assign method returns a new modified dataframe with a new column, it does not modify it in place.
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df = df.assign(test3="Hello") # <--- Note the variable reassingment
print("After",df.columns)

dataframe values converted as 'nan' after applied df.iloc()

nan values
I ran into a problem after runnning: pd.DataFrame(), the whole data-frame became 'nan' (empty). I could not reverse this again. I also assigned the data-frame columns names, but their values also disappeared:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
df = df[0].str.split(',', expand=True)
df.to_csv("PuntaCapi.tab",sep="\t",header=None, index=False)
print(df)
Akim =df.iloc[:,0:1]
A= pd.DataFrame(data =Akim ,columns=['Akim'])
veriler2 = pd.DataFrame(data = df, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])
print(veriler2)
Please view the following results from the above mentioned code:
[![Spyder View DataFrame code [][2]][2]1
There is no nan value into the csv file. But after .iloc[], entire dataframe became nan value. I have tried solve the problem but I could not. I need help to solve problem
enter image description here
I do not understand your question.
You read data using pd.read_csv('PuntaCapi.csv', header=None, sep='\n') and save it as df, but you modify df as df[0].str.split(',', expand=True), which directly impact on the result.
Try this code.
df = pd.read_csv('PuntaCapi.csv', header=None, sep='\n')
veriler2 = pd.DataFrame(data = df.values, columns=['Akim','Kuvvet','Zaman','Soguma','Yaklasma','Baski','SacKalinliği','PuntaCapi'])

Delete duplicate by using python pandas

I want to delete all record with condition
import pandas as pd
import numpy as np
# Create a DataFrame
d = {
'Name':['Alisa','Bobby','jodha','jack','raghu','Cathrine',
'Alisa','Bobby','kumar','Alisa','Alex','Cathrine'],
'Age':[26,24,23,22,23,24,26,24,22,23,24,24],
'Score':[85,63,55,74,31,77,85,63,42,62,89,77]}
df = pd.DataFrame(d,columns=['Name','Age','Score'])
df
I want to remove all the record of "Alisa" which is duplicate as she is having Score = 85
I have tried below code, but it still displays "Alisa"
df1 = df[df['Score']==85]
df.drop_duplicates(['Name'])
If you want to drop all duplicates where 'Score' is equal to 85 you can use the following solution:
df1 = df[df['Score'] == 85].drop_duplicates(keep='last')
df.drop(df1.index, inplace=True)

Trying to iterate over DataFrame created from excel

I have an Excel file which I open with pandas and put into a dataframe. It all works well until I try to iterate over a column in the dataframe using a for loop. I get either df does not exist, or #iterrows() missing 1 required positional argument: 'self'
I tried adding this line to code from pandas import dataframe and import dataframe as df neither work
import pandas as pd
from pandas import DataFrame as df
def getFunc():
df = pd.read_excel('filename.xlsx')
for index, row in df.iterrows(): #this thows exception
some_list = row{ColName] * some_val
You should use return:
import pandas as pd
def getFunc():
df = pd.read_excel('filename.xlsx')
return(df)
df1 = getFunc()
for index, row in df1.iterrows():
some_list = row{ColName] * some_val

Pandas: Fill new column by condition row-wise

import pandas as pd
import numpy as np
df = pd.DataFrame([np.random.rand(100),100*[0.1],100*[0.3]]).T
df.columns = ["value","lower","upper"]
df.head()
How can I create a new column which indicates that value is between lower and upper ?
You can use between for this purpose.
df['new_col'] = df['value'].between(df['lower'], df['upper'])

Categories

Resources