I have an Excel file which I open with pandas and put into a dataframe. It all works well until I try to iterate over a column in the dataframe using a for loop. I get either df does not exist, or #iterrows() missing 1 required positional argument: 'self'
I tried adding this line to code from pandas import dataframe and import dataframe as df neither work
import pandas as pd
from pandas import DataFrame as df
def getFunc():
df = pd.read_excel('filename.xlsx')
for index, row in df.iterrows(): #this thows exception
some_list = row{ColName] * some_val
You should use return:
import pandas as pd
def getFunc():
df = pd.read_excel('filename.xlsx')
return(df)
df1 = getFunc()
for index, row in df1.iterrows():
some_list = row{ColName] * some_val
Related
I am trying to create a new DataFrame which contains a calculation from an original DF.
To that purpose, I run a for loop with the calc for each column, but I am still getting the empty original DF and I don't see where is the source of the error.
May I ask for some help here?
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2.update(df["Volume"][i] * df["Close"][i])
df2
I expected to create a new DF which contains the original index but with the calculation obtained from original DF
I think this is what you are looking to do:
import yfinance as yf
import pandas as pd
df = yf.download(["YPFD.BA", "GGAL.BA"], period='6mo')
df2 = pd.DataFrame()
for i in ["YPFD.BA", "GGAL.BA"]:
df2[i] = df["Volume"][i] * df["Close"][i]
df2
say I got multiple pandas dataframes and I want to check their number of columns and create a def function that does that, how? Already tried on my own but when I the following code it returns a type error
import pandas as pd
def load_csv(filename):
filename = pd.read_csv(filename)
return filename
def columns_count(f):
f = load_csv(f)
columns = f.shape[1]
return columns
Code to loop through dataframes and count number of columns
def count_num_cols(df):
return len(df.columns) # or df.shape[1]
list_of_paths = ["C://Users//file.txt", ]
for a_path in list_of_paths:
df = pd.read_csv(a_path)
print(count_num_cols(df.shape[1])
I have a list called: sens_fac = [0.8, 1, 1.2], and a dataframe df defined this way:
import pandas as pd
df = pd.DataFrame(index=range(len(sens_fac)),columns=range(len(factors)))
However, I want to modify the index. I know I can do this in the definition, and it works.
import pandas as pd
df = pd.DataFrame(index=sens_fac,columns=range(len(factors)))
But what if I want to modify the index after it was created? I tried doing this
df.set_index(sens_fac)
But I get this error:
KeyError: 'None of [0.8, 1.2] are in the columns'
You only call the set_index method on an existing pandas.DataFrame object of the same length as your index (so it must not be empty). But there is an index argument in the constructor:
import pandas as pd
import numpy as np
sens_fact = np.random.rand(5)
df = pd.DataFrame(index=sens_fact)
I you want to manipulate an existing pandas.DataFrame than pandas.DataFrame.set_index() is the correct method but it expects the name of a column of the table. So you go with:
df = pd.DataFrame(sens_fact, columns={'sens_fact'})
print(df) # dataframe with standard enumerated indices
df.set_index('sens_fact', inplace=True)
print(df) # dataframe with no columns an non-standard index
Or you can just manipulate the index directly:
df = pd.DataFrame(np.random.rand(len(sens_fact)))
df.index = sens_fact
I'm new to pandas. I'm trying to add new columns to my existing DataFrame but It's not getting assigned don't know why can anyone explain me what I'm missing this is what i tried
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df.assign(test3="Hello")
print("After",df.columns)
Output
Before Index(['test', 'test2'], dtype='object')
After Index(['test', 'test2'], dtype='object')
Pandas assign method returns a new modified dataframe with a new column, it does not modify it in place.
import pandas as pd
df = pd.DataFrame(data = {"test":["mkt1","mkt2","mkt3"],
"test2":["cty1","cty2","cty3"]})
print("Before",df.columns)
df = df.assign(test3="Hello") # <--- Note the variable reassingment
print("After",df.columns)
Let me first outline the overall context of the problem at hand, through the following code snippet.
import pandas as pd
df = pd.read_csv("abc.csv")
df.as_matrix
The desired matrix [100 rows x 785 columns] is output.
I am having difficulty in outputting(using print()) a row of the above matrix.
I tried the following, but in vain:
print(df[0])
print(df[:, 0])
The return value of as_matrix() is array of arrays. So following code should work:
import pandas as pd
df = pd.read_csv("abc.csv")
matrix = df.as_matrix()
print(matrix[0]) # out put first row.
print(matrix[3:5] # output from 3 row up to 4 row.
I think you are missing parenthesis:
df.as_matrix()[0]
Or you can use:
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.head.html
.head(n=5) Returns first n rows
df.head(1).as_matrix()
To get a specific row where the row index is equal A:
df.iloc[A, :785]
In general, you can use df.iloc to slice from a pandas dataframe using the idnex. For example, to slice rows from up to 100 and columns up to 785 you can do the following:
import pandas as pd
df = pd.read_csv("abc.csv")
df = df.iloc[:100, :785]
df.as_matrix()
If you want to slice the first row after converting to matrix, you are working with a list of lists, so you can do that as follows:
print(df.as_matrix()[1,:])
Here is a working example:
from StringIO import StringIO
import pandas as pd
st = """
col1|col2|col3
1|2|3
4|5|6
7|8|9
"""
pd.read_csv(StringIO(st), sep="|")
df = pd.read_csv(StringIO(st), sep="|")
print("print first row from a matrix")
print(df.as_matrix()[0,:])
print("print one column")
print(df.iloc[:2,1])
print("print a slice")
print(df.iloc[:2,:])
print("print one row")
print(df.iloc[1,:])