This question already has answers here:
Python, Functions changing values
(2 answers)
df.append() is not appending to the DataFrame
(2 answers)
Closed 3 years ago.
This is a portion of my code:
df3 = pd.DataFrame(columns=["colA","colB"])
def someFunction(filePath,file,df):
df2 = pd.DataFrame(columns=["colA","colB"])
df1 = pd.read_csv(filePath,header=0)
var = ["foo1","bar1"]
df2['colA'] = ["foo2","bar2"]
df2['colB'] = var
df = df.append(df2)
df1 = pd.DataFrame()
df2 = pd.DataFrame()
return df
for root, dirs, fileList in os.walk(someDir):
for file in fileList:
if <someCondition>:
someFunction(filePath,file,df3)
print df3
On running the code, I don't see df2 getting appended to df3. Instead, the value of df3 within the function is the same as df3 and empty dataframe outside the function.
How do I consistently append df2 to df3 and get df3 to grow for every file processed?
Related
This question already has answers here:
How do I combine two dataframes?
(8 answers)
Closed 8 months ago.
Working on my first data project and I'm new to stackoverflow. All the other examples I have found use append, but whenever I try append, the data gets organized wrong since I want to concatenate the columns vertically. This is what I have so far:
import pandas as pd
import os
input_file_path = "C:/Users/laura/Downloads/excel files/"
output_file_path = "C:/Users/laura/OneDrive/Desktop/master excel/"
excel_file_list = os.listdir(input_file_path)
df = pd.DataFrame()
for excel_files in excel_file_list:
if excel_files.endswith('.csv'):
df1 = pd.read_csv(input_file_path+excel_files)
df = pd.concat(df1, axis=1, ignore_index=True)
And this is the error I am getting:
TypeError: first argument must be an iterable of pandas objects, you passed an object of type "DataFrame"
Simply do this: (File1 and File2 are the paths to the [.csv]/excel files)
dataFrame = pd.concat(
map(pd.read_csv, [file1, file2]), ignore_index=True)
Make sure your paths are something like this:
C:\username\folder\1.csv
Hy Laura, try this:
df1 = pd.read_csv("Directory/file.csv",sep=';')
df2 = pd.read_csv("Directory/file.csv", sep=';')
df = pd.concat([df1, df2])
I usually d'ont use
df = pd.DataFrame()
Directly I put the line
df = pd.concat([df1, df2])
Be careful because in some exel files, you have to modify the atribute 'sep' of function pd.read_csv.
I hope have you helped.
This question already has answers here:
Why can pandas DataFrames change each other?
(3 answers)
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
Closed 1 year ago.
I am new to loop in Python and just came across a weird question. I was doing some calculations on multiple dataframes, and to simplify the question, here is an illustration.
Suppose I have 3 dataframes filled with NaN:
# generate NaN entries
data = np.empty((15, 10))
# create dataframe
data[:] = np.nan
dfnan = pd.DataFrame(data)
df1 = dfnan
df2 = dfnan
df3 = dfnan
After this step, all the three dataframes give me NaN as expected.
But then, if I add two for loops in one block like below:
for i in range(0, 15, 1):
df1.iloc[i] = 0
for j in range(0, 15, 1):
df2.iloc[j] = df1.iloc[j].transform(lambda x: x+1)
Then all of df1, df2, and df3 give me 1 entries. But shouldn't it be that:
df1 filled with 0, df2 filled with 1 and df3 filled with NaN (since I didn't make any change to it)?
Why is that and how I can change it to get the wanted result?
Assignment never copies in python. df1, df2, df3 and dfnan are all references to the same object (pd.DataFrame(data)). This means that changes in one are reflected in the remaining ones, as they all point to the same object.
This is a great reading https://nedbatchelder.com/text/names.html.
To create independent copies use the copy method
dfnan = pd.DataFrame(data)
df1 = dfnan.copy()
df2 = dfnan.copy()
df3 = dfnan.copy()
This question already has answers here:
How to change variables fed into a for loop in list form
(4 answers)
Closed 5 months ago.
I'm trying to reindex the columns in a set of dataframes inside a loop. This only seems to work outside the loop. See sample code below
import pandas as pd
data1 = [[1,2,3],[4,5,6],[7,8,9]]
data2 = [[10,11,12],[13,14,15],[16,17,18]]
data3 = [[19,20,21],[22,23,24],[25,26,27]]
index = ['a','b','c']
columns = ['d','e','f']
df1 = pd.DataFrame(data=data1,index=index,columns=columns)
df2 = pd.DataFrame(data=data2,index=index,columns=columns)
df3 = pd.DataFrame(data=data3,index=index,columns=columns)
columns2 = ['f','e','d']
for i in [df1,df2,df3]:
i = i.reindex(columns=columns2)
print(df1)
df2 = df2.reindex(columns=columns2)
print(df2)
df1 is not reindexed as desired, however if I reindex df2 outside of the loop it works. Why is that?
Thanks
Andrew
That happens for the same reason this happens:
a = 5
b = 6
for i in [a, b]:
i = 4
>>> a
5
Why? See this accepted answer.
Concerning your problem, one way to go about it is create a list of reindexed dataframes like so:
reindexed_dfs = [df.reindex(columns=columns2) for df in [df1, df2, df3]]
and then reassign df1, df2 and df3. But it's better to just keep using your newly created list anyways.
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two dataframes: df1 has data and df2 is kind of like a map for the data. (They are both the same size and are 2D).
I would like to use pandas.where (or any method that isn't too convoluted) to replace the values of df1 based of the condition of the same cell in df2.
For instance, if df2 is equal to 0, I want to set the same cell in df1 also to 0. How do I do this?
When I try the following I get an error:
df3 = df1.where(df2 == 0, other = 0)
import pandas as pd
df = pd.DataFrame()
df_1 = pd.DataFrame()
df['a'] = [1,2,3,4,5]
df_1['b'] = [5,6,7,8,0]
This will give a sample df:
Now implement a loop, using range or len(df.index)
for i in range(0,5):
df['a'][i] = np.where( df_1['b'][i] == 0, 0, df['a'][i])
Generally you shouldn't need to handle multiple dataframes separately like this; if df1, df2 have the same shape and either the same index or some common column they can be joined/merged on (e.g. say it's named 'id'), then merge them:
df = pd.merge(df1, df2, on='id')
See Pandas Merging 101
This question already has answers here:
Pandas, group by count and add count to original dataframe?
(3 answers)
Closed 3 years ago.
I have a dataframe containing a column of values (X).
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
For each row, I would like to find how many times it's value of X appears (A).
My expected output is:
create temp column with 1 and groupby and count to get your desired answer
df = pd.DataFrame({'X' : [2,3,5,2,2,3,7,2,2,7,5,2]})
df['temp'] = 1
df['count'] = df.groupby(['X'],as_index=False).transform(pd.Series.count)
del df['temp']
print(df)