List Objects into Individual CSV - python

I have a list of dataframes which I wish to convert to multiple csv.
Example:
List_Df = [df1,df2,df3,df4]
for i in List_Df:
i.to_csv("C:\\Users\\Public\\Downloads\\"+i+".csv")
Expected output: Having 4 csv files with the names df1.csv,df2.csv ...
But I am facing two problems:
First problem:
AttributeError: 'list' object has no attribute 'to_csv'
Second problem:
("C:\\Users\\Public\\Downloads\\"+ **i** +".csv") <- **i** returns the object
as it's suppose to but I wish for python to automatically take the
object_name and use it with .csv
Any help will be greatly appreciated as I am new to Python and SOF.
Thank you :)

Try this:
import pandas as pd
List_Df = [df1,df2,df3,df4]
for i,e in enumerate(List_Df):
df = pd.DataFrame(e)
df.to_csv("C:\\Users\\Public\\Downloads\\"+"df"+str(i)+".csv")

For your second problem you would have to e.g. name the dataframes first:
for j,df in enumerate(List_Df):
df.name = 'df'+str(j)
df.to_csv("C:\\Users\\Public\\Downloads\\%s.csv" %(df.name))
or even just take a string and add the index without naming the dataframes first:
for j,df in enumerate(List_Df):
name = 'df'+str(j)
df.to_csv("C:\\Users\\Public\\Downloads\\%s.csv" %(name))

Related

Pandas - Concatenating Dataframes

I have a script with if statements that has 14 possible dataframes
['result_14', 'result_13', 'result_12', 'result_11', 'result_10', 'result_9', 'result_8', 'result_7', 'result_6', 'result_5', 'result_4', 'result_3', 'result_2', 'result_1']
Not all dataframes are created every time I run the script. It is dependent on a secondary input variable. I am now attempting to concatenate dataframes but run into issue with those that do not exist.
pd.concat(([result_14, result_13, result_12, result_11, result_10, result_9, result_8, result_7, result_6, result_5, result_4, result_3, result_2, result_1]), ignore_index=True)
NameError: name 'result_13' is not defined
I have tried finding all dfs that exist in my python memory and parsing the results but this creates a list rather than a list of dataframes
alldfs = [var for var in dir() if isinstance(eval(var), pd.core.frame.DataFrame)]
SelectDFs = [s for s in alldfs if "result" in s]
SelectDFs
['result_14', 'result_15', 'result_12', 'result_11', 'result_10', 'result_9', 'result_8', 'result_7', 'result_6', 'result_5', 'result_4', 'result_3', 'result_2', 'result_1']
pd.concat(([SelectDFs]), ignore_index=True)
TypeError: cannot concatenate object of type '<class 'list'>'; only Series and DataFrame objs are valid
You can try
%who_ls DataFrame
# %whos DataFrame
In your case
l = %who_ls DataFrame
pd.concat([eval(dfn) for dfn in l if dfn.startswith('result')], ignore_index=True)
You are passing list of string and not Dataframe object.
And once you re able to get DF Object you can pass SelecteDFs without bracket.
pd.concat(SelectDFs, ignore_index=True)
Have you tried to convert them into DFs? I mean when you want to concat them, it raise an error which says your data need to be dfs rahter than lists, so have you tried to convert your lists into DFs?
this link may help you:
Convert List to Pandas Dataframe Column

Indexing by row name

Can someone please help me with this. I want to call rows by name, so I used set_index on the 1st column in the dataframe to index the rows by name instead of using integers for indexing.
# Set 'Name' column as index on a Dataframe
df1 = df1.set_index("Name", inplace = True)
df1
Output:
AttributeError: 'NoneType' object has no attribute 'set_index'
Then I run the following code:
result = df1.loc["ABC4"]
result
Output:
AttributeError: 'NoneType' object has no attribute 'loc'
I don't usually run a second code that depends on the 1st before fixing the error, but originally I run them together in one Jupyter notebook cell. Now I see that the two code cells have problems.
Please let me know where I went wrong. Thank you!
Maybe you should define your dataframe?
import pandas as pd
df1 = pd.DataFrame("here's your dataframe")
df1.set_index("Name")
or just
import pandas as pd
df1 = pd.DataFrame("here's your dataframe").set_index("Name")
df1
Your variable "df1" is not defined anywhere before doing something with it.
Try this:
# Set 'Name' column as index on a Dataframe
df1 = ''
df1 = df1.set_index("Name", inplace = True)
If its defined before, its value is NONE. So check this variable first.
The rest of the code "SHOULD" work afterwards.

Create multiple empty DataFrames named from a list using a loop

I'm trying to create multiple empty DataFrames with a for loop where each DataFrame has a unique name stored in a list. Per the sample code below, I would like three empty DataFrames, one called A[], another B[] and the last one C[]. Thank you.
import pandas as pd
report=['A','B','C']
for i in report:
report[i]=pd.DataFrame()
It would be best to use a dictionary
import pandas as pd
report=['A','B','C']
df_dict = {}
for i in report:
df_dict[i]=pd.DataFrame()
print(df_dict['A'])
print(df_dict['B'])
print(df_dict['C'])
You should use dictionnary for that:
import pandas as pd
report={'A': pd.DataFrame(),'B': pd.DataFrame(),'C': pd.DataFrame()]
if you have a list of string or character containing the name, which is I think what you are really trying to do
name_dataframe = ['A', 'B', 'C']
dict_dataframe = {}
for name in name_dataframe:
dict_dataframe[name] = pd.Dataframe()
It is not a good practise, and you should probably use a dictionary to do this, but the below code gets the work done if you still need to do it, this will create the DataFrames in the memory with the names in the list report:
for i in report:
exec(i + ' = pd.DataFrame()')
And if you want to store the empty DataFrames in a list:
df_list = []
for i in report:
exec(i + ' = pd.DataFrame() \ndf_list.append(' + i+ ')')

DataFrame object has no attribute 'name'

I currently have a list of Pandas DataFrames. I'm trying to perform an operation on each list element (i.e. each DataFrame contained in the list) and then save that DataFrame to a CSV file.
I assigned a name attribute to each DataFrame, but I realized that in some cases the program throws an error AttributeError: 'DataFrame' object has no attribute 'name'.
Here's the code that I have.
# raw_og contains the file names for each CSV file.
# df_og is the list containing the DataFrame of each file.
for idx, file in enumerate(raw_og):
df_og.append(pd.read_csv(os.path.join(data_og_dir, 'raw', file)))
df_og[idx].name = file
# I'm basically checking if the DataFrame is in reverse-chronological order using the
# check_reverse function. If it is then I simply reverse the order and save the file.
for df in df_og:
if (check_reverse(df)):
df = df[::-1]
df.to_csv(os.path.join(data_og_dir, 'raw_new', df.name), index=False)
else:
continue
The program is throwing an error in the second for loop where I used df.name.
This is especially strange because when I run print(df.name) it prints out the file name. Would anybody happen to know what I'm doing wrong?
Thank you.
the solution is to use a loc to set the values, rather than creating a copy.
creating a copy of df loses the name:
df = df[::-1] # creates a copy
setting the value 'keeps' the original object intact, along with name
df.loc[:] = df[:, ::-1] # reversal maintaining the original object
Example code that reverses values along the column axis:
df = pd.DataFrame([[6,10]], columns=['a','b'])
df.name='t'
print(df.name)
print(df)
df.iloc[:] = df.iloc[:,::-1]
print(df)
print(df.name)
outputs:
t
a b
0 6 10
a b
0 10 6
t
A workaround is to set a columns.name and use it when needed.
Example:
df = pd.DataFrame()
df.columns.name = 'name'
print(df.columns.name)
name
I suspect it's the reversal that loses the custom .name attribute.
In [11]: df = pd.DataFrame()
In [12]: df.name = 'empty'
In [13]: df.name
Out[13]: 'empty'
In [14]: df[::-1].name
AttributeError: 'DataFrame' object has no attribute 'name'
You'll be better off storing a dict of dataframes rather than using .name:
df_og = {file: pd.read_csv(os.path.join(data_og_dir, 'raw', fn) for fn in raw_og}
Then you could iterate through this and reverse the values that need reversing...
for fn, df in df_og.items():
if (check_reverse(df)):
df = df[::-1]
df.to_csv(os.path.join(data_og_dir, 'raw_new', fn), index=False)

loop over names of several pandas DataFrames

I have a couple of DataFrames from different files, which are named for example df001, df002 and so on.
Now I want to loop over those DataFrames to execute similar tasks. But I can't figure out how to address them.
This failed (AttributeError: 'str' object has no attribute 'iloc'):
names = ['df001', 'df002']
for name in names:
name.iloc[1,1]
Can you try this?
names = [df001, df002]
for name in names:
name.iloc[1,1]
If you use the string name for purposes other than looping, you can always store the dataframes in a dictionary:
d = {'df001': df001, 'df002': df002}
for name in d:
d[name].iloc[1, 1]

Categories

Resources