Related
I have a dataframe, where in one column (we'll call it info) all the cells/rows contain another dataframe inside. I want to loop through all the rows in this column and literally stack the nested dataframes on top of each other, because they all have the same columns
How would I go about this?
You could try as follows:
import pandas as pd
length=5
# some dfs
nested_dfs = [pd.DataFrame({'a': [*range(length)],
'b': [*range(length)]}) for x in range(length)]
print(nested_dfs[0])
a b
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
# df with nested_dfs in info
df = pd.DataFrame({'info_col': nested_dfs})
# code to be implemented
lst_dfs = df['info_col'].values.tolist()
df_final = pd.concat(lst_dfs,axis=0, ignore_index=True)
df_final.tail()
a b
20 0 0
21 1 1
22 2 2
23 3 3
24 4 4
This method should be a bit faster than the solution offered by nandoquintana, which also works.
Incidentally, it is ill advised to name a df column info. This is because df.info is actually a function. E.g., normally df['col_name'].values.tolist() can also be written as df.col_name.values.tolist(). However, if you try this with df.info.values.tolist(), you will run into an error:
AttributeError: 'function' object has no attribute 'values'
You also run the risk of overwriting the function if you start assigning values to your column on top of doing something which you probably don't want to do. E.g.:
print(type(df.info))
<class 'method'>
df.info=1
# column is unaffected, you just create an int variable
print(type(df.info))
<class 'int'>
# but:
df['info']=1
# your column now has all 1's
print(type(df['info']))
<class 'pandas.core.series.Series'>
This is the solution that I came up with, although it's not the fastest which is why I am still leaving the question unanswered
df1 = pd.DataFrame()
for frame in df['Info'].tolist():
df1 = pd.concat([df1, frame], axis=0).reset_index(drop=True)
Our dataframe has three columns (col1, col2 and info).
In info, each row has a nested df as value.
import pandas as pd
nested_d1 = {'coln1': [11, 12], 'coln2': [13, 14]}
nested_df1 = pd.DataFrame(data=nested_d1)
nested_d2 = {'coln1': [15, 16], 'coln2': [17, 18]}
nested_df2 = pd.DataFrame(data=nested_d2)
d = {'col1': [1, 2], 'col2': [3, 4], 'info': [nested_df1, nested_df2]}
df = pd.DataFrame(data=d)
We could combine all nested dfs rows appending them to a list (as nested dfs schema is constant) and concatenating them later.
nested_dfs = []
for index, row in df.iterrows():
nested_dfs.append(row['info'])
result = pd.concat(nested_dfs, sort=False).reset_index(drop=True)
print(result)
This would be the result:
coln1 coln2
0 11 13
1 12 14
2 15 17
3 16 18
I have trouble with some pandas dataframes.
Its very simple, I have 4 columns, and I want to reshape them in 2...
For 'practical' reasons, I don't want to use 'header names', but I need to use 'index' (for the columns header names).
I have :
df = pd.DataFrame({'a': [1,2,3],'b': [4,5,6],'c': [7,8,9],'d':[10,11,12]})
I want as a result :
df_res = pd.DataFrame({'NewName1': [1,2,3,4,5,6],'NewName2': [7,8,9,10,11,12]})
(in fact NewName1 doesn't matter, it can stay a or whatever the name...)
I tried with for loops, append, concat, but couldn't figured it out...
Any suggestions ?
Thanks for your help !
Bina
You can extract the desired columns and create a new pandas.DataFrame like so:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1,2,3],'b': [4,5,6],'c': [7,8,9],'d':[10,11,12]})
first_col = np.concatenate((df.a.to_numpy(), df.b.to_numpy()))
second_col = np.concatenate((df.c.to_numpy(), df.d.to_numpy()))
df2 = pd.DataFrame({"NewName1": first_col, "NewName2": second_col})
>>> df2
NewName1 NewName2
0 1 7
1 2 8
2 3 9
3 4 10
4 5 11
5 6 12
This is probably not the most elegant solution, but I would isolate the two dataframes and then concatenate them. I needed to rename the column axis so that the four columns could be aligned correctly.
import pandas as pd
df = pd.DataFrame({'a': [1,2,3],'b': [4,5,6],'c': [7,8,9],'d':[10,11,12]})
af = df[['a', 'c']]
bf = df[['b', 'd']]
frames = (
af.rename({'a': 'NewName1', 'c': 'NewName2'}, axis=1),
bf.rename({'b': 'NewName1', 'd': 'NewName2'}, axis=1)
)
out = pd.concat(frames)
[EDIT] Replying to the comment.
I'm not that familiar with indexing but this might be one solution. You could avoid column names by using .iloc. Replace the af, and bf frames above with these lines.
af = df.iloc[:, ::2]
bf = df.iloc[:, 1::2]
I want to save a pandas pivot table for human reading, but DataFrame.to_csv doesn't include the DataFrame.columns.name. How can I do that?
Example:
For the following pivot table:
>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [6, 7, 8]])
>>> df.columns = list("ABC")
>>> df.index = list("XY")
>>> df
A B C
X 1 2 3
Y 6 7 8
>>> p = pd.pivot_table(data=df, index="A", columns="B", values="C")
When viewing the pivot table, we have both the index name ("A"), and the columns name ("B").
>>> p
B 2 7
A
1 3.0 NaN
6 NaN 8.0
But when exporting as a csv we lose the columns name:
>>> p.to_csv("temp.csv")
===temp.csv===
A,2,7
1,3.0,
6,,8.0
How can I get some kind of human-readable output format which contains the whole of the pivot table, including the .columns.name ("B")?
Something like this would be fine:
B,2,7
A,,
1,3.0,
6,,8.0
Yes, it is possible by append helper DataFrame, but reading file is a bit complicated:
p1 = pd.DataFrame(columns=p.columns, index=[p.index.name]).append(p)
p1.to_csv('temp.csv',index_label=p.columns.name)
B,2,7
A,,
1,3.0,
6,,8.0
#set first column to index
df = pd.read_csv('temp.csv', index_col=0)
#set columns and index names
df.columns.name = df.index.name
df.index.name = df.index[0]
#remove first row of data
df = df.iloc[1:]
print (df)
B 2 7
A
1 3.0 NaN
6 NaN 8.0
I have a object of which type is Panda and the print(object) is giving below output
print(type(recomen_total))
print(recomen_total)
Output is
<class 'pandas.core.frame.Pandas'>
Pandas(Index=12, instrument_1='XXXXXX', instrument_2='XXXX', trade_strategy='XXX', earliest_timestamp='2016-08-02T10:00:00+0530', latest_timestamp='2016-08-02T10:00:00+0530', xy_signal_count=1)
I want to convert this obejct in pd.DataFrame, how i can do it ?
i tried pd.DataFrame(object), from_dict also , they are throwing error
Interestingly, it will not convert to a dataframe directly but to a series. Once this is converted to a series use the to_frame method of series to convert it to a DataFrame
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
for row in df.itertuples():
print(pd.Series(row).to_frame())
Hope this helps!!
EDIT
In case you want to save the column names use the _asdict() method like this:
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
for row in df.itertuples():
d = dict(row._asdict())
print(pd.Series(d).to_frame())
Output:
0
Index a
col1 1
col2 0.1
0
Index b
col1 2
col2 0.2
To create new DataFrame from itertuples namedtuple you can use list() or Series too:
import pandas as pd
# source DataFrame
df = pd.DataFrame({'a': [1,2], 'b':[3,4]})
# empty DataFrame
df_new_fromAppend = pd.DataFrame(columns=['x','y'], data=None)
for r in df.itertuples():
# create new DataFrame from itertuples() via list() ([1:] for skipping the index):
df_new_fromList = pd.DataFrame([list(r)[1:]], columns=['c','d'])
# or create new DataFrame from itertuples() via Series (drop(0) to remove index, T to transpose column to row)
df_new_fromSeries = pd.DataFrame(pd.Series(r).drop(0)).T
# or use append() to insert row into existing DataFrame ([1:] for skipping the index):
df_new_fromAppend.loc[df_new_fromAppend.shape[0]] = list(r)[1:]
print('df_new_fromList:')
print(df_new_fromList, '\n')
print('df_new_fromSeries:')
print(df_new_fromSeries, '\n')
print('df_new_fromAppend:')
print(df_new_fromAppend, '\n')
Output:
df_new_fromList:
c d
0 2 4
df_new_fromSeries:
1 2
0 2 4
df_new_fromAppend:
x y
0 1 3
1 2 4
To omit index, use param index=False (but I mostly need index for the iteration)
for r in df.itertuples(index=False):
# the [1:] needn't be used, for example:
df_new_fromAppend.loc[df_new_fromAppend.shape[0]] = list(r)
The following works for me:
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, index=['a', 'b'])
for row in df.itertuples():
row_as_df = pd.DataFrame.from_records([row], columns=row._fields)
print(row_as_df)
The result is:
Index col1 col2
0 a 1 0.1
Index col1 col2
0 b 2 0.2
Sadly, AFAIU, there's no simple way to keep column names, without explicitly utilizing "protected attributes" such as _fields.
With some tweaks in #Igor's answer
I concluded with this satisfactory code which preserved column names and used as less of pandas code as possible.
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]})
# Or initialize another dataframe above
# Get list of column names
column_names = df.columns.values.tolist()
filtered_rows = []
for row in df.itertuples(index=False):
# Some code logic to filter rows
filtered_rows.append(row)
# Convert pandas.core.frame.Pandas to pandas.core.frame.Dataframe
# Combine filtered rows into a single dataframe
concatinated_df = pd.DataFrame.from_records(filtered_rows, columns=column_names)
concatinated_df.to_csv("path_to_csv", index=False)
The result is a csv containing:
col1 col2
1 0.1
2 0.2
To convert a list of objects returned by Pandas .itertuples to a DataFrame, while preserving the column names:
# Example source DF
data = [['cheetah', 120], ['human', 44.72], ['dragonfly', 54]]
source_df = pd.DataFrame(data, columns=['animal', 'top_speed'])
animal top_speed
0 cheetah 120.00
1 human 44.72
2 dragonfly 54.00
Since Pandas does not recommended building DataFrames by adding single rows in a for loop, we will iterate and build the DataFrame at the end:
WOW_THAT_IS_FAST = 50
list_ = list()
for animal in source_df.itertuples(index=False, name='animal'):
if animal.top_speed > 50:
list_.append(animal)
Now build the DF in a single command and without manually recreating the column names.
filtered_df = pd.DataFrame(list_)
animal top_speed
0 cheetah 120.00
2 dragonfly 54.00
Is it possible to append to an empty data frame that doesn't contain any indices or columns?
I have tried to do this, but keep getting an empty dataframe at the end.
e.g.
import pandas as pd
df = pd.DataFrame()
data = ['some kind of data here' --> I have checked the type already, and it is a dataframe]
df.append(data)
The result looks like this:
Empty DataFrame
Columns: []
Index: []
This should work:
>>> df = pd.DataFrame()
>>> data = pd.DataFrame({"A": range(3)})
>>> df = df.append(data)
>>> df
A
0 0
1 1
2 2
Since the append doesn't happen in-place, so you'll have to store the output if you want it:
>>> df = pd.DataFrame()
>>> data = pd.DataFrame({"A": range(3)})
>>> df.append(data) # without storing
>>> df
Empty DataFrame
Columns: []
Index: []
>>> df = df.append(data)
>>> df
A
0 0
1 1
2 2
And if you want to add a row, you can use a dictionary:
df = pd.DataFrame()
df = df.append({'name': 'Zed', 'age': 9, 'height': 2}, ignore_index=True)
which gives you:
age height name
0 9 2 Zed
You can concat the data in this way:
InfoDF = pd.DataFrame()
tempDF = pd.DataFrame(rows,columns=['id','min_date'])
InfoDF = pd.concat([InfoDF,tempDF])
The answers are very useful, but since pandas.DataFrame.append was deprecated (as already mentioned by various users), and the answers using pandas.concat are not "Runnable Code Snippets" I would like to add the following snippet:
import pandas as pd
df = pd.DataFrame(columns =['name','age'])
row_to_append = pd.DataFrame([{'name':"Alice", 'age':"25"},{'name':"Bob", 'age':"32"}])
df = pd.concat([df,row_to_append])
So df is now:
name age
0 Alice 25
1 Bob 32
pandas.DataFrame.append Deprecated since version 1.4.0: Use concat() instead.
Therefore:
df = pd.DataFrame() # empty dataframe
df2 = pd..DataFrame(...) # some dataframe with data
df = pd.concat([df, df2])