I want to add an array to an existing pandas dataframe as row. Below is my code :
import pandas as pd
import numpy as np
data = [['tom', 10]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
print(df)
Y = [10, 100]
df.loc[0] = list(Y)
print(df)
Basically I want to add Y to df as row without disturbing existing rows. I also want to add column names of final df as 'Y1' and 'Y2'
Clearly with above code existing information of df appears to be replaced with Y.
Could you please help me with right code?
Use loc adding the value by exiting row and additional columns
df.loc[0,['Y1','Y2']] =Y
df
Name Age Y1 Y2
0 tom 10 10.0 100.0
Related
I want to replace a row in a csv file with a variable. The row itself also has to be a variable. The following code is an example:
import pandas as pd
# sample dataframe
df = pd.DataFrame({'A': ['a','b','c'], 'B':['b','c','d']})
print("Original DataFrame:\n", df)
x = 1
y = 12698
df_rep = df.replace([int(x),1], y)
print("\nAfter replacing:\n", df_rep)
This can be done using pandas indexing eg df.iloc[row_num, col_num].
#update df
df.iloc[x,1]=y
#print df
print(df)
A B
0 a b
1 b 12698
2 c d
I have 2 similar dataframes that I would like to compare each row of the 1st dataframe with the 2nd based on condition. The dataframe looks like this:
Based on this comparison I would like to generate a similar dataframe with a new column 'change' containing the changes based on the following conditions:
if the rows have similar values then 'change'='identical' otherwise if the date changed then 'change'='new date'.
Here is an easy workaround.
# Import pandas library
import pandas as pd
# One dataframe
data = [['foo', 10], ['bar', 15], ['foobar', 14]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
# Another similar dataframe but foo age is 13 this time
data = [['foo', 13], ['bar', 15], ['foobar', 14]]
df2 = pd.DataFrame(data, columns = ['Name', 'Age'])
df3 = df2.copy()
for index, row in df.iterrows():
if df.at[index,'Age'] != df2.at[index,'Age']:
df3.at[index,'Change']="Changed"
df3["Change"].fillna("Not Changed",inplace = True)
print(df3)
Here is the output
Name Age Change
0 foo 13 Changed
1 bar 15 Not Changed
2 foobar 14 Not Changed
I have a pandas DataFrame, with one of its columns being column of lists. I want to extract rows that have a specific element in corresponding list. (For example, DF is dataframe and DF['a'] is Series of lists. Then I want to find rows where there is an X element in corresponding DF['a'] list). How can I do it?
Is this what you mean?
import pandas as pd
d = ({
'a' : ['X','Y','Z','X','Y','Z','X'],
})
df = pd.DataFrame(data=d)
df = df[df.a == 'X']
print(df)
a
0 X
3 X
6 X
Try this code please :
import numpy as np
import pandas as pd
df1 = pd.DataFrame(np.random.randn(3,4),columns=['a','b','c','d'])
df2 = pd.DataFrame(np.random.randn(2,4),columns=['a','b','c','d'])
newdf = pd.concat([df1,df2] , axis = 0)
print type(newdf.loc[0])
The result is 'pandas.core.frame.DataFrame';
but I think it's should be a 'Series'.
Is that a bug or I am wrong?
It should be a DataFrame, as after concatenating you have two rows with index 0. newdf.loc[0] returns a 2x4 DataFrame
Specifically, in my case it returns a DataFrame like this:
Out[50]:
a b c d
0 1.302054 -0.274331 -1.131744 -1.736018
0 0.811842 -1.225765 1.258529 0.647977
To get series you can use ignore_index parameter in pd.concat - then the index values will be from 0 to 4, not 0,1,2,0,1:
newdf = pd.concat([df1,df2] , axis = 0, ignore_index=True)
I have a object of which type is Panda and the print(object) is giving below output
print(type(recomen_total))
print(recomen_total)
Output is
<class 'pandas.core.frame.Pandas'>
Pandas(Index=12, instrument_1='XXXXXX', instrument_2='XXXX', trade_strategy='XXX', earliest_timestamp='2016-08-02T10:00:00+0530', latest_timestamp='2016-08-02T10:00:00+0530', xy_signal_count=1)
I want to convert this obejct in pd.DataFrame, how i can do it ?
i tried pd.DataFrame(object), from_dict also , they are throwing error
Interestingly, it will not convert to a dataframe directly but to a series. Once this is converted to a series use the to_frame method of series to convert it to a DataFrame
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
for row in df.itertuples():
print(pd.Series(row).to_frame())
Hope this helps!!
EDIT
In case you want to save the column names use the _asdict() method like this:
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]},
index=['a', 'b'])
for row in df.itertuples():
d = dict(row._asdict())
print(pd.Series(d).to_frame())
Output:
0
Index a
col1 1
col2 0.1
0
Index b
col1 2
col2 0.2
To create new DataFrame from itertuples namedtuple you can use list() or Series too:
import pandas as pd
# source DataFrame
df = pd.DataFrame({'a': [1,2], 'b':[3,4]})
# empty DataFrame
df_new_fromAppend = pd.DataFrame(columns=['x','y'], data=None)
for r in df.itertuples():
# create new DataFrame from itertuples() via list() ([1:] for skipping the index):
df_new_fromList = pd.DataFrame([list(r)[1:]], columns=['c','d'])
# or create new DataFrame from itertuples() via Series (drop(0) to remove index, T to transpose column to row)
df_new_fromSeries = pd.DataFrame(pd.Series(r).drop(0)).T
# or use append() to insert row into existing DataFrame ([1:] for skipping the index):
df_new_fromAppend.loc[df_new_fromAppend.shape[0]] = list(r)[1:]
print('df_new_fromList:')
print(df_new_fromList, '\n')
print('df_new_fromSeries:')
print(df_new_fromSeries, '\n')
print('df_new_fromAppend:')
print(df_new_fromAppend, '\n')
Output:
df_new_fromList:
c d
0 2 4
df_new_fromSeries:
1 2
0 2 4
df_new_fromAppend:
x y
0 1 3
1 2 4
To omit index, use param index=False (but I mostly need index for the iteration)
for r in df.itertuples(index=False):
# the [1:] needn't be used, for example:
df_new_fromAppend.loc[df_new_fromAppend.shape[0]] = list(r)
The following works for me:
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]}, index=['a', 'b'])
for row in df.itertuples():
row_as_df = pd.DataFrame.from_records([row], columns=row._fields)
print(row_as_df)
The result is:
Index col1 col2
0 a 1 0.1
Index col1 col2
0 b 2 0.2
Sadly, AFAIU, there's no simple way to keep column names, without explicitly utilizing "protected attributes" such as _fields.
With some tweaks in #Igor's answer
I concluded with this satisfactory code which preserved column names and used as less of pandas code as possible.
import pandas as pd
df = pd.DataFrame({'col1': [1, 2], 'col2': [0.1, 0.2]})
# Or initialize another dataframe above
# Get list of column names
column_names = df.columns.values.tolist()
filtered_rows = []
for row in df.itertuples(index=False):
# Some code logic to filter rows
filtered_rows.append(row)
# Convert pandas.core.frame.Pandas to pandas.core.frame.Dataframe
# Combine filtered rows into a single dataframe
concatinated_df = pd.DataFrame.from_records(filtered_rows, columns=column_names)
concatinated_df.to_csv("path_to_csv", index=False)
The result is a csv containing:
col1 col2
1 0.1
2 0.2
To convert a list of objects returned by Pandas .itertuples to a DataFrame, while preserving the column names:
# Example source DF
data = [['cheetah', 120], ['human', 44.72], ['dragonfly', 54]]
source_df = pd.DataFrame(data, columns=['animal', 'top_speed'])
animal top_speed
0 cheetah 120.00
1 human 44.72
2 dragonfly 54.00
Since Pandas does not recommended building DataFrames by adding single rows in a for loop, we will iterate and build the DataFrame at the end:
WOW_THAT_IS_FAST = 50
list_ = list()
for animal in source_df.itertuples(index=False, name='animal'):
if animal.top_speed > 50:
list_.append(animal)
Now build the DF in a single command and without manually recreating the column names.
filtered_df = pd.DataFrame(list_)
animal top_speed
0 cheetah 120.00
2 dragonfly 54.00