Python, unlist array in DataFrame - python

I have a dataframe df with 1 column
inside each row of this column I have a np.array object with 3 elements
Is there a way to unlist this array and create a DataFrame with 3 columns for easy manipulation.

Your question was already asked and answered here:
enter link description here
Anyway, let's say your dataframe looks like this
import pandas as pd, numpy as np
df = pd.DataFrame({'A': [ np.array([1, 2, 3]), np.array([4, 5, 6]), np.array([7, 8, 9])]})
df
Create New dataframe df1, converting column A of original df as a list
df1 = pd.DataFrame(columns = ['A1', 'A2', 'A3'], data = list(df['A']))
df1

Related

How to create a dataframe from a single row of another dataframe?

I'd like to create a dataframe from a select row of another dataframe, e.g.,
import pandas as pd
df = pd.DataFrame({"col1": [1, 2], "col2": [0, 1]})
# I want to create df1 such that df1 is a dataframe from the first row of df
df1 = df.iloc[0] # produces a pandas Series which is not what i want
This one seems to work but seems to involve a lot of (unnecessary?) ops?
df1 = pd.DataFrame(df.iloc[0]).T

Mapping a multiindex dataframe to another using row ID

I have two dataframes of different shape
The 'ANTENNA1' and 'ANTENNA2' in the bigger dataframe correspond to the ID columns in the smaller dataframe. I want to create merge the smaller dataframe to the bigger one so that the bigger dataframe will have '(POSITION, col1)', '(POSITION, col2)', '(POSITION, col3)' according to ANTENNA1 == ID
Edit: I tried with pd.merge but it is changing the original dataframe column values
Original:
df = pd.merge(df_main, df_sub, left_on='ANTENNA1', right_on ='id', how = 'left')
Result:
I want to keep the original dataframe columns as it is.
Assuming your first dataframe (with positions) is called df1, and the second is called df2, with your loaded data, you could just use pandas.DataFrame.merge: ( -> pd.merge(...) )
df = pd.merge(df1,df2,left_on='id', right_on='ANTENNA1')
Than you might select the df on your needed columns(col1,col2,..) to get the desired result df[["col1","col2",..]].
simple example:
# import pandas as pd
import pandas as pd
# creating dataframes as df1 and df2
df1 = pd.DataFrame({'ID': [1, 2, 3, 5, 7, 8],
'Name': ['Sam', 'John', 'Bridge',
'Edge', 'Joe', 'Hope']})
df2 = pd.DataFrame({'id': [1, 2, 4, 5, 6, 8, 9],
'Marks': [67, 92, 75, 83, 69, 56, 81]})
# merging df1 and df2 by ID
# i.e. the rows with common ID's get
# merged i.e. {1,2,5,8}
df = pd.merge(df1, df2, left_on="ID", right_on="id")
print(df)

How to match two dataframe columns and return matching values on a separate column in Python?

I am a newbie in Python and I need some help.
I have 2 data frame containing a list of users with a list of recommended friends from two tables.
I would like to achieve the following:
Sort the list of recommended friends by ascending order from 2 data frame for each user.
Match the list of matching recommended friends from dataframe2 to dataframe1 for each user. Return only the matched values.
I have tried my code but it didn't achieve the desired results.
import pandas as pd
import numpy as np
///load data from csv
df1 = pd.read_csv('CommonFriend.csv')
df2 = pd.read_csv('InfluenceFriend.csv')
print(df1)
print(df2)
///convert values to list to sort by recommended friends ID
df1.values.tolist()
df1.sort_values(by=['User','RecommendedFriends'])
df2.values.tolist()
df2.sort_values(by=['User','RecommendedFriends'])
///obtain only matched values from list of recommended friends from df1 and df2.
df3 = df1.merge(df2, how='inner', on='User')
/// return dataframe with user, matched recommendedfriends ID
print(df3)
Problem encountered:
The elements in each list are not sorted in ascending order.
While matching each elements through pandas merge with "inner-join". It seems that it is not able to read certain elements.
Updates: Below are the data frame header which cause some error in the code.
This should be the solution to your problem. You might have to change a few variables but you get the idea: you merge the two dataframes on the users, so you get a dataframe with both lists for each user. You then take the intersection of both lists and store that in a new column.
df1 = pd.DataFrame(np.array([[1, [5, 7, 10, 11]], [2, [3, 8, 5, 12]]]),
columns=['User', 'Recommended friends'])
df2 = pd.DataFrame(np.array([[1, [5, 7, 9]], [2, [4, 7, 10]], [3, [15, 7, 9]]]),
columns=['User', 'Recommended friends'])
df3 = pd.merge(df1, df2, on='User')
df3['intersection'] = [list(set(a).intersection(set(b))) for a, b in zip(df3['Recommended friends_x'], df3['Recommended friends_y'])]
Output df3:
User Recommended friends_x Recommended friends_y intersection
0 1 [5, 7, 10, 11] [5, 7, 9] [5, 7]
1 2 [3, 8, 5, 12] [4, 7, 10] []
I do not quite understand what exactly your problem is, but in general you will have to assign the dataframe to itself again.
import pandas as pd
import numpy as np
df1 = pd.read_csv('CommonFriend.csv')
df2 = pd.read_csv('InfluenceFriend.csv')
print(df1)
print(df2)
df1 = df1.values.tolist()
df1 = df1.sort_values(by=['User','RecommendedFriends'])
df2 = df2.values.tolist()
df2 = df2.sort_values(by=['User','RecommendedFriends'])
df3 = df1.merge(df2, how='inner', on='User')
print(df3)

Picking out certain indexes from a pandas data frame

I have a pandas data frame with hundreds of entries and an array of random entries in the array. For example:
import pandas as pd
list1 = [13,2,32,34,15,7,19]
list2 = [15,65,95,9,90,88,10]
df1 = pd.DataFrame(list1)
df2 = pd.DataFrame(list2)
cols = [df1, df2]
df1.loc[:, cols]
and I have another array called
M =[1, 2, 5, 6, 9]
where these are the indexes of the pandas data frame I want, is there a way to create a new table that picks out only the rows that match the index given by the array M?
import pandas as pd
list1 = [13,2,32,34,15,7,19]
df1 = pd.DataFrame(list1)
M =[1, 2, 5, 6]
df1[df1.index.isin(M)]
Note that in your problem statement, cols is a list of dataframes, not a two-column dataframe. I am not sure if that was not clear from your code and question.

Assign constant numpy array value to pandas dataframe column

I would like to assign constant numpy array value to pandas dataframe column.
Here is what I tried:
import pandas as pd
import numpy as np
my_df = pd.DataFrame({'col_1': [1,2,3], 'col_2': [4,5,6]})
my_df['new'] = np.array([]) # did not work
my_df['new'] = np.array([])*len(df) # did not work
Here is what worked:
my_df['new'] = my_df['new'].apply(lambda x: np.array([]))
I am curious why it works with simple scalar, but does not work with numpy array. Is there simpler way to assign numpy array value?
Your "new" column will contains arrays, so it must be a object type column.
The simplest way to initialize it is :
my_df = pd.DataFrame({'col_1': [1,2,3], 'col_2': [4,5,6]})
my_df['new']=None
You can then fill it as you want. For example :
for index,(a,b,_) in my_df.iterrows():
my_df.loc[index,'new']=np.arange(a,b)
#
# col_1 col_2 new
# 0 1 4 [1, 2, 3]
# 1 2 5 [2, 3, 4]
# 2 3 6 [3, 4, 5]

Categories

Resources