Building another dataframe using minimun values of the columns using Pandas - python

I have a probleme where I have a pandas DataFrame with name df_x where my index is the name of persons and my columns are the name of products. The values are the distance between these persons to the products
I want to build another table containing the columns of df_x and as values the name of the person that have the minimun distance to this product.
Is there a simple way to do this using pandas or np? Do I need to use for loop?
Example:
(index) Banana Apple
Mike 7 2
Kevin 2 4
James 3 6
so the final table should be
(index) Banana Apple
Name Kevin Mike

IIUC, DataFrame.idxmax
df_x.idxmax().to_frame('Name').T
Output
Banana Apple
Name Mike James

Related

Pandas return column data as list without duplicates

This is just an oversimplification but I have this large categorical data.
Name Age Gender
John 12 Male
Ana 24 Female
Dave 16 Female
Cynthia 17 Non-Binary
Wayne 26 Male
Hebrew 29 Non-Binary
Suppose that it is assigned as df and I want it to return as a list with non-duplicate values:
'Male','Female','Non-Binary'
I tried it with this code, but this returns the gender with duplicates
list(df['Gender'])
How can I code it in pandas so that it can return values without duplicates?
In these cases you have to remember that df["Gender"] is a Pandas Series so you could use .drop_duplicates() to retrieve another Pandas Series with the duplicated values removed or use .unique() to retrieve a Numpy Array containing the unique values.
>> df["Gender"].drop_duplicates()
0 Male
1 Female
3 Non-Binary
4 Male
Name: Gender, dtype: object
>> df["Gender"].unique()
['Male ' 'Female' 'Non-Binary' 'Male']

Compare two data-frames with different column names and update first data-frame with the column from second data-frame

I am working on two data-frames which have different column names and dimensions.
First data-frame "df1" contains single column "name" that has names need to be located in second data-frame. If matched, value from df2 first column df2[0] needs to be returned and added in the result_df
Second data-frame "df2" has multiple columns and no header. This contains all the possible diminutive names and full names. Any of the column can have the "name" that needs to be matched
Goal: Locate the name in "df1" in "df2" and if it is matched, return the value from first column of the df2 and add in the respective row of df1
df1
name
ab
alex
bob
robert
bill
df2
0
1
2
3
abram
ab
robert
rob
bob
robbie
alexander
alex
al
william
bill
result_df
name
matched_name
ab
abram
alex
alexander
bob
robert
robert
robert
bill
william
The code i have written so far is giving error. I need to write it as an efficient code as it will be checking millions of entries in df1 with df2:
'''
result_df = process_name(df1, df2)
def process_name(df1, df2):
for elem in df2.values:
if elem in df1['name']:
df1["matched_name"] = df2[0]
'''
Try via concat(),merge(),drop() and rename() and reset_index() method:
df=(pd.concat((df1.merge(df2,left_on='name',right_on=x) for x in df2.columns))
.drop(['1','2','3'],1)
.rename(columns={'0':'matched_name'})
.reset_index(drop=True))
Output of df:
name matched_name
0 robert robert
1 ab abram
2 alex alexander
3 bill william
4 bob robert

How to take only index 0 of a list of a pandas dataframe and create a new dataframe?

In resume:
i have a Dataframe like this in python pandas:
Code Name
13 [Robert, RoBert, robert, robert man]
2 [Barbie, BarBie, barbie, barbie womam]
5 [ShibA, Shiba, Shibba, shiba dog]
100 [HusKYE, huskye, Huskye, huskye dog]
I want to transform it into this:
Code Name
13 Robert
2 Barbie
5 ShibA
100 HusKYE
How can i do it?
You can use pandas.Series.apply. df['First_Name'] = df['Name'].apply(lambda names: names[0])

Add data from multiple dataframe by its index using pandas [duplicate]

This question already has answers here:
replace column values in one dataframe by values of another dataframe
(5 answers)
Closed 5 years ago.
So i got my dataframe (df1):
Number Name Gender Hobby
122 John Male -
123 Patrick Male -
124 Rudy Male -
I want to add data to hobby based on number column. Assuming i've got my list of hobby based on its number on different dataframe. Like Example (df2):
Number Hobby
124 Soccer
... ...
... ...
and df3 :
Number Hobby
122 Basketball
... ...
... ...
How can i achieve this dataframe:
Number Name Gender Hobby
122 John Male Basketball
123 Patrick Male -
124 Rudy Male Soccer
So far i've already tried this following solutions :
Select rows from a DataFrame based on values in a column in pandas
but its only selecting some data. How can i update the 'Hobby' column ?
Thanks in advance.
You can use map, merge and join will also achieve it
df['Hobby']=df.Number.map(df1.set_index('Number').Hobby)
df
Out[155]:
Number Name Gender Hobby
0 122 John Male NaN
1 123 Patrick Male NaN
2 124 Rudy Male Soccer

Fill Missing Dates in DataFrame with Duplicate Dates in Groupby

I am trying to get a daily status count from the following DataFrame (it's a subset, the real data set is ~14k jobs with overlapping dates, only one status at any given time within a job):
Job Status User
Date / Time
1/24/2011 10:58:04 1 A Ted
1/24/2011 10:59:20 1 C Bill
2/11/2011 6:53:14 1 A Ted
2/11/2011 6:53:23 1 B Max
2/15/2011 9:43:13 1 C Bill
2/21/2011 15:24:42 1 F Jim
3/2/2011 15:55:22 1 G Phil Jr.
3/4/2011 14:57:45 1 H Ted
3/7/2011 14:11:02 1 I Jim
3/9/2011 9:57:34 1 J Tim
8/18/2014 11:59:35 2 A Ted
8/18/2014 13:56:21 2 F Bill
5/21/2015 9:30:30 2 G Jim
6/5/2015 13:17:54 2 H Jim
6/5/2015 14:40:38 2 I Ted
6/9/2015 10:39:15 2 J Tom
1/16/2015 7:45:58 3 A Phil Jr.
1/16/2015 7:48:23 3 C Jim
3/6/2015 14:09:42 3 A Bill
3/11/2015 11:16:04 3 K Jim
My initial thought (from the following link) was to groupby the job column, fill in the missing dates for each group and then ffill the statuses down.
Pandas reindex dates in Groupby
I was able to make this work...kinda...if two statuses occurred on the same date, one would not be included in output and consequently some statuses were missing.
I then found the following, it supposedly handles the duplicate issue, but I am unable to get it to work with my data.
Efficiently re-indexing one level with "forward-fill" in a multi-index dataframe
Am I on the right path thinking that filling in the missing dates and then ffill down the statuses is the correct way to ultimately capture daily counts of individual statuses? Is there another method that might better use pandas features that I'm missing?

Categories

Resources