I have a dataframe that looks like below
Date Name
01-2021 Mark 714.53
Chris 112681.49
Ashley 3127.07
Brad 16875.00
Michelle 429520.04
...
12-2021 Mark 429520.04
Chris 975261.29
Ashley 377449.79
Brad 53391.73
Michelle 4286.00
But I need to transpose it like below:
Name 01-2021 12-2021
Mark 714.53 429520.04
Chris 112681.49 975261.29
Ashley 3127.07 377449.79
Brad 16875.00 53391.73
Michelle 429520.04 4286.00
Does anyone have a solution please.
pd.pivot
# the last column is assumed as 'amt'
df.pivot(index='Name', columns='Date', values='amt').reset_index().rename_axis(columns=None)
Name 01-2021 12-2021
0 Ashley 3127.07 377449.79
1 Brad 16875.00 53391.73
2 Chris 112681.49 975261.29
3 Mark 714.53 429520.04
4 Michelle 429520.04 4286.00
Related
Let's say I have a pandas dataframe that looks like this:
import pandas as pd
data = {'name': ['Tom, Jeffrey, Henry', 'Nick, James', 'Chris', 'David, Oscar']}
df = pd.DataFrame(data)
df
name
0 Tom, Jeffrey, Henry
1 Nick, James
2 Chris
3 David, Oscar
I know I can split the names into separate columns using the comma as separator, like so:
df[["name1", "name2", "name3"]] = df["name"].str.split(", ", expand=True)
df
name name1 name2 name3
0 Tom, Jeffrey, Henry Tom Jeffrey Henry
1 Nick, James Nick James None
2 Chris Chris None None
3 David, Oscar David Oscar None
However, if the name column would have a row that contains 4 names, like below, the code above will yield a ValueError: Columns must be same length as key
data = {'name': ['Tom, Jeffrey, Henry', 'Nick, James', 'Chris', 'David, Oscar', 'Jim, Jones, William, Oliver']}
# Create DataFrame
df = pd.DataFrame(data)
df
name
0 Tom, Jeffrey, Henry
1 Nick, James
2 Chris
3 David, Oscar
4 Jim, Jones, William, Oliver
How can automatically split the name column into n-number of separate columns based on the ',' separator? The desired output would be this:
name name1 name2 name3 name4
0 Tom, Jeffrey, Henry Tom Jeffrey Henry None
1 Nick, James Nick James None None
2 Chris Chris None None None
3 David, Oscar David Oscar None None
4 Jim, Jones, William, Oliver Jim Jones William Oliver
Use DataFrame.join for new DataFrame with rename for new columns names:
f = lambda x: f'name{x+1}'
df = df.join(df["name"].str.split(", ", expand=True).rename(columns=f))
print (df)
name name1 name2 name3 name4
0 Tom, Jeffrey, Henry Tom Jeffrey Henry None
1 Nick, James Nick James None None
2 Chris Chris None None None
3 David, Oscar David Oscar None None
4 Jim, Jones, William, Oliver Jim Jones William Oliver
I have excel files that contains two columns, I want to check presence of every cell in column 1 against data in column 2,
If data in a cell in column 1 is present in column 2 then it must output 1 and if not 0.
Here's dataframe
COLUMN 1 COLUMN 2
ZUBEDA SALIBOKO JUMANNE REDEMPTHA MATINDI
STEPHEN STAFFORD MIHUNGO PETER G. DATTAN
JUMANNE MWALIMU JOANES PETER LUGAZIA
HUWAIDA IDRISSA JUMBE HAMIS JUMA IDD ISAKA
AIDANIA LUAMBANO EDWIN MARTIN MUHONDEZI
KESSY BONIFAS FULANO RICHARD THOMAS MLIWA
KENEDY STEPHEN MSHOMI JUMANNE MWALIMU
JOANES PETER LUGAZIA ISAAC RUGEMALILA ABRAHAM
MWANAISHA MOHAMED MUNGIA ZAITUN SALUM MGAZA
PETRO ZACHARIA MAGANGA STEPHEN STAFFORD MIHUNGO
Desired output
COLUMN 1 COLUMN 2 RESULTS
ZUBEDA SALIBOKO JUMANNE REDEMPTHA MATINDI 0
STEPHEN STAFFORD MIHUNGO PETER G. DATTAN 1
JUMANNE MWALIMU JOANES PETER LUGAZIA 1
HUWAIDA IDRISSA JUMBE HAMIS JUMA IDD ISAKA 0
AIDANIA LUAMBANO EDWIN MARTIN MUHONDEZI 0
KESSY BONIFAS FULANO PETRO ZACHARIA MAGANGA 0
KENEDY STEPHEN MSHOMI JUMANNE MWALIMU 0
JOANES PETER LUGAZIA ISAAC RUGEMALILA ABRAHAM 0
MWANAISHA MOHAMED MUNGIA ZAITUN SALUM MGAZA 0
PETRO ZACHARIA MAGANGA STEPHEN STAFFORD MIHUNGO 1
df['RESULTS'] = df['COLUMN 1'] isin df['COLUMN 2']
You almost had it:
df["RESULTS"] = df["COLUMN 1"].isin(df["COLUMN 2"]).astype(int)
>>> df
COLUMN 1 COLUMN 2 RESULTS
0 ZUBEDA SALIBOKO JUMANNE REDEMPTHA MATINDI 0
1 STEPHEN STAFFORD MIHUNGO PETER G. DATTAN 1
2 JUMANNE MWALIMU JOANES PETER LUGAZIA 1
3 HUWAIDA IDRISSA JUMBE HAMIS JUMA IDD ISAKA 0
4 AIDANIA LUAMBANO EDWIN MARTIN MUHONDEZI 0
5 KESSY BONIFAS FULANO RICHARD THOMAS MLIWA 0
6 KENEDY STEPHEN MSHOMI JUMANNE MWALIMU 0
7 JOANES PETER LUGAZIA ISAAC RUGEMALILA ABRAHAM 1
8 MWANAISHA MOHAMED MUNGIA ZAITUN SALUM MGAZA 0
9 PETRO ZACHARIA MAGANGA STEPHEN STAFFORD MIHUNGO 0
Use np.where
import numpy as np
df["RESULTS"] = np.where(df["COLUMN 1"]==df["COLUMN 2"], 1, 0)
https://numpy.org/doc/stable/reference/generated/numpy.where.html
I have two dataframes, here are snippets of both below. I am trying to find and replace the artists names in the second dataframe with the id's in the first dataframe. Is there a good way to do this?
id fullName
0 1 Colin McCahon
1 2 Robert Henry Dickerson
2 3 Arthur Dagley
Artists
0 Arthur Dagley, Colin McCahon, Maria Cruz
1 Fiona Gilmore, Peter Madden, Nicholas Spratt, ...
2 Robert Henry Dickerson
3 Steve Carr
Desired output:
Artists
0 3, 1, Maria Cruz
1 Fiona Gilmore, Peter Madden, Nicholas Spratt, ...
2 2
3 Steve Carr
You mean check with replace
df1.Artists.replace(dict(zip(df.fullName,df.id.astype(str))),regex=True)
0 3, 1, Maria Cruz
1 Fiona Gilmore, Peter Madden, Nicholas Spratt, ...
2 2
3 Steve Carr
Name: Artists, dtype: object
Convert your first dataframe into a dictionary:
d = Series(name_df.id.astype(str),index=name_df.fullName).to_dict()
Then use .replace():
artists_df["Artists"] = artists_df["Artists"].replace(d, regex=True)
This question already has answers here:
Python: Random selection per group
(11 answers)
Closed 4 years ago.
Let's say I have a pandas DataFrame named df that looks like this
father_name child_name
Robert Julian
Robert Emily
Robert Dan
Carl Jack
Carl Rose
John Lucy
John Mark
John Alysha
Paul Christopher
Paul Thomas
Robert Kevin
Carl Elisabeth
where I know for sure that each father has at least 2 children.
I would like to obtain a DataFrame where each father has exactly 2 of his children, and those two children are selected at random. An example output would be
father_name child_name
Robert Emily
Robert Kevin
Carl Jack
Carl Elisabeth
John Alysha
John Mark
Paul Thomas
Paul Christopher
How can I do that?
You can apply DataFrame.sample on the grouped data. It takes the parameter n which you can set to 2
df.groupby('father_name').child_name.apply(lambda x: x.sample(n=2))\
.reset_index(1, drop = True).reset_index()
father_name child_name
0 Carl Elisabeth
1 Carl Jack
2 John Mark
3 John Lucy
4 Paul Thomas
5 Paul Christopher
6 Robert Emily
7 Robert Julian
Given a Pandas dataframe which has a few labeled series in it, say Name and Villain.
Say the dataframe has values such:
Name: {'Batman', 'Batman', 'Spiderman', 'Spiderman', 'Spiderman', 'Spiderman'}
Villain: {'Joker', 'Bane', 'Green Goblin', 'Electro', 'Venom', 'Dr Octopus'}
In total the above dataframe has 2 series(or columns) each with six datapoints.
Now, based on the Name, I want to concatenate 3 more columns: FirstName, LastName, LoveInterest to each datapoint.
The result of which adds 'Bruce; Wayne; Catwoman' to every row which has Name as Batman. And 'Peter; Parker; MaryJane' to every row which has Name as Spiderman.
The final result should be a dataframe containing 5 columns(series) and 6 rows each.
This is a classic inner-join scenario. In pandas, use the merge module-level function:
In [13]: df1
Out[13]:
Name Villain
0 Batman Joker
1 Batman Bane
2 Spiderman Green Goblin
3 Spiderman Electro
4 Spiderman Venom
5 Spiderman Dr. Octopus
In [14]: df2
Out[14]:
FirstName LastName LoveInterest Name
0 Bruce Wayne Catwoman Batman
1 Peter Parker MaryJane Spiderman
In [15]: pd.DataFrame.merge(df1,df2,on='Name')
Out[15]:
Name Villain FirstName LastName LoveInterest
0 Batman Joker Bruce Wayne Catwoman
1 Batman Bane Bruce Wayne Catwoman
2 Spiderman Green Goblin Peter Parker MaryJane
3 Spiderman Electro Peter Parker MaryJane
4 Spiderman Venom Peter Parker MaryJane
5 Spiderman Dr. Octopus Peter Parker MaryJane