reshaping dataframe two columns into one column and two rows - python

Suppose I have the following df that I would like to reshape:
df6 = pd.DataFrame({'name':['Sara', 'John', 'Jack'],
'trip places': ['UK,UK,UK', 'US,US,US', 'AUS,AUS,AUS'],
'Trip code': ['UK322,UK454,UK4441', 'US664,US4544,US44', 'AUS11,AUS11,AUS11']
})
df6
Looks like:
name trip places Trip code
0 Sara UK,UK,UK UK322,UK454,UK4441
1 John US,US,US US664,US4544,US44
2 Jack AUS,AUS,AUS AUS11,AUS11,AUS11
I want to add a new column lets say df6['total-info'] and merge the current two columns trip places and Trip code in two rows per name, so the output will be like this:
name total-info
0 Sara UK,UK,UK
UK322,UK454,UK4441
1 John US,US,US
US664,US4544,US44
2 Jack AUS,AUS,AUS
AUS11,AUS11,AUS11
I tried to do so by many methods grouping/stack/unstack pivot .. etc but all what I tried does not generate the output I need and I am not completely familiar with the best function to do so. I also used concatenation but it generated one column and added all the two columns comma separated values altogether.

Use set_index, stack, droplevel then reset_index and specify the new column name:
df7 = (
df6
.set_index('name') # Preserve during reshaping
.stack() # Reshape
.droplevel(1) # Remove column names
.reset_index(name='total-info') # reset_index and name new column
)
df7:
name total-info
0 Sara UK,UK,UK
1 Sara UK322,UK454,UK4441
2 John US,US,US
3 John US664,US4544,US44
4 Jack AUS,AUS,AUS
5 Jack AUS11,AUS11,AUS11
Or if name is to be part of the multi-index append name and call to_frame
after stack and droplevel instead:
df7 = (
df6
.set_index('name', append=True) # Preserve during reshaping
.stack() # Reshape
.droplevel(2) # Remove column names
.to_frame(name='total-info') # Make DataFrame and name new column
)
total-info
name
0 Sara UK,UK,UK
Sara UK322,UK454,UK4441
1 John US,US,US
John US664,US4544,US44
2 Jack AUS,AUS,AUS
Jack AUS11,AUS11,AUS11

Related

String slice of a column in a datframe [duplicate]

This question already has answers here:
Pandas make new column from string slice of another column
(3 answers)
Closed 4 months ago.
data = [['Tom', '5-123g'], ['Max', '6-745.0d'], ['Bob', '5-900.0e'], ['Ben', '2-345',], ['Eva', '9-712.x']]
df = pd.DataFrame(data, columns=['Person', 'Action'])
I want to shorten the "Action" column to a length of 5. My current df has two columns:
['Person'] and ['Action']
I need it to look like this:
person Action Action_short
0 Tom 5-123g 5-123
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-900
3 Ben 2-345 2-345
4 Eva 9-712.x 9-712
What I´ve tried was:
Checking the type of the Column
df['Action'].dtypes
The output is:
dtype('0')
Then I tried:
df['Action'] = df['Action'].map(str)
df['Action_short'] = df.Action.str.slice(start=0, stop=5)
I also tried it with:
df['Action'] = df['Action'].astype(str)
df['Action'] = df['Action'].values.astype(str)
df['Action'] = df['Action'].map(str)
df['Action'] = df['Action'].apply(str)```
and with:
df['Action_short'] = df.Action.str.slice(0:5)
df['Action_short'] = df.Action.apply(lambda x: x[:5])
df['pos'] = df['Action'].str.find('.')
df['new_var'] = df.apply(lambda x: x['Action'][0:x['pos']],axis=1)
The output from all my versions was:
person Action Action_short
0 Tom 5-123g 5-12
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-90
3 Ben 2-345 2-34
4 Eva 9-712.x 9-712
The lambda funktion is not working with 3-222 it sclices it to 3-22
I don't get it why it is working for some parts and for others not.
Try this:
df['Action_short'] = df['Action'].str.slice(0, 5)
By using .str on a DataFrame or a single column of a DataFrame (which is a pd.Series), you can access pandas string manipulation methods that are designed to look like the string operations on standard python strings.
# slice by specifying the length you need
df['Action_short']=df['Action'].str[:5]
df
Person Action Action_short
0 Tom 5-123g 5-123
1 Max 6-745.0d 6-745
2 Bob 5-900.0e 5-900
3 Ben 2-345 2-345
4 Eva 9-712.x 9-712

Count visitor with same id but different name and show it

I have a dataframe:
df1 = pd.DataFrame({'id': ['1','2','2','3','3','4','4'],
'name': ['James','Jim','jimy','Daniel','Dane','Ash','Ash'],
'event': ['Basket','Soccer','Soccer','Basket','Soccer','Basket','Soccer']})
I want to count unique values of id but with the name, the result I except are:
id name count
1 James 1
2 Jim, jimy 2
3 Daniel, Dane 2
4 Ash 2
I try to group by id and name but it doesn't count as i expected
You could try:
df1.groupby('id').agg(
name=('name', lambda x: ', '.join(x.unique())),
count=('name', 'count')
)
We are basically grouping by id and then joining the unique names to a comma separated list!
Here is a solution:
groups = df1[["id", "name"]].groupby("id")
a = groups.agg(lambda x: ", ".join( set(x) ))
b = groups.size().rename("count")
c = pd.concat([a,b], axis=1)
I'm not an expert when it comes to pandas but I thought I might as well post my solution because I think that it's straightforward and readable.
In your example, the groupby is done on the id column and not by id and name. The name column you see in your expected DataFrame is the result of an aggregation done after a groupby.
Here, it is obvious that the groupby was done on the id column.
My solution is maybe not the most straightforward but I still find it to be more readable:
Create a groupby object groups by grouping by id
Create a DataFrame a from groups by aggregating it using commas (you also need to remove the duplicates using set(...) ): lambda x: ", ".join( set(x) )
The DataFrame a will thus have the following data:
name
id
1 James
2 Jim, jimy
3 Daniel, Dane
4 Ash
Create another DataFrame b by computing the size of each groups in groups : groups.size() (you should also rename your column)
id
1 1
2 2
3 2
4 2
Name: count, dtype: int64
Concat a and b horizontally and you get what you wanted
name count
id
1 James 1
2 Jim, jimy 2
3 Daniel, Dane 2
4 Ash 2

Creating new rows from the elements in a column

I have a dataframe that has two fields, name and alias. If one or more aliases appear, I must create a new row for each alias and replace the name field with the alias field. I have something like the first table and it should look like the second table using pandas in python. thanks in advance
IIUC, you want to add rows to get all possible names and aliases as rows.
You could get all unique names as set and reindex:
out = (
df.set_index('name')
.reindex(set(df['name']).union(*df['alias']))
.reset_index()
)
Output:
name alias
0 Juan [Perez, Juancho]
1 Perez NaN
2 Juancho NaN
Or transform "alias" to a renamed DataFrame and concat:
out = (
pd.concat([df,
df['alias'].explode().rename('name').to_frame()
])
.sort_index(kind='stable')
)
Output:
name alias
0 Juan [Perez, Juancho]
0 Perez NaN
0 Juancho NaN
Used input:
df = pd.DataFrame({'name': ['Juan'],
'alias': [['Perez', 'Juancho']]})

Copy values only to a new empty dataframe with column names - Pandas

I have two dataframes.
df1= pd.DataFrame({'person_id':[1,2,3],'gender': ['Male','Female','Not disclosed'],'ethnicity': ['Chinese','Indian','European']})
df2 = pd.DataFrame(columns=['pid','gen','ethn'])
As you can see, the second dataframe (df2) is empty. But may also contain few rows of data at times
What I would like to do is copy dataframe values (only) from df1 to df2 with column names of df2 remain unchanged.
I tried the below but both didn't work
df2 = df1.copy(deep=False)
df2 = df1.copy(deep=True)
How can I achieve my output to be like this? Note that I don't want the column names of df1. I only want the data
Do:
df1.columns = df2.columns.tolist()
df2 = df2.append(df1)
## OR
df2 = pd.concat([df1, df2])
Output:
pid gen ethn
0 1 Male Chinese
1 2 Female Indian
2 3 Not disclosed European
Edit based on OPs comment linking to the nature of dataframes:
df1= pd.DataFrame({'person_id':[1,2,3],'gender': ['Male','Female','Not disclosed'],'ethn': ['Chinese','Indian','European']})
df2= pd.DataFrame({'pers_id':[4,5,6],'gen': ['Male','Female','Not disclosed'],'ethnicity': ['Chinese','Indian','European']})
df3= pd.DataFrame({'son_id':[7,8,9],'sex': ['Male','Female','Not disclosed'],'ethnici': ['Chinese','Indian','European']})
final_df = pd.DataFrame(columns=['pid','gen','ethn'])
Now do:
frame = [df1, df2, df3]
for i in range(len(frame)):
frame[i].columns = final_df.columns.tolist()
final_df = final_df.append(frame[i])
print(final_df)
Output:
pid gen ethn
0 1 Male Chinese
1 2 Female Indian
2 3 Not disclosed European
0 4 Male Chinese
1 5 Female Indian
2 6 Not disclosed European
0 7 Male Chinese
1 8 Female Indian
2 9 Not disclosed European
The cleanest solution I think is to just append the df1 after its column names have been set properly:
df2 = df2.append(pd.DataFrame(df1.values, columns=df2.columns))

transforming single dimension dataframe into form of tables

I have a dataframe with four records:
Name: Bob
College Name:Boston
Name:Ready
College Name:IIT KGP
I want to transform that into form with a table that has two columns in python like:
Name College
Boob Boston
Ready IIT
The separator should be ":".
First split values of column by first :, add counter by cumcount and reshape by unstack:
df = df['col'].str.split(':', expand=True, n=1)
df.columns = ['a','b']
df1 = (df.set_index(['a',df.groupby('a').cumcount()])['b']
.unstack()
.T
.rename_axis(None, axis=1)
.reindex(columns=df['a'].unique()))
print (df1)
Name College Name
0 Bob Boston
1 Ready IIT KGP

Categories

Resources