Converting 2 columns of names into 4 columns of names using Pandas - python

I have an Excel file that consists of two columns: last_name, first_name. The list is sorted by years of experience. I would like to create an new Excel file (or text file) that prints the names two-by-two.
Last First
Smith Joe
Jones Mary
Johnson Ken
etc
and converts it to
Smith Joe Jones Mary
Johnson Ken etc.
effectively printing every other name on the same row as the name above.
I have reached the point where the names can be printed into a single set of columns, but I can't move every other name to adjacent columns.
Thanks

TRY:
result = pd.concat([df.iloc[::2].reset_index(drop=True),
df.iloc[1::2].reset_index(drop=True)], 1)
OUTPUT:
Last First Last First
0 Smith Joe Jones Mary
1 Johnson Ken etc None

Related

Pivot - Transpose Vertical Data with repeated rows into Horizontal Data with one row per ID [duplicate]

This question already has answers here:
How can I pivot a dataframe?
(5 answers)
Closed 8 months ago.
I have survey data that was exported as a Vertical dataframe. Meaning for everytime a person responded to 3 questions in the survey, their row would duplicate 3 times, except the content of the question and their answer. I am trying to transpose/pivot this data so that all 3 questions are displayed in a unique column so that their responses are also displayed in each column instead of another row, alongside their details like ID, Full Name, Location, etc...
Here is what it looks like currently:
ID Full Name Location Question Multiple Choice Question Answers
12345 John Smith UK 1. It was easy to report my sickness. Agree
12345 John Smith UK 2. I felt ready to return from Quarantine. Neutral
12345 John Smith UK 3. I am satisfied with the adjustments made. Disagree
.. ... ... ... ...
67891 Jane Smith UK 1. It was easy to report my sickness. Agree
67891 Jane Smith UK 2. I felt ready to return from Quarantine. Agree
67891 Jane Smith UK 3. I am satisfied with the adjustments made. Agree
and this is how I want it:
ID Full Name Location 1. It was easy to report my sickness. 2. I was satisfied with the support I received. 3. I felt ready to return from Quarantine.
12345 John Smith UK Agree Neutral Disagree
67891 Jane Smith UK Agree Agree Disagree
Currently I'm trying to use this code to get my desired output but I can only get the IDs and Full Names to isolate without duplicating and the other columns just show up as individual rows.
column_indices1 = [2,3,4]
df5 = df4.pivot_table(index = ['ID', 'Full Name'], columns = df4.iloc[:, column_indices1], \
values = 'Multiple Choice Question Answer', \
fill_value = 0)
Concept
In this scenario, we should consider using:
pivot(): Pivot without aggregation that can handle non-numeric data.
Practice
Prepare data
data = {'ID':[12345,12345,12345,67891,67891,67891],
'Full Name':['John Smith','John Smith','John Smith','Jane Smith','Jane Smith','Jane Smith'],
'Location':['UK','UK','UK','UK','UK','UK'],
'Question':['Q1','Q2','Q3','Q1','Q2','Q3'],
'Answers':['Agree','Neutral','Disagree','Agree','Agree','Agree']}
df = pd.DataFrame(data=data)
df
Output
ID
Full Name
Location
Question
Answers
0
12345
John Smith
UK
Q1
Agree
1
12345
John Smith
UK
Q2
Neutral
2
12345
John Smith
UK
Q3
Disagree
3
67891
Jane Smith
UK
Q1
Agree
4
67891
Jane Smith
UK
Q2
Agree
5
67891
Jane Smith
UK
Q3
Agree
Use pivot()
questionnaire = df.pivot(index=['ID','Full Name','Location'], columns='Question', values='Answers')
questionnaire
Output
adding reset_index() and rename_axis() to get the format you want
questionnaire = questionnaire.reset_index().rename_axis(None, axis=1)
questionnaire
Output

Pandas Number of Unique Values from 2 Fields

I am trying to find the number of unique values that cover 2 fields. So for example, a typical example would be last name and first name. I have a data frame.
When I do the following, I just get the number of unique fields for each column, in this case, Last and First. Not a composite.
df[['Last Name','First Name']].nunique()
Thanks!
Groupby both columns first, and then use nunique
>>> df.groupby(['First Name', 'Last Name']).nunique()
IIUC, you could use value_counts() for that:
df[['Last Name','First Name']].value_counts().size
3
For another example, if you start with this extended data frame that contains some dups:
Last Name First Name
0 Smith Bill
1 Johnson Bill
2 Smith John
3 Curtis Tony
4 Taylor Elizabeth
5 Smith Bill
6 Johnson Bill
7 Smith Bill
Then value_counts() gives you the counts by unique composite last-first name:
df[['Last Name','First Name']].value_counts()
Last Name First Name
Smith Bill 3
Johnson Bill 2
Curtis Tony 1
Smith John 1
Taylor Elizabeth 1
Then the length of that frame will give you the number of unique composite last-first names:
df[['Last Name','First Name']].value_counts().size
5

Python/Pandas - If Column A equals X or Y, then assign value from Col B. If not, then assign Col C. How to write in Python?

I'm having trouble formulating this statement in Pandas that would be very simple in excel. I have a dataframe sample as follows:
colA colB colC
10 0 27:15 John Doe
11 0 24:33 John Doe
12 1 29:43 John Doe
13 Inactive John Doe None
14 N/A John Doe None
Obviously the dataframe is much larger than this, with 10,000+ rows, so I'm trying to find an easier way to do this. I want to create a column that checks if colA is equal to 0 or 1. If so, then equals colC. If not, then equals colC. In excel, I would simply create a new column (new_col) and write
=IF(OR(A2<>0,A2<>1),B2,C2)
And then drag fill the entire sheet.
I'm sure this is fairly simple but I cannot for the life of me figure this out.
Result should look like this
colA colB colC new_col
10 0 27:15 John Doe John Doe
11 0 24:33 John Doe John Doe
12 1 29:43 John Doe John Doe
13 Inactive John Doe None John Doe
14 N/A John Doe None John Doe
np.where should do the trick.
df['new_col'] = np.where(df['colA'].isin([0, 1]), df['colB'], df['colC'])
Here is a solution that adds your results to a list given your conditions, then adds the list back in the dataframe as D column.
your_results=[]
for i,data in enumerate(df["colA"]):
if data==0 or data==1:
your_results.append(df["colC"][i])
else:
your_results.append(df["colB"][i])
df["colD"]=your_results

Break up a data-set into separate excel files based on a certain row value in a given column in Pandas?

I have a fairly large dataset that I would like to split into separate excel files based on the names in column A ("Agent" column in the example provided below). I've provided a rough example of what this data-set looks like in Ex1 below.
Using pandas, what is the most efficient way to create a new excel file for each of the names in column A, or the Agent column in this example, preferably with the name found in column A used in the file title?
For example, in the given example, I would like separate files for John Doe, Jane Doe, and Steve Smith containing the information that follows their names (Business Name, Business ID, etc.).
Ex1
Agent Business Name Business ID Revenue
John Doe Bobs Ice Cream 12234 $400
John Doe Car Repair 445848 $2331
John Doe Corner Store 243123 $213
John Doe Cool Taco Stand 2141244 $8912
Jane Doe Fresh Ice Cream 9271499 $2143
Jane Doe Breezy Air 0123801 $3412
Steve Smith Big Golf Range 12938192 $9912
Steve Smith Iron Gyms 1231233 $4133
Steve Smith Tims Tires 82489233 $781
I believe python / pandas would be an efficient tool for this, but I'm still fairly new to pandas, so I'm having trouble getting started.
I would loop over the groups of names, then save each group to its own excel file:
s = df.groupby('Agent')
for name, group in s:
group.to_excel(f"{name}.xls")
Use lise comprehension with groupby on agent column:
dfs = [d for _,d in df.groupby('Agent')]
for df in dfs:
print(df, '\n')
Output
Agent Business Name Business ID Revenue
4 Jane Doe Fresh Ice Cream 9271499 $2143
5 Jane Doe Breezy Air 123801 $3412
Agent Business Name Business ID Revenue
0 John Doe Bobs Ice Cream 12234 $400
1 John Doe Car Repair 445848 $2331
2 John Doe Corner Store 243123 $213
3 John Doe Cool Taco Stand 2141244 $8912
Agent Business Name Business ID Revenue
6 Steve Smith Big Golf Range 12938192 $9912
7 Steve Smith Iron Gyms 1231233 $4133
8 Steve Smith Tims Tires 82489233 $781
Grouping is what you are looking for here. You can iterate over the groups, which gives you the grouping attributes and the data associated with that group. In your case, the Agent name and the associated business columns.
Code:
import pandas as pd
# make up some data
ex1 = pd.DataFrame([['A',1],['A',2],['B',3],['B',4]], columns = ['letter','number'])
# iterate over the grouped data and export the data frames to excel workbooks
for group_name,data in ex1.groupby('letter'):
# you probably have more complicated naming logic
# use index = False if you have not set an index on the dataframe to avoid an extra column of indices
data.to_excel(group_name + '.xlsx', index = False)
Use the unique values in the column to subset the data and write it to csv using the name:
import pandas as pd
for unique_val in df['Agent'].unique():
df[df['Agent'] == unique_val].to_csv(f"{unique_val}.csv")
if you need excel:
import pandas as pd
for unique_val in df['Agent'].unique():
df[df['Agent'] == unique_val].to_excel(f"{unique_val}.xlsx")

Merging two columns with different information, python

I have a dataframe with one column of last names, and one column of first names. How do I merge these columns so that I have one column with first and last names?
Here is what I have:
First Name (Column 1)
John
Lisa
Jim
Last Name (Column 2)
Smith
Brown
Dandy
This is what I want:
Full Name
John Smith
Lisa Brown
Jim Dandy.
Thank you!
Try
df.assign(name = df.apply(' '.join, axis = 1)).drop(['first name', 'last name'], axis = 1)
You get
name
0 bob smith
1 john smith
2 bill smith
Here's a sample df:
df
first name last name
0 bob smith
1 john smith
2 bill smith
You can do the following to combine columns:
df['combined']= df['first name'] + ' ' + df['last name']
df
first name last name combined
0 bob smith bob smith
1 john smith john smith
2 bill smith bill smith

Categories

Resources