Create sorting dictionary - python

Write the function dataframe that takes a
dictionary as input and creates a dataframe
from the dictionary, Sort the dictionary.
Instructions
1. Create a dataframe with the input dictionary
2. Columns should be Name Age
3. Print "Before Sorting"
4. Print a Newline
5. Print the dataframe before sorting.
Note: Printing the dataframe must not contain index.
6. Print a Newline
7. Sort the dataframe in ascending order based on Age column
8. Print "After Sorting"
9. Print a Newline
10. Print the dataframe after sorting. Note: Printing the dataframe must not contain index.
Sample Input (it may change according to use cases. So cannot insert below input on code)
['william':42, 'George' :10, 'Joseph
:22, 'Henry':15, 'Samuel':32, 'David':18]
Sample Output
Before Sorting
Name Age
William 42
George. 10
Joseph. 22
Henry. 15
Samuel. 32
David. 18
After Sorting
Name. Age
George. 10
Henry. 15
David. 18
Joseph. 22
Samuel. 32
William. 42
import pandas
import ast
#Enter your code here. Read input from STDIN. Print output from STDOUT
def dataframe(key, value):
. STDIN = {key:value}

def dataframe(data):
df = pd.DataFrame(data)
print("Before Sorting")
print(df)
df.sort_values(by=['Age'], inplace=True)
print("After Sorting")
print(df)
Output :
Before Sorting
Name Age
0 William 42
1 George 10
2 Joseph 22
3 Henry 15
4 Samuel 32
5 David 18
After Sorting
Name Age
1 George 10
3 Henry 15
5 David 18
2 Joseph 22
4 Samuel 32
0 William 42

Related

How to leave certain values (which have a comma in them) intact when separating list-values in strings in pandas?

From the dataframe, I create a new dataframe, in which the values from the "Select activity" column contain lists, which I will split and transform into new rows. But there is a value: "Nothing, just walking", which I need to leave unchanged. Tell me, please, how can I do this?
The original dataframe looks like this:
Name Age Select activity Profession
0 Ann 25 Cycling, Running Saleswoman
1 Mark 30 Nothing, just walking Manager
2 John 41 Cycling, Running, Swimming Accountant
My code looks like this:
df_new = df.loc[:, ['Name', 'Age']]
df_new['Activity'] = df['Select activity'].str.split(', ')
df_new = df_new.explode('Activity').reset_index(drop=True)
I get this result:
Name Age Activity
0 Ann 25 Cycling
1 Ann 25 Running
2 Mark 30 Nothing
3 Mark 30 just walking
4 John 41 Cycling
5 John 41 Running
6 John 41 Swimming
In order for the value "Nothing, just walking" not to be divided by 2 values, I added the following line:
if df['Select activity'].isin(['Nothing, just walking']) is False:
But it throws an error.
then let's look ahead after comma to guarantee a Capital letter, and only then split. So instead of , we have , (?=[A-Z])
df_new = df.loc[:, ["Name", "Age"]]
df_new["Activity"] = df["Select activity"].str.split(", (?=[A-Z])")
df_new = df_new.explode("Activity", ignore_index=True)
i only changed the splitter, and ignore_index=True to explode instead of resetting afterwards (also the single quotes..)
to get
>>> df_new
Name Age Activity
0 Ann 25 Cycling
1 Ann 25 Running
2 Mark 30 Nothing, just walking
3 John 41 Cycling
4 John 41 Running
5 John 41 Swimming
one line as usual
df_new = (df.loc[:, ["Name", "Age"]]
.assign(Activity=df["Select activity"].str.split(", (?=[A-Z])"))
.explode("Activity", ignore_index=True))

Pandas return column data as list without duplicates

This is just an oversimplification but I have this large categorical data.
Name Age Gender
John 12 Male
Ana 24 Female
Dave 16 Female
Cynthia 17 Non-Binary
Wayne 26 Male
Hebrew 29 Non-Binary
Suppose that it is assigned as df and I want it to return as a list with non-duplicate values:
'Male','Female','Non-Binary'
I tried it with this code, but this returns the gender with duplicates
list(df['Gender'])
How can I code it in pandas so that it can return values without duplicates?
In these cases you have to remember that df["Gender"] is a Pandas Series so you could use .drop_duplicates() to retrieve another Pandas Series with the duplicated values removed or use .unique() to retrieve a Numpy Array containing the unique values.
>> df["Gender"].drop_duplicates()
0 Male
1 Female
3 Non-Binary
4 Male
Name: Gender, dtype: object
>> df["Gender"].unique()
['Male ' 'Female' 'Non-Binary' 'Male']

How do i increase an element value from column in Pandas?

Hello I have this Pandas code (look below) but turn out it give me this error: TypeError: can only concatenate str (not "int") to str
import pandas as pd
import numpy as np
import os
_data0 = pd.read_excel("C:\\Users\\HP\\Documents\\DataScience task\\Gender_Age.xlsx")
_data0['Age' + 1]
I wanted to change the element values from column 'Age', imagine if I wanted to increase the column elements from 'Age' by 1, how do i do that? (With Number of Children as well)
The output I wanted:
First Name Last Name Age Number of Children
0 Kimberly Watson 36 2
1 Victor Wilson 35 6
2 Adrian Elliott 35 2
3 Richard Bailey 36 5
4 Blake Roberts 35 6
Original output:
First Name Last Name Age Number of Children
0 Kimberly Watson 24 1
1 Victor Wilson 23 5
2 Adrian Elliott 23 1
3 Richard Bailey 24 4
4 Blake Roberts 23 5
Try:
df['Age'] = df['Age'] - 12
df['Number of Children'] = df['Number of Children'] - 1

Pandas dataframe - How to sort (alphabetically) column values with value_counts

I am trying to sort dataframe column values in conjunction with value_count -
Below is a code snippet of my algorithm:
with open (f_out_txt_2, 'w', encoding='utf-8') as f_txt_out_2:
f_txt_out_2.write(f"SORTED First Names w/SORTED value counts:\n")
for val, cnt in df['First Name'].value_counts(sort='True').iteritems():
f_txt_out_2.write("\n{0:9s} {1:2d}".format(val, cnt))
Below is the first few lines of output - note that "First Name" values are not in alphabetic order.
How can I get the "First Name" values sorted while keeping value counts sorted?
Output:
SORTED First Names w/SORTED value counts:
Marilyn 11
Todd 10
Jeremy 10
Barbara 10
Sarah 9
Rose 9
Kathy 9
Steven 9
Irene 9
Cynthia 9
Carl 8
Alice 8
Justin 8
Bobby 8
Ruby 8
Gloria 8
Julie 8
Clarence 8
Harry 8
Andrea 8
....
Unfortunately I can't find the original source link of where I downloaded the "employee.csv" file from, but here is a sample of it to give an idea of what it contained:
I believe you would use the following code to sort by first name, then by value counts.
dfg = df.groupby('First Name').agg(value_count = ('First Name','count')).sort_values(by = ['First Name','value_count'], ascending = [True,False])

cleaning a column of strings in a pandas dataframe with str comprehension

I have a dataframe (df1) constructed from a survey in which participants entered their gender as a string and so there is a gender column that looks like:
id gender age
1 Male 19
2 F 22
3 male 20
4 Woman 32
5 female 26
6 Male 22
7 make 24
etc.
I've been using
df1.replace('male', 'Male')
for example, but this is really clunky and involves knowing the exact format of each response to fix it.
I've been trying to use various string comprehensions and string operations in Pandas, such as .split(), .replace(), and .capitalize(), with np.where() to try to get:
id gender age
1 Male 19
2 Female 22
3 Male 20
4 Female 32
5 Female 26
6 Male 22
7 Male 24
I'm sure there must be a way to use regex to do this but I can't seem to get the code right.
I know that it is probably a multi-step process of removing " ", then capitalising the entry, then replacing the capitalised values.
Any guidance would be much appreciated pythonistas!
Kev
Adapt the code in my comment to replace every record that starts with an f with the word Female:
df1["gender"] = df1.gender.apply(lambda s: re.sub(
"(^F)([A-Za-z]+)*", # pattern
"Female", # replace
s.strip().title()) # string
)
Similarly for F with M in the pattern and replace with Male for Male.
Relevant regex docs
Regex help

Categories

Resources