Python String Padding - python

I am trying to output a string from a tuple with different widths between each element.
Here is the code I am using at the moment:
b = tuple3[3] + ', ' + tuple3[4] + ' ' + tuple3[0] + ' '
+ tuple3[2] + ' ' + '£' + tuple3[1]
print(b)
Say for example I input these lines of text:
12345 1312 Teso Billy Jones
12344 30000 Test John M Smith
The output will be this:
Smith, John M 12344 Test £30000
Jones, Billy 12345 Teso £1312
How can I keep the padding consistent with larger spacing between the 3 parts?
Also, when I input these strings straight from a text file this is the output I recieve:
Smith
, John M 12344 Test £30000
Jones, Billy 12345 Teso £1312
How can I resolve this?
Thanks alot.

String formatting to the rescue!
lines_of_text = [
(12345, 1312, 'Teso', 'Billy', 'Jones'),
(12344, 30000, 'Test', 'John M', 'Smith')
]
for mytuple in lines_of_text:
name = '{}, {}'.format(mytuple[4], mytuple[3])
value = '£' + str(mytuple[1])
print('{name:<20} {id:>8} {test:<12} {value:>8}'.format(
name=name, id=mytuple[0], test=mytuple[2], value=value)
)
results in
Jones, Billy 12345 Teso £1312
Smith, John M 12344 Test £30000

Related

In, Python trying to remove duplicate word in dataframe, but get error

I'm trying to remove a duplicate word in a cell
Current Desired
0 John and Jane John and Jane
1 John and John John
2 John John
3 Jane and Jane Jane
I have tried the following, desired column gets filled with o d i c t _ k e y s ( [ ' n a n ' ] ):
from collections import OrderedDict
df['Current'] = (df['Desired'].astype(str).str.split()
.apply(lambda x: OrderedDict.fromkeys(x).keys())
.astype(str).str.join(' '))
I have also tried this, but the desired column gets filled with nan
df['Desired'] = df['Current'].str.replace(r'\b(\w+)(\s+\1)+\b', r'\1')
Let us do split with set then join back
df['out'] = df.Current.str.split(' and ').map(lambda x : ' and '.join(set(x)))
df
Out[876]:
Current out
0 John and Jane Jane and John
1 John and John John
2 John John
3 Jane and Jane Jane

Split text only by comma or (comma and space) not only with space

I want a regex expression that extract values seperated by commas.
Sample input: "John Doe, Jack, , Henry,Harry,,Rob"
Expected output:
John Doe
Jack
Henry
Harry
Rob
I tried [\w ]+ but blank values and extra spaces are getting included
Given:
>>> s="John Doe, Jack, , Henry,Harry,,Rob"
You can do:
>>> [e for e in re.split(r'\s*,\s*',s) if e]
['John Doe', 'Jack', 'Henry', 'Harry', 'Rob']

Python : Merge several columns of a dataframe without having duplicates of data

Let's say that I have this dataframe :
Name = ['Lolo', 'Mike', 'Tobias','Luke','Sam']
Age = [19, 34, 13, 45, 52]
Info_1 = ['Tall', 'Large', 'Small', 'Small','']
Info_2 = ['New York', 'Paris', 'Lisbon', '', 'Berlin']
Info_3 = ['Tall', 'Paris', 'Hi', 'Small', 'Thanks']
Data = [123,268,76,909,87]
Sex = ['F', 'M', 'M','M','M']
df = pd.DataFrame({'Name' : Name, 'Age' : Age, 'Info_1' : Info_1, 'Info_2' : Info_2, 'Info_3' : Info_3, 'Data' : Data, 'Sex' : Sex})
print(df)
Name Age Info_1 Info_2 Info_3 Data Sex
0 Lolo 19 Tall New York Tall 123 F
1 Mike 34 Large Paris Paris 268 M
2 Tobias 13 Small Lisbon Hi 76 M
3 Luke 45 Small Small 909 M
4 Sam 52 Berlin Thanks 87 M
I want to merge the data of four columns of this dataframe : Info_1, Info_2, Info_3, Data.
I want to merge them without having duplicates of data for each row. That means for the row "0", I do not want to have "Tall" twice. So at the end I would like to get something like that :
Name Age Info Sex
0 Lolo 19 Tall New York 123 F
1 Mike 34 Large Paris 268 M
2 Tobias 13 Small Lisbon Hi 76 M
3 Luke 45 Small 909 M
4 Sam 52 Berlin Thanks 87 M
I tried this function to merge the data :
di['period'] = df[['Info_1', 'Info_2', 'Info_3' 'Data']].agg('-'.join, axis=1)
However I get an error because it expects a string, How can I merge the data of the column "Data" ? And how can I check that I do not create duplicates
Thank you
Your Data columns seems to be int type. Convert it to strings first:
df['Data'] = df['Data'].astype(str)
df['period'] = (df[['Info_1','Info_2','Info_3','Data']]
.apply(lambda x: ' '.join(x[x!=''].unique()), axis=1)
)
Output:
Name Age Info_1 Info_2 Info_3 Data Sex period
0 Lolo 19 Tall New York Tall 123 F Tall New York 123
1 Mike 34 Large Paris Paris 268 M Large Paris 268
2 Tobias 13 Small Lisbon Hi 76 M Small Lisbon Hi 76
3 Luke 45 Small Small 909 M Small 909
4 Sam 52 Berlin Thanks 87 M Berlin Thanks 87
I think it's probably easiest to first just concatenate all the fields you want with a space in between:
df['Info'] = df.Info_1 + ' ' + df.Info_2 + ' ' + df.Info_3 + ' ' + df.Data.astype(str)
Then you can write a function to remove the duplicate words from a string, something like this:
def remove_dup_words(s):
words = s.split(' ')
unique_words = pd.Series(words).drop_duplicates().tolist()
return ' '.join(unique_words)
and apply that function to the Info field:
df['Info'] = df.Info.apply(remove_dup_words)
all the code together:
import pandas as pd
def remove_dup_words(s):
words = s.split(' ')
unique_words = pd.Series(words).drop_duplicates().tolist()
return ' '.join(unique_words)
Name = ['Lolo', 'Mike', 'Tobias','Luke','Sam']
Age = [19, 34, 13, 45, 52]
Info_1 = ['Tall', 'Large', 'Small', 'Small','']
Info_2 = ['New York', 'Paris', 'Lisbon', '', 'Berlin']
Info_3 = ['Tall', 'Paris', 'Hi', 'Small', 'Thanks']
Data = [123,268,76,909,87]
Sex = ['F', 'M', 'M','M','M']
df = pd.DataFrame({'Name' : Name, 'Age' : Age, 'Info_1' : Info_1, 'Info_2' : Info_2, 'Info_3' : Info_3, 'Data' : Data, 'Sex' : Sex})
df['Info'] = df.Info_1 + ' ' + df.Info_2 + ' ' + df.Info_3 + ' ' + df.Data.astype(str)
df['Info'] = df.Info.apply(remove_dup_words)
print(df)
Name Age Info_1 Info_2 Info_3 Data Sex Info
0 Lolo 19 Tall New York Tall 123 F Tall New York 123
1 Mike 34 Large Paris Paris 268 M Large Paris 268
2 Tobias 13 Small Lisbon Hi 76 M Small Lisbon Hi 76
3 Luke 45 Small Small 909 M Small 909
4 Sam 52 Berlin Thanks 87 M Berlin Thanks 87

How to create conditional clause if column in dataframe is empty?

I have a df that looks like this:
fname lname
joe smith
john smith
jane#jane.com
jacky /jax jack
a#a.com non
john (jack) smith
Bob J. Smith
I want to create logic that says that if lname is empty, and if there are two OR three strings in fname seperate the second string OR third string and push it into lname column. If email address in fname leave as is, and if slashes or parenthesis in the fname column and no value in lname leave as is.
new df:
fname lname
joe smith
john smith
jane#jane.com
jacky /jax jack
a#a.com non
john (jack) smith
Bob J. smith
Code so far to seperate two strings:
df[['lname']] = df['name'].loc[df['fname'].str.split().str.len() == 2].str.split(expand=True)
With the following sample dataframe:
df = pd.DataFrame({'fname': ['joe', 'john smith', 'jane#jane.com', 'jacky /jax', 'a#a.com', 'john (jack)', 'Bob J. Smith'],
'lname': ['smith', '', '', 'jack', 'non', 'smith', '']})
You can use np.where():
conditions = (df['lname']=='') & (df['fname'].str.split().str.len()>1)
df['lname'] = np.where(conditions, df['fname'].str.split().str[-1].str.lower(), df['lname'])
Yields:
fname lname
0 joe smith
1 john smith smith
2 jane#jane.com
3 jacky /jax jack
4 a#a.com non
5 john (jack) smith
6 Bob J. Smith smith
To remove the last string from the fname column of the rows that had their lname column populated:
df['fname'] = np.where(conditions, df['fname'].str.split().str[:-1].str.join(' '), df['fname'])
Yields:
fname lname
0 joe smith
1 john smith
2 jane#jane.com
3 jacky /jax jack
4 a#a.com non
5 john (jack) smith
6 Bob J. smith
If I understand correctly you have a dataframe with columns fname and lname. If so then you can modify empty rows in column lname with:
condition = (df.loc[:, 'lname'] == '') & (df.loc[:, 'fname'].str.contains(' '))
df.loc[condition, 'lname'] = df.loc[condition, 'fname'].str.split().str[-1]
The code works for the sample data you have provided in the question but should be improved to be used in more general case.
To modify column fname you may use:
df.loc[condition, 'fname'] = df.loc[condition, 'fname'].str.split().str[:-1].str.join(sep=' ')

Filter duplicates and append a character to each item

I am working on the following Dataframe:
print (df)
LN FN
0 Smith Jason
1 Smith Pat
2 Smith Liz
3 Kim Jim
4 Hazel Vickie
5 Sun Sandra
I would like to filter the duplicated names on ['LN'] and put a first character of a name from ['FN']. In this example, I would like to add 'J', 'P', and 'L' to each 'Smith' with the space on ['LN'].
Desired output would be:
print (df)
LN FN
0 Smith J Jason
1 Smith P Pat
2 Smith L Liz
3 Kim Jim
4 Hazel Vickie
5 Sun Sandra
My attempt:
My code below achieved the desired output but there should be a cleaner and more pandas-like way of achieving this.
df1 = df.loc[df.duplicated('LN', False)]
df2 = pd.DataFrame(df1.LN + ' '+ df1.FN.str.get(0))
df3 = pd.concat([df1,df2], axis=1)
df3 = df3[[0, 'FN']]
df3.columns = ['LN', 'FN']
df.update(df3)
Thank you for your help on this!
you can do it this way:
In [41]: df.loc[df.LN.duplicated(keep=False), 'LN'] += ' ' + df.FN.str[0]
In [42]: df
Out[42]:
LN FN
0 Smith J Jason
1 Smith P Pat
2 Smith L Liz
3 Kim Jim
4 Hazel Vickie
5 Sun Sandra

Categories

Resources