Extracting specific string after specific character - python

new = ['mary 2jay 3ken +', 'mary 2jay 3ken +', 'steven +john ']
print(new):
mary 2jay 3ken +
mary 2jay 3ken +
steven +john -
How could I get the sign/number after each person's name? I'm wondering whether dict would work in this case as my expected output is:
mary:2
jay:3
ken:+
steven:+
john:-

To get the index of "+" in a string, you can use:
index = a_string.index("+")
To check if "+" exist in a string, use:
if "+" in a_string:
# ...
To iterate a list of string, you can do:
for text in new:
# ...
There are fifty ways to do what you want. I suggest you to read the Python tutorial.
edit
You can use a RegEx to extract the fields name/number
for text in next:
couples = re.findall(r"(\S+)\s+(\d+|\+|\-|$)", text)
for name, num in couples:
print(name, num)

Related

Parsing complicated list of strings using regex, loops, enumerate, to produce a pandas dataframe

I have a long list of many elements, each element is a string. See below sample:
data = ['BAT.A.100', 'Regulation 2020-1233', 'this is the core text of', 'the regulation referenced ',
'MOC to BAT.A.100', 'this', 'is', 'one method of demonstrating compliance to BAT.A.100',
'BAT.A.120', 'Regulation 2020-1599', 'core text of the regulation ...', ' more free text','more free text',
'BAT.A.145', 'Regulation 2019-3333', 'core text of' ,'the regulation1111',
'MOC to BAT.A.145', 'here is how you can show compliance to BAT.A.145','more free text',
'MOC2 to BAT.A.145', ' here is yet another way of achieving compliance']
My desired output is ultimately a Pandas DataFrame as follows:
As the strings may have to be concatenated, I have firstly joining all the elements to single string using ## to separate the text which have been joined.
I am going for all regex because there would be lot of conditions to check otherwise.
re_req = re.compile(r'##(?P<Short_ref>BAT\.A\.\d{3})'
r'##(?P<Full_Reg_ref>Regulation\s\d{4}-\d{4})'
r'##(?P<Reg_text>.*?MOC to \1|.*?(?=##BAT\.A\.\d{3})(?!\1))'
r'(?:##)?(?:(?P<Moc_text>.*?MOC2 to \1)(?P<MOC2>(?:##)?.*?(?=##BAT\.A\.\d{3})(?!\1)|.+)'
r'|(?P<Moc_text_temp>.*?(?=##BAT\.A\.\d{3})(?!\1)))')
final_list = []
for match in re_req.finditer("##" + "##".join(data)):
inner_list = [match.group('Short_ref').replace("##", " "),
match.group('Full_Reg_ref').replace("##", " "),
match.group('Reg_text').replace("##", " ")]
if match.group('Moc_text_temp'): # just Moc_text is present
inner_list += [match.group('Moc_text_temp').replace("##", " "), ""]
elif match.group('Moc_text') and match.group('MOC2'): # both Mock_text and MOC2 is present
inner_list += [match.group('Moc_text').replace("##", " "), match.group('MOC2').replace("##", " ")]
else: # neither Moc_text nor MOC2 is present
inner_list += ["", ""]
final_list.append(inner_list)
final_df = pd.DataFrame(final_list, columns=['Short_ref', 'Full_Reg_ref', 'Reg_text', 'Moc_text', 'MOC2'])
First and second line of regex is same as which you posted earlier and identifies the first two columns.
In third line of regex, r'##(?P<Reg_text>.*?MOC to \1|.*?(?=##BAT\.A\.\d{3})(?!\1))' - matches all text till MOC to Short_ref or matches all the text before the next Reg_text. (?=##BAT\.A\.\d{3})(?!\1) part is to taking the text upto Short_ref pattern and if the Short_ref is not the current Reg_text.
Fourth line is for when Moc_text and MOC2 both is present and it is or with fifth line for the case when just Moc_text is present. This part of the regex is similar to the third line.
Last looping over all the matches using finditer and constructing the rows of the dataframe
final_df:

How to split list elements to a line separated by space

I have a list in python as :
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("".join(values))
I want output should be as :-
Subjects: Maths English Hindi Science Physical_Edu Accounts
I am new to Python, I used join() method but unable to get expected output.
You could map the str.stripfunction to every element in the list and join them afterwards.
values = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects:", " ".join(map(str.strip, values)))
Using a regular expression approach:
import re
lst = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
rx = re.compile(r'.*')
print("Subjects: {}".format(" ".join(match.group(0) for item in lst for match in [rx.match(item)])))
# Subjects: Maths English Hindi Science Physical_Edu Accounts
But better use strip() (or even better: rstrip()) as provided in other answers like:
string = "Subjects: {}".format(" ".join(map(str.rstrip, lst)))
print(string)
strip() each element of the string and then join() with a space in between them.
a = ['Maths\n', 'English\n', 'Hindi\n', 'Science\n', 'Physical_Edu\n', 'Accounts\n', '\n']
print("Subjects: " +" ".join(map(lambda x:x.strip(), a)))
Output:
Subjects: Maths English Hindi Science Physical_Edu Accounts
As pointed out by #miindlek, you can also achieve the same thing, by using map(str.strip, a) in place of map(lambda x:x.strip(), a))
What you can do is use this example to strip the newlines and join them using:
joined_string = " ".join(stripped_array)

List within a string and print formatting

I am creating something which takes a tuple, converts it into a string and then reorganises the string using print formatting. 'other' can sometimes have 2 names, hence why I have used * and the " ".join(other) in this function:
def strFormat(x):
#Convert to string
s=' '
s = s.join(x)
print(s)
#Split string into different parts
payR, dep, sal, *other, surn = s.split()
payR, dep, sal, " ".join(other), surn
#Print formatting!
print (surn , other, payR, dep, sal)
The problem with this is that it prints a list of 'other' within the string like this:
Jones ['David', 'Peter'] 84921 Python 63120
But I want it more like this:
Jones David Peter 84921 Python 63120
So that it is ready for formatting into something like this:
Jones, David Peter 84921 Python £63120
Am I going about this the right way and how do I stop the list appearing within the string?
You're close. Change this line (which does nothing):
payR, dep, sal, " ".join(other), surn
to
other = " ".join(other)

SQL / Python - String replacement pattern with commas

I need to achieve the following task using preferably SQL string functions (i.e CHARINDEX, LEFT, TRIM, etc) or Python.
Here's the problem:
Example string: BOB 3A, ALICE 6M
Required output: 3aB, 6mA
As you can see I need to get the last two characters for each word preceding a comma, then append the first character of each item to the end. Preferably this should work for any number of items with commas separating them but the likely case is two.
Any hints / direction would be great. Thanks.
Here's a Python solution:
def thesplit(s):
result = []
for each in s.split(', '):
name, chars = each.split(' ')
result.append(chars.lower() + name[0])
return ', '.join(result)
You can use it like this: thesplit('BOB 3A, ALICE 6M')
Yoyu may try this,
>>> s = "number: 123456789"
>>> ', '.join([i[-2]+i[-1].lower()+i[0] for i in s.split(', ')])
'3aB, 6mA'
Try this:
str = 'BOB 3A, ALICE 6M'
print ', '.join( map( lambda x: x[-2:].lower()+x[0], str.split(", ") ) )

Tuple conversion to a string

I have the following list:
[('Steve Buscemi', 'Mr. Pink'), ('Chris Penn', 'Nice Guy Eddie'), ...]
I need to convert it to a string in the following format:
"(Steve Buscemi, Mr. Pink), (Chris Penn, Nice Guy Eddit), ..."
I tried doing
str = ', '.join(item for item in items)
but run into the following error:
TypeError: sequence item 0: expected string, tuple found
How would I do the above formatting?
', '.join('(' + ', '.join(i) + ')' for i in L)
Output:
'(Steve Buscemi, Mr. Pink), (Chris Penn, Nice Guy Eddie)'
You're close.
str = '(' + '), ('.join(', '.join(names) for names in items) + ')'
Output:
'(Steve Buscemi, Mr. Pink), (Chris Penn, Nice Guy Eddie)'
Breaking it down: The outer parentheses are added separately, while the inner ones are generated by the first '), ('.join. The list of names inside the parentheses are created with a separate ', '.join.
s = ', '.join( '(%s)'%(', '.join(item)) for item in items )
You can simply use:
print str(items)[1:-1].replace("'", '') #Removes all apostrophes in the string
You want to omit the first and last characters which are the square brackets of your list. As mentioned in many comments, this leaves single quotes around the strings. You can remove them with a replace.
NB As noted by #ovgolovin this will remove all apostrophes, even those in the names.
you were close...
print ",".join(str(i) for i in items)
or
print str(items)[1:-1]
or
print ",".join(map(str,items))

Categories

Resources