Python, sort with key - python

I am having trouble understanding key parameter in sorted function in Python.
Let us say, the following list is given: sample_list = ['Date ', 'of', 'birth', 'year 1990', 'month 10', 'day 15'] and I would like to sort only the strings that contain number.
Expected output: ['Date ', 'of', 'birth', 'month 10', 'day 15', 'year 1990']
Until now, I only managed to print the string with the number
def sort_with_key(element_of_list):
if re.search('\d+', element_of_list):
print(element_of_list)
return element_of_list
sorted(sample_list, key = sort_with_key)
But how do I actually sort those elements?
Thank you!

We can try sorting with a lambda:
sample_list = ['Date ', 'of', 'birth', 'year 1990', 'month 10', 'day 15']
sample_list = sorted(sample_list, key=lambda x: int(re.findall(r'\d+', x)[0]) if re.search(r'\d+', x) else 0)
print(sample_list)
This prints:
['Date ', 'of', 'birth', 'month 10', 'day 15', 'year 1990']
The logic used in the lambda is to sort by the number in each list entry, if the entry has a number. Otherwise, it assigns a value of zero to other entries, placing them first in the sort.

If I understand correctly, you want strings with a number to be sorted with this number as key, and strings without a number to be at the beginning?
You need a key that extracts the number from the string. We can use str.isdigit() to extract digits from a string, ''.join() to put these digits back together, and int() to convert to an integer. If there are no digits in the string, we'll return -1 instead, so it comes before all nonnegative numbers.
sample_list = ['Date ', 'of', 'birth', 'year 1990', 'month 10', 'day 15', 'answer 42', 'small number 0', 'large number 8676965', 'no number here']
sample_list.sort(key=lambda s: int(''.join(c for c in s if c.isdigit()) or -1))
print(sample_list)
# ['Date ', 'of', 'birth', 'no number here', 'small number 0', 'month 10', 'day 15', 'answer 42', 'year 1990', 'large number 8676965']

Related

Create a list with string + range

I need your help:
I want to create a list looking like this ['Unnamed: 16', 'Unnamed: 17', 'Unnamed:18'] for a range (16,60). How can I proceed?
I don't know if my question is clear but it's like doing list(range(16, 60) but with a string before each numbers.
Thank you very much for your help!!
You can use f-strings to do so :
my_list = [f"Unnamed: {i}" for i in range(16, 60)]
# Output
['Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', ...]
I would do it following way
prefix = "Unnamed: "
lst = [prefix + str(i) for i in range(16,25)]
print(lst)
output
['Unnamed: 16', 'Unnamed: 17', 'Unnamed: 18', 'Unnamed: 19', 'Unnamed: 20', 'Unnamed: 21', 'Unnamed: 22', 'Unnamed: 23', 'Unnamed: 24']
Note: I used othre range for brevity sake. You might elect to use one of string formatting method instead.
You can do it using map as,
list(map(lambda x: f'Unnamed: {x}', range(16, 60)))
You can use f strings
name = 'Unamed:'
list = [f"{prefix} {i}" for i in range(16, 60)]
print(list)
my_list = []
for i in range(16, 60):
my_list.append("Unnamed: " + str(i))
print(my_list)

How to get correct output from regex.split()?

import re
number_with_both_parantheses = "(\(*([\d+\.]+)\))"
def process_numerals(text):
k = re.split(number_with_both_parantheses, text)
k = list(filter(None, k))
for elem in k:
print(elem)
INPUT = 'Statement 1 (1) Statement 2 (1.1) Statement 3'
expected_output = ['Statement 1', '(1)' , 'Statement 2', '(1.1)', 'Statement 3']
current_output = ['Statement 1', '(1)' , '1', 'Statement 2', '(1.1)', '1.1' , 'Statement 3']
My input is the INPUT. I am getting the current_output when call the method 'process_numerals' with input text. How do I shift to expected output ?
Your regex seems off. You realize that \(* checks for zero or more left parentheses?
>>> import re
>>> INPUT = 'Statement 1 (1) Statement 2 (1.1) Statement 3'
>>> re.split('\((\d+(?:\.\d+)?)\)', INPUT)
['Statement 1 ', '1', ' Statement 2 ', '1.1', ' Statement 3']
If you really want the literal parentheses to be included, put them inside the capturing parentheses.
The non-capturing parentheses (?:...) allow you to group without capturing. I guess that's what you are mainly looking for.

match strings in list and DF column and put into new DF column

using python, pandas
I have a dataframe with three columns and about a million rows. The third column contains strings. I want to select a subset of these strings that match the strings in a list and put them in a fourth column.
Here is an example of a string from the dataframe:
"BW - Jl 8 '79 - pE2 CCB-B -vl9-Ja '66-p83 LJ - v91 - Ja 15 -66 - p426
NYRB - v5 - D 9 '65 - p39 NYTBR - v70 - N 21 '65 - p60 Nat R - vl7 -
D14 '65-pll65 y"
Here is a sample of my list:
['AAA', 'A Anth', 'AAPSS-A', 'A Anth', 'A Arch', 'A Art', 'AB', 'ABA
Jour', 'ABC', 'ABR', 'AC', 'ACSB', 'Adult L', 'Advocate', 'AE', 'AER',
'AF', 'Africa T', 'Afterimage', 'Aging', 'AH', 'AHR', 'A Hy R', 'AIQ',
'AJA', 'AJES', 'AJMD', 'AJMR', 'AJP', 'A J Psy', 'AJS', 'AL', 'A Lead',
'A Lib', 'Am', 'Am Ant', 'Am Arts', 'Am Craft', 'Amer R', 'Am Ethol',
'Am Film', 'Am Mus Teach', 'Am Q', 'Ams', 'Am Sci', 'Am Spect', 'Am
Threat', 'Analog', 'ANQ', 'ANQ:QJ', 'Ant & Col Hob', 'Antiq', 'Antiq
J', 'Ant R', 'Apo', 'APR', 'APSR', 'AR', 'ARBA', 'Arch', 'Archt R',
'ARG', 'Armchair Det', 'Art Am', 'Art Bull', 'Art Dir', 'Art J', 'Art
N', 'AS', 'ASBYP', 'Aspen A', 'Aspen J', 'ASR', 'Astron', 'Ath J',
'Atl', 'Atl Pro Bk R', 'Atl PBR', 'Aud', 'AW', 'BALF', 'Ballet N',
"Barron's", 'BAS', 'BB', 'B&B', 'BC', 'BCM', 'B Ent', 'Belles Let',
'BF', 'BFYC', 'B Hor', 'BHR', 'BIC', 'Biography', 'BksW', 'Bks for
Keeps', 'Bks for YP', 'BL', 'Bloom Rev']
From the string in the dataframe, I want to select 'BW', 'CCB-B', 'LJ', 'NYRB', 'NYTRB', and 'Nat R', (all of which are in the list) and put them in a new column in the same row.
My code looks like this:
s = df65['Review'].str.extractall(reviews_list).squeeze()
s = s.unstack(level=-1)
df65['Reviews'] = s
But extractall doesn't take lists as arguments in this way.
Help?
str.extractall expects a regex pattern as a parameter. You can make this regex with
'|'.join(reviews_list)
But some characters need to be escaped to be used with regex, so import re and use re.escape like this:
[re.escape(item) for item in reviews_list]
So your new call will be
s = df65['Review'].str.extractall('|'.join([re.escape(item) for item in reviews_list])).squeeze()

Tuples and List Manipulation with Python. Cutting Tuple generation short

Really stuck with this question in my homework assignment.
Everything works, but when there is a space (' ') in the p. I need to stop the process of creating can.
For example, if I submit:
rankedVote("21 4", [('AB', '132'), ('C D', ''), ('EFG', ''), ('HJ K', '2 1')])
I would like to have:
['C D', 'AB']
returned, rather than just [] like it is now.
Code as below:
def rankedVote(p,cs):
candsplit = zip(*cs)
cand = candsplit[0]
vote = list(p)
ppl = vote
can = list(p)
for i in range(len(vote)):
if ' ' in vote[i-1]:
return []
else:
vote[i] = int(vote[i])
can[vote[i]-1] = cand[i]
for i in range(len(vote)):
for j in range(len(vote)):
if i != j:
if vote[i] == vote[j]:
return []
return can
EDIT:
In the example:
rankedVote("21 4", [('AB', '132'), ('C D', ''), ('EFG', ''), ('HJ K', '2 1')])
This means that the 1st, AB becomes 2nd,
and the 2nd one C D becomes 1st,
and it should stop because 3rd does not exist.
Let's say that instead of 21 4, it was 2143.
It would mean that the 3rd one EFG would be 4th,
and the 4th HJ K would be 3rd.
The code is doing as you instructed I would say. Look at the code block below:
if ' ' in vote[i-1]:
return []
I know this question is old, but I found it interesting.
Like the previous answer said you aren't returning the list up to that point, you are returning [].
What you should do is:
if ' ' in vote[i]:
return can[:i]
Also, since you seemed to know how to use zip, you could have also done it this way:
def rankedVote(p,cs):
cand = zip(*cs)[0]
# get elements before ' '
votes = p.split()[0] # '21'
# map votes index order with corresponding list order
# (number of `cands` is determined by length of `votes`)
cans = zip(votes, cand) # [('2', 'AB'), ('1', 'C D')]
# Sort the results and print only the cands
result = [can for vote, can in sorted(cans)] # ['C D', 'AB']
return result
Output:
>> rankedVote("21 4", [('AB', '132'), ('C D', ''), ('EFG', ''), ('HJ K', '2 1')])
['C D', 'AB']
>> rankedVote("2143", [('AB', '132'), ('C D', ''), ('EFG', ''), ('HJ K', '2 1')])
['C D', 'AB', 'HJ K', 'EFG']

How to find unique starts of strings?

If I have a list of strings (eg 'blah 1', 'blah 2' 'xyz fg','xyz penguin'), what would be the best way of finding the unique starts of strings ('xyz' and 'blah' in this case)? The starts of strings can be multiple words.
Your question is confusing, as it is not clear what you really want. So I'll give three answers and hope that one of them at least partially answers your question.
To get all unique prefixes of a given list of string, you can do:
>>> l = ['blah 1', 'blah 2', 'xyz fg', 'xyz penguin']
>>> set(s[:i] for s in l for i in range(len(s) + 1))
{'', 'xyz pe', 'xyz penguin', 'b', 'xyz fg', 'xyz peng', 'xyz pengui', 'bl', 'blah 2', 'blah 1', 'blah', 'xyz f', 'xy', 'xyz pengu', 'xyz p', 'x', 'blah ', 'xyz pen', 'bla', 'xyz', 'xyz '}
This code generates all initial slices of every string in the list and passes these to a set to remove duplicates.
To get all largest initial word sequences smaller than the full string, you could go with:
>>> l = ['a b', 'a c', 'a b c', 'b c']
>>> set(s.rsplit(' ', 1)[0] for s in l)
{'a', 'a b', 'b'}
This code creates a set by splitting all strings at their rightmost space, if available (otherwise the while string will be returned).
On the other hand, to get all unique initial word sequences without considering full strings, you could go for:
>>> l = ['a b', 'a c', 'a b c', 'b c']
>>> set(' '.join(w[:i]) for s in l for w in (s.split(),) for i in range(len(w)))
{'', 'a', 'b', 'a b'}
This code splits each word at any whitespace and concatenates all initial slices of the resulting list, except the largest one. This code has pitfall: it will e.g. convert tabs to spaces. This may or may not be an issue in your case.
If you mean unique first words of strings (words being separated by space), this would be:
arr=['blah 1', 'blah 2' 'xyz fg','xyz penguin']
unique=list(set([x.split(' ')[0] for x in arr]))

Categories

Resources