sorting using python for complex strings [closed] - python

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 7 years ago.
Improve this question
I have an array that contains numbers and characters, e.g. ['A 3', 'C 1', 'B 2'], and I want to sort it using the numbers in each element.
I tried the below code but it did not work
def getKey(item):
item.split(' ')
return item[1]
x = ['A 3', 'C 1', 'B 2']
print sorted(x, key=getKey(x))

To be safe, I'd recommend you to strip everything but the digits.
>>> import re
>>> x = ['A 3', 'C 1', 'B 2', 'E']
>>> print sorted(x, key=lambda n: int(re.sub(r'\D', '', n) or 0))
['E', 'C 1', 'B 2', 'A 3']
With your method;
def getKey(item):
return int(re.sub(r'\D', '', item) or 0)
>>> print sorted(x, key=getKey)
['E', 'C 1', 'B 2', 'A 3']

What you have, plus comments to what's not working :P
def getKey(item):
item.split(' ') #without assigning to anything? This doesn't change item.
#Also, split() splits by whitespace naturally.
return item[1] #returns a string, which will not sort correctly
x = ['A 3', 'C 1', 'B 2']
print sorted(x, key=getKey(x)) #you are assign key to the result of getKey(x), which is nonsensical.
What it should be
print sorted(x, key=lambda i: int(i.split()[1]))

This is one way to do it:
>>> x = ['A 3', 'C 1', 'B 2']
>>> y = [i[::-1] for i in sorted(x)]
>>> y.sort()
>>> y = [i[::-1] for i in y]
>>> y
['C 1', 'B 2', 'A 3']
>>>

Related

Working with nested loops in python by keeping the inner list element as it is

I have a nested loop that follows as below
arr = [[' item 1 ', 'item 2 ', 'item 3'], ['item 4 ', 'item 5', 'item 6'], ['item 7 ', 'item 8', 'item 9' ]]
I am trying to loop through the arr with 2 for loops to get rid of (strip) the spaces around each item in the inner loop. But when I use the following code, although I can get rid of the spaces, the final result only forms a combined list without the inner list elements intact.
clean_arr = []
for i in arr:
for j in i:
clean_arr.append(j.strip(' '))
The result I get is a single list without any inner lists/nested lists. But what I want is to keep the exact nested structure.
How can I achieve the result? Could you please put some discussion as well. Thanks
Try a list comprehension as follows:
clean_arr = [[y.strip() for y in x] for x in arr]
print(clean_arr)
Output:
[['item 1', 'item 2', 'item 3'], ['item 4', 'item 5', 'item 6'], ['item 7', 'item 8', 'item 9']]
If you want to use a for loop, try the below code:
clean_arr = []
for i in arr:
l = []
for j in i:
l.append(j.strip())
clean_arr.append(l)

I cannot understand this list comprehension

a = [x+y for x in ['Python ','C '] for y in ['Language','Programming']]
print(a)
the output is ['Python Language', 'Python Programming', 'C Language', 'C Programming']
I thought that two list added together should be like ['Python ','C ','Language','Programming']
Simply "deconstruct" the comprehension from left to right, it is the same as nesting for loops to give you the Cartesian product of the two lists:
a = []
for x in ['Python ','C ']:
for y in ['Language','Programming']:
a.append(x+y)
# ['Python Language', 'Python Programming', 'C Language', 'C Programming']
What you had in mind as expected output is the result of a list concatenation like
a = ['Python ','C '] + ['Language','Programming']
# ['Python ', 'C ', 'Language', 'Programming']

How to sort items in a list by a comma delimited number and remove the number afterwards in Python [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 5 years ago.
Improve this question
I have text in a list like this:
something , 3
something else , 1
something this , 2
And this code correctly sorts the list by the number to the right of the comma:
data_list= sorted(data_list, key=lambda line: int(line.rsplit(' ,', 1)[1]))
After the sort, how can I remove the comma delimited numbers, leaving just the part before the comma?
I tried this and it removes the numbers, but the sorting is lost. Somehow this changes the order.
data_list= [sen.split(' ,')[0] for sen in data_list]
Rather than sort while in string form and parse afterwards, it would be better to parse first, then sort.
>>> import csv
>>> from operator import itemgetter
>>> text = ['something , 3', 'something else , 1', 'something this , 2']
>>> list(map(itemgetter(0), sorted(csv.reader(text), key=itemgetter(1))))
['something else ', 'something this ', 'something ']
Here are the steps explained:
# First parse the commas into separate fields
csv.reader(text)
\--> [['something ', ' 3'], ['something else ', ' 1'], ['something this ', ' 2']]
# Then sort using the second field as the key
sorted(_, key=itemgetter(1))
\--> [['something else ', ' 1'], ['something this ', ' 2'], ['something ', ' 3']]
# The extract just the field
list(map(itemgetter(0), _))
\--> ['something else ', 'something this ', 'something ']
As #donkopotamus mentioned, you can replace the key-function with key=lambda x: int(x[1]) to sort numerically rather than alphabetically.
This will do
data_list = ['something , 3',
'something else , 1',
'something this , 2']
data_list= [i.split(',')[0] for i in sorted(data_list, key=lambda line: int(line.split(',')[1]))]
results in
['something else ', 'something this ', 'something ']
Something like:
[x.strip() for x, y in sorted([x.split(",") for x in data],
key=lambda row: int(row[1]))]
Gives:
['something else', 'something this', 'something']
Here we:
split each row, then
sort the data according to the second item as an integer, then
take the first items (stripping them for whitespace)

How to print part of item in a list? [closed]

Closed. This question needs debugging details. It is not currently accepting answers.
Edit the question to include desired behavior, a specific problem or error, and the shortest code necessary to reproduce the problem. This will help others answer the question.
Closed 6 years ago.
Improve this question
I have a list that looks like this:
['Blake 4', 'Bolt 1', 'De Grasse 3', 'Gatlin 2', 'Simbine 5', 'Youssef Meite 6']
I am trying to print this:
Blake
Bolt
De Grasse
Gatlin
Simbine
Youssef Meite
How do I go about writing a list comprehension that handles this scenario? I tried using split and indexing but nothing I have used has worked.
Assuming all the values keep that pattern, split and join back, ignoring last value in splitted array:
l = ['Blake 4', 'Bolt 1', 'De Grasse 3', 'Gatlin 2', 'Simbine 5', 'Youssef Meite 6']
for x in l:
print(' '.join(x.split()[:-1]))
Otherwise, use regex to eliminate numerals:
import re
l = ['Blake 4', 'Bolt 1', 'De Grasse 3', 'Gatlin 2', 'Simbine 5', 'Youssef Meite 6']
for x in l:
print(re.sub(' \d+', '', x))
list comprehension is useless to print stuff (or any operations where you don't need the return value).
In your case, you could use str.rpartition in a loop to print only the left hand of the rightmost space found in the string:
l =['Blake 4', 'Bolt 1', 'De Grasse 3', 'Gatlin 2', 'Simbine 5', 'Youssef Meite 6']
for s in l:
print(s.rpartition(" ")[0])
Just stripping the last digit off the list would be a good usage of listcomp:
newl = [s.rpartition(" ")[0] for s in l]
Try this:
l = ['Blake 4', 'Bolt 1', 'De Grasse 3', 'Gatlin 2', 'Simbine 5', 'Youssef Meite 6']
for i in l:
print(i[:-2])
Indexing is sufficient for solving your problem.
Based on #Jean-François 's comment, if you are trying to remove all the characters before the last space, you can do this instead:
l = ['Blake 4', 'Bolt 1', 'De Grasse 3', 'Gatlin 2', 'Simbine 5', 'Youssef Meite 6']
for i in l:
print(i[:-i[::-1].index(' ')-1])

How to find unique starts of strings?

If I have a list of strings (eg 'blah 1', 'blah 2' 'xyz fg','xyz penguin'), what would be the best way of finding the unique starts of strings ('xyz' and 'blah' in this case)? The starts of strings can be multiple words.
Your question is confusing, as it is not clear what you really want. So I'll give three answers and hope that one of them at least partially answers your question.
To get all unique prefixes of a given list of string, you can do:
>>> l = ['blah 1', 'blah 2', 'xyz fg', 'xyz penguin']
>>> set(s[:i] for s in l for i in range(len(s) + 1))
{'', 'xyz pe', 'xyz penguin', 'b', 'xyz fg', 'xyz peng', 'xyz pengui', 'bl', 'blah 2', 'blah 1', 'blah', 'xyz f', 'xy', 'xyz pengu', 'xyz p', 'x', 'blah ', 'xyz pen', 'bla', 'xyz', 'xyz '}
This code generates all initial slices of every string in the list and passes these to a set to remove duplicates.
To get all largest initial word sequences smaller than the full string, you could go with:
>>> l = ['a b', 'a c', 'a b c', 'b c']
>>> set(s.rsplit(' ', 1)[0] for s in l)
{'a', 'a b', 'b'}
This code creates a set by splitting all strings at their rightmost space, if available (otherwise the while string will be returned).
On the other hand, to get all unique initial word sequences without considering full strings, you could go for:
>>> l = ['a b', 'a c', 'a b c', 'b c']
>>> set(' '.join(w[:i]) for s in l for w in (s.split(),) for i in range(len(w)))
{'', 'a', 'b', 'a b'}
This code splits each word at any whitespace and concatenates all initial slices of the resulting list, except the largest one. This code has pitfall: it will e.g. convert tabs to spaces. This may or may not be an issue in your case.
If you mean unique first words of strings (words being separated by space), this would be:
arr=['blah 1', 'blah 2' 'xyz fg','xyz penguin']
unique=list(set([x.split(' ')[0] for x in arr]))

Categories

Resources