sorting filenames in numerical order in python [duplicate] - python

This question already has answers here:
Python analog of PHP's natsort function (sort a list using a "natural order" algorithm) [duplicate]
(3 answers)
Is there a built in function for string natural sort?
(23 answers)
Closed 8 years ago.
Hello all. i am learning python recently.
I am having a problem sorting files in numerical order. i have files in order in list:
["1card.txt", "card.txt" , "3card.txt", "52card.txt", "badcard.txt"]
when i simply print the list it doesn't print in order instead it prints: 1card.txt, 10card.txt and so on. so how do i fixed the following code?
file=glob.glob('/directory/*.txt')
sorted(file, key=int)

How about:
import re
def tryint(s):
try:
return int(s)
except ValueError:
return s
def alphanum_key(s):
return [tryint(c) for c in re.split('([0-9]+)', s)]
def sort_nicely(l):
return sorted(l, key=alphanum_key)
Then you could do:
>>> file = ["1card.txt", "card.txt" , "3card.txt", "52card.txt", "badcard.txt"]
>>> sort_nicely(file)
['1card.txt', '3card.txt', '52card.txt', 'badcard.txt', 'card.txt']

A simple solution without regular expression could be:
def sort_int(examp):
pos = 1
while examp[:pos].isdigit():
pos += 1
return examp[:pos-1] if pos > 1 else examp
sorted(files, key=sort_int)
['1card.txt', '3card.txt', '52card.txt', 'badcard.txt', 'card.txt']

files = ["1card.txt", "card.txt" , "3card.txt", "52card.txt", "badcard.txt"]
def nat_sort(s):
'''
provides a sort mechanism for strings that may or
may not lead with an integer
'''
for i, c in enumerate(s):
if not c.isdigit():
break
if not i:
return 0, s
else:
return int(s[:i]), s[i:]
files.sort(key=nat_sort)
And now files is a sorted list:
['badcard.txt', 'card.txt', '1card.txt', '3card.txt', '52card.txt']
To sort keeping the similar letters together, do similarly to above:
def nat_sort(s):
'''
provides a sort mechanism for strings that may or
may not lead with an integer, but groups by strings
starting after integers, if any
'''
for i, c in enumerate(s):
if not c.isdigit():
break
if not i:
return s, 0
else:
return s[i:], int(s[:i])
files.sort(key=nat_sort)
And now files returns:
['badcard.txt', 'card.txt', '1card.txt', '3card.txt', '52card.txt']

Related

How to find maximum value of two numbers in python? [duplicate]

This question already has answers here:
How do I compare version numbers in Python?
(16 answers)
Closed 6 years ago.
I want to get the maximum value from a list.
List = ['1.23','1.8.1.1']
print max(List)
If I print this I'm getting 1.8.1.1 instead of 1.23.
What I am doing wrong?
The easiest way is, to use tuple comparison.
Say:
versions = ['1.23','1.8.1.1']
def getVersionTuple(v):
return tuple(map(int, v.strip().split('.')))
Now you can use, print(max(map(getVersionTuple, versions))) to get the maximum.
EDIT:
You can use '.'.join(map(str, m)) to get the original string (given m holds the max tuple).
These aren't numbers, they are strings, and as such they are sorted lexicographically. Since the character 8 comes after 2, 1.8.1.1 is returned as the maximum.
One way to solve this is to write your own comparing function which takes each part of the string as an int and compares them numerically:
def version_compare(a, b):
a_parts = a.split('.')
b_parts = b.split('.')
a_len = len(a_parts)
b_len = len(b_parts)
length = min(a_len, b_len)
# Compare the parts one by one
for i in range(length):
a_int = int(a_parts[i])
b_int = int(b_parts[i])
# And return on the first nonequl part
if a_int != b_int:
return a_int - b_int
# If we got here, the longest list is the "biggest"
return a_len - b_len
print sorted(['1.23','1.8.1.1'], cmp=version_compare, reverse=True)[0]
A similar approach - assuming these strings are version numbers - is to turn the version string to an integer list:
vsn_list=['1.23', '1.8.1.1']
print sorted( [ [int(v) for v in x.split(".")] for x in vsn_list ] )
When you compare strings, they are compared character by character so any string starting with '2' will sort before a string starting with '8' for example.

Sorting a List in Python by file name [duplicate]

This question already has answers here:
Is there a built in function for string natural sort?
(23 answers)
Closed 7 years ago.
I want to sort a List with Strings by name:
I want the following:
a = ["Datei", "Datei-1", "Datei-2", "Datei-3", "Datei-4", "Datei-5", "Datei-6", "Datei-7", "Datei-8", "Datei-9", "Datei-10", "Datei-11", "Datei-12", "Datei-13", "Datei-14", "Datei-15", "Datei-16"]
I have got the following:
a = ["Datei", "Datei-1", "Datei-10", "Datei-11", "Datei-12", "Datei-13", "Datei-14", "Datei-15", "Datei-16" , "and so on"]
I have tried:
sorted(a)
In [1896]: a = ["Datei", "Datei-1","Datei-2", "Datei-10", "Datei-11", "Datei-12", "Datei-13", "Datei-14", "Datei-15", "Datei-16" , ]
In [1897]: sorted(a, key=lambda v:int(v.split('-')[-1]) if '-' in v else 0)
Out[1897]:
['Datei',
'Datei-1',
'Datei-2',
'Datei-10',
'Datei-11',
'Datei-12',
'Datei-13',
'Datei-14',
'Datei-15',
'Datei-16']
We can sort by splitting the string, and sorting by the numeric value. However your first element is missing a value, so we could put that first, as element 0:
def sort_func(entry):
try:
return int(x.split('-')[1])
except IndexError:
return 0
new_a = sorted(a, key=sort_func)
returns
['Datei', 'Datei-1', 'Datei-2', ..., 'Datei-9', 'Datei-10', 'Datei-11', ...]

python printing common items in a string without duplicating [duplicate]

This question already has answers here:
How can I find all common letters in a set of strings?
(2 answers)
Closed 8 years ago.
I need to make a function that takes two string arguments and returns a string with only the characters that are in both of the argument strings. There should be no duplicate characters in the return value.
this is what I have but I need to make it print things only once if there is more then one
def letter(x,z):
for i in x:
for f in z:
if i == f:
s = str(i)
print(s)
If the order is not important, you can take the intersection & of the set of characters in each word, then join that set into a single string and return it.
def makeString(a, b):
return ''.join(set(a) & set(b))
>>> makeString('sentence', 'santa')
'nts'
Try this
s = set()
def letter(x,z):
for i in x:
for f in z:
if i == f:
s.add(i)
letter("hello","world")
print("".join(s))
It will print 'ol'
If sets aren't your bag for some reason (perhaps you want to maintain the order in one or other of the strings, try:
def common_letters(s1, s2):
unique_letters = []
for letter in s1:
if letter in s2 and letter not in unique_letters:
unique_letters.append(letter)
return ''.join(unique_letters)
print(common_letters('spam', 'arthuprs'))
(Assuming Python 3 for the print()).

Finding substring in python [duplicate]

This question already has answers here:
Python: Find a substring in a string and returning the index of the substring
(7 answers)
Closed 2 years ago.
I'm new to python and I'm trying different methods to accomplish the same task, right now I'm trying to figure out how to get a substring out of a string using a for loop and a while loop. I quickly found that this is a really easy task to accomplish using regex. For example if I have a string: "ABCDEFGHIJKLMNOP" and I want to find if "CDE" exists then print out "CDE" + the rest of the string how would I do that using loops? Right now I'm using:
for i, c in enumerate(myString):
which returns each index and character, which I feel is a start but I can't figure out what to do after. I also know there are a lot of build in functions to find substrings by doing: myString.(Function) but I would still like to know if it's possible doing this with loops.
Given:
s = 'ABCDEFGHIJKLMNOP'
targets = 'CDE','XYZ','JKL'
With loops:
for t in targets:
for i in range(len(s) - len(t) + 1):
for j in range(len(t)):
if s[i + j] != t[j]:
break
else:
print(s[i:])
break
else:
print(t,'does not exist')
Pythonic way:
for t in targets:
i = s.find(t)
if i != -1:
print(s[i:])
else:
print(t,'does not exist')
Output (in both cases):
CDEFGHIJKLMNOP
XYZ does not exist
JKLMNOP
Here's a concise way to do so:
s = "ABCDEFGHIJKLMNOP"
if "CDE" in s:
print s[s.find("CDE")+len("CDE"):]
else:
print s
Prints:
FGHIJKLMNOP
The caveat here is of course, if the sub-string is not found, the original string will be returned.
Why do this? Doing so allows you to check whether or not the original string was found or not. As such, this can be conceptualized into a simple function (warning: no type checks enforced for brevity - it is left up to the reader to implement them as necessary):
def remainder(string, substring):
if substring in string:
return string[string.find(substring)+len(substring):]
else:
return string
Getting the remainder of the string using a for-loop:
n = len(substr)
rest = next((s[i+n:] for i in range(len(s) - n + 1) if s[i:i+n] == substr),
None) # return None if substr not in s
It is equivalent to:
_, sep, rest = s.partition(substr)
if substr and not sep: # substr not in s
rest = None

How to remove duplicates only if consecutive in a string? [duplicate]

This question already has answers here:
Removing elements that have consecutive duplicates
(9 answers)
Closed 3 years ago.
For a string such as '12233322155552', by removing the duplicates, I can get '1235'.
But what I want to keep is '1232152', only removing the consecutive duplicates.
import re
# Only repeated numbers
answer = re.sub(r'(\d)\1+', r'\1', '12233322155552')
# Any repeated character
answer = re.sub(r'(.)\1+', r'\1', '12233322155552')
You can use itertools, here is the one liner
>>> s = '12233322155552'
>>> ''.join(i for i, _ in itertools.groupby(s))
'1232152'
Microsoft / Amazon job interview type of question:
This is the pseudocode, the actual code is left as exercise.
for each char in the string do:
if the current char is equal to the next char:
delete next char
else
continue
return string
As a more high level, try (not actually the implementation):
for s in string:
if s == s+1: ## check until the end of the string
delete s+1
Hint: the itertools module is super-useful. One function in particular, itertools.groupby, might come in really handy here:
itertools.groupby(iterable[, key])
Make an iterator that returns consecutive keys and groups from
the iterable. The key is a function computing a key value for each
element. If not specified or is None, key defaults to an identity
function and returns the element unchanged. Generally, the iterable
needs to already be sorted on the same key function.
So since strings are iterable, what you could do is:
use groupby to collect neighbouring elements
extract the keys from the iterator returned by groupby
join the keys together
which can all be done in one clean line..
First of all, you can't remove anything from a string in Python (google "Python immutable string" if this is not clear).
M first approach would be:
foo = '12233322155552'
bar = ''
for chr in foo:
if bar == '' or chr != bar[len(bar)-1]:
bar += chr
or, using the itertools hint from above:
''.join([ k[0] for k in groupby(a) ])
+1 for groupby. Off the cuff, something like:
from itertools import groupby
def remove_dupes(arg):
# create generator of distinct characters, ignore grouper objects
unique = (i[0] for i in groupby(arg))
return ''.join(unique)
Cooks for me in Python 2.7.2
number = '12233322155552'
temp_list = []
for item in number:
if len(temp_list) == 0:
temp_list.append(item)
elif len(temp_list) > 0:
if temp_list[-1] != item:
temp_list.append(item)
print(''.join(temp_list))
This would be a way:
def fix(a):
list = []
for element in a:
# fill the list if the list is empty
if len(list) == 0:list.append(element)
# check with the last element of the list
if list[-1] != element: list.append(element)
print(''.join(list))
a= 'GGGGiiiiniiiGinnaaaaaProtijayi'
fix(a)
# output => GiniGinaProtijayi
t = '12233322155552'
for i in t:
dup = i+i
t = re.sub(dup, i, t)
You can get final output as 1232152

Categories

Resources