formatting a string where variable does not always exist in Python - python

In one of my projects, I'm trying to parse parcel numbers that sometimes do, and sometimes don't have lot extensions (a three digit code at the end). I could obviously make an if elif structure to handle cases where lot extensions aren't present, but I was hoping to satisfy my curiosity and get some feedback on more efficient ways to write the code.
In it's current state, I end up with an unwanted trailing dash on parcels without a lot extension: '00-000-0000-'
Final parcel number formats should be:
00-000-0000
00-000-0000-000
and the input pins look like:
pin_that_wont_work1 = '00000000'
pin_that_wont_work2 = '000000000'
pin_that_works1 = '00000000000'
pin_that_works2 = '000000000000'
import re
pattern = r'^(\d{1,2})(\d{3})(\d{4})(\d{3})?$'
def parse_pins(pattern, pin):
L = [x for x in re.search(pattern, pin).groups()]
return '{dist}-{map_sheet}-{lot}-{lot_ext}'.format(dist=L[0] if len(L[0]) == 2 else '0'+L[0],
map_sheet=L[1],
lot=L[2],
lot_ext=L[3] if L[3] else '')

import re
pin_pattern = re.compile(r'^(\d{1,2})(\d{3})(\d{4})(\d{3})?$')
pin_formats = {
3: '{0:02d}-{1:03d}-{2:04d}',
4: '{0:02d}-{1:03d}-{2:04d}-{3:03d}'
}
def parse_pin(s):
groups = [int(d) for d in pin_pattern.search(s).groups() if d is not None]
return pin_formats[len(groups)].format(*groups)

Maybe I'm missing something, but couldn't you just put the dash inside the format call?
def parse_pins(pattern, pin):
L = [x for x in re.search(pattern, pin).groups()]
return '{dist}-{map_sheet}-{lot}{lot_ext}'.format(dist=L[0] if len(L[0]) == 2 else '0'+L[0],
map_sheet=L[1],
lot=L[2],
lot_ext='-{0}'.format(L[3]) if L[3] else '')

Throw them in a list, and list_.join('-'). The list should have 3 or 4 values.

Related

Simple way to accept only one form of delimiter and rejecting multiple types?

I was wondering how I could go about making something that accepts a string with only one type of delimiter, something like this:
car:bus:boat
and rejecting something like:
car:bus-boat
I am not really sure about how to go about creating something like this.
Well, first you have to define what are invalid limiters. A hyphen could well be part of a valid hyphenated word or name, and the algorithm wouldn't be able to tell those apart. Supposing you have a list of invalid delimiters, you could just do:
def string_is_valid(s):
invalid_delimiters = ['-', ';']
for d in invalid_delimiters:
if d in s:
return False
return True
s1 = 'car:bus-boat'
print(string_is_valid(s1)) # False
s2 = 'car:bus:boat'
print(string_is_valid(s2)) # True
If, on the other hand, you have a list of delimiters and you want to make sure that only one type is present on the string, you could do this:
def string_is_valid(s):
valid_delimiters = [',', ':', ';']
# For each delimiter in our list...
for d in valid_delimiters:
# If the delimiter is present in the string...
if d in s:
# If any of the other delimiters is in s (and the other delimiter isn't the same one we're currently looking at), return False (it's invalid)
if any([other_d in s and other_d != d for other_d in valid_delimiters]):
return False
return True
s1 = 'car:bus:boat'
print(string_is_valid(s1)) # True
s2 = 'car,bus,boat'
print(string_is_valid(s2)) # True
s3 = 'car,bus;boat'
print(string_is_valid(s3)) # False
you can have an alphabet of "allowed" characters and count whatever is not on it (hence interpreting it as a sep).
e.g.
allowed = list('abcdefghijklmnopqrstuvxwyz')
def validate(string):
if len(set([k for k in string if k not in allowed])) > 1:
return False
return True
Of course you can expand the allowed for capital letters etc.
Use regex expression:
import re
data = re.compile(r'^([a-zA-Z][:][a-zA-Z]){1, }$')
data.match(string)

Find common characters between two strings

I am trying to print the common letters from two different user inputs using a for loop. (I need to do it using a for loop.) I am running into two problems: 1. My statement "If char not in output..." is not pulling unique values. 2. The output is giving me a list of individual letters rather than a single string. I tried the split the output but split ran into a type error.
wrd = 'one'
sec_wrd = 'toe'
def unique_letters(x):
output =[]
for char in x:
if char not in output and char != " ":
output.append(char)
return output
final_output = (unique_letters(wrd) + unique_letters(sec_wrd))
print(sorted(final_output))
You are trying to perform the Set Intersection. Python has set.intersection method for the same. You can use it for your use-case as:
>>> word_1 = 'one'
>>> word_2 = 'toe'
# v join the intersection of `set`s to get back the string
# v v No need to type-cast it to `set`.
# v v Python takes care of it
>>> ''.join(set(word_1).intersection(word_2))
'oe'
set will return the unique characters in your string. set.intersection method will return the characters which are common in both the sets.
If for loop is must for you, then you may use a list comprehension as:
>>> unique_1 = [w for w in set(word_1) if w in word_2]
# OR
# >>> unique_2 = [w for w in set(word_2) if w in word_1]
>>> ''.join(unique_1) # Or, ''.join(unique_2)
'oe'
Above result could also be achieved with explicit for loop as:
my_str = ''
for w in set(word_1):
if w in word_2:
my_str += w
# where `my_str` will hold `'oe'`
For this kind of problem, you're probably better off using sets:
wrd = 'one'
sec_wrd = 'toe'
wrd = set(wrd)
sec_wrd = set(sec_wrd)
print(''.join(sorted(wrd.intersection(sec_wrd))))
I have just solved this today on code signal. It worked for all tests.
def solution(s1, s2):
common_char = ""
for i in s1:
if i not in common_char:
i_in_s1 = s1.count(i)
i_in_s2 = s2.count(i)
comm_num = []
comm_num.append(i_in_s1)
comm_num.append(i_in_s2)
comm_i = min(comm_num)
new_char = i * comm_i
common_char += new_char
return len(common_char)
Function to solve the problem
def find_common_characters(msg1,msg2):
#to remove duplication set() is used.
set1=set(msg1)
set2=set(msg2)
remove={" "}
#if you wish to exclude space
set3=(set1&set2)-remove
msg=''.join(set3)
return msg
Providing input and Calling the function
Provide different values for msg1,msg2 and test your program
msg1="python"
msg2="Python"
common_characters=find_common_characters(msg1,msg2)
print(common_characters)
Here is your one line code if you want the number of common character between them!
def solution(s1,s2):
return sum(min(s1.count(x),s2.count(x)) for x in set(s1))

Python - making a function that would add "-" between letters

I'm trying to make a function, f(x), that would add a "-" between each letter:
For example:
f("James")
should output as:
J-a-m-e-s-
I would love it if you could use simple python functions as I am new to programming. Thanks in advance. Also, please use the "for" function because it is what I'm trying to learn.
Edit:
yes, I do want the "-" after the "s".
Can I try like this:
>>> def f(n):
... return '-'.join(n)
...
>>> f('james')
'j-a-m-e-s'
>>>
Not really sure if you require the last 'hyphen'.
Edit:
Even if you want suffixed '-', then can do like
def f(n):
return '-'.join(n) + '-'
As being learner, it is important to understand for your that "better to concat more than two strings in python" would be using str.join(iterable), whereas + operator is fine to append one string with another.
Please read following posts to explore further:
Any reason not to use + to concatenate two strings?
which is better to concat string in python?
How slow is Python's string concatenation vs. str.join?
Also, please use the "for" function because it is what I'm trying to learn
>>> def f(s):
m = s[0]
for i in s[1:]:
m += '-' + i
return m
>>> f("James")
'J-a-m-e-s'
m = s[0] character at the index 0 is assigned to the variable m
for i in s[1:]: iterate from the second character and
m += '-' + i append - + char to the variable m
Finally return the value of variable m
If you want - at the last then you could do like this.
>>> def f(s):
m = ""
for i in s:
m += i + '-'
return m
>>> f("James")
'J-a-m-e-s-'
text_list = [c+"-" for c in text]
text_strung = "".join(text_list)
As a function, takes a string as input.
def dashify(input):
output = ""
for ch in input:
output = output + ch + "-"
return output
Given you asked for a solution that uses for and a final -, simply iterate over the message and add the character and '-' to an intermediate list, then join it up. This avoids the use of string concatenations:
>>> def f(message)
l = []
for c in message:
l.append(c)
l.append('-')
return "".join(l)
>>> print(f('James'))
J-a-m-e-s-
I'm sorry, but I just have to take Alexander Ravikovich's answer a step further:
f = lambda text: "".join([c+"-" for c in text])
print(f('James')) # J-a-m-e-s-
It is never too early to learn about list comprehension.
"".join(a_list) is self-explanatory: glueing elements of a list together with a string (empty string in this example).
lambda... well that's just a way to define a function in a line. Think
square = lambda x: x**2
square(2) # returns 4
square(3) # returns 9
Python is fun, it's not {enter-a-boring-programming-language-here}.

Replace numbers in string by respective result of a substraction

I have a string like this:
"foo 15 bar -2hello 4 asdf+2"
I'd like to get:
"foo 14 bar -3hello 3 asdf+1"
I would like to replace every number (sequence of digits as signed base-10 integers) with the result of a subtraction executed on each of them, one for each number.
I've written a ~50 LOC function that iterates on characters, separating signs, digits and other text, applying the function and recombining the parts. Although it has one issue my intent with the question is not to review it. Instead I'm trying to ask, what is the pythonic way to solve this, is there an easier way?
For reference, here is my function with the known issue, but my intention is not asking for a review but finding the most pythonic way instead.
edit to answer the wise comment of Janne Karila:
preferred: retain sign if given: +2 should become +1
preferred: zero has no sign: +1 should become 0
preferred: no spaces: asdf - 4 becomes asdf - 3
required: only one sign: -+-2 becomes -+-3
edit on popular demand here is my buggy code :)
DISCLAIMER: Please note I'm not interested in fixing this code. I'm asking if there is a better approach than something like mine.
def apply_to_digits(some_str,handler):
sign = "+"
started = 0
number = []
tmp = []
result = []
for idx,char in enumerate(some_str):
if started:
if not char.isdigit():
if number:
ss = sign + "".join(number)
rewritten = str(handler(int(ss)))
result.append(rewritten)
elif tmp:
result.append("".join(tmp))
number = []
tmp = []
sign = "+"
started = 0
# char will be dealt later
else:
number.append(char)
continue
if char in "-+":
sign = char
started = 1
if tmp:
result.append("".join(tmp))
tmp = []
tmp.append(char)
continue
elif char.isdigit():
started = 1
if tmp:
result.append("".join(tmp))
tmp = []
number.append(char)
else:
tmp.append(char)
if number:
ss = sign + "".join(number)
rewritten = str(handler(int(ss)))
result.append(rewritten)
if tmp:
result.append("".join(tmp)), tmp
return "".join(result)
#
DISCLAIMER: Please note I'm not interested in fixing this code. I'm asking if there is a better approach than something like mine.
You could try using regex, and using re.sub:
>>> pattern = "(-?\d+)|(\+1)"
>>> def sub_one(match):
return str(int(match.group(0)) - 1)
>>> text = "foo 15 bar -2hello 4 asdf+2"
>>> re.sub(pattern, sub_one, text)
'foo 14 bar -3hello 3 asdf+1'
The regex (-?\d+)|(\+1) will either capture an optional - sign and one or more digits, OR the literal sequence +1. That way, the regex will make sure that all of your requirements when converting digits work properly.
The regex (-?\d+) by itself does the right thing most of the time, but the (\+1) exists to make sure that the string +1 always converts to zero, without a sign. If you change your mind, and want +1 to convert to +0, then you can just use only the first part of the regex: (-?d+).
You could probably compress this all into a one-liner if you wanted:
def replace_digits(text):
return re.sub("(-?\d+)|(\+1)", lambda m: str(int(m.group(0)) - 1), text)

Python - packing/unpacking by letters

I'm just starting to learn python and I have this exercise that's puzzling me:
Create a function that can pack or unpack a string of letters.
So aaabb would be packed a3b2 and vice versa.
For the packing part of the function, I wrote the following
def packer(s):
if s.isalpha(): # Defines if unpacked
stack = []
for i in s:
if s.count(i) > 1:
if (i + str(s.count(i))) not in stack:
stack.append(i + str(s.count(i)))
else:
stack.append(i)
print "".join(stack)
else:
print "Something's not quite right.."
return False
packer("aaaaaaaaaaaabbbccccd")
This seems to work all proper. But the assignment says that
if the input has (for example) the letter a after b or c, then
it should later be unpacked into it's original form.
So "aaabbkka" should become a3b2k2a, not a4b2k2.
I hence figured, that I cannot use the "count()" command, since
that counts all occurrences of the item in the whole string, correct?
What would be my options here then?
On to the unpacking -
I've thought of the basics what my code needs to do -
between the " if s.isalpha():" and else, I should add an elif that
checks whether or not the string has digits in it. (I figured this would be
enough to determine whether it's the packed version or unpacked).
Create a for loop and inside of it an if sentence, which then checks for every element:
2.1. If it has a number behind it > Return (or add to an empty stack) the number times the digit
2.2. If it has no number following it > Return just the element.
Big question number 2 - how do I check whether it's a number or just another
alphabetical element following an element in the list? I guess this must be done with
slicing, but those only take integers. Could this be achieved with the index command?
Also - if this is of any relevance - so far I've basically covered lists, strings, if and for
and I've been told this exercise is doable with just those (...so if you wouldn't mind keeping this really basic)
All help appreciated for the newbie enthusiast!
SOLVED:
def packer(s):
if s.isalpha(): # Defines if unpacked
groups= []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s' % (g[0], len(g)>1 and len(g) or '') for g in groups)
else: # Seems to be packed
stack = ""
for i in range(len(s)):
if s[i].isalpha():
if i+1 < len(s) and s[i+1].isdigit():
digit = s[i+1]
char = s[i]
i += 2
while i < len(s) and s[i].isdigit():
digit +=s[i]
i+=1
stack += char * int(digit)
else:
stack+= s[i]
else:
""
return "".join(stack)
print (packer("aaaaaaaaaaaabbbccccd"))
print (packer("a4b19am4nmba22"))
So this is my final code. Almost managed to pull it all off with just for loops and if statements.
In the end though I had to bring in the while loop to solve reading the multiple-digit numbers issue. I think I still managed to keep it simple enough. Thanks a ton millimoose and everyone else for chipping in!
A straightforward solution:
If a char is different, make a new group. Otherwise append it to the last group. Finally count all groups and join them.
def packer(s):
groups = []
last_char = None
for c in s:
if c == last_char:
groups[-1].append(c)
else:
groups.append([c])
last_char = c
return ''.join('%s%s'%(g[0], len(g)) for g in groups)
Another approach is using re.
Regex r'(.)\1+' can match consecutive characters longer than 1. And with re.sub you can easily encode it:
regex = re.compile(r'(.)\1+')
def replacer(match):
return match.group(1) + str(len(match.group(0)))
regex.sub(replacer, 'aaabbkka')
#=> 'a3b2k2a'
I think You can use `itertools.grouby' function
for example
import itertools
data = 'aaassaaasssddee'
groupped_data = ((c, len(list(g))) for c, g in itertools.groupby(data))
result = ''.join(c + (str(n) if n > 1 else '') for c, n in groupped_data)
of course one can make this code more readable using generator instead of generator statement
This is an implementation of the algorithm I outlined in the comments:
from itertools import takewhile, count, islice, izip
def consume(items):
from collections import deque
deque(items, maxlen=0)
def ilen(items):
result = count()
consume(izip(items, result))
return next(result)
def pack_or_unpack(data):
start = 0
result = []
while start < len(data):
if data[start].isdigit():
# `data` is packed, bail
return unpack(data)
run = run_len(data, start)
# append the character that might repeat
result.append(data[start])
if run > 1:
# append the length of the run of characters
result.append(str(run))
start += run
return ''.join(result)
def run_len(data, start):
"""Return the end index of the run of identical characters starting at
`start`"""
return start + ilen(takewhile(lambda c: c == data[start],
islice(data, start, None)))
def unpack(data):
result = []
for i in range(len(data)):
if data[i].isdigit():
# skip digits, we'll look for them below
continue
# packed character
c = data[i]
# number of repetitions
n = 1
if (i+1) < len(data) and data[i+1].isdigit():
# if the next character is a digit, grab all the digits in the
# substring starting at i+1
n = int(''.join(takewhile(str.isdigit, data[i+1:])))
# append the repeated character
result.append(c*n) # multiplying a string with a number repeats it
return ''.join(result)
print pack_or_unpack('aaabbc')
print pack_or_unpack('a3b2c')
print pack_or_unpack('a10')
print pack_or_unpack('b5c5')
print pack_or_unpack('abc')
A regex-flavoured version of unpack() would be:
import re
UNPACK_RE = re.compile(r'(?P<char> [a-zA-Z]) (?P<count> \d+)?', re.VERBOSE)
def unpack_re(data):
matches = UNPACK_RE.finditer(data)
pairs = ((m.group('char'), m.group('count')) for m in matches)
return ''.join(char * (int(count) if count else 1)
for char, count in pairs)
This code demonstrates the most straightforward (or "basic") approach of implementing that algorithm. It's not particularly elegant or idiomatic or necessarily efficient. (It would be if written in C, but Python has the caveats such as: indexing a string copies the character into a new string, and algorithms that seem to copy data excessively might be faster than trying to avoid this if the copying is done in C and the workaround was implemented with a Python loop.)

Categories

Resources