Python: Get number from a non-sepreted string. Using regex? - python

I am working on convert a chemical formular to the proportion of elements by using python.
for example:
I have a list of ["Ti5Cu3", "TiCu2", "Ti2Cu3"] as input list, and want to convert it to [5/(5+3), 1/(1+2), 2/(2+3)].
How can I get the number behind the element mark? I think that re library might be useful? And how to use it to solve my problem?
My solution now is:
def formula2por(s):
if s == "Ti":
return 1
elseif s == "Cu":
return 0
else:
t = re.match(r'Ti(.*)Cu(.*)', s).groups()
# pdb.set_trace()
if t[0] is not '':
x = int(t[0])
else:
x = 1
if t[1] is not '':
y = int(t[1])
else:
y = 1
return round(x/(x+y), 4)
However, I think it is mussy and not a pythonic way for this question.
Thanks.

You can use Ti(\d*)Cu(\d*) to capture the digits and pass the matched object to a replacement function; where the digits can be accessed as the first and second captured group respectively:
lst = ["Ti5Cu3", "TiCu2", "Ti2Cu3"]
​
def div_sub(match):
x, y = match.group(1), match.group(2)
x = 1 if x == '' else int(x)
y = 1 if y == '' else int(y)
return str(x/(x+y))
​
import re
[float(re.sub(r"Ti(\d*)Cu(\d*)", div_sub, s)) for s in lst]
# [0.625, 0.3333333333333333, 0.4]

You can easily handle this if we make the assumption that you have none of the three letter codes. Then,
def calculate(match):
i = 1, tmp = []
while match.group(i) != '':
if match.group(i+1) == '':
tmp.append(1.0)
else:
tmp.append(float(match.group(i+1)))
i += 2
if i == 1:
return 0
else:
return tmp[0]/sum(tmp)
import re
required_list = []
pattern = re.compile("^([A-Z][a-z]?)(\d*\.?\d*)([A-Z][a-z]?)(\d*\.?\d*)")
for compound in lst:
required_list.append(calculate(pattern.match(compound)))
As you can see, this code can easily be adapted to multi element compounds like potassium permanganate and handles floating point indices.

Related

Python Inserting a string

I need to insert a string (character by character) into another string at every 3rd position
For example:- string_1:-wwwaabkccgkll
String_2:- toadhp
Now I need to insert string2 char by char into string1 at every third position
So the output must be wwtaaobkaccdgkhllp
Need in Python.. even Java is ok
So i tried this
Test_str="hiimdumbiknow"
challenge="toadh"
new_st=challenge [k]
Last=list(test_str)
K=0
For i in range(Len(test_str)):
if(i%3==0):
last.insert(i,new_st)
K+=1
and the output i get
thitimtdutmbtiknow
You can split test_str into sub-strings to length 2, and then iterate merging them with challenge:
def concat3(test_str, challenge):
chunks = [test_str[i:i+2] for i in range(0,len(test_str),2)]
result = []
i = j = 0
while i<len(chunks) or j<len(challenge):
if i<len(chunks):
result.append(chunks[i])
i += 1
if j<len(challenge):
result.append(challenge[j])
j += 1
return ''.join(result)
test_str = "hiimdumbiknow"
challenge = "toadh"
print(concat3(test_str, challenge))
# hitimoduambdikhnow
This method works even if the lengths of test_str and challenge are mismatching. (The remaining characters in the longest string will be appended at the end.)
You can split Test_str in to groups of two letters and then re-join with each letter from challenge in between as follows;
import itertools
print(''.join(f'{two}{letter}' for two, letter in itertools.zip_longest([Test_str[i:i+2] for i in range(0,len(Test_str),2)], challenge, fillvalue='')))
Output:
hitimoduambdikhnow
*edited to split in to groups of two rather than three as originally posted
you can try this, make an iter above the second string and iterate over the first one and select which character should be part of the final string according the position
def add3(s1, s2):
def n():
try:
k = iter(s2)
for i,j in enumerate(s1):
yield (j if (i==0 or (i+1)%3) else next(k))
except:
try:
yield s1[i+1:]
except:
pass
return ''.join(n())
def insertstring(test_str,challenge):
result = ''
x = [x for x in test_str]
y = [y for y in challenge]
j = 0
for i in range(len(x)):
if i % 2 != 0 or i == 0:
result += x[i]
else:
if j < 5:
result += y[j]
result += x[i]
j += 1
get_last_element = x[-1]
return result + get_last_element
print(insertstring(test_str,challenge))
#output: hitimoduambdikhnow

Longest Common Prefix from list elements in Python

I have a list as below:
strs = ["flowers", "flow", "flight"]
Now, I want to find the longest prefix of the elements from the list. If there is no match then it should return "". I am trying to use the 'Divide and Conquer' rule for solving the problem. Below is my code:
strs = ["flowers", "flow", "flight"]
firstHalf = ""
secondHalf = ""
def longestCommonPrefix(strs) -> str:
minValue = min(len(i) for i in strs)
length = len(strs)
middle_index = length // 2
firstHalf = strs[:middle_index]
secondHalf = strs[middle_index:]
minSecondHalfValue = min(len(i) for i in secondHalf)
matchingString=[] #Creating a stack to append the matching characters
for i in range(minSecondHalfValue):
secondHalf[0][i] == secondHalf[1][i]
return secondHalf
print(longestCommonPrefix(strs))
I was able to find the mid and divide the list into two parts. Now I am trying to use the second half and get the longest prefix but am unable to do so. I have had created a stack where I would be adding the continuous matching characters and then I would use it to compare with the firstHalf but how can I compare the get the continuous matching characters from start?
Expected output:
"fl"
Just a suggestion would also help. I can give it a try.
No matter what, you need to look at each character from each string in turn (until you find a set of corresponding characters that doesn't match), so there's no benefit to splitting the list up. Just iterate through and break when the common prefix stops being common:
def common_prefix(strs) -> str:
prefix = ""
for chars in zip(*strs):
if len(set(chars)) > 1:
break
prefix += chars[0]
return prefix
print(common_prefix(["flowers", "flow", "flight"])) # fl
Even if this problem has already found its solution, I would like to post my approach (I considered the problem interesting, so started playing around with it).
So, your divide-and-conquer solution would involve a very big task split in many smaller subtasks, whose solutions get processed by other small tasks and so, until you get to the final solution. The typical example is a sum of numbers (let's take 1 to 8), which can be done sequentially (1 + 2 = 3, then 3 + 3 = 6, then 6 + 4 = 10... until the end) or splitting the problem (1 + 2 = 3, 3 + 4 = 7, 5 + 6 = 11, 7 + 8 = 15, then 3 + 7 = 10 and 11 + 15 = 26...). The second approach has the clear advantage that it can be parallelized - increasing the time performance dramatically in the right set up - reason why this goes generally hand in hand with topics like multithreading.
So my approach:
import math
def run(lst):
if len(lst) > 1:
lst_split = [lst[2 * (i-1) : min(len(lst) + 1, 2 * i)] for i in range(1, math.ceil(len(lst)/2.0) + 1)]
lst = [Processor().process(*x) for x in lst_split]
if any([len(x) == 0 for x in lst]):
return ''
return run(lst)
else:
return lst[0]
class Processor:
def process(self, w1, w2 = None):
if w2 != None:
zipped = list(zip(w1, w2))
for i, (x, y) in enumerate(zipped):
if x != y:
return w1[:i]
if i + 1 == len(zipped):
return w1[:i+1]
else:
return w1
return ''
lst = ["flowers", "flow", "flight", "flask", "flock"]
print(run(lst))
OUTPUT
fl
If you look at the run method, the passed lst gets split in couples, which then get processed (this is where you could start multiple threads, but let's not focus on that). The resulting list gets reprocessed until the end.
An interesting aspect of this problem is: if, after a pass, you get one empty match (two words with no common start), you can stop the reduction, given that you know the solution already! Hence the introduction of
if any([len(x) == 0 for x in lst]):
return ''
I don't think the functools.reduce offers the possibility of stopping the iteration in case a specific condition is met.
Out of curiosity: another solution could take advantage of regex:
import re
pattern = re.compile("(\w+)\w* \\1\w*")
def find(x, y):
v = pattern.findall(f'{x} {y}')
return v[0] if len(v) else ''
reduce(find, lst)
OUTPUT
'fl'
Sort of "divide and conquer" :
solve for 2 strings
solve for the other strings
def common_prefix2_(s1: str, s2: str)-> str:
if not s1 or not s2: return ""
for i, z in enumerate(zip(s1,s2)):
if z[0] != z[1]:
break
else:
i += 1
return s1[:i]
from functools import reduce
def common_prefix(l:list):
return reduce(common_prefix2_, l[1:], l[0]) if len(l) else ''
Tests
for l in [["flowers", "flow", "flight"],
["flowers", "flow", ""],
["flowers", "flow"],
["flowers", "xxx"],
["flowers" ],
[]]:
print(f"{l if l else '[]'}: '{common_prefix(l)}'")
# output
['flowers', 'flow', 'flight']: 'fl'
['flowers', 'flow', '']: ''
['flowers', 'flow']: 'flow'
['flowers', 'xxx']: ''
['flowers']: 'flowers'
[]: ''

combinations with python

I am trying to generate combination of ID's
Input: cid = SPARK
oupout: list of all the comibnations as below, position of each element should be constant. I am a beginner in python any help here is much appreciated.
'S****'
'S***K'
'S**R*'
'S**RK'
'S*A**'
'S*A*K'
'S*AR*'
'S*ARK'
'SP***'
'SP**K'
'SP*R*'
'SP*RK'
'SPA**'
'SPA*K'
'SPAR*'
'SPARK'
I tried below, I need a dynamic code:
cid = 'SPARK'
# print(cid.replace(cid[1],'*'))
# cu_len = lenth of cid [SPARK] here which is 5
# com_stars = how many stars i.e '*' or '**'
def cubiod_combo_gen(cu_len, com_stars, j_ite, i_ite):
cubiodList = []
crange = cu_len
i = i_ite #2 #3
j = j_ite #1
# com_stars = ['*','**','***','****']
while( i <= crange):
# print(j,i)
if len(com_stars) == 1:
x = len(com_stars)
n_cid = cid.replace(cid[j:i],com_stars)
i += x
j += x
cubiodList.append(n_cid)
elif len(com_stars) == 2:
x = len(com_stars)
n_cid = cid.replace(cid[j:i],com_stars)
i += x
j += x
cubiodList.append(n_cid)
elif len(com_stars) == 3:
x = len(com_stars)
n_cid = cid.replace(cid[j:i],com_stars)
i += x
j += x
cubiodList.append(n_cid)
return cubiodList
#print(i)
#print(n_cid)
# for item in cubiodList:
# print(item)
print(cubiod_combo_gen(5,'*',1,2))
print(cubiod_combo_gen(5,'**',1,3))
For every character in your given string, you can represent it as a binary string, using a 1 for a character that stays the same and a 0 for a character to replace with an asterisk.
def cubiod_combo_gen(string, count_star):
str_list = [char0 for char0 in string] # a list with the characters of the string
itercount = 2 ** (len(str_list)) # 2 to the power of the length of the input string
results = []
for config in range(itercount):
# return a string of i in binary representation
binary_repr = bin(config)[2:]
while len(binary_repr) < len(str_list):
binary_repr = '0' + binary_repr # add padding
# construct a list with asterisks
i = -1
result_list = str_list.copy() # soft copy, this made me spend like 10 minutes debugging lol
for char in binary_repr:
i += 1
if char == '0':
result_list[i] = '*'
if char == '1':
result_list[i] = str_list[i]
# now we have a possible string value
if result_list.count('*') == count_star:
# convert back to string and add to list of accepted strings
result = ''
for i in result_list:
result = result + i
results.append(result)
return results
# this function returns the value, so you have to use `print(cubiod_combo_gen(args))`
# comment this stuff out if you don't want an interactive user prompt
string = input('Enter a string : ')
count_star = input('Enter number of stars : ')
print(cubiod_combo_gen(string, int(count_star)))
It iterates through 16 characters in about 4 seconds and 18 characters in about 17 seconds. Also you made a typo on "cuboid" but I left the original spelling
Enter a string : DPSCT
Enter number of stars : 2
['**SCT', '*P*CT', '*PS*T', '*PSC*', 'D**CT', 'D*S*T', 'D*SC*', 'DP**T', 'DP*C*', 'DPS**']
As a side effect of this binary counting, the list is ordered by the asterisks, where the earliest asterisk takes precedence, with next earliest asterisks breaking ties.
If you want a cumulative count like 1, 4, 5, and 6 asterisks from for example "ABCDEFG", you can use something like
star_counts = (1, 4, 5, 6)
string = 'ABCDEFG'
for i in star_counts:
print(cubiod_combo_gen(string, star_counts))
If you want the nice formatting you have in your answer, try adding this block at the end of your code:
def formatted_cuboid(string, count_star):
values = cubiod_combo_gen(string, count_star)
for i in values:
print(values[i])
I honestly do not know what your j_ite and i_ite are, but it seems like they have no use so this should work. If you still want to pass these arguments, change the first line to def cubiod_combo_gen(string, count_star, *args, **kwargs):
I am not sure what com_stars does, but to produce your sample output, the following code does.
def cuboid_combo(cid):
fill_len = len(cid)-1
items = []
for i in range(2 ** fill_len):
binary = f'{i:0{fill_len}b}'
#print(binary, 'binary', 'num', i)
s = cid[0]
for idx, bit in enumerate(binary,start=1):
if bit == '0':
s += '*'
else: # 'bit' == 1
s += cid[idx]
items.append(s)
return items
#cid = 'ABCDEFGHI'
cid = 'DPSCT'
result = cuboid_combo(cid)
for item in result:
print(item)
Prints:
D****
D***T
D**C*
D**CT
D*S**
D*S*T
D*SC*
D*SCT
DP***
DP**T
DP*C*
DP*CT
DPS**
DPS*T
DPSC*
DPSCT

How to change uppercase & lowercase alternatively in a string?

I want to create a new string from a given string with alternate uppercase and lowercase.
I have tried iterating over the string and changing first to uppercase into a new string and then to lower case into another new string again.
def myfunc(x):
even = x.upper()
lst = list(even)
for itemno in lst:
if (itemno % 2) !=0:
even1=lst[1::2].lowercase()
itemno=itemno+1
even2=str(even1)
print(even2)
Since I cant change the given string I need a good way of creating a new string alternate caps.
Here's a onliner
"".join([x.upper() if i%2 else x.lower() for i,x in enumerate(mystring)])
You can simply randomly choose for each letter in the old string if you should lowercase or uppercase it, like this:
import random
def myfunc2(old):
new = ''
for c in old:
lower = random.randint(0, 1)
if lower:
new += c.lower()
else:
new += c.upper()
return new
Here's one that returns a new string using with alternate caps:
def myfunc(x):
seq = []
for i, v in enumerate(x):
seq.append(v.upper() if i % 2 == 0 else v.lower())
return ''.join(seq)
This does the job also
def foo(input_message):
c = 0
output_message = ""
for m in input_message:
if (c%2==0):
output_message = output_message + m.lower()
else:
output_message = output_message + m.upper()
c = c + 1
return output_message
Here's a solution using itertools which utilizes string slicing:
from itertools import chain, zip_longest
x = 'inputstring'
zipper = zip_longest(x[::2].lower(), x[1::2].upper(), fillvalue='')
res = ''.join(chain.from_iterable(zipper))
# 'iNpUtStRiNg'
Using a string slicing:
from itertools import zip_longest
s = 'example'
new_s = ''.join(x.upper() + y.lower()
for x, y in zip_longest(s[::2], s[1::2], fillvalue=''))
# ExAmPlE
Using an iterator:
s_iter = iter(s)
new_s = ''.join(x.upper() + y.lower()
for x, y in zip_longest(s_iter, s_iter, fillvalue=''))
# ExAmPlE
Using the function reduce():
def func(x, y):
if x[-1].islower():
return x + y.upper()
else:
return x + y.lower()
new_s = reduce(func, s) # eXaMpLe
This code also returns alternative caps string:-
def alternative_strings(strings):
for i,x in enumerate(strings):
if i % 2 == 0:
print(x.upper(), end="")
else:
print(x.lower(), end= "")
return ''
print(alternative_strings("Testing String"))
def myfunc(string):
# Un-hash print statements to watch python build out the string.
# Script is an elementary example of using an enumerate function.
# An enumerate function tracks an index integer and its associated value as it moves along the string.
# In this example we use arithmetic to determine odd and even index counts, then modify the associated variable.
# After modifying the upper/lower case of the character, it starts adding the string back together.
# The end of the function then returns back with the new modified string.
#print(string)
retval = ''
for space, letter in enumerate(string):
if space %2==0:
retval = retval + letter.upper()
#print(retval)
else:
retval = retval + letter.lower()
#print(retval)
print(retval)
return retval
myfunc('Thisisanamazingscript')

Add a start index to a string index generator

I'm currently learning to create generators and to use itertools. So I decided to make a string index generator, but I'd like to add some parameters such as a "start index" allowing to define where to start generating the indexes.
I came up with this ugly solution which can be very long and not efficient with large indexes:
import itertools
import string
class StringIndex(object):
'''
Generator that create string indexes in form:
A, B, C ... Z, AA, AB, AC ... ZZ, AAA, AAB, etc.
Arguments:
- startIndex = string; default = ''; start increment for the generator.
- mode = 'lower' or 'upper'; default = 'upper'; is the output index in
lower or upper case.
'''
def __init__(self, startIndex = '', mode = 'upper'):
if mode == 'lower':
self.letters = string.ascii_lowercase
elif mode == 'upper':
self.letters = string.ascii_uppercase
else:
cmds.error ('Wrong output mode, expected "lower" or "upper", ' +
'got {}'.format(mode))
if startIndex != '':
if not all(i in self.letters for i in startIndex):
cmds.error ('Illegal characters in start index; allowed ' +
'characters are: {}'.format(self.letters))
self.startIndex = startIndex
def getIndex(self):
'''
Returns:
- string; current string index
'''
startIndexOk = False
x = 1
while True:
strIdMaker = itertools.product(self.letters, repeat = x)
for stringList in strIdMaker:
index = ''.join([s for s in stringList])
# Here is the part to simpify
if self.startIndex:
if index == self.startIndex:
startIndexOk = True
if not startIndexOk:
continue
###
yield index
x += 1
Any advice or improvement is welcome. Thank you!
EDIT:
The start index must be a string!
You would have to do the arithmetic (in base 26) yourself to avoid looping over itertools.product. But you can at least set x=len(self.startIndex) or 1!
Old (incorrect) answer
If you would do it without itertools (assuming you start with a single letter), you could do the following:
letters = 'abcdefghijklmnopqrstuvwxyz'
def getIndex(start, case):
lets = list(letters.lower()) if case == 'lower' else list(letters.upper())
# default is 'upper', but can also be an elif
for r in xrange(0,10):
for l in lets[start:]:
if l.lower() == 'z':
start = 0
yield ''.join(lets[:r])+l
I run until max 10 rows of letters are created, but you could ofcourse use an infinite while loop such that it can be called forever.
Correct answer
I found the solution in a different way: I used a base 26 number translator (based on (and fixxed since it didn't work perfectly): http://quora.com/How-do-I-write-a-program-in-Python-that-can-convert-an-integer-from-one-base-to-another)
I uses itertools.count() to count and just loops over all the possibilities.
The code:
import time
from itertools import count
def toAlph(x, letters):
div = 26
r = '' if x > 0 else letters[0]
while x > 0:
r = letters[x % div] + r
if (x // div == 1) and (x % div == 0):
r = letters[0] + r
break
else:
x //= div
return r
def getIndex(start, case='upper'):
alphabet = 'abcdefghijklmnopqrstuvwxyz'
letters = alphabet.upper() if case == 'upper' else alphabet
started = False
for num in count(0,1):
l = toAlph(num, letters)
if l == start:
started = True
if started:
yield l
iterator = getIndex('AA')
for i in iterator:
print(i)
time.sleep(0.1)

Categories

Resources