regex error: bad character range 8-1 at position 6 - python

I am trying to map a new value based on characters in a column. These are digits stored as a string.
If the value of the first character is 1 and the second character is 2-10, then label this as "Lost" etc.
print(x[['Segment']].head(15))
Segment
0 12
1 12
2 22
3 14
4 54
5 12
6 12
7 56
8 12
9 12
10 22
11 12
12 310
13 22
14 53
The mapping I will use:
segt_map = {
r'[4-5][8-10]': 'Champion',
r'[4-5][4-7]': 'Loyal',
r'[4-5][2-3]': 'Recent',
r'3[6-10]': 'High Potential',
r'3[2-5]': 'Need Nurturing',
r'2[6-10]': 'Cannot Lose',
r'2[2-5]': 'At Risk',
r'1[2-10]': 'Lost',
}
And trying to implement it:
x['Label'] = x['Segment'].replace(segt_map, regex=True)
error: bad character range 8-1 at position 6
I am not sure what my error is, and I've checked the related questions, they're not similar to mine. I looked at position 6, you can see it above.. I can't find a range 8-1 ? So what is happening here?
The full error trace is quite long, but if it's needed I can post it.

Character classes match on characters instead of numbers. [8-10] means [8-1] or [0]. You want this:
segt_map = {
r'[4-5]([8-9]|10)': 'Champion',
r'[4-5][4-7]': 'Loyal',
r'[4-5][2-3]': 'Recent',
r'3([6-9]|10)': 'High Potential',
r'3[2-5]': 'Need Nurturing',
r'2([6-9]|10)': 'Cannot Lose',
r'2[2-5]': 'At Risk',
r'1([2-9]|10)': 'Lost',
}

You try to use regex to detect number ranges, whereas it's a tool for processing text that knows nothing about numbers. You cannot use range 8-10, because ranges are for characters. It's ok to use [1-9], because it's about chars, but[1-10] is incorrect. Instead you should parse text ro numbers and then compare them to rewuired ranges.

Related

iterate over a list of chemical names using ChemSpiPy to get canonical smiles

I have list of chemical names called Phenolics
Phenolics
0 Dihydroquercetin 7,30-dimethyl ether
1 Artelin
2 Esculin 7- methylether (methylesculin)
3 Esculin
4 Scopoletin (7- hydroxy-6- methoxycoumarin)
5 Axillarin
6 Esculetin
7 Isoscopoletin
8 6-Beta-D-glucosyl-7- methoxycoumarin
9 5,40Dihydroxy- 3,6,7,30- tetramethoxyflavone
10 Apigenin
11 Luteolin-7-O- glucoside
12 Magnoloside
13 Penduletin
14 Quercetagetin
15 Quercetagetin-3,6,7- trimethyl ether
16 Quercetin
17 Quercetin 7,30- dimethyl ether (Rhamnazine)
18 Scoparone
19 Skimmin
20 Umbelliferone
21 Apigenin 40-methyl ether
and I would like to run a search on chemspipy to obtain the canonical smiles of these chemical names.
I tried
for result in cs.search(Phenolics):
print(result.smiles)
and it doesn't work, I get no results.
I can not test it because I have no API key, but this should search for the name and give you a result about it. How to then get a canonical SMILES from that result is another question I can't answer:
from chemspipy import ChemSpider
cs = ChemSpider('<YOUR-API-KEY>')
Phenolics = ['Artelin', 'Esculin', 'Axillarin']
for name in Phenolics:
result = cs.search(name)
print(name, result)

add str to start of each row value

I have a pandas dataframe
df = pd.DataFrame({'num_legs': [1, 34, 34, 104 , 6542, 6542 , 48383]})
I want to append a str before each row`s value.
The str is ZZ00000
The catch is that the row data must always = 7 characters in total
so the desired output will be
df = num_legs
0 ZZ00001
1 ZZ00034
2 ZZ00034
3 ZZ00104
4 ZZ06542
5 ZZ06542
6 ZZ48383
As the column is of type int I was thinking of changing to a str type and then possibly using regex and some str manipulation to achieve my desired outcome..
Is there a more streamlined way possibly using a function with pandas?
Use
df['num_legs'] = "ZZ" + df['num_legs'].astype(str).str.rjust(5, "0")
You could use string concatenation here:
df["num_legs"] = 'ZZ' + ('00000' + str(df["num_legs"]))[-5:]
The idea here is that, given a num_legs integer value of say 6542, we first form the following string:
000006542
Then we retain the right 5 characters, leaving 06542.
You could also pad using the following:
'ZZ' + df['num_legs'].astype(str).str.pad(width=5, side='left', fillchar='0')
Here you pad your current number (converted to string) on the left with zeros up to a width of 5 and conctatenate that to your 'ZZ' string.
Use pythons .zfill()
df['num_legs']='zz'+df['num_legs'].astype(str).str.zfill(7)
You could try this - using a regex, and a for loop: for strings, for loops are more efficient, usually, than pandas String methods :
import re
variable = "ZZ00000"
df["new_val"] = [re.sub("\d" + f"{{{len(num)}}}$", num, variable)
for num in df.num_legs.astype(str)]
df
num_legs new_val
0 1 ZZ00001
1 34 ZZ00034
2 34 ZZ00034
3 104 ZZ00104
4 6542 ZZ06542
5 6542 ZZ06542
6 48383 ZZ48383
out = []
for nl in df["num_legs"]:
out.append(f'ZZ{nl:05d}')
The rest is up to your output manipulation

Encode data to HEX and get an L at the end in Python 2.7. Why?

I ask a Measurement Device to give me some Data. At first it tells me how many bytes of data are in the storage. It is always 14. Then it gives me the data which i have to encode into hex. It is Python 2.7 can´t use newer versions. Line 6 to 10 tells the Device to give me the measured data.
Line 12 to 14 is the encoding to Hex. In other Programs it works. but when i print result(Line 14) then i get a Hex number with 13 Bytes PLUS 1 which can not be correct because it has an L et the end. I guess it is some LONG or whatever. and i dont need the last Byte. but i do think it changes the Data too, which is picked out from Line 15 and up. at first in Hex. Then it is converted into Int.
Is it possible that the L has an effect on the Data or not?
How can i fix it?
1 ap.write(b"ML\0")
rmemb = ap.read(2)
print(rmemb)
rmemb = int(rmemb)+1
5 rmem = rmemb #must be and is 14 Bytes
addmem = ("MR:%s\0" % rmem)
# addmem = ("MR:14\0")
ap.write(addmem.encode())
10 time.sleep(1)
test = ap.read(rmem)
result = hex(int(test.encode('hex'), 16))
print(result)
15 ftflash = result[12:20]
ftbg = result[20:28]
print(ftflash)
print(ftbg)
ftflash = int(ftflash, 16)
20 # print(ftflash)
ftbg = int(ftbg, 16)
# print(ftbg)
OUTPUT:
14
0x11bd5084c0b000001ce00000093L
b000001c
e0000009
Python 2 has two built-in integer types, int and long. hex returns a string representing a Python hexadecimal literal, and in Python 2, that means that longs get an L at the end, to signify that it's a long.

How to calculate the verification digit of the Tax ID in the country of Paraguay (calcular digito verificador del RUC)

In the country of Paraguay (South America) each taxpayer has a Tax ID (called RUC: Registro Único del Contribuyente) assigned by the government (Ministerio de Hacienda, Secretaría de Tributación).
This RUC is a number followed by a verification digit (dígito verificador), for example 123456-0. The government tells you the verification digit when you request your RUC.
Is there a way for me to calculate the verification digit based on the RUC? Is it a known formula?
In my case, I have a database of suppliers and customers, collected over the years by several employees of the company.
Now I need to run checks to see if all the RUCs were entered correctly or if there are typing mistakes.
My preference would be a Python solution, but I'll take whatever solutions I get to point me in the right direction.
Edit: This is a self-answer to share knowledge that took me hours/days to find. I marked this question as "answer your own question" (don't know if that changes anything).
The verification digit of the RUC is calculated using formula very similar (but not equal) to a method called Modulo 11; that is at least the info I got reading the following tech sites (content is in Spanish):
https://www.yoelprogramador.com/funncion-para-calcular-el-digito-verificador-del-ruc/
http://groovypy.wikidot.com/blog:02
https://es.wikipedia.org/wiki/C%C3%B3digo_de_control#M.C3.B3dulo_11
I analyzed the solutions provided in the mentioned pages and ran my own tests against a list of RUCs and their known verification digits, which led me to a final formula that returns the expected output, but which is DIFFERENT from the solutions in the mentioned links.
The final formula I got to calculate the verification digit of the RUC is shown in this example (80009735-1):
Multiply each digit of the RUC (without considering the verification digit) by a factor based on the position of the digit within the RUC (starting from the right side of the RUC) and sum all the results of these multiplications:
RUC: 8 0 0 0 9 7 3 5
Position: 7 6 5 4 3 2 1 0
Multiplications: 8x(7+2) 0x(6+2) 0x(5+2) 0x(4+2) 9x(3+2) 7x(2+2) 3x(1+2) 5x(0+2)
Results: 72 0 0 0 45 28 9 10
Sum of results: 164
Divide the sum by 11 and use the remainder of the division to determine the verification digit:
If the remainder is greater than 1, the the verification digit is 11 - remainder
If the remainder is 0 or 1, the the verification digit is 0
In out example:
Sum of results: 164
Division: 164 / 11 ==> quotient 14, remainder 10
Verification digit: 11 - 10 ==> 1
Here is my Python version of the formula:
def calculate_dv_of_ruc(input_str):
# assure that we have a string
if not isinstance(input_str, str):
input_str = str(input_str)
# try to convert to 'int' to validate that it contains only digits.
# I suspect that this is faster than checking each char independently
int(input_str)
the_sum = 0
for i, c in enumerate(reversed(input_str)):
the_sum += (i + 2) * int(c)
base = 11
_, rem = divmod(the_sum, base)
if rem > 1:
dv = base - rem
else:
dv = 0
return dv
Testing this function it returns the expected results, raising errors when the input has other characters than digits:
>>> calculate_dv_of_ruc(80009735)
1
>>> calculate_dv_of_ruc('80009735')
1
>>> calculate_dv_of_ruc('80009735A')
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "<input>", line 8, in calculate_dv_of_ruc
ValueError: invalid literal for int() with base 10: '80009735A'

How can I create a decremented numberPyramid(num) in Python?

I'm trying to create a pyramid that looks like the picture below(numberPyramid(6)), where the pyramid isn't made of numbers but actually a black space with the numbers around it. The function takes in a parameter called "num" and which is the number of rows in the pyramid. How would I go about doing this? I need to use a for loop but I'm not sure how I implement it. Thanks!
666666666666
55555 55555
4444 4444
333 333
22 22
1 1
def pyramid(num_rows, block=' ', left='', right=''):
for idx in range(num_rows):
print '{py_layer:{num_fill}{align}{width}}'.format(
py_layer='{left}{blocks}{right}'.format(
left=left,
blocks=block * (idx*2),
right=right),
num_fill=format((num_rows - idx) % 16, 'x'),
align='^',
width=num_rows * 2)
This works by using python's string format method in an interesting way. The spaces are the string to be printed, and the number used as the character to fill in the rest of the row.
Using the built-in format() function to chop off the leading 0x in the hex string lets you build pyramids up to 15.
Sample:
In [45]: pyramid(9)
999999999999999999
88888888 88888888
7777777 7777777
666666 666666
55555 55555
4444 4444
333 333
22 22
1 1
Other pyramid "blocks" could be interesting:
In [52]: pyramid(9, '_')
999999999999999999
88888888__88888888
7777777____7777777
666666______666666
55555________55555
4444__________4444
333____________333
22______________22
1________________1
With the added left and right options and showing hex support:
In [57]: pyramid(15, '_', '/', '\\')
ffffffffffffff/\ffffffffffffff
eeeeeeeeeeeee/__\eeeeeeeeeeeee
dddddddddddd/____\dddddddddddd
ccccccccccc/______\ccccccccccc
bbbbbbbbbb/________\bbbbbbbbbb
aaaaaaaaa/__________\aaaaaaaaa
99999999/____________\99999999
8888888/______________\8888888
777777/________________\777777
66666/__________________\66666
5555/____________________\5555
444/______________________\444
33/________________________\33
2/__________________________\2
/____________________________\
First the code:
max_depth = int(raw_input("Enter max depth of pyramid (2 - 9): "))
for i in range(max_depth, 0, -1):
print str(i)*i + " "*((max_depth-i)*2) + str(i)*i
Output:
(numpyramid)macbook:numpyramid joeyoung$ python numpyramid.py
Enter max depth of pyramid (2 - 9): 6
666666666666
55555 55555
4444 4444
333 333
22 22
1 1
How this works:
Python has a built-in function named range() which can help you build the iterator for your for-loop. You can make it decrement instead of increment by passing in -1 as the 3rd argument.
Our for loop will start at the user supplied max_depth (6 for our example) and i will decrement by 1 for each iteration of the loop.
Now the output line should do the following:
Print out the current iterator number (i) and repeat it itimes.
Figure out how much white space to add in the middle.
This will be the max_depth minus the current iterator number, then multiply that result by 2 because you'll need to double the whitespace for each iteration
Attach the whitespace to the first set of repeated numbers.
Attach a second set of repeated numbers: the current iterator number (i) repeated itimes
When your print characters, they can be repeated by following the character with an asterisk * and the number of times you want the character to be repeated.
For example:
>>> # Repeats the character 'A' 5 times
... print "A"*5
AAAAA

Categories

Resources