One hot encoding of custom built vocabolary - python

I have charset as follows.
charset =set([ '$', '^', '#', '(', ')', '-', '.', '/', '1', '2', '3', '4', '5', '6', '7', '=', 'Br',
'C', 'Cl', 'F', 'I', 'N', 'O', 'P', 'S', '[2H]', '[Br-]', '[C##H]', '[C##]', '[C#H]', '[C#]',
'[Cl-]', '[H]', '[I-]', '[N+]', '[N-]', '[N#+]', '[N##+]', '[NH+]', '[NH2+]', '[NH3+]', '[N]',
'[Na+]', '[O-]', '[P+]', '[S+]', '[S-]', '[S#+]', '[S##+]', '[SH]', '[Si]', '[n+]', '[n-]',
'[nH+]', '[nH]', '[o+]', '[se]', '\\', 'c', 'n', 'o', 's', '!', 'E'])
On the basis of this charset, I create char_to_int as follows.
char_to_int = dict((c,i) for i,c in enumerate(charset))
{'[nH]': 0, '[2H]': 1, '2': 2, 'N': 3, 'Cl': 4, 'c': 5, '$': 6,
'O': 7, '(': 8, '6': 9, 's': 10, '[S#+]': 11, '[C##H]': 12, 'C':
13, '[nH+]': 14, '/': 15, '[NH+]': 16, '[Br-]': 17, '[Si]': 18,
'4': 19, '[N#+]': 20, '[se]': 21, 'P': 22, '[SH]': 23, '[N+]':
24, '[N]': 25, '^': 26, '5': 27, '7': 28, 'n': 29, '!': 30,
'\': 31, '[n-]': 32, 'S': 33, '[NH3+]': 34, '#': 35, 'I': 36,
'[O-]': 37, '1': 38, '[NH2+]': 39, '[S##+]': 40, 'Br': 41, 'F':
42, '[Na+]': 43, 'E': 44, '[S-]': 45, '.': 46, ')': 47, '[C#]':
48, '=': 49, '3': 50, '-': 51, '[C#H]': 52, '[Cl-]': 53, '[I-]':
54, '[H]': 55, '[P+]': 56, '[S+]': 57, 'o': 58, '[N##+]': 59,
'[N-]': 60, '[n+]': 61, '[o+]': 62, '[C##]': 63}
and int_to_char as follows.
int_to_char = dict((i,c) for i,c in enumerate(charset))
{0: '[nH]', 1: '[2H]', 2: '2', 3: 'N', 4: 'Cl', 5: 'c', 6: '$',
7: 'O', 8: '(', 9: '6', 10: 's', 11: '[S#+]', 12: '[C##H]', 13:
'C', 14: '[nH+]', 15: '/', 16: '[NH+]', 17: '[Br-]', 18: '[Si]',
19: '4', 20: '[N#+]', 21: '[se]', 22: 'P', 23: '[SH]', 24:
'[N+]', 25: '[N]', 26: '^', 27: '5', 28: '7', 29: 'n', 30: '!',
31: '\', 32: '[n-]', 33: 'S', 34: '[NH3+]', 35: '#', 36: 'I',
37: '[O-]', 38: '1', 39: '[NH2+]', 40: '[S##+]', 41: 'Br', 42:
'F', 43: '[Na+]', 44: 'E', 45: '[S-]', 46: '.', 47: ')', 48:
'[C#]', 49: '=', 50: '3', 51: '-', 52: '[C#H]', 53: '[Cl-]', 54:
'[I-]', 55: '[H]', 56: '[P+]', 57: '[S+]', 58: 'o', 59: '[N##+]',
60: '[N-]', 61: '[n+]', 62: '[o+]', 63: '[C##]'}
I have a string which I want to convert to one hot encoding on the basis of char_to_int and int_to_char.
string = 'N[C#H]1C[C##H](N2Cc3nn4cccnc4c3C2)CC[C##H]1c1cc(F)c(F)cc1F'
Is there any efficient way which uses the self defined char_to_int and int_to_char to convert a string to one hot vector?

from itertools import chain, repeat, islice
import re
string = 'N[C#H]1C[C##H](N2Cc3nn4cccnc4c3C2)CC[C##H]1c1cc(F)c(F)cc1F'
items_list=[ '$', '^', '#', '(', ')', '-', '.', '/', '1', '2', '3', '4', '5', '6', '7', '=', 'Br',
'C', 'Cl', 'F', 'I', 'N', 'O', 'P', 'S', '[2H]', '[Br-]', '[C##H]', '[C##]', '[C#H]', '[C#]',
'[Cl-]', '[H]', '[I-]', '[N+]', '[N-]', '[N#+]', '[N##+]', '[NH+]', '[NH2+]', '[NH3+]', '[N]',
'[Na+]', '[O-]', '[P+]', '[S+]', '[S-]', '[S#+]', '[S##+]', '[SH]', '[Si]', '[n+]', '[n-]',
'[nH+]', '[nH]', '[o+]', '[se]', '\\', 'c', 'n', 'o', 's', '!', 'E']
charset = set(items_list)
char_to_int = dict((c,i) for i,c in enumerate(charset))
pattern = '|'.join(re.escape(item) for item in items_list)
tokens = re.findall(pattern, string)
x=[char_to_int[k] for k in tokens]
Here, xis one hot encoded.
x=[3, 52, 38, 13, 12, 8, 3, 2, 13, 5, 50, 29, 29, 19, 5, 5, 5, 29, 5, 19, 5, 50, 13, 2, 47, 13, 13, 12, 38, 5, 38, 5, 5, 8, 42, 47, 5, 8, 42, 47, 5, 5, 38, 42]

Related

Python KeyError, qr code barcode reader on raspberry pi

I'm a Korean. English translation may be wrong.
I am making a program that can output data in Python using a qr reader that is received as a usb input from a Raspberry Pi 4.
The code below raises KeyError:74 . What's the workaround?
ss += hid[int(ord(c))]
Below is the full code.
import sys
hid = {4: 'a', 5: 'b', 6: 'c', 7: 'd', 8: 'e', 9: 'f', 10: 'g', 11: 'h', 12: 'i', 13: 'j', 14: 'k', 15: 'l', 16: 'm',
17: 'n', 18: 'o', 19: 'p', 20: 'q', 21: 'r', 22: 's', 23: 't', 24: 'u', 25: 'v', 26: 'w', 27: 'x', 28: 'y',
29: 'z', 30: '1', 31: '2', 32: '3', 33: '4', 34: '5', 35: '6', 36: '7', 37: '8', 38: '9', 39: '0', 44: ' ',
45: '-', 46: '=', 47: '[', 48: ']', 49: '\\', 51: ';', 52: '\'', 53: '~', 54: ',', 55: '.', 56: '/'}
hid2 = {4: 'A', 5: 'B', 6: 'C', 7: 'D', 8: 'E', 9: 'F', 10: 'G', 11: 'H', 12: 'I', 13: 'J', 14: 'K', 15: 'L', 16: 'M',
17: 'N', 18: 'O', 19: 'P', 20: 'Q', 21: 'R', 22: 'S', 23: 'T', 24: 'U', 25: 'V', 26: 'W', 27: 'X', 28: 'Y',
29: 'Z', 30: '!', 31: '#', 32: '#', 33: '$', 34: '%', 35: '^', 36: '&', 37: '*', 38: '(', 39: ')', 44: ' ',
45: '_', 46: '+', 47: '{', 48: '}', 49: '|', 51: ':', 52: '"', 53: '~', 54: '<', 55: '>', 56: '?'}
fp = open('/dev/hidraw4', 'rb')
ss = ""
shift = False
done = False
while not done:
## Get the character from the HID
buffer = fp.read(8)
for c in buffer:
if ord(c) > 0:
## 40 is carriage return which signifies
## we are done looking for characters
if int(ord(c)) == 40:
done = True
break;
## If we are shifted then we have to
## use the hid2 characters.
if shift:
## If it is a '2' then it is the shift key
if int(ord(c)) == 2 :
shift = True
## if not a 2 then lookup the mapping
else:
ss += hid2[int(ord(c))]
shift = False
## If we are not shifted then use
## the hid characters
else:
## If it is a '2' then it is the shift key
if int(ord(c)) == 2 :
shift = True
## if not a 2 then lookup the mapping
else:
ss += hid[int(ord(c))]
print(ss)
A KeyError is raised when you try to access a key/value in a dict that does not contain that key. You probably want to re-check and update your mapping to contain the correct (ASCII) values as keys. The 74 comes from int(ord("J")).
You can avoid Key errors by changing hid[int(ord(c))] to hid.get(int(ord(c)) which would return None when the key does not exist.

Are Multiple Iterations possible in the Dictionary Comprehension in the Python, like the Lists?

In some special cases, can we use the multiple iterations in the dictionary comprehensions?
For Example, we have a string in the below format:-
"6: 14, 11: 28, 17: 74, 22: 7, 38: 59, 49: 12, 57: 76, 61: 54, 81: 98, 88: 4"
So If I want to set 6,11,17,22,38,...... as the keys
and 14,28,74,7... as the corresponding values
How can it be achieved by Dictionary Comprehensions?
You can use ast.literal_eval to convert a string to a dictionary:
import ast
my_string = "6: 14, 11: 28, 17: 74, 22: 7, 38: 59, 49: 12, 57: 76, 61: 54, 81: 98, 88: 4"
my_dict = ast.literal_eval(f"{{{my_string}}}")
Dictionary comprehension combined with split() on : should be good enough:
dic = {elt.split(':')[0].strip(): elt.split(':')[1].strip() for elt in string.split(',')}
Output:
{'6': '14', '11': '28', '17': '74', '22': '7', '38': '59', '49': '12', '57': '76', '61': '54', '81': '98', '88': '4'}
If you want keys and values to be as int objects:
dic = {int(elt.split(':')[0].strip()): int(elt.split(':')[1].strip()) for elt in string.split(',')}
Output:
{6: 14, 11: 28, 17: 74, 22: 7, 38: 59, 49: 12, 57: 76, 61: 54, 81: 98, 88: 4}
Use dict constructor
dict(x.replace(' ', '').split(':') for x in s.split(','))
Or use dict constructor and map function
dict(map(lambda x: x.replace(' ', '').split(':'), s.split(',')))
Output
{'6': '14', '11': '28', '17': '74', '22': '7', '38': '59', '49': '12', '57': '76', '61': '54', '81': '98', '88': '4'}
there are multiple possibility:
With Regex:
import re
sample = "6: 14, 11: 28, 17: 74, 22: 7, 38: 59, 49: 12, 57: 76, 61: 54, 81: 98, 88: 4"
my_dict = {key: value for key, value in re.findall('(\d+): (\d+)', sample)}
print(my_dict)
output:
{'6': '14', '11': '28', '17': '74', '22': '7', '38': '59', '49': '12', '57': '76', '61': '54', '81': '98', '88': '4'}
With Split():
sample = "6: 14, 11: 28, 17: 74, 22: 7, 38: 59, 49: 12, 57: 76, 61: 54, 81: 98, 88: 4"
my_dict = {elem.split(": ")[0]: elem.split(": ")[1] for elem in sample.split(", ")}
print(my_dict)
output:
{'6': '14', '11': '28', '17': '74', '22': '7', '38': '59', '49': '12', '57': '76', '61': '54', '81': '98', '88': '4'}
Required output can be achieved with below comprehension:
>>> s1 = "6: 14, 11: 28, 17: 74, 22: 7, 38: 59, 49: 12, 57: 76, 61: 54, 81: 98, 88: 4"
>>> dict((s[0].strip(),int(s[1].strip())) for s in [s.split(":") for s in s1.split(",")])
{'11': 28, '38': 59, '17': 74, '22': 7, '49': 12, '57': 76, '61': 54, '88': 4, '6': 14, '81': 98}
>>>

How to return previous element from group where some column is True

My data frame is as follows:
ex = {'group': {0: '0', 1: '0', 2: '0', 3: '0', 4: '0', 5: '0', 6: '0', 7: '0', 8: '0', 9: '0', 10: '0', 11: '0', 12: '0', 13: '0', 14: '0', 15: '0', 16: '0', 17: '0', 18: '0', 19: '0', 20: '0', 21: '1', 22: '1', 23: '1', 24: '1', 25: '1', 26: '1', 27: '1', 28: '1', 29: '1', 30: '1', 31: '1', 32: '1', 33: '1', 34: '1', 35: '1', 36: '1', 37: '1', 38: '1', 39: '1'}, 'order': {0: 0, 1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10, 11: 11, 12: 12, 13: 13, 14: 14, 15: 15, 16: 16, 17: 17, 18: 18, 19: 19, 20: 20, 21: 0, 22: 1, 23: 2, 24: 3, 25: 4, 26: 5, 27: 6, 28: 7, 29: 8, 30: 9, 31: 10, 32: 11, 33: 12, 34: 13, 35: 14, 36: 15, 37: 16, 38: 17, 39: 18}, 'id': {0: '102', 1: '302', 2: '302', 3: '302', 4: '102', 5: '302', 6: '302', 7: '302', 8: '302', 9: '302', 10: '102', 11: '308', 12: '308', 13: '308', 14: '308', 15: '302', 16: '102', 17: '302', 18: '102', 19: '302', 20: '102', 21: '102', 22: '102', 23: '308', 24: '312', 25: '312', 26: '312', 27: '308', 28: '102', 29: '302', 30: '312', 31: '302', 32: '302', 33: '102', 34: '102', 35: '302', 36: '312', 37: '308', 38: '102', 39: '302'}, 'type': {0: 'A', 1: 'B', 2: 'C', 3: 'A', 4: 'D', 5: 'E', 6: 'D', 7: 'E', 8: 'A', 9: 'E', 10: 'E', 11: 'D', 12: 'A', 13: 'A', 14: 'A', 15: 'D', 16: 'D', 17: 'D', 18: 'A', 19: 'D', 20: 'A', 21: 'D', 22: 'F', 23: 'A', 24: 'D', 25: 'A', 26: 'E', 27: 'A', 28: 'E', 29: 'D', 30: 'E', 31: 'E', 32: 'G', 33: 'A', 34: 'D', 35: 'D', 36: 'H', 37: 'I', 38: 'A', 39: 'E'}, 'of_interest': {0: False, 1: False, 2: True, 3: False, 4: False, 5: True, 6: False, 7: True, 8: True, 9: True, 10: True, 11: True, 12: True, 13: False, 14: True, 15: True, 16: True, 17: True, 18: False, 19: False, 20: True, 21: False, 22: False, 23: False, 24: True, 25: False, 26: True, 27: True, 28: False, 29: True, 30: True, 31: False, 32: True, 33: True, 34: True, 35: True, 36: True, 37: False, 38: True, 39: False}}
ex.head()
group order id type of_interest
0 0 0 102 A False
1 0 1 302 B False
2 0 2 302 C True
3 0 3 302 A False
4 0 4 102 D False
I want to create a column that for each combination of group and id return previous type where of_interest == True.
My first attempt involved querying for of_interest == True, therefore returned value only for these rows:
ex['prev_type_of_interest'] = ex \
.query('of_interest == True') \
.groupby(['group', 'id'])['type'] \
.shift(1)
How can I return previous type of interest for every row?
I believe you need shift all rows per groups, then set missing values by Series.where and last replace missing values by previos non missing values by GroupBy.ffill:
ex1 = ex.groupby(['group', 'id']).shift()
ex['prev_type_of_interest'] = ex1['type'].where(ex1['of_interest'] == True)
ex['prev_type_of_interest'] = ex.groupby(['group', 'id'])['prev_type_of_interest'].ffill()
print (ex.head(10))
group order id type of_interest prev_type_of_interest
0 0 0 102 A False NaN
1 0 1 302 B False NaN
2 0 2 302 C True NaN
3 0 3 302 A False C
4 0 4 102 D False NaN
5 0 5 302 E True C
6 0 6 302 D False E
7 0 7 302 E True E
8 0 8 302 A True E
9 0 9 302 E True A

How do i print my barcode output to a file?

I have set up a raspberry Pi with a USB barcode scanner for a little project. It works with my generated barcodes, it prints the output of the scanned code in the terminal. I really want to save this input to a txt file that doesn't overwrite itself. I have tried changing all the functions and i just cant get it to work. I'm just a novice in Python and i have been stuck on this for a long time now and i have looked all over the internet. If you can just point me to the specific place in code i need to change in order to print the output out i would be very appreciative.
Source: Instructables
!/usr/bin/python
import sys
import requests
import json
api_key = "" #https://upcdatabase.org/
def barcode_reader():
hid = {4: 'a', 5: 'b', 6: 'c', 7: 'd', 8: 'e', 9: 'f', 10: 'g', 11: 'h', 12: 'i', 13: 'j', 14: 'k', 15: 'l', 16: 'm',
17: 'n', 18: 'o', 19: 'p', 20: 'q', 21: 'r', 22: 's', 23: 't', 24: 'u', 25: 'v', 26: 'w', 27: 'x', 28: 'y',
29: 'z', 30: '1', 31: '2', 32: '3', 33: '4', 34: '5', 35: '6', 36: '7', 37: '8', 38: '9', 39: '0', 44: ' ',
45: '-', 46: '=', 47: '[', 48: ']', 49: '\\', 51: ';', 52: '\'', 53: '~', 54: ',', 55: '.', 56: '/'}
hid2 = {4: 'A', 5: 'B', 6: 'C', 7: 'D', 8: 'E', 9: 'F', 10: 'G', 11: 'H', 12: 'I', 13: 'J', 14: 'K', 15: 'L', 16: 'M',
17: 'N', 18: 'O', 19: 'P', 20: 'Q', 21: 'R', 22: 'S', 23: 'T', 24: 'U', 25: 'V', 26: 'W', 27: 'X', 28: 'Y',
29: 'Z', 30: '!', 31: '#', 32: '#', 33: '$', 34: '%', 35: '^', 36: '&', 37: '*', 38: '(', 39: ')', 44: ' ',
45: '_', 46: '+', 47: '{', 48: '}', 49: '|', 51: ':', 52: '"', 53: '~', 54: '<', 55: '>', 56: '?'}
fp = open('/dev/hidraw0', 'rb')
ss = ""
shift = False
done = False
while not done:
## Get the character from the HID
buffer = fp.read(8)
for c in buffer:
if ord(c) > 0:
## 40 is carriage return which signifies
## we are done looking for characters
if int(ord(c)) == 40:
done = True
break;
## If we are shifted then we have to
## use the hid2 characters.
if shift:
## If it is a '2' then it is the shift key
if int(ord(c)) == 2:
shift = True
## if not a 2 then lookup the mapping
else:
ss += hid2[int(ord(c))]
shift = False
## If we are not shifted then use
## the hid characters
else:
## If it is a '2' then it is the shift key
if int(ord(c)) == 2:
shift = True
## if not a 2 then lookup the mapping
else:
ss += hid[int(ord(c))]
return ss
def UPC_lookup(api_key,upc):
'''V3 API'''
url = "https://api.upcdatabase.org/product/%s/%s" % (upc, api_key)
headers = {
'cache-control': "no-cache",
}
response = requests.request("GET", url, headers=headers)
print("-----" * 5)
print(upc)
print(json.dumps(response.json(), indent=2))
print("-----" * 5 + "\n")
if __name__ == '__main__':
try:
while True:
UPC_lookup(api_key,barcode_reader())
except KeyboardInterrupt:
pass
If it is already printing to the console it means it's coming from this part of the code:
print("-----" * 5)
print(upc)
print(json.dumps(response.json(), indent=2))
print("-----" * 5 + "\n")
In order to save it to a file you can use the following:
with open('FILENAME.txt', 'a', encoding='utf-8') as file:
file.write('CONTENT THAT YOU WANT TO WRITE!\n')
Or in your particular case:
with open('FILENAME.txt', 'a', encoding='utf-8') as file:
file.write("-----" * 5)
file.write(upc)
file.write(json.dumps(response.json(), indent=2))
file.write("-----" * 5 + "\n")

How can you convert a Python identifier into a number?

Reference: Is there a faster way of converting a number to a name?
In the question referenced above, a solution was found for turning a numbe into a name. This question asks just the opposite. How can you convert a name back into a number? So far, this is what I have:
>>> import string
>>> HEAD_CHAR = ''.join(sorted(string.ascii_letters + '_'))
>>> TAIL_CHAR = ''.join(sorted(string.digits + HEAD_CHAR))
>>> HEAD_BASE, TAIL_BASE = len(HEAD_CHAR), len(TAIL_CHAR)
>>> def number_to_name(number):
"Convert a number into a valid identifier."
if number < HEAD_BASE:
return HEAD_CHAR[number]
q, r = divmod(number - HEAD_BASE, TAIL_BASE)
return number_to_name(q) + TAIL_CHAR[r]
>>> [number_to_name(n) for n in range(117)]
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', '_', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A0', 'A1', 'A2', 'A3', 'A4', 'A5', 'A6', 'A7', 'A8', 'A9', 'AA', 'AB', 'AC', 'AD', 'AE', 'AF', 'AG', 'AH', 'AI', 'AJ', 'AK', 'AL', 'AM', 'AN', 'AO', 'AP', 'AQ', 'AR', 'AS', 'AT', 'AU', 'AV', 'AW', 'AX', 'AY', 'AZ', 'A_', 'Aa', 'Ab', 'Ac', 'Ad', 'Ae', 'Af', 'Ag', 'Ah', 'Ai', 'Aj', 'Ak', 'Al', 'Am', 'An', 'Ao', 'Ap', 'Aq', 'Ar', 'As', 'At', 'Au', 'Av', 'Aw', 'Ax', 'Ay', 'Az', 'B0']
>>> def name_to_number(name):
assert name, 'Name must exist!'
head, *tail = name
number = HEAD_CHAR.index(head)
for position, char in enumerate(tail):
if position:
number *= TAIL_BASE
else:
number += HEAD_BASE
number += TAIL_CHAR.index(char)
return number
>>> [name_to_number(number_to_name(n)) for n in range(117)]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 54]
The function number_to_name works perfectly, and name_to_number works up until it gets to number 116. At that point, the function returns 54 instead. Does anyone see the code's problem?
Solution based on recursive's answer:
import string
HEAD_CHAR = ''.join(sorted(string.ascii_letters + '_'))
TAIL_CHAR = ''.join(sorted(string.digits + HEAD_CHAR))
HEAD_BASE, TAIL_BASE = len(HEAD_CHAR), len(TAIL_CHAR)
def name_to_number(name):
if not name.isidentifier():
raise ValueError('Name must be a Python identifier!')
head, *tail = name
number = HEAD_CHAR.index(head)
for char in tail:
number *= TAIL_BASE
number += TAIL_CHAR.index(char)
return number + sum(HEAD_BASE * TAIL_BASE ** p for p in range(len(tail)))
Unfortunately, these identifiers don't yield to traditional constant base encoding techniques. For example "A" acts like a zero, but leading "A"s change the value. In normal number systems leading zeroes do not. There could be multiple approaches, but I settled on one that calculates the total number of identifiers with fewer digits, and starts from that.
def name_to_number(name):
assert name, 'Name must exist!'
skipped = sum(HEAD_BASE * TAIL_BASE ** i for i in range(len(name) - 1))
val = reduce(
lambda a,b: a * TAIL_BASE + TAIL_CHAR.index(b),
name[1:],
HEAD_CHAR.index(name[0]))
return val + skipped

Categories

Resources