How to encode text in AL32UTF8 with Python

How to encode text in AL32UTF8 with Python - python

We are trying to match a hash that has gone through Oracle's MD5 hash algorithm using Python. According to their forums everything is encoded in AL21UTF8 prior to hashing:
-- Prior to encryption, hashing or keyed hashing, CLOB datatype is
-- converted to AL32UTF8. This allows cryptographic data to be
-- transferred and understood between databases with different
-- character sets, across character set changes and between
-- separate processes (for example, Java programs).
--
I thought at first that UTF-8 was good enough, but if I do that, my hashes still don't match. So after additional digging, I found this article which stated from the Oracle's Database Companion CD installation Guide:
AL32UTF8 is the Oracle Database character set that is appropriate for XMLType data. It is equivalent to the IANA registered standard UTF-8 encoding, which supports all valid XML characters.
Do not confuse the Oracle Database database character set UTF8 (no hyphen) with the database character set AL32UTF8 or with character encoding UTF-8. Database character set UTF8 has been superseded by AL32UTF8. Do not use UTF8 for XML data. UTF8 supports only Unicode version 3.1 and earlier; it does not support all valid XML characters. AL32UTF8 has no such limitation.
So I can't use UTF-8 and I can't figure out how to get Python's codecs module to differentiate between utf-8 and utf8. If I try AL32UTF8, it throws an error. Has anyone else ever encoded in AL32UTF8 in Python?
My codecs code looks like this:
import codecs
sourceFmt = "ascii"
targetFmt = "utf8"
utfFile = "kesa_utf8.dat"
with codecs.open(old, "rU", sourceFmt) as sourceFile:
with codecs.open(utfFile, "w", targetFmt) as targetFile:
targetFile.write(sourceFile.read())
The file itself looks like this:
WC000|IC |KESA |KESA | | | |2012-07-31-15.12.36 |0090| | |\c\n
WC001|100534 |W.47212-0100534 |2012-07-31-15.12.36 | 00000000001270.00|USD|\c\n
WC002|100534 |W.47212-0100534 |Sally |H |Klass |1235 14th St. W. || |Palma Sola ||FL |USA |34209 | | | | | | | | |9412587545 | | |O | | ||20800426|645858741 |SSN | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | || | | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |KESAPC | | | | | |N| | | || | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |\c\n
WC999|1000000000|1000000000|4000000000|
The hash should be 86D993FA7121E3B9EE1657A23345FE21
Anyway, I hash it using hashlib:
import hashlib
with open(path) as f:
data = f.read()
mdhash = hashlib.md5(data)
mdhash = mdhash.hexdigest()
print mdhash
which results in 8421877dd9cdf7235eec47765821998c

It turns out that whatever the client was doing caused the data itself to be changed in such a way that it had "\c\n" line endings and it also would make the lines in the file all the same size via padding (of spaces on the end) AFTER they hashed it. Once we got the client to stop feeding us bad data, we were able to replicate the hash. Thanks for the help though!

Related

Button Not Interacting (Custom Type) - Pywinauto

I am working on automating a process that uses our ancient HRIS system that unfortunately doesn't have API Access.
I am fairly new to Python, so I have been taking this task bit by bit. I've managed to connect to the app and input my username and password to sign in. However, I am stuck on selecting a menu item. I've tried everything that I know to do and have Googled until I've gone cross-eyed.
Dialog - 'City of Conway LIVE Springbrook V7' (L0, T0, R1032, B1039)
['Dialog', 'City of Conway LIVE Springbrook V7Dialog', 'City of Conway LIVE Springbrook V7', 'Dialog0', 'Dialog1']
child_window(title="City of Conway LIVE Springbrook V7", auto_id="MainMenu", control_type="Window")
|
| Pane - '' (L231, T87, R1024, B118)
| ['Pane', 'Pane0', 'Pane1']
| child_window(auto_id="_panelExWorkArea", control_type="Pane")
| |
| | Pane - 'Desktop' (L231, T90, R1021, B115)
| | ['DesktopPane', 'Desktop', 'Pane2']
| | child_window(title="Desktop", auto_id="_ssiGroupHeaderWorkArea", control_type="Pane")
|
| Pane - '' (L228, T87, R231, B1005)
| ['Pane3']
| child_window(auto_id="_ssiExpandableSplitter1", control_type="Pane")
|
| Pane - '' (L8, T87, R228, B1005)
| ['Pane4']
| child_window(auto_id="_panelTaskArea", control_type="Pane")
| |
| | Pane - '' (L11, T90, R228, B1002)
| | ['Pane5']
| | child_window(auto_id="328582", control_type="Pane")
| | |
| | | TreeView - '' (L11, T115, R228, B1002)
| | | ['TreeView', 'TreeView0', 'TreeView1']
| | | child_window(auto_id="1775914", control_type="Tree")
| | | |
| | | | Pane - '' (L28, T269, R194, B609)
| | | | ['Pane6']
| | | | child_window(auto_id="726924", control_type="Pane")
| | | | |
| | | | | Pane - '' (L28, T269, R194, B609)
| | | | | ['Pane7']
| | | | | child_window(auto_id="1317216", control_type="Pane")
| | | | | |
| | | | | | TreeView - '' (L28, T269, R194, B609)
| | | | | | ['TreeView2']
| | | | | | child_window(auto_id="2101028", control_type="Tree")
| | | | | | |
| | | | | | | Custom - 'Maintenance' (L0, T0, R0, B0)
| | | | | | | ['Custom', 'Maintenance', 'MaintenanceCustom', 'Custom0', 'Custom1', 'Maintenance0', 'Maintenance1', 'MaintenanceCustom0', 'MaintenanceCustom1']
| | | | | | | child_window(title="Maintenance", control_type="Custom")
I'm using a few tools to inspect the GUI, and this one specifically allows me to do the desired task by selecting "do it". It allows me to expand and collapse the section, so surely I've got to be missing something somewhere?
enter image description here
enter image description here
Here is my code:
from pywinauto import Application
app=Application(backend="uia").connect(path=r"C:\Users\skywalker\AppData\Local\Apps\2.0\C38DNYDP.PZ6\07BV1NGN.8G6\spri..ons1_b443b3e57637483a_0007.000f_52ec298e739bfebb", timeout = 30)
maintenance = app.CityofConwayLIVESpringbrookV7.WindowsForms10.Window.8.app.0.a0f91b_r8_ad1, 263022
maintenance.click()
I would also like to mention that I CAN get it to work with Click_Input, but I would like to avoid that if at all possible.

Consume data from KAFKA topic and extract fields from it and store in MySQL using python

I want to consume data from a Kafka topic with the following command as follows:
bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTestTopic --from-beginning
Then this will output the following (just pasting top 2 lines output, but it will be many lines...):
&time=1561768216000&gameCategory=PINPOINT&category=ONE&uniqueId=2518Z-0892A-0030O-16H70&transactionType=CRD&familyId=000-222-115-11119&realTs=1561768319000&sortId=1&msg=SET-UP+PRAYER+%26+intercession+begins+in+just+30+minutes.&remoteIpAddress=127.0.0.1&userAgent=HTTP&
&uniqueId=872541806296826880&time=1571988786000&gameCategory=NOTIFY&category=TWO&transactionType=CRD&familyId=401-222-115-89387&sortId=1&realTs=1571988989000&msg=This-is+a+reminder.&remoteIpAddress=127.0.0.1&userAgent=HTTPS&
I want to consume the following from the output:
realTs
familyId
msg
uniqueId
and you can see that each element is seperated by an ampersand ('&'). They are not always in same index/place so I'm not sure if I need a regex? Eventually when I do the query on a local running MySQL, i'd see this:
describe testTable;
+----------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+-------+
| realTs | bigint(20) | YES | | NULL | |
| familyId | varchar(255) | YES | | NULL | |
| msg | text | YES | | NULL | |
| uniqueId | varchar(255) | YES | | NULL | |
+----------+--------------+------+-----+---------+-------+
4 rows in set (0.00 sec)
SELECT * FROM testTable;
+---------------+-------------------+-----------------------------------------------------------+-------------------------+
| realTs | familyId | msg | uniqueId |
+---------------+-------------------+-----------------------------------------------------------+-------------------------+
| 1561768319000 | 000-222-115-11119 | SET-UP+PRAYER+%26+intercession+begins+in+just+30+minutes. | 2518Z-0892A-0030O-16H70 |
| 1571988989000 | 401-222-115-89387 | This-is+a+reminder. | 872541806296826880 |
+---------------+-------------------+-----------------------------------------------------------+-------------------------+
What do I have so far?
I have a mysql-connector with python where I can connect to a local mysql etc, but i'm struggling with parsing this and inserting it...

With Python, you can use urllib.parse.parse_qs to retrieve URL query string components in a Python dictionary that you can iterate later to insert data in your MySQL database.
For instance:
from urllib.parse import parse_qs
line = "&time=1561768216000&gameCategory=PINPOINT&category=ONE&uniqueId=2518Z-0892A-0030O-16H70&transactionType=CRD&familyId=000-222-115-11119&realTs=1561768319000&sortId=1&msg=SET-UP+PRAYER+%26+intercession+begins+in+just+30+minutes.&remoteIpAddress=127.0.0.1&userAgent=HTTP&uniqueId=872541806296826880&time=1571988786000&gameCategory=NOTIFY&category=TWO&transactionType=CRD&familyId=401-222-115-89387&sortId=1&realTs=1571988989000&msg=This-is+a+reminder.&remoteIpAddress=127.0.0.1&userAgent=HTTPS&"
o = parse_qs(line)
print(o)
Result:
{'time': ['1561768216000', '1571988786000'], 'gameCategory': ['PINPOINT', 'NOTIFY'], 'category': ['ONE', 'TWO'], 'uniqueId': ['2518Z-0892A-0030O-16H70', '872541806296826880'], 'transactionType': ['CRD', 'CRD'], 'familyId': ['000-222-115-11119', '401-222-115-89387'], 'realTs': ['1561768319000', '1571988989000'], 'sortId': ['1', '1'], 'msg': ['SET-UP PRAYER & intercession begins in just 30 minutes.', 'This-is a reminder.'], 'remoteIpAddress': ['127.0.0.1', '127.0.0.1'], 'userAgent': ['HTTP', 'HTTPS']}

SQLAlchemy - pretty print SQL query results

In Ruby console, it is possible to display SQL query results in a very human-friendly way (ActiveRecord + Hirb):
>> Tag.all :limit=>3, :order=>"id DESC"
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| id | created_at | description | name | namespace | predicate | value |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
| 907 | 2009-03-06 21:10:41 UTC | | gem:tags=yaml | gem | tags | yaml |
| 906 | 2009-03-06 08:47:04 UTC | | gem:tags=nomonkey | gem | tags | nomonkey |
| 905 | 2009-03-04 00:30:10 UTC | | article:tags=ruby | article | tags | ruby |
+-----+-------------------------+-------------+-------------------+-----------+-----------+----------+
3 rows in set
Is there a module that will allow me to do display SQLAlchemy result sets in a similar way in IPython?

Stemming for Polish language using Google App Engine Python Search Api

I'm trying to use Python Search Api in Google App Engine to search through set of Polish documents and I found, that stemming feature is not working as expected.
The word "red" in English has only one form, although there are different forms of it in Polish, based on: gender, plurality and case:
Non-plural:
| | masculine | feminine | neuter |
|--------------|------------|-----------|------------|
| Nominative | czerwony | czerwona | czerwone |
| Genitive | czerwonego | czerwonej | czerwonego |
| Dative | czerwonemu | czerwonej | czerwonemu |
| Accusative | czerwony | czerwoną | czerwone |
| Instrumental | czerwonym | czerwoną | czerwonym |
| Locative | czerwonym | czerwonej | czerwonym |
| Vocative | czerwony | czerwona | czerwone |
Plural (neuter is the same as feminine):
| | masculine | feminine |
|--------------|------------|------------|
| Nominative | czerwoni | czerwone |
| Genitive | czerwonych | czerwonych |
| Dative | czerwonym | czerwonym |
| Accusative | czerwonych | czerwone |
| Instrumental | czerwonymi | czerwonymi |
| Locative | czerwonych | czerwonych |
| Vocative | czerwoni | czerwone |
As you can see there are in total 12 unique forms of "red" in Polish: 'czerwony', 'czerwonym', 'czerwonego', 'czerwonemu', 'czerwona', 'czerwoną', 'czerwonej', 'czerwone', 'czerwoni', 'czerwonymi', 'czerwonych', 'czerwonym'
What I'd expect from Google App Engine stemmer is to treat all of them as being the same (as being "red"). Let's test it by adding endpoint to the App Engine app, which does as follows:
def test_me():
forms = {'czerwony', 'czerwonym', 'czerwonego', 'czerwonemu',
'czerwona', 'czerwoną', 'czerwonej',
'czerwone', 'czerwoni', 'czerwonymi', 'czerwonych',
'czerwonym'}
# turn each form into document and insert to index
index = search.Index(name=str(uuid.uuid4()))
index.put([search.Document(language='pl',
fields=[
search.TextField(name='color', value=form, language='pl')
])
for form in forms])
missing = {}
for form in forms:
# find out what forms can we match to 'form' using ~ stemming operator
results = index.search(query="~" + form).results
matching_forms = set([doc.field('color').value for doc in results])
# and see which we missed
missing[form] = list(forms - matching_forms)
return json.dumps(missing)
It turns out there's bunch of items, which were not matched correctly:
"czerwonym": [
"czerwona",
"czerwoną",
"czerwoni",
"czerwonych",
"czerwonej",
"czerwonymi",
"czerwonemu"
],
"czerwonemu": [
"czerwona",
"czerwoną",
"czerwone",
"czerwoni",
"czerwonych",
"czerwonej",
"czerwonego",
"czerwony",
"czerwonym",
"czerwonymi"
],
...
Am I doing something wrong here? Or maybe I have wrong expectations for GAE stemmer?
Please, note that there's a open-source polish stemmer (https://github.com/morfologik/morfologik-stemming), which handles all 12 forms without any problems. This leads me to believe that my expectations for GAE stemmer are not outrageous.

MySQL Encoding 4 byte in 3 byte utf-8 - Incorrect string value

According to the mysql documentation which supports only up to 3 byte utf-8 unicode encoding.
My question is, how can I replace characters that require 4 byte utf-8 encoding in my database? And how do I decode those characters in order to display exactly what the user wrote?
Part of the integration test:
description = u'baaam á ✓ ✌ ❤'
print description
test_convention = Blog.objects.create(title="test title",
description=description,
login=self.user,
tag=self.tag)
Error:
Creating test database for alias 'default'...
baaam á ✓ ✌ ❤
E..
======================================================================
ERROR: test_post_blog (blogs.tests.PostTestCase)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/admin/Developer/project/pro/blogs/tests.py", line 64, in test_post_blog
tag=self.tag)
File "build/bdist.macosx-10.9-intel/egg/MySQLdb/cursors.py", line 201, in execute
self.errorhandler(self, exc, value)
File "build/bdist.macosx-10.9-intel/egg/MySQLdb/connections.py", line 36, in defaulterrorhandler
raise errorclass, errorvalue
DatabaseError: (1366, "Incorrect string value: '\\xE2\\x9C\\x93 \\xE2\\x9C...' for column 'description' at row 1")
----------------------------------------------------------------------
Ran 3 tests in 1.383s
FAILED (errors=1)
Destroying test database for alias 'default'...
Table's configuration:
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
| Name | Engine | Version | Collation | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Checksum | Create_options | Comment |
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
| blogs_blog | InnoDB | 10 | utf8_general_ci | Compact | 25 | 1966 | 49152 | 0 | 32768 | 0 | 35 | 2014-02-09 00:57:59 | NULL | NULL | NULL | | |
+----------------------------------+--------+---------+-------------------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+----------+----------------+---------+
Update: I already changed the table and column configurations from utf-8 to utf8mb4 and still getting the same error, any ideas?
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
| blogs_blog | InnoDB | 10 | Compact | 5 | 3276 | 16384 | 0 | 32768 | 0 | 36 | 2014-02-17 22:24:18 | NULL | NULL | utf8mb4_general_ci | NULL | | |
+----------------------------------+--------+---------+------------+------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+--------------------+----------+----------------+---------+
and:
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
| Field | Type | Collation | Null | Key | Default | Extra | Privileges | Comment |
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+
| id | int(11) | NULL | NO | PRI | NULL | auto_increment | select,insert,update,references | |
| title | varchar(500) | latin1_swedish_ci | NO | | NULL | | select,insert,update,references | |
| description | longtext | utf8mb4_general_ci | YES | | NULL | | select,insert,update,references | |
| creation_date | datetime | NULL | NO | | NULL | | select,insert,update,references | |
| login_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
| tag_id | int(11) | NULL | NO | MUL | NULL | | select,insert,update,references | |
+---------------+--------------+--------------------+------+-----+---------+----------------+---------------------------------+---------+

It is supported, but not asutf8. Add the following to the [mysqld] section of my.cnf:
character-set-server=utf8mb4
collation-server=utf8mb4_unicode_ci
When creating a database, use:
CREATE DATABASE xxxxx DEFAULT CHARACTER SET utf8mb4 DEFAULT COLLATE utf8mb4_unicode_ci;
At the end of a CREATE TABLE command, add:
ENGINE=InnoDB ROW_FORMAT=COMPRESSED DEFAULT CHARSET=utf8mb4;

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to encode text in AL32UTF8 with Python - python

Related

Button Not Interacting (Custom Type) - Pywinauto

Consume data from KAFKA topic and extract fields from it and store in MySQL using python

SQLAlchemy - pretty print SQL query results

Stemming for Polish language using Google App Engine Python Search Api

MySQL Encoding 4 byte in 3 byte utf-8 - Incorrect string value

Categories

Resources