How to encode to ascii,base64,hex from a file in python - python

In the following code, I am converting an image file into a string depending on the choices of radio buttons:
def convert_now(self):
self.img_data = ""
self.img_data_encoded = ""
file1 = open(self.filedict,'rb')
self.img_data = file1.read()
#RADIO_BUTTONS CHOICES, Convert to: 0-ascii, 1-base64, 2-Hex
v = self.rvar.get()
if v==0:
self.img_data_encoded=self.img_data
elif v==1:
self.img_data_encoded=base64.b64encode(self.img_data) (!)
elif v==2:
self.img_data_encoded=base64.b16encode(self.img_data) (!!)
I tried getting the base64 string from an image file using this line (!) and saved it to a "st" named string.
Then I tried getting the hex string using this one(!!)
The problem is when I compared the results I got from the code above to those I got from this website "https://www.branah.com/ascii-converter" when I used the "st"(the base64 string from the code)
they don't match at all.
Did I code something wrong ?

No, it doesn't seem like you did something wrong. Your code looks fine. Also, for me, both b64encode and b16encode produce the same output as the webpage you linked to. Here is an example.
>>> import base64
>>> base64.b64encode(b"test")
b'dGVzdA=='
>>> base64.b16encode(b"test")
b'74657374'
You can compare this to the results on the webpage. They match. So, there must be something else wrong. b64encode and b16encode work fine.

Related

scrapy isnt showing the arabic letters

I have spent the whole day looking for a way to display the Arabic letters with scrapy and nothing worked for me! I am scraping an Arabic website but i am not getting the right format of the arabic language.
here is what i am actually getting when i am saving the results in a csv file:
"بطل ليÙربول القديم" يرد على أنصار "الريدز"
here is my function:
def parse_details(self, response):
vars = ArticlesItem()
vars["title"] = response.css("h1.sna_content_heading::text").extract_first().strip()
vars["article_summary"] = response.css("span.article-summary").extract_first().strip()
vars["article_content"] = [i.strip() for i in response.css("div.article-body p::text").extract()]
vars["tags"] = [i.strip() for i in response.css("div.article-tags h2.tags::text").extract()]
yield vars
i tried to add encode("utf-8") but i am still not getting the right format
vars["title"] = ...extract_first().strip().encode("utf-8")
i am getting something like this:
b'\xd8\xa8\xd8\xb1\xd9\x82\xd9\x85 "\xd9\x85\xd8\xb0\xd9\x87'
b'\xd9\x84".. \xd8\xa8\xd9\x86\xd8\xb2\xd9\x8a\xd9\x85\xd8\xa9 \xd9'
b'\x8a\xd8\xaa\xd9\x81\xd9\x88\xd9\x82 \xd8\xb9\xd9\x84\xd9\x89'
b' \xd9\x85\xd9\x8a\xd8\xb3\xd9\x8a \xd9\x88\xd8\xb1\xd9\x88'
b'\xd9\x86\xd8\xa7\xd9\x84\xd8\xaf\xd9\x88 \xd9\x88\xd8\xb5\xd9'
b'\x84\xd8\xa7\xd8\xad'
If I take the data you report in your question, and assign it to a variable, like this:
>>> a = (b'\xd8\xa8\xd8\xb1\xd9\x82\xd9\x85 "\xd9\x85\xd8\xb0\xd9\x87'
b'\xd9\x84".. \xd8\xa8\xd9\x86\xd8\xb2\xd9\x8a\xd9\x85\xd8\xa9 \xd9'
b'\x8a\xd8\xaa\xd9\x81\xd9\x88\xd9\x82 \xd8\xb9\xd9\x84\xd9\x89'
b' \xd9\x85\xd9\x8a\xd8\xb3\xd9\x8a \xd9\x88\xd8\xb1\xd9\x88'
b'\xd9\x86\xd8\xa7\xd9\x84\xd8\xaf\xd9\x88 \xd9\x88\xd8\xb5\xd9'
b'\x84\xd8\xa7\xd8\xad')
and I then decode these bytes on the (reasonable) assumption that they are UTF-8:
>>> a.decode()
'برقم "مذهل".. بنزيمة يتفوق على ميسي ورونالدو وصلاح'
it seems to me that you are getting back what you might be expecting, just not quite in the way you expect it.
since #gallaecio wanted me to write an answer to my question
here is what i did:
1- open an empty excel sheet
2- go to data
3- choose From text/csv
4- under the file origin i had to change it from 1252 Western European(Windows) TO 65001 Unicode ( UTF-8 ), now i can read the arabic text !
5- Load !

Python - Decode Text (German) from Hexadecimal Value and write it to a file

I know there are a lot of encoding/decoding topics here and I tried it for hours, but I'm still not able to solve it. Hence I want to raise a question:
I have a string with hexadecimal values, which in the end is a text that I want to write to a text file with the correct encoding.
hexvalues = "476572E2A47465646574656B746F72"
In the end, the (german) result should be "Gerätedetektor"
Currently I am using binascii.unhexlify() for the decoding, but it still doesn't show me the "ä" like its supposed to be, and instead, I get:
>> result = binascii.unhexlify(hexvalues)
Gerâ¤tedetektor
I tried to do result.decode("utf-8") and a lot of other things but either the script crashes or it also doesn't return what I want to see.
In the end, I want to write the word in the correct way to a file.
Any help would be highly appreciated!
Edit:
As I wrote before, I tried many things so it is kind of hard to give the ONE code that I'm using but here is an excerpt from the current version:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import binascii
resultfile = "text_GER.txt"
fpx = open(resultfile, 'wb')
hexvalues = "476572E2A47465646574656B746F72"
result= binascii.unhexlify(hexvalues )
result= result.decode("utf-8")
print(result)
fpx.write(result)
This one makes the script crash, but no further indicators why it does.
If i skip
result= result.decode("utf-8")
then my result print of result looks like this:
b'Ger\xe2\xa4tedetektor'

String comparison does not work in python

I'm writing a script that work with tesseract-ocr. I get text from screen and then I need to compare it with a string. The problem is that the comparison fails even if I'm sure that the strings are the same.
How can I made my code works?
Here my code:
import pyscreenshot as pss
import time
from pytesser import image_to_string
buy=str("VENDI")
buyNow=str("VENDI ADESSO")
if __name__ == '__main__':
while 1:
c=0
time.sleep(2)
image=pss.grab(bbox=(1104,422,(1104+206),(422+30)))
text = str(image_to_string(im))
print text
if text==buy or text==buyNow:
print 'ok'
For example as input:
And as output I get:
VENDI ADESSO
Which is the same string I need to compare, but during the execution I don't get ok on the console?
As it turns out, your string has new-lines (\n\n) at the end.
You can use
text = text.strip()
to remove any surrounding whitespace from your string.

pygame image to base64

I capture screen of my pygame program like this
data = pygame.image.tostring(pygame.display.get_surface(),"RGB")
How can I convert it into base64 string? (WITHOUT having to save it to HDD). Its important that there is no saving to HDD. I know I can save it to a file and then just encode the file to base64 but I cant seem to encode "on the fly"
thanks
If you want, you can save it to a StringIO, which is basically a virtual file stored as a string.
However, I'd really recommend using the base64 module, which has a method called base64.b64encode. It handles your 'on the fly' requirement well.
Code example:
import base64
data = pygame.image.tostring(pygame.display.get_surface(),"RGB")
base64data = base64.b64encode(data)
Happy coding!
Actually - pygame.image.tostring() is a pretty strange function (really dont understand the binary string it returns, I cant find anythin that can process it right).
There seems to be an enhancement issue on this at pygame bitbucket:
(https://bitbucket.org/pygame/pygame/issue/48/add-optional-format-argument-to)
I got around it like this:
data = cStringIO.StringIO()
pygame.image.save(pygame.display.get_surface(), data)
data = base64.b64encode(data.getvalue())
So in the end you get the valid and RIGHT base64 string. And it seems to work. Not sure about the format yet tho, will add more info tmrw

Python - writing lines from file into IRC buffer

Ok, so I am trying to write a Python script for XCHAT that will allow me to type "/hookcommand filename" and then will print that file line by line into my irc buffer.
EDIT: Here is what I have now
__module_name__ = "scroll.py"
__module_version__ = "1.0"
__module_description__ = "script to scroll contents of txt file on irc"
import xchat, random, os, glob, string
def gg(ascii):
ascii = glob.glob("F:\irc\as\*.txt")
for textfile in ascii:
f = open(textfile, 'r')
def gg_cb(word, word_eol, userdata):
ascii = gg(word[0])
xchat.command("msg %s %s"%(xchat.get_info('channel'), ascii))
return xchat.EAT_ALL
xchat.hook_command("gg", gg_cb, help="/gg filename to use")
Well, your first problem is that you're referring to a variable ascii before you define it:
ascii = gg(ascii)
Try making that:
ascii = gg(word[0])
Next, you're opening each file returned by glob... only to do absolutely nothing with them. I'm not going to give you the code for this: please try to work out what it's doing or not doing for yourself. One tip: the xchat interface is an extra complication. Try to get it working in plain Python first, then connect it to xchat.
There may well be other problems - I don't know the xchat api.
When you say "not working", try to specify exactly how it's not working. Is there an error message? Does it do the wrong thing? What have you tried?

Categories

Resources