Strange behaviour when converting base64 string to png in Python

Strange behaviour when converting base64 string to png in Python - python

Hello I'm new to the concept of base64 images. I was trying to convert base64 "links" in a HTML to png files in Python, but the png generated seems to be damaged and I don't know why... Here is my code (in python 3.6)
encoded = (string2[0].split(",")[1]).encode("utf-8")
with open(r"myDirectory\example1.png", "wb") as fh:
fh.write(base64.decodebytes(encoded))
string2[0] is the full base64 string which I copied from the HTML. i.e. something like
data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAA0gAA...K5C%0AYII=
The problem is essentially the following: A png file will be generated, but when I open it, windows says "the file appears to be damaged, corrupted". However strangely when I open this base64 string in google chrome, the photo can be shown.
Anyone has encountered similar situation before?
p.s. I was thinking to provide the full base64 string, but it's very very long. Anyone knows how to paste such a long string to the question? e.g. a "dragable box of code" similar to what the OP has done in this question
Edit: The base64 string can be found here. My first time sharing documents in google drive - let me know if you guys can access it.

Related

Problem loading text from searchable pdfs ("PSKeyword" error)

I have a problem with extracting text using pdfplumber. The pdf is of type searchable and other examples work fine. On the other hand, there is one invoice, it cannot be loaded correctly. I get this error:
cannot convert 'PSKeyword' object to bytearray
From what I've noticed, this can be fixed as follows. Open the file with any pdf program and save it again.
Maybe someone had a similar problem? I am trying to code it in python somehow? Anyone have any ideas?
Thanks,
Norbert.

Python decoding, base64, nbt, gzip? what is it?

I am trying to get information from a Minecraft AP. From the API you can read players inventories, but it this is what it says: here is link to pastebin
I tried to run base64 on it on python, but it gave me an output like this (only a few lines):
b'\xad\xa9\xc0d\x85\xe4\xe0\x87`\xcess\x00\x9b]e~c\xea\xaa\xb8\x9a\xa4\xdd\x958"\x8f\x0f\x10\xb9\xea\x9f2v\xdd\xcc#N\xe8x\xb4\xdd\x18\xa9\xee>\xcfM
I read a bit about it on their forums, and a few comments said stuff about "base64, gzip, nbt".
Know, I haven't really worked at decoding stuff, etc, and I am trying to understand what it all means.
Thanks

NBT is a minecraft specific format: Named Binary Tag
So you get an NBT-File, that is zipped (compressed) in the gzip format and then Base64 encoded.
After base64 decoding you need to unzip the gzip format to get the NBT.
There's also a nbt parser in python.

Does python have font face for strings?

I recently used Google Vision API to extract text from a pdf. Now I searching for a keyword in the response text (from API). When I compare the given string and found string, they do not match even they have same characters.
The only reason I can see is font types of given and found string which looks different which lead to different ascii/utf-8 code of the characters in the string. (I never came across such a problem)
How to solve this? How can I bring these two string to same characters? I am using Jupyter notebook but I even pasted the comparison on terminal but still its evaluates it to False.
Here are the strings I am trying to match:
'КА Р5259' == 'KA P5259'
But they look the same on Stack Overflow so here's a screenshot:

Thanks everyone for the your comments.
I found the solution. I am posting it here, it might be helpful for someone. Actually it's correct that python does not support font faces. So if one copies a font faced character and paste it to python console or jupyter notebook (which renders the font faces due to the fact that it uses html to display information) it is considered a different unicode character.
So the idea is to first bring the text response in a plain text format which I achieved by storing the response in a .txt file (or .pkl file more precisely) which I had to do anyway to preserve the response objects for later data analysis purposes. Once the response in stored in plain text file you can read it without any font face problem unlike I faced above.

Can't enter txt file contents as query string using python

I can’t get python to open a link that uses the contents of a .txt file as a query string. I’m working on Python 3.7.0 and was able to write code that opens the website and checks a string that I’ve input directly, as well as open my text file and print the contents, but when I try to make the text file’s contents a query it throws an error.
I added lines that print the link that I would need to open to make sure it comes out correctly and that works fine, I can copy and paste it into my browser and get a correct result.
Here's the code I used
And a screenshot of the error I get
I'm a total beginner at this so any suggestions or explanations would be lifesavers!

The error is with the string being passed to the urlopen(). When it tries to open the link you get an HTTP 400 : Bad request error which means that something is wrong with the link you provided. The text possibly has spaces and you aren't escaping the characters properly. Here is the link which could help you.
Alternatively, you could also use the Python Requests library.
(Please include the code in the question rather than screenshot)

Check out the http you’re requesting does ‘actually’ exists. Moreover, I’m not sure how’s your .txt file looks like, but reexamine the code (.read() part) to make sure the data you wanted to add as a query is being handled correctly.

Uncompress and save zlib data in PDF with python

We get PDF files delivered to us daily and we need to get the images out. For example, what I want to do is to get the image back out of this PDF file I have, with python. Most pdf files we get are multipage and we want to export each embedded image to separate files. Most have jpeg files in them, but his one does not.
Object 5 is embedded as a zlib compressed stream. I am pretty sure it is zlib compressed because it is marked as FlateDecode and the start of the stream is \x78\x9c which is typical for zlib. You can see (part of) the hex dump here
The question is, how do I 'deflate' it and save the resulting file.
Thank you for sharing your wisdom.

I searched everywhere and tried many things but couldn't get to work. I managed to decompress the data like this:
import zlib
with open("MDL1703140088.pdf", "rb") as f:
pdf = f.read()
image = zlib.decompress(pdf[640:69307])
640 is zlib header(b'x\x9c') position and 69307 is the position of something like footer of pdf spec. b'\nendstream\n' is there. Detail is in the spec and some helpful Q&A can be found here. But omitting the end position is allowed in this case because decompress() seems to ignore following non-compressed data. You can validate this by:
decomp = zlib.decompressobj()
image = decomp.decompress(pdf[640:])
print(decomp.unused_data) # starts from b'\nendstream\n
So far so good. But when I write image to a PNG file, it cannot be read by any image viewer. Actually decompressed data looks so quite empty here and there. I attached some PNG header, but no luck. Hey, it's too much...
As I said earlier (strangely my comment was removed by someone), you'd better use some other existing tools. If Acrobat is not your option, what about pdftopng (part of Xpdf)? pdftopng MDL1703140088.pdf . gave me a valid PNG file flawlessly. Obviously command-line tools can be executed in Python, as you may know.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Strange behaviour when converting base64 string to png in Python - python

Related

Problem loading text from searchable pdfs ("PSKeyword" error)

Python decoding, base64, nbt, gzip? what is it?

Does python have font face for strings?

Can't enter txt file contents as query string using python

Uncompress and save zlib data in PDF with python

Categories

Resources