base64 encoding of file in Python - python

I want to use the function base64.encode() to directly encode the contents of a file in Python.
The documentation states:
base64.encode(input, output)
Encode the contents of the binary input file and write the resulting base64 encoded data to the output file. input and output must be file objects.
So I do this:
encode(open('existing_file.dat','r'), open('output.dat','w'))
and get the following error:
>>> import base64
>>> base64.encode(open('existing_file.dat','r'), open('output.dat','w'))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.6/base64.py", line 502, in encode
line = binascii.b2a_base64(s)
TypeError: a bytes-like object is required, not 'str'
To my eye that looks like a bug in /usr/lib/python3.6/base64.py but a big part of me does not want to believe that...

from docs
when opening a binary file, you should append 'b' to the mode value to open the file in binary mode
so changing
encode(open('existing_file.dat','r'), open('output.dat','w'))
to
encode(open('existing_file.dat','rb'), open('output.dat','wb'))
should work

Related

Error when reading a SPSS file that is in Spanish with Pandas (Python)

Good morning!
I am trying to work with a SPSS file (.sav) in Python.
This is my code:
import pandas as pd
df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')
df.head()
I get this error:
df=pd.read_spss('C:/Users/bonif/Documents/CSALUD01.sav')
File "C:\Users\bonif\AppData\Local\Programs\Python\Python39\lib\site-packages\pandas\io\spss.py", line 44, in read_spss
df, _ = pyreadstat.read_sav(
File "pyreadstat\pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
File "pyreadstat\_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat\_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat\_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)
I figure out that the error may be because there are some words with the letter "ñ" or maybe some words with the following character "á". How may I solve this?
The data base is in this google drive: https://drive.google.com/drive/folders/1P8v5NWE-GdAEJRZdmrp5KiL-DODClmfU?usp=sharing
Thank you so much
as ti7 suggests, use pyreadstat, and you need to specify the encoding, in this case latin1 will do the trick:
>>> import pyreadstat
# This raises an error
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "pyreadstat/pyreadstat.pyx", line 342, in pyreadstat.pyreadstat.read_sav
File "pyreadstat/_readstat_parser.pyx", line 1034, in pyreadstat._readstat_parser.run_conversion
File "pyreadstat/_readstat_parser.pyx", line 845, in pyreadstat._readstat_parser.run_readstat_parser
File "pyreadstat/_readstat_parser.pyx", line 775, in pyreadstat._readstat_parser.check_exit_status
pyreadstat._readstat_parser.ReadstatError: Unable to convert string to the requested encoding (invalid byte sequence)
# This is fine
>>> df, meta = pyreadstat.read_sav("CSALUD01.sav", encoding="latin1")
>>>
Pandas calls pyreadstat to read SPSS files src
You may have more luck using it directly, where it has an option to set the encoding
From the docs https://github.com/Roche/pyreadstat#other-options
You can set the encoding of the original file manually. The encoding must be a iconv-compatible encoding. This is absolutely necessary if you are handling old xport files with non-ascii characters. Those files do not have stamped the encoding in the file itself, therefore the encoding must be set manually.
import pyreadstat
df, meta = pyreadstat.read_sav(path, encoding=my_encoding)
It could also be that you simply don't have iconv installed (which it relies on for encodings), but I doubt it (you would get some other error)

Removing certain strings from base64 output python

Here is my code.
import base64
encoded = base64.b64encode(b"data to be encoded")
print(encoded)
print(encoded.replace("b", ""))
Here is my output
b'ZGF0YSB0byBiZSBlbmNvZGVk'
Traceback (most recent call last):
File "C:\Users\user\Desktop\base64_obfuscation.py", line 8, in <module>
print(decoded.replace("b", ""))
TypeError: a bytes-like object is required, not 'str'
My overall task is to remove the single quotes and the "b" chracter from the string but I'm unsure on how to do so?
print(str(encoded).replace("b", ""))

Trying to decode internationalized url

I currently have a list of domains some of which are internationalized.
For example one ends with this xn--nqv7f.com but I want it to display like this 机构.com
I'm tried encoding it to ascii and utf-8 but I can't seem to get the console or my website to print it like this. I'm using python 3.5
'xn--nqv7f.com'.decode('utf-8')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
And when I try this I only get this
'xn--nqv7f.com'.encode("idna")
b'xn--nqv7f.com'
Had to encode then decode
'xn--nqv7f.com'.encode("idna").decode('idna')
'机构.com'

TypeError: Non-Hexadecimal digit found when trying to convert from hex to string

I have a file encoded in hex and I'm trying to decode the file however I keep getting a type error. I have only been using python on and off for a few weeks so if this seems like a basic question I apologize.
The file contents is as follows:
4647525137454353554e54544b5831375a42524d345742473246563639554e4a36495a3359304f35394843554637564d4d464f32354143574f495a4f4a4a565849373259544f46335a4358494b424e335047545a51534b47465259475956584d44594f473536494553373653455932574b33574431435a314d35545957594d4e57434444344948324d375858544f4c564f31444a45304947394c32375a584f4845535a534f43353859594c55594e4239363759393738313557475859345a474448434e4f5a5744544d696c6c656e69756d323030303a3035303233626566343737386639343461626439346334653364623062326166
here is the code I ran:
"received_files/documents/cache/OCAGS0WFYO57JVFGUI4Z437.txt".decode("hex")
This is what I got back:
Traceback (most recent call last):
File "converter.py", line 1, in <module>
"received_files/documents/cache/OCAGS0WFYO57JVFGUI4Z437.txt".decode("hex")
File "/usr/lib/python2.7/encodings/hex_codec.py", line 42, in hex_decode
output = binascii.a2b_hex(input)
TypeError: Non-hexadecimal digit found
You're giving it a filename rather than the contents of that file:
"received_files/documents/cache/OCAGS0WFYO57JVFGUI4Z437.txt".decode("hex")
Try this:
open("received_files/documents/cache/OCAGS0WFYO57JVFGUI4Z437.txt").read().decode("hex")

where is the call to encode the string or force the string to need to be encoded in this file?

I know this may seem rude or mean or unpolite, but I need some help to try to figure out why I cant call window.loadPvmFile("f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm") exactly like that as a string. Instead of doing that, it gives me a traceback error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 415, in <module>
window.loadPvmFile("f:\games\#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")
File "F:\Python Apps\pvmViewer_v1_1.py", line 392, in loadPvmFile
file1 = open(path, "rb")
IOError: [Errno 22] invalid mode ('rb') or filename:
'f:\\games\\#DD.ATC3.Root\\common\\models\x07300\x07mu\\dummy.pvm'
Also notice, that in the traceback error, the file path is different. When I try a path that has no letters in it except for the drive letter and filename, it throws this error:
Traceback (most recent call last):
File "F:\Python Apps\pvmViewer_v1_1.py", line 416, in <module>
loadPvmFile('f:\0\0\dummy.pvm')
File "F:\Python Apps\pvmViewer_v1_1.py", line 393, in loadPvmFile
file1 = open(path, "r")
TypeError: file() argument 1 must be encoded string without NULL bytes, not str
I have searched for the place that the encode function is called or where the argument is encoded and cant find it. Flat out, I am out of ideas, frustrated and I have nowhere else to go. The source code can be found here: PVM VIEWER
Also note that you will not be able to run this code and load a pvm file and that I am using portable python 2.7.3! Thanks for everyone's time and effort!
\a and \0 are escape sequences. Use r'' (or R'') around the string to mark it as a raw string.
window.loadPvmFile(r"f:\games#DD.ATC3.Root\common\models\a300\amu\dummy.pvm")

Categories

Resources