Python: pickle: No code suggestion after extracting string object from pickle file - python

for example, this is my code:
#extract the object from "lastringa.pickle" and save it
extracted = ""
with open("lastringa.pickle","rb") as f:
extracted = pickle.load(f)
Where "lasting.pickle" contains a string object with some text.
So if I type extracted. before the opening of the file, I'm able to get the code suggestion as shown in the picture:
But then, after this operation extracted = pickle.load(f), if I type extracted. I don't get code suggestion anymore.
Can somebody explain me why is that and how to solve this?

Pickle reads and writes objects as binary files. You can confirm this by the open('lastringa.pickle', 'rb'), command where you are using the rb option, i.e. read binary.
Your IDE doesn't know the type of the object that the pickle is expected to read, so that it can suggest the string methods (e.g. .split(), .read())
On the other hand, in the first photo, your IDE knows that expected is a string and it knows what to suggest.

Related

How can I input the image as byte data instead of string?

I'm new to python and was playing around with how to change my Instagram profile picture. The part I just can't get past is how I can put my image into the program. This is my code:
from instagram_private_api import Client, ClientCompatPatch
user_name = 'my_username'
password = 'my_password'
api = Client(user_name, password)
api.change_profile_picture('image.png')
Now, from what I read on the API Documentation, I can't just put in an image. It needs to be byte data. On the API documentation, the parameter is described like this:
photo_data – byte string of image
I converted the image on an encoding website and now I have the file image.txt with the byte data of the image. So I changed the last line to this:
api.change_profile_picture('image.txt')
But this still doesn't work. The program doesn't read it as byte data. I get the following error:
Exception has occurred: TypeError
a bytes-like object is required, not 'str'
What is the right way to put in the picture?
The error is telling you that "input.txt" (or "image.png") is a string, and it's always going to say that as long as you pass in a filename because filenames are always strings. Doesn't matter what's in the file, because the API doesn't read the file.
It doesn't want the filename of the image, it wants the actual image data that's in that file. That's why the parameter is named photo_data and not photo_filename. So read it (in binary mode, so you get bytes rather than text) and pass that instead.
with open("image.png", "rb") as imgfile:
api.change_profile_picture(imgfile.read())
The with statement ensures that the file is closed after you're done with it.
if you have .png or .jpeg or ... then use this.
with open("image.png", "rb") as f:
api.change_profile_picture(f.read())
and if you have a .txt file then use this.
with open("image.txt", "rb") as f:
api.change_profile_picture(f.read())

Python ijson - parse error: trailing garbage // bz2.decompress()

I have come across an error while parsing json with ijson.
Background:
I have a series(approx - 1000) of large files of twitter data that are compressed in a '.bz2' format. I need to get elements from the file into a pd.DataFrame for further analysis. I have identified the keys I need to get. I am cautious putting twitter data up.
Attempt:
I have managed to decompress the files using bz2.decompress with the following code:
## Code in loop specific for decompressing and parsing -
with open(file, 'rb') as source:
# Decompress the file
json_r = bz2.decompress(source.read())
json_decom = json_r.decode('utf-8') # decompresses one file at a time rather than a stream
# Parse the JSON with ijson
parser = ijson.parse(json_decom)
for prefix, event, value in parser:
# Print selected items as part of testing
if prefix=="created_at":
print(value)
if prefix=="text":
print(value)
if prefix=="user.id_str":
print(value)
This gives the following error:
IncompleteJSONError: parse error: trailing garbage
estamp_ms":"1609466366680"} {"created_at":"Fri Jan 01 01:59
(right here) ------^
Two things:
Is my decompression method correct and giving the right type of file for ijson to parse (ijson takes both bytes and str)?
Is is a JSON error? // If it is a JSON error is it possible to develop some kind of error handler to move to the next file - if so any suggestion would be appreciated?
Any assistance would be greatly appreciated.
Thank you, James
To directly answer your two questions:
The decompression method is correct in the sense that it yields JSON data that you then feed to ijson. As you point out, ijson works both with str and bytes inputs (although the latter is preferred); if you were giving ijson some non-JSON input you wouldn't see an error showing JSON data in it.
This is a very common error that is described in ijson's FAQ. It basically means your JSON document has more than one top-level value, which is not standard JSON, but is supported by ijson by using the multiple_values option (see docs for details).
About the code as a whole: while it's working correctly, it could be improved on: the whole point of using ijson is that you can avoid loading the full JSON contents in memory. The code you posted doesn't use this to its advantage though: it first opens the bz-compressed file, reads it as a whole, decompresses that as a whole, (unnecessarily) decodes that as a whole, and then gives the decoded data as input to ijson. If your input file is small, and the decompressed data is also small you won't see any impact, but if your files are big then you'll definitely start noticing it.
A better approach is to stream the data through all the operations so that everything happens incrementally: decompression, no decoding and JSON parsing. Something along the lines of:
with bz2.BZ2File(filename, mode='r') as f:
for prefix, event, value in ijson.parse(f):
# ...
As the cherry on the cake, if you want to build a DataFrame from that you can use DataFrame's data argument to build the DataFrame directly with the results from the above. data can be an iterable, so you can, for example, make the code above a generator and use it as data. Again, something along the lines of:
def json_input():
with bz2.BZ2File(filename, mode='r') as f:
for prefix, event, value in ijson.parse(f):
# yield your results
df = pandas.DataFrame(data=json_input())

how to convert string into byte array in this particular scenario

I am following the tutorial which was designed for Python 2x in Python 3.5. I have made updates to the code in order to make it compatible, but I am falling at the last hurdle.
In 2x Python changes between text and binary as and when, whereas Python 3x marks a clear demarcation.
I am trying to produce a CSV file for submission, but it requires that the data be in int32 format.
test_file = open('/Users/williamneal/Scratch/Titanic/test.csv', 'rt')
test_file_object = csv.reader(test_file)
header = test_file_object.__next__
Opens up a file object before making it a read object. I modified the original code, 'wb' --> 'wt' to account for the fact that by default python returns a string.
prediction_file = open("/Users/williamneal/Scratch/Titanic/genderbasedmodel.csv", "wt")
prediction_file_object = csv.writer(prediction_file)
The opens a file object before making it a writing object. Like previously I modified the mode.
prediction_file_object.writerow([bytearray(b"PassengerId"), bytearray(b"Survived")])
for row in test_file_object:
if row[3] == 'female':
prediction_file_object.writerow([row[0],int(1)])
else:
prediction_file_object.writerow([row[0],int(0)])
test_file.close()
prediction_file.close()
I changed the writers in the for-loop into integers but as I attempt to cast the string into binary I get the following error at the point of submission:
I am flummoxed, I do not see how I can submit my file which needs the headers "PassengerId" and "Survivors" and also be type int32.
Any suggestions?

Open a file for RawIOBase python

I need to read in the the binary of a file for a function, and from this link https://docs.python.org/2/library/io.html, it looks like I should be using a RawIOBase object to read it in. But I can't find any where on how to open a file to use with RawIOBase. Right now I have tried this to read the binary into a string
with (open(documentFileName+".bin", "rb")) as binFile:
document = binFile.RawIOBase.read()
print document
but that throws the error AttributeError: 'file' object has no attribute 'RawIOBase'
So with no open attribute in RawIOBase, how do I open the file for it to read from?
Don't delve into the implementation details of the io thicket unless you need to code your own peculiar file-oid-like types! In your case,
with open(documentFileName+".bin", "rb") as binFile:
document = binFile.read()
will be perfectly fine!
Note in passing that I've killed the superfluous parentheses you were using -- "no unneeded pixels!!!" -- but, while important!, that's a side issue to your goal here.
Now, assuming Python 2, document is a str -- an immutable array of bytes. It may be confusing that displaying document shows it as a string of characters, but that's just Py2's confusion between text and byte strings (in Py3, the returned type would be bytes).
If you prefer to work with (e.g) a mutable array of ints, use e.g
theints = map(ord, document)
or, for an immutable array of bytes that displays numerically,
import array
thearray = array.array('b', document)

Cpickle invalid load key error with a weird key at the end

I just tried to update a program i wrote and i needed to add another pickle file. So i created the blank .pkl and then use this command to open it(just as i did with all my others):
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
only this time around i keep getting this really weird error for no obvious reason,
cPickle.UnpicklingError: invalid load key, 'Γ'.
The pickle file does contain the necessary information to be loaded, it is an exact match to other blank .pkl's that i have and they load fine. I don't know what that last key is in the error but i suspect that could give me some incite if i know what it means.
So have have figured out the solution to this problem, and i thought I'd take the time to list some examples of what to do and what not to do when using pickle files. Firstly, the solution to this was to simply just make a plain old .txt file and dump the pickle data to it.
If you are under the impression that you have to actually make a new file and save it with a .pkl ending you would be wrong. I was creating my .pkl's with notepad++ and saving them as .pkl's. Now from my experience this does work sometimes and sometimes it doesn't, if your semi-new to programming this may cause a fair amount of confusion as it did for me. All that being said, i recommend just using plain old .txt files. It's the information stored inside the file not necessarily the extension that is important here.
#Notice file hasn't been pickled.
#What not to do. No need to name the file .pkl yourself.
with open('tryagain.pkl', 'r') as input:
self.open_multi_clock = pickle.load(input)
The proper way:
#Pickle your new file
with open(filename, 'wb') as output:
pickle.dump(obj, output, -1)
#Now open with the original .txt ext. DONT RENAME.
with open('tryagain.txt', 'r') as input:
self.open_multi_clock = pickle.load(input)
Gonna guess the pickled data is throwing off portability by the outputted characters. I'd suggest base64 encoding the pickled data before writing it to file. What what I ran:
import base64
import pickle
value_p = pickle.dumps("abdfg")
value_p_b64 = base64.b64encode(value_p)
f = file("output.pkl", "w+")
f.write(value_p_b64)
f.close()
for line in open("output.pkl", 'r'):
readable += pickle.loads(base64.b64decode(line))
>>> readable
'abdfg'

Categories

Resources