Dump (pickle) in new line Python - python

So i want to write each element of a list in a new line in a binary file using Pickle, i want to be able to access these dictionaries later as well.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
Output:
{1: 11}
Could someone explain why the rest of the elements arent being dumped or suggest another way to write in a new line?
I'm using Python 3

They're all being dumped, but each dump is separate; to load them all, you need to match them with load calls.
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
print(pickle.load(file))
import pickle
with open(r'student.dat','w+b') as file:
for i in [{1:11},{2:22},{3:33},{4:44}]:
pickle.dump(i,file)
file.seek(0)
for _ in range(4):
print(pickle.load(file))
If you don't want to perform multiple loads, pickle them as a single data structure (e.g. the original list of dicts all at once).
In none of these cases are you writing newlines, nor should you be; pickle is a binary protocol, which means newlines are just another byte with independent meaning, and trying to inject newlines into the stream would get in the way of loading the data, and risk splitting up bits of data (if you actually read a line at a time for loading).

Related

Convert bytes to a file object in python

I have a small application that reads local files using:
open(diefile_path, 'r') as csv_file
open(diefile_path, 'r') as file
and also uses linecache module
I need to expand the use to files that send from a remote server.
The content that is received by the server type is bytes.
I couldn't find a lot of information about handling IOBytes type and I was wondering if there is a way that I can convert the bytes chunk to a file-like object.
My goal is to use the API is specified above (open,linecache)
I was able to convert the bytes into a string using data.decode("utf-8"),
but I can't use the methods above (open and linecache)
a small example to illustrate
data = 'b'First line\nSecond line\nThird line\n'
with open(data) as file:
line = file.readline()
print(line)
output:
First line
Second line
Third line
can it be done?
open is used to open actual files, returning a file-like object. Here, you already have the data in memory, not in a file, so you can instantiate the file-like object directly.
import io
data = b'First line\nSecond line\nThird line\n'
file = io.StringIO(data.decode())
for line in file:
print(line.strip())
However, if what you are getting is really just a newline-separated string, you can simply split it into a list directly.
lines = data.decode().strip().split('\n')
The main difference is that the StringIO version is slightly lazier; it has a smaller memory foot print compared to the list, as it splits strings off as requested by the iterator.
The answer above that using StringIO would need to specify an encoding, which may cause wrong conversion.
from Python Documentation using BytesIO:
from io import BytesIO
f = BytesIO(b"some initial binary data: \x00\x01")

Is it possible to iterate over serialized Python file?

I have a reasonably long array of events (each one being an array line), stored as numpy.ndarray with pickle.dump. Is there any way to efficiently iterate over a serialized object? The idea is very similar to iterating over a text file:
with open('my_file.txt', 'r') as FILE:
for line in FILE:
do_something(line)
Im just concerned about not loading the entire object into memory at once.

Searching for and manipulating the content of a keyword in a huge file

I have a huge HTML file that I have converted to text file. (The file is Facebook home page's source). Assume the text file has a specific keyword in some places of it. For example: "some_keyword: [bla bla]". How would I print all the different bla blas that are followed by some_keyword?
{id:"1126830890",name:"Hillary Clinton",firstName:"Hillary"}
Imagine there are 50 different names with this format in the page. How would I print all the names followed by "name:", considering the text is very large and crashes when you read() it or try to search through its lines.
Sample File:
shortProfiles:{"100000094503825":{id:"100000094503825",name:"Bla blah",firstName:"Blah",vanity:"blah",thumbSrc:"https://scontent-lax3-1.xx.fbcdn.net/v/t1.0-1/c19.0.64.64/p64x64/10354686_10150004552801856_220367501106153455_n.jpg?oh=3b26bb13129d4f9a482d9c4115b9eeb2&oe=5883062B",uri:"https://www.facebook.com/blah",gender:2,i18nGender:16777216,type:"friend",is_friend:true,mThumbSrcSmall:null,mThumbSrcLarge:null,dir:null,searchTokens:["Bla"],alternateName:"",is_nonfriend_messenger_contact:false},"1347968857":
Based on your comment, since you are the person responsible for writting the data to the file. Write the data in JSON format and read it from file using json.loads() as:
import json
json_file = open('/path/to/your_file')
json_str = json_file.read()
json_data = json.loads(json_str)
for item in json_data:
print item['name']
Explanation:
Lets say data is the variable storing
{id:"1126830890",name:"Hillary Clinton",firstName:"Hillary"}
which will be dynamically changing within your code where you are performing write operation in the file. Instead append it to the list as:
a = []
for item in page_content:
# data = some xy logic on HTML file
a.append(data)
Now write this list to the file using: json.dump()
I just wanted to throw this out there even though I agree with all the comments about just dealing with the html directly or using Facebook's API (probably the safest way), but open file objects in Python can be used as a generator yielding lines without reading the entire file into memory and the re module can be used to extract information from text.
This can be done like so:
import re
regex = re.compile(r"(?:some_keyword:\s\[)(.*?)\]")
with open("filename.txt", "r") as fp:
for line in fp:
for match in regex.findall(line):
print(match)
Of course this only works if the file is in a "line-based" format, but the end effect is that only the line you are on is loaded into memory at any one time.
here is the Python 2 docs for the re module
here is the Python 3 docs for the re module
I cannot find documentation which details the generator capabilities of file objects in Python, it seems to be one of those well-known secrets...Please feel free to edit and remove this paragraph if you know where in the Python docs this is detailed.

Python : Text Replacement In Large Files

I'm trying to insert text at very specific locations in a text file. This text file can be fairly large (>> 10 GB)
The approach I am currently using to read it:
with open("my_text_file.txt") as f:
while True:
result = f.read(set_number_of_bytes)
x = process_result(result)
if x:
replace_some_characters_that_i_just_read_and write_it_back_to_same_file
However, I am unsure as to how to implement
replace_some_characters_that_i_just_read_and write_it_back_to_same_file
Is there some method which I can use to determine where I have read up to in the current file that I might be able to use to write to the file.
Performance-wise, if I was to use the approach above to write to the original file at specific locations, would there be efficiency issues with having to find the write location before writing?
Or would you recommend creating an entirely different file and appending to that file on each loop above. Then deleting the original file after this operation is completed? Assuming space is not a large concern but performance is.
Use the fileinput module, which handles files correctly when replacing data, with the inplace flag set:
import sys
import fileinput
for line in fileinput.input('my_text_file.txt', inplace=True):
x = process_result(line)
if x:
line = line.replace('something', x)
sys.stdout.write(line)
When you use the inplace flag, the original file is moved to a backup, and anything your write to sys.stdout is written to the original filename (so, as a new file). Make sure you include all lines, altered or not.
You have to rewrite the complete file whenever your replacement data is not exactly the same number of bytes as the parts that you are replacing.

Pickle problem writing to file

I have a problem writing a file with Pickle in Python
Here is my code:
test = "TEST"
f1 = open(path+filename, "wb", 0)
pickle.dump(test,f1,0)
f1.close()
return
This gives me the output in the .txt file as VTESTp0. I'm not sure why this is?
Shouldn't it just have been saved as TEST?
I'm very new to pickle and I didn't even know it existed until today so sorry if I'm asking a silly question.
No, pickle does not write strings just as strings. Pickle is a serialization protocol, it turns objects into strings of bytes so that you can later recreate them. The actual format depends on which version of the protocol you use, but you should really treat pickle data as an opaque type.
If you want to write the string "TEST" to the file, just write the string itself. Don't bother with pickle.
Think of pickling as saving binary data to disk. This is interesting if you have data structures in your program like a big dict or array, which took some time to create. You can save them to a file with pickle and read them in with pickle the next time your program runs, thus saving you the time it took to build the data structure. The downside is that other, non-Python programs will not be able to understand the pickle files.
As pickle is quite versatile you can of course also write simple text strings to a pickle file. But if you want to process them further, e.g. in a text editor or by another program, you need to store them verbatim, as Thomas Wouters suggests:
test = "TEST"
f1 = open(path+filename, "wb", 0)
f1.write(test)
f1.close()
return

Categories

Resources