Loading an unknown amount of pickled objects in python - python

I have a small and simple movie registration app that lets a user register a new movie in the registry. This is currently only using pickled objects and saving the objects is not a problem but reading an unknown number of pickled objects from the file seems to be a little more complicated since i cant find any sequence of objects to iterate over when reading the file.
Is there any way to read an unknown number of pickled objects from a file in python (read into an unknown number of variables, preferably a list) ?
Since the volume of the data is so low i dont see the need to use a more fancy storage solution than a simple file.
When trying to use a list with this code:
film = Film(title, description, length)
film_list.append(film)
open_file = open(file, "ab")
try:
save_movies = pickle.dump(film_list, open_file)
except pickle.PickleError:
print "Error: Could not save film to file."
it works fine and when i load it i get a list returned but no matter how many movies im registering i still only get one element in the list. When typing len(film_list) it only returns the first movie that was saved/added to the file. When looking at the file it does contain the other movies that were added to the list but they are not being included in the list for some strange reason.
I'm using this code for loading the movies:
open_file = open(file, "rb")
try:
film_list = pickle.load(open_file)
print type(film_list) # displays a type of list
print len(film_list) # displays that only 1 element is in the list
for film in film_list: # only prints out one list item
print film.name
except pickle.PickleError:
print "Error: Unable to load one or more movies."

You can get an unknown amount of pickled objects from a file by repeatedly calling load on a file handle object.
>>> import string
>>> # make a sequence of stuff to pickle
>>> stuff = string.ascii_letters
>>> # iterate over the sequence, pickling one object at a time
>>> import pickle
>>> with open('foo.pkl', 'wb') as f:
... for thing in stuff:
... pickle.dump(thing, f)
...
>>>
>>> things = []
>>> f = open('foo.pkl', 'rb')
>>> # load the first two objects
>>> things.append(pickle.load(f))
>>> things.append(pickle.load(f))
>>> # get the remaining pickled items
>>> while True:
... try:
... things.append(pickle.load(f))
... except EOFError:
... break
...
>>> stuff
'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
>>> things
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
>>> f.close()

Related

How do I stop Python from escaping the backslashes when reading text from a file?

I'm having trouble getting text from a file to be in the same format as the text in a string. Example:
>>> a = 'Hello\tWorld'
>>> list(a)
['H', 'e', 'l', 'l', 'o', '\t', 'W', 'o', 'r', 'l', 'd']
That's fine.
Now when I read the same characters from a file, I get a different result...
with open('file.txt', 'r') as f:
a = f.read()
>>> list(a)
['H', 'e', 'l', 'l', 'o', '\\', 't', 'W', 'o', 'r', 'l', 'd']
The tab is gone. Now I have an escaped backshlash and a t, instead of a tab. And the number of elements in the list is different.
How do I read a file, and keep the tab?
BTW, I'm working on character counting and would like to be counting tabs as one thing, not two.
As far as I can tell you're reading in the string as a raw string. In order to convert a raw string to a normal string it can be decoded with the 'string_escape' codec:
with open('file.txt','r') as f:
a = f.read().decode('string_escape')
I wasn't able to recreate the same txt file but this works when testing with a raw string:
a = r'hello\tworld'
list(a.decode('string_escape')) #outputs ['h', 'e', 'l', 'l', 'o', '\t', 'w', 'o', 'r', 'l', 'd']
hope this helps!

Why does Python's map() function swap values position in each return?

im studying python and trying to learn how to use the map() function.
Had the idea to change letters from a string for equivalent+1 in alphabet, ex.: abc -> bcd
wrote the following code:
m = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']
def func(s):
return m[m.index(s) + 1]
l = "abc"
print(set(map(func, l)))
But every excecution returns a different order for the letters
I got the expected answer by using:
l2 = [func(i) for i in s]
print(l2)
But i wanted to understand the map() function and how it works. Tried to read the documentation but I could not understand much.
Sorry about my bad english and my lack of experience in python :/
It is because you are converting to set in set(map(func, l)) and set is an unordered collection in Python.
From docs:
A set object is an unordered collection of distinct hashable objects....Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior.
If you replace print(set(map(func, l))) with print(list(map(func, l))), you'll not see this behavior.

Unique elements of sublists depending on specific value in sublist

I an trying to select unique datasets from a very large quite inconsistent list.
My Dataset RawData consists of string-items of different length.
Some items occure many times, for example: ['a','b','x','15/30']
The key to compare the item is always the last string: for example '15/30'
The goal is: Get a list: UniqueData with items that occure only once. (i want to keep the order)
Dataset:
RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
My desired solution Dataset:
UniqueData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['i','j','k','l','m','n','o','p','20/60']]
I tried many possible solutions for instance:
for index, elem in enumerate(RawData): and appending to a new list if.....
for element in list does not work, because the items are not exactly the same.
Can you help me finding a solution to my problem?
Thanks!
The best way to remove duplicates is to add them into a set. Add the last element into a set as to keep track of all the unique values. When the value you want to add is already present in the set unique do nothing if not present add the value to set unique and append the lst to result list here it's new.
Try this.
new=[]
unique=set()
for lst in RawData:
if lst[-1] not in unique:
unique.add(lst[-1])
new.append(lst)
print(new)
#[['a', 'b', 'x', '15/30'],
['d', 'e', 'f', 'g', 'h', '20/30'],
['w', 'x', 'y', 'z', '10/10'],
['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]
You could set up a new array for unique data and to track the items you have seen so far. Then as you loop through the data if you have not seen the last element in that list before then append it to unique data and add it to the seen list.
RawData = [['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'],
['a', 'x', 'c', '15/30'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60'], ['x', 'b', 'c', '15/30']]
seen = []
UniqueData = []
for data in RawData:
if data[-1] not in seen:
UniqueData.append(data)
seen.append(data[-1])
print(UniqueData)
OUTPUT
[['a', 'b', 'x', '15/30'], ['d', 'e', 'f', 'g', 'h', '20/30'], ['w', 'x', 'y', 'z', '10/10'], ['i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', '20/60']]
RawData = [['a','b','x','15/30'],['d','e','f','g','h','20/30'],['w','x','y','z','10/10'],['a','x','c','15/30'],['i','j','k','l','m','n','o','p','20/60'],['x','b','c','15/30']]
seen = []
seen_indices = []
for _,i in enumerate(RawData):
# _ -> index
# i -> individual lists
if i[-1] not in seen:
seen.append(i[-1])
else:
seen_indices.append(_)
for index in sorted(seen_indices, reverse=True):
del RawData[index]
print (RawData)
Using a set to filter out entries for which the key has already been seen is the most efficient way to go.
Here's a one liner example using a list comprehension with internal side effects:
UniqueData = [rd for seen in [set()] for rd in RawData if not(rd[-1] in seen or seen.add(rd[-1])) ]

How can I store a method's return statement in a variable? (python)

I quite new to OOP (and python as well)
So I am building something similar to the Enigma machine.
My problem is that:
I don't know how to access to the randomizer method's return statement.
I need that in order to get a randomized list of characters.
class generators():
global static_alphabet,backup_alphabet,encripting_dict,decripting_dict, mutable_alphabet
static_alphabet=['a','á', 'b', 'c', 'd', 'e','é', 'f', 'g', 'h', 'i','í', 'j', 'k', 'l', 'm', 'n', 'o','ó','ö','ő', 'p', 'q', 'r', 's', 't', 'u','ú','ü','ű','v', 'w', 'x', 'y', 'z',"'",'"','#',':','_','.','-',',','!']
backup_alphabet=['a','á', 'b', 'c', 'd', 'e','é', 'f', 'g', 'h', 'i','í', 'j', 'k', 'l', 'm', 'n', 'o','ó','ö','ő', 'p', 'q', 'r', 's', 't', 'u','ú','ü','ű', 'v', 'w', 'x', 'y', 'z',"'",'"','#',':','_','.','-',',','!']
mutable_alphabet=[]
def randomizer(self):
mutable_alphabet=[]
import random
for element in static_alphabet:
randletter=backup_alphabet[random.randint(0,len(backup_alphabet)-1)]
while randletter==element:
randletter=backup_alphabet[random.randint(0,len(backup_alphabet)-1)]
if randletter!=element:
break
mutable_alphabet.append(randletter)
backup_alphabet.remove(randletter)
return mutable_alphabet
Basically, you are trying to access the return statement inside a function (def) inside a class. While your code does not currently appear to be constructed to take advantage of OOP, here is how you would run that function within the same file:
class generators():
...your code...
gen = generators()
outputArray = gen.randomizer()
# confirm output by printing it out
print( outputArray )
This would need to be placed outside of the class as I tried to illustrate above.

Python loops are missing results

I am reading a file with about 13,000 names on it into a list.
Then, I look at each character of each item on that list and if there is a match I remove that line from the list of 13,000.
If I run it once, it removes about half of the list. On the 11th run it seems to cut it down to 9%. Why is this script missing results? Why does it catch them with successive runs?
Using Python 3.
with open(fname) as f:
lines = f.read().splitlines()
bad_letters = ['B', 'C', 'F', 'G', 'H', 'J', 'L', 'O', 'P', 'Q', 'U', 'W', 'X']
def clean(callsigns, bad):
removeline = 0
for line in callsigns:
for character in line:
if character in bad:
removeline = 1
if removeline == 1:
lines.remove(line)
removeline = 0
return callsigns
for x in range (0, 11):
lines = clean(lines, bad_letters)
print (len(lines))
You are changing (i.e., mutating) the lines array while you're looping (i.e. iterating) over it. This is never a good idea because it means that you are changing something while you're reading it, which leads to you skipping over lines and not removing them in the first go.
There are many ways of fixing this. In the below example, we keep track of which lines to remove, and remove them in a separate loop in a way so that the indices do not change.
with open(fname) as f:
lines = f.read().splitlines()
bad_letters = ['B', 'C', 'F', 'G', 'H', 'J', 'L', 'O', 'P', 'Q', 'U', 'W', 'X']
def clean(callsigns, bad):
removeline = 0
to_remove = []
for line_i, line in enumerate(callsigns):
for b in bad:
if b in line:
# We're removing this line, take note of it.
to_remove.append(line_i)
break
# Remove the lines in a second step. Reverse it so the indices don't change.
for r in reversed(to_remove):
del callsigns[r]
return callsigns
for x in range (0, 11):
lines = clean(lines, bad_letters)
Save the names you want to keep in a separate list.. Maybe this way:-
with open(fname) as f:
lines = f.read().splitlines()
bad_letters = ['B', 'C', 'F', 'G', 'H', 'J', 'L', 'O', 'P', 'Q', 'U', 'W', 'X']
def clean(callsigns, bad):
valid = [i for i in callsigns if not any(j in i for j in bad)]
return valid
valid_names = clean(lines,bad_letters)
print (len(valid_names))

Categories

Resources