How to pickle a dictionary with classes in the hash? - python

So I have an program with the list of Class's (TestEVAL) and a I use a QMdiArea to present the multiple Class data in a subwindow. I am trying to enhance my pickle to include the QSubwindow data which is stored in it's subwindow widget with the following code:
with open(fname, 'wb') as fout:
for dataset in self.datasets:
# Temp save chdlg - do not want saved
ochdlg = dataset.chdlg
dataset.chdlg = None
pickle.dump(dataset, fout)
# Restore chdlg
dataset.chdlg = ochdlg
pickle.dump('plots', fout)
for sub in self.mdi.subWindowList():
mw = sub.widget()
old_plot = dict()
old_plot['title'] = sub.windowTitle()
old_plot['setup'] = mw.setup
pickle.dump(old_plot, fout)
pickle.dump(mw.channels, fout)
fout.close()
At which point I get a:
Traceback (most recent call last):
File "TestEVAL.pyw", >line 1448, in saveas_session
pickle.dump(mw.channels, fout)
File "AppData\Local\Continuum\Anaconda\lib\copy_reg.py", >line 71, in _reduce_ex
state = base(self)
TypeError: the sip.wrapper type cannot be instantiated or sub-classed
Since mw.channels is a dict with hash values from self.datasets (not a simple object) I think I need to do something more? Maybe I need to implement a .setstate() and .getstate() (with __ before and after!) But I do not understand what it returns in getstate. The dataset class is large so I do not want to duplicate it's pickle information. I could add a dataset.id and hash the mw.channels[dataset.id] but this is not clear to me. furthermore, mw.channels[dataset] = values and values is a list of another Class (headers) with Meta states associated (not too big data wise). What do I do?

Related

How do I process numerous files using numerous regex conditions?

I want to process the cna and linear_cna files by reading only lines that do not contain either Hugo_Symbol or -01.
import os
import re
class DataProcessing:
def __init__(self, data):
self.line = [line.rstrip('\n') for line in data]
self.data = data
def read_data(self):
with open(self.data):
pass
return self.line
def read_cna(self):
# In cna and linear_cna files, skip lines that either begin with "Hugo_Symbol" or "-01"
for lines in self.line:
cna_lines = [lines for l in cna if not re.findall(r"^(Hugo_Symbol|[-01])", l)]
return cna_lines
...continue...
dp_cna = DataProcessing("data_cna.txt")
dp_linear_cna = DataProcessing("data_linear_cna.txt")
dp_cna.read_data()
dp_linear_cna.read_data()
Traceback:
Traceback (most recent call last):
File "C:/Users/User/PycharmProjects/testing/main.py", line 24, in <module>
cna = DataProcessing.read_data("data_cna.txt")
File "C:/Users/User/PycharmProjects/testing/main.py", line 14, in read_data
with open(self.data) as f:
AttributeError: 'str' object has no attribute 'data'
The right way to use your class consists of two steps.
Step 1: Create an instance of DataProcessing by invoking __init__. You do this by declaring dp = DataProcessing("data_cna.txt"). You can replace dp with any name you want.
Now dp is an instance of DataProcessing. Its data field is set to "DataProcessing". In other words, dp remembers the name of the file.
Step 2: Call read_data on dp. Note that read_data has only one parameter, namely self, which should not be passed as an argument, meaning it takes no arguments. Therefore, the right way to call read_data is just read_data(). To call read_data on dp you do dp.read_data().

How to open a csv file for reading purpose using mmap in python?

I want to open csv file for reading purpose. But I'm facing some exceptions regarding to that.
I'm using Python 2.7.
main.python-
if __name__ == "__main__":
f = open('input.csv','r+b')
m = mmap.mmap(f.fileno(), 0, prot=mmap.PROT_READ)
reader = csv.DictReader(iter(m.readline, ""))
for read in reader:
num = read['time']
print num
output-
Traceback (most recent call last):
File "/home/PycharmProjects/time_gap_Task/main.py", line 22, in <module>
for read in reader:
File "/usr/lib/python3.4/csv.py", line 109, in __next__
self.fieldnames
File "/usr/lib/python3.4/csv.py", line 96, in fieldnames
self._fieldnames = next(self.reader)
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
How to resolve this error? and how to open csv file using mmap and csv in good manner so code is working perfect?
I know you asked this a while ago, but I actually created a module for myself that does this, because I do a lot of work with large CSV files, and sometimes I need to convert them into dictionaries, based on a key. Below is the code I've been using. Please feel free to modify as needed.
def MmapCsvFileIntoDict(csvFilePath, skipHeader = True, transform = lambda row: row, keySelector = lambda o: o):
"""
Takes a CSV file path and uses mmap to open the file and return a dictionary of the contents keyed
on the results of the keySelector. The default key is the transformed object itself. Mmap is used because it is
a more efficient way to process large files.
The transform method is used to convert the line (converted into a list) into something else. Hence 'transform'.
If you don't pass it in, the transform returns the list itself.
"""
contents = {}
firstline = False
try:
with open(csvFilePath, "r+b") as f:
# memory-map the file, size 0 means whole file
mm = mmap.mmap(f.fileno(), 0)
for line in iter(mm.readline, b''):
if firstline == False:
firstline = True
if skipHeader == True:
continue
row = ''
line = line.decode('utf-8')
line = line.strip()
row = next(csv.reader([line]), '')
if transform != None and callable(transform):
if row == None or row == '':
continue
value = transform(row)
else:
value = row
if callable(keySelector):
key = keySelector(value)
else:
key = keySelector
contents[key] = value
except IOError as ie:
PrintWithTs('Error decomposing the companies: {0}'.format(ie))
return {}
except:
raise
return contents
When you call this method, you have some options.
Assume you have a file that looks like:
Id, Name, PhoneNumber
1, Joe, 7175551212
2, Mary, 4125551212
3, Vince, 2155551212
4, Jane, 8145551212
The easiest way to call it is like this:
dict = MmapCsvFileIntoDict('/path/to/file.csv', keySelector = lambda row: row[0])
What you get back is a dict looking like this:
{ '1' : ['1', 'Joe', '7175551212'], '2' : ['2', 'Mary', '4125551212'] ...
One thing I like to do is create a class or a namedtuple to represent my data:
class CsvData:
def __init__(self, row):
self.Id = int(row[0])
self.Name = row[1].upper()
self.Phone = int(row[2])
And then when I call the method, I pass in a second lambda to transform each row in the file to an object I can work with:
dict = MmapCsvFileIntoDict('/path/to/file.csv', transform = lambda row: CsvData(row), keySelector = lambda o: o.Id)
What I get back that time looks like:
{ 1 : <object instance>, 2 : <object instance>...
I hope this helps! Best of luck
When open a file with the flag b like this:
f = open('input.csv','r+b')
You read the file as bytes and not as string.
So, try to change the flags to r:
f = open('input.csv','r')
if you just want to read data with specific columnes from csv file, just try:
import csv
with open('input.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print row['time']

Python, write json / dictionary objects to a file iteratively (one at a time)

I have a large for loop in which I create json objects and I would like to be able to stream write the object in each iteration to a file. I would like to be able to use the file later in a similar fashion later (read objects one at a time).
My json objects contain newlines and I can't just dump each object as a line in a file.
How can I achieve this?
To make it more concrete, consider the following:
for _id in collection:
dict_obj = build_dict(_id) # build a dictionary object
with open('file.json', 'a') as f:
stream_dump(dict_obj, f)
stream_dump is the function that I want.
Note that I don't want to create a large list and dump the whole list using something like json.dump(obj, file). I want to be able to append the object to the file in each iteration.
Thanks.
You need to work with a subclass of JSONEncoder and then proxy the build_dict function
from __future__ import (absolute_import, division, print_function,)
# unicode_literals)
import collections
import json
mycollection = [1, 2, 3, 4]
def build_dict(_id):
d = dict()
d['my_' + str(_id)] = _id
return d
class SeqProxy(collections.Sequence):
def __init__(self, func, coll, *args, **kwargs):
super(SeqProxy, *args, **kwargs)
self.func = func
self.coll = coll
def __len__(self):
return len(self.coll)
def __getitem__(self, key):
return self.func(self.coll[key])
class JsonEncoderProxy(json.JSONEncoder):
def default(self, o):
try:
iterable = iter(o)
except TypeError:
pass
else:
return list(iterable)
# Let the base class default method raise the TypeError
return json.JSONEncoder.default(self, o)
jsonencoder = JsonEncoderProxy()
collproxy = SeqProxy(build_dict, mycollection)
for chunk in jsonencoder.iterencode(collproxy):
print(chunk)
Ouput:
[
{
"my_1"
:
1
}
,
{
"my_2"
:
2
}
,
{
"my_3"
:
3
}
,
{
"my_4"
:
4
}
]
To read it back chunk by chunk you need to use JSONDecoder and pass a callable as object_hook. This hook will be called with each new decoded object (each dict in your list) when you call JSONDecoder.decode(json_string)
Since you are generating the files yourself, you can simply write out one JSON object per line:
for _id in collection:
dict_obj = build_dict(_id) # build a dictionary object
with open('file.json', 'a') as f:
f.write(json.dumps(dict_obj))
f.write('\n')
And then read them in by iterating over lines:
with open('file.json', 'r') as f:
for line in f:
dict_obj = json.loads(line)
This isn't a great general solution, but it's a simple one if you are both the generator and consumer.
Simplest solution:
Remove all whitespace characters from your json document:
import string
def remove_whitespaces(txt):
""" We shall remove all whitespaces"""
for chr in string.whitespace:
txt = txt.replace(chr)
Obviously you could also json.dumps(json.loads(json_txt)) (BTW this also verify that the text is a valid json).
Now you could write you documents to a file one line each.
Second solution:
Create an [AnyStr]Io stream, write in the Io a valid document, (your documents being part of an object or list) and then write the io in a file (or upload it to the cloud).

Objects for words in a Python flashcard program

I am making a flashcard program in which I take a text file that contains several columns, such as english word, french equivalent, gender, type of word, etc. My idea was to create a loop that read each line of the text file, separating by tabs, and makes an instance of a user-defined Word object for each line.
In the following block code I import the text file, process it into a list, then attempt to create an instance of a previously defined object: Word. I would like the object to have the second item on the list for it's name so that it is easily searchable, but it's not letting me do this, please can somebody help me with the code:
file = (open('dictionary.txt', 'r')).readline()
import re
line_list = re.split(r'\t', file.rstrip('\n'))
line_list[1] = Word(line_list[0], line_list[1], line_list[2], line_list[3])
Create a dict of instances and use the second item of the lists as key. It's a bad idea to create dynamic variables.
import re
instance_dict = {}
with open('dictionary.txt') as f:
for line in f:
line_list = re.split(r'\t', line.rstrip('\n'))
instance_dict[line_list[1]] = Word(*line_list[:4])
Why the with statement?
It is good practice to use the with keyword when dealing with file
objects. This has the advantage that the file is properly closed after
its suite finishes, even if an exception is raised on the way.
You can also use the csv module:
import csv
instances = {}
with open('dictionary.txt', 'rb') as f:
reader = csv.reader(f, delimiter='\t')
instances = {line[1]: Word(*line) for line in reader}
Here's a cleaner solution using a namedtuple. You'll end up with a dict called "words" which you use to lookup each by name.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import pprint
from collections import namedtuple
Word = namedtuple('Word', ['name', 'french', 'gender', 'type_'])
words = {}
with open('dictionary.txt', 'rU') as fin:
for word in (Word(*r.rstrip('\n').split('\t')) for r in fin):
words[word.name] = word
pprint.pprint(words)
Firstly, it's better to use with, as statements to get input from files, as the closing procedures are automatically taken care of. Secondly, to read ALL of the lines from a file, you must use readlines() rather than readline(). Try something like this :
with open('dictionary.txt','r') as file :
line_list = file.readlines()
splitLineList = []
for lines in line_list :
splitLineList.append(re.split(r'\t',lines.strip('\n'))
You may have an appropriate solution depending on few clarification on your requirements
"My idea was to create a loop that read each line of the text file,
separating by tabs, and"
If the text file is already pre-validated or reliable to ignore error-handling (e.g. not evenly separated by single tabs).
with open('dictionary.txt', 'r') as f:
[line.strip().split("\t")
for line in f.read().split("\n")
if line.strip()]
will get you the (comprehensive) list required to create Word object instances, without using re
"then attempt to create an instance of a previously defined object:
Word."
with open('dictionary.txt', 'r') as f:
[Word(line.strip().split("\t"))
for line in f.read().split("\n")
if line.strip()]
"I would like the object to have the second item on the list for it's
name so that it is easily searchable,"
Can you rewrite this with an example?
but it's not letting me do this,
line_list[1] = Word(line_list[0], line_list[1], line_list[2], line_list[3])
Sorry I am loosing you here, why are using line_list[1] to refer newly created Word instances where line_list[1] itself is an argument ?
With your clarification, I would have something like this
Reworked Code:
from pprint import pprint
My assumption on your Class definition:
class Word():
def __init__(self, **kwargs):
self.set_attrs(**kwargs)
def __call__(self):
return self.get_attr("swedish_word")
def set_attrs(self, **kwargs):
for k, v in kwargs.iteritems():
setattr(self, k, v)
def get_attr(self, attr):
return getattr(self, attr)
def get_attrs(self):
return ({attr.upper():getattr(self, attr) for attr in self.__dict__.keys()})
def print_attrs(self):
pprint(self.get_attrs())
if __name__ == '__main__':
# sample entries in dictionary.txt
# swedish_word english_word article word_type
# hund dog ett noun
# katt cat ett noun
# sova sleep ett verb
with open('dictionary.txt', 'r') as f:
header = f.readline().strip().split("\t")
instances = [Word(**dict(zip(header, line.strip().split("\t"))))
for line in f.read().split("\n")
if line.strip()]
# for line in f.read().split("\n"):
# data = dict(zip(header, line.strip().split("\t")))
# w = Word(**data)
You can get instance properties for a given swedish_word like this
def print_swedish_word_properties(swedish_word):
for instance in instances:
if instance() == swedish_word:
print "Properties for Swedish Word:", swedish_word
instance.print_attrs()
print_swedish_word_properties("hund")
to have output like this
Properties for Swedish Word: hund
{'ARTICLE': 'ett',
'ENGLISH_WORD': 'dog',
'SWEDISH_WORD': 'hund',
'WORD_TYPE': 'noun'}
or you can use any other class methods to search instances on various attributes

csv2json.py error

I am trying to run the script csv2json.py in the Command Prompt, but I get this error:
C:\Users\A\Documents\PROJECTS\Django\sw2>csv2json.py csvtest1.csv wkw1.Lawyer
Converting C:\Users\A\Documents\PROJECTS\Django\sw2csvtest1.csv from CSV to JSON as C:\Users\A\Documents\PROJECTS\Django\sw2csvtest1.csv.json
Traceback (most recent call last):
File "C:\Users\A\Documents\PROJECTS\Django\sw2\csv2json.py", line 37, in <module>
f = open(in_file, 'r' )
IOError: [Errno 2] No such file or directory: 'C:\\Users\\A\\Documents\\PROJECTS\\Django\\sw2csvtest1.csv'
Here are the relevant lines from the snippet:
31 in_file = dirname(__file__) + input_file_name
32 out_file = dirname(__file__) + input_file_name + ".json"
34 print "Converting %s from CSV to JSON as %s" % (in_file, out_file)
36 f = open(in_file, 'r' )
37 fo = open(out_file, 'w')
It seems that the directory name and file name are combined. How can I make this script run?
Thanks.
Edit:
Altering lines 31 and 32 as answered by Denis Otkidach worked fine. But I realized that the first column name needs to be pk and each row needs to start with an integer:
for row in reader:
if not header_row:
header_row = row
continue
pk = row[0]
model = model_name
fields = {}
for i in range(len(row)-1):
active_field = row[i+1]
So my csv row now looks like this (including the header row):
pk, firm_url, firm_name, first, last, school, year_graduated
1, http://www.graychase.com/aabbas, Gray & Chase, Amr A, Babas, The George Washington University Law School, 2005
Is this a requirement of the django fixture or json format? If so, I need to find a way to add the pk numbers to each row. Can I delete this pk column? Any suggestions?
Edit 2
I keep getting this ValidationError: "This value must be an integer". There is only one integer field and that's the pk. Is there a way to find out from the traceback what the line numbers refer to?
Problem installing fixture 'C:\Users\A\Documents\Projects\Django\sw2\wkw2\fixtures\csvtest1.csv.json': Traceback (most recent call last):
File "C:\Python26\Lib\site-packages\django\core\management\commands\loaddata.py", line 150, in handle
for obj in objects:
File "C:\Python26\lib\site-packages\django\core\serializers\json.py", line 41, in Deserializer
for obj in PythonDeserializer(simplejson.load(stream)):
File "C:\Python26\lib\site-packages\django\core\serializers\python.py", line 95, in Deserializer
data[field.attname] = field.rel.to._meta.get_field(field.rel.field_name).to_python(field_value)
File "C:\Python26\lib\site-packages\django\db\models\fields\__init__.py", line 356, in to_python
_("This value must be an integer."))
ValidationError: This value must be an integer.
+ is used incorrectly here, the proper way to combine directory name and file name is using os.path.join(). But there is no need to combine directory where script is located with file name, since it's common to pass relative path to current working directory. So, change lines 31-32 to the following:
in_file = input_file_name
out_file = in_file + '.json'
from os import path
in_file = path.join(dirname(__file__), input_file_name )
out_file = path.join(dirname(__file__), input_file_name + ".json" )
[...]
You should be using os.path.join rather than just concatenating dirname() and filenames.
import os.path
in_file = os.path.join(dirname(__file__), input_file_name)
out_file = os.path.join(dirname(__file__), input_file_name + ".json")
will fix your problem, though depending on what exactly you're doing, there's probably a more elegant way to do it.

Categories

Resources