Load csv file in processing.py - python

I am trying to load a csv file in processing.py as a table. The Java environment allows me to use the loadTable() function, however, I'm unable to find an equivalent function in the python environment.

The missing functionality could be added as follows:
import csv
class Row(object):
def __init__(self, dict_row):
self.dict_row = dict_row
def getFloat(self, key):
return float(self.dict_row[key])
def getString(self, key):
return self.dict_row[key]
class loadTable(object):
def __init__(self, csv_filename, header):
with open(csv_filename, "rb") as f_input:
csv_input = csv.DictReader(f_input)
self.data = [Row(row) for row in csv_input]
def rows(self):
return self.data
This reads the csv file into memory using Python's csv.DictReader class. This treats each row in the csv file as a dictionary. For each row, it creates an instance of a Row class which then lets you retrieve entries in the format required. Currently I have just coded for getFloat() and getString() (which is the default format for all csv values).

You could create an empty Table object with this:
from processing.data import Table
t = Table()
And then populate it as discussed at https://discourse.processing.org/t/creating-an-empty-table-object-in-python-mode-and-some-other-hidden-data-classes/25121
But I think a Python Dict as proposed by #martin-evans would be nice. You load it like this:
import csv
from codecs import open # optional to have the 'enconding="utf-8"' in Python 2
with open("data/pokemon.csv", encoding="utf-8") as f:
data = list(csv.DictReader(f)) # a list of dicts, col-headers as keys

Related

Serializing a list of class instances in python

In python, I am trying to store a list to a file. I've tried pickle, json, etc, but none of them support classes being inside those lists. I can't sacrifice the lists or the classes, I must maintain both. How can I do it?
My current attempt:
try:
with open('file.json', 'r') as file:
allcards = json.load(file)
except:
allcards = []
def saveData(list):
with open('file.json', 'w') as file:
print(list)
json.dump(list, file, indent=2)
saveData is called elsewhere, and I've done all the testing I can and have determined the error comes from trying to save the list due to it's inclusion of classes. It throws me the error
Object of type Card is not JSON serializable
whenever I do the JSON method, and any other method doesn't even give errors but doesn't load the list when I reload the program.
Edit: As for the pickle method, here is what it looks like:
try:
with open('allcards.dat', 'rb') as file:
allcards = pickle.load(file)
print(allcards)
except:
allcards = []
class Card():
def __init__(self, owner, name, rarity, img, pack):
self.owner = str(owner)
self.name = str(name)
self.rarity = str(rarity)
self.img = img
self.pack = str(pack)
def saveData(list):
with open('allcards.dat', 'wb') as file:
pickle.dump(list, file)
When I do this, all that happens is the code runs as normal, but the list is not saved. And the print(allcards) does not trigger either which makes me believe it's somehow not detecting the file or causing some other error leading to it just going straight to the exception. Also, img is supposed to always a link, in case that changes anything.
I have no other way I believe I can help solve this issue, but I can post more code if need be.
Please help, and thanks in advance.
Python's built-in pickle module does not support serializing a python class, but there are libraries that extend the pickle module and provide this functionality. Drill and Cloudpickle both support serializing a python class and has the exact same interface as the pickle module.
Dill: https://github.com/uqfoundation/dill
Cloudpickle: https://github.com/cloudpipe/cloudpickle
//EDIT
The article linked below is good, but I've written a bad example.
This time I've created a new snippet from scratch -- sorry for making it earlier more complicated than it should.
import json
class Card(object):
#classmethod
def from_json(cls, data):
return cls(**data)
def __init__(self, figure, color):
self.figure = figure
self.color = color
def __repr__(self):
return f"<Card: [{self.figure} of {self.color}]>"
def save(cards):
with open('file.json', 'w') as f:
json.dump(cards, f, indent=4, default=lambda c: c.__dict__)
def load():
with open('file.json', 'r') as f:
obj_list = json.load(f)
return [Card.from_json(obj) for obj in obj_list]
cards = []
cards.append(Card("1", "clubs"))
cards.append(Card("K", "spades"))
save(cards)
cards_from_file = load()
print(cards_from_file)
Source

Loading multiple files with bobobo-etl

I'm new to bonobo-etl and I'm trying to write a job that loads multiple files at once but I can't get the CsvReader to work with the #use_context_processor annotation. A snippet of my code:
def input_file(self, context):
yield 'test1.csv'
yield 'test2.csv'
yield 'test3.csv'
#use_context_processor(input_file)
def extract(f):
return bonobo.CsvReader(path=f,delimiter='|')
def load(*args):
print(*args)
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(extract,load)
return graph
When I run the job I get something like <bonobo.nodes.io.csv.CsvReader object at 0x7f849678dc88> rather than the lines of the CSV.
If I hardcode the reader like graph.add_chain(bonobo.CsvReader(path='test1.csv',delimiter='|'),load), it works.
Any help would be appreciated.
Thank you.
As bonobo.CsvReader does not support (yet) to read file names from the input stream, you need use a custom reader for that.
Here is a solution that works for me on a set of csvs I have:
import bonobo
import bonobo.config
import bonobo.util
import glob
import csv
#bonobo.config.use_context
def read_multi_csv(context, name):
with open(name) as f:
reader = csv.reader(f, delimiter=';')
headers = next(reader)
if not context.output_type:
context.set_output_fields(headers)
for row in reader:
yield tuple(row)
def get_graph(**options):
graph = bonobo.Graph()
graph.add_chain(
glob.glob('prenoms_*.csv'),
read_multi_csv,
bonobo.PrettyPrinter(),
)
return graph
if __name__ == '__main__':
with bonobo.parse_args() as options:
bonobo.run(get_graph(**options))
Few comments on this snippet, in reading order:
use_context decorator will inject the node execution context to the transformation call, allowing to use .set_output_fields(...) using the first csv headers.
Other csv headers are ignored, in my case they're all the same. You may need a slightly more complex logic for your own case.
Then, we just generate the filenames in a bonobo.Graph instance using glob.glob (in my case, the stream will contain: prenoms_2004.csv prenoms_2005.csv ... prenoms_2011.csv prenoms_2012.csv) and pass it to our custom reader, which will be called once for each file, open it, and yield its lines.
Hope that helps!

How to store file locally to a class?

I have a class that is supposed to be able to read data from .csv files. In the __init__ of the class I read the file and store it locally to the class as self.csv_table. The problem is that when I try to access this variable in another function I get a ValueError: I/O operation on closed file. How can I avoid this error and instead print the file?
import csv
class CsvFile(object):
"""
A class that allows the user to read data from a csv file. Can read columns, rows, specific fields
"""
def __init__(self, file, delimiter="'", quotechar='"'):
"""
file: A string. The full path to the file and the file. /home/user/Documents/table.csv
delimter & quotechar: Strings that define how the table's rows and columns are constructed
return: the file in a way use-able to other functions
Initializes the csv file
"""
with open(file, 'r') as csv_file:
self.csv_table = csv.reader(csv_file, delimiter=delimiter, quotechar=quotechar) # local copy of csv file
def read_csv(self):
"""
Prints the csv file in a simple manner. Not much can be done with this.
"""
for row in self.csv_table:
print(', '.join(row))
my_file = CsvFile(file)
my_file.read_csv() # this one causes an I/O error
Here, your problem is that self.csv_table contains the file reference itself, not the file content. Once you're out of the "with" statement, the file is closed, and you can no longer access it.
Since you care about the content, you need to store your content in the csv_table by iterating the csv_reader, for instance in your __init__ function, you can do something like this:
def __init__(self, file, delimiter="'", quotechar='"'):
"""
file: A string. The full path to the file and the file. /home/user/Documents/table.csv
delimter & quotechar: Strings that define how the table's rows and columns are constructed
return: the file in a way use-able to other functions
Initializes the csv file
"""
self.csv_table = []
with open(file, 'r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=delimiter, quotechar=quotechar) # local copy of csv file
for data_entry in csv_reader:
self.csv_table.append(data_entry)
Then you'll be able to access the content in self.csv_table as a list of list.
Or, if you really care about the file, you need to reopen it, anytime you want to access it =>
Change your self.csv_table by self.csv_filename, and in your read_csv function, you just reopen the file and create the reader anytime you need =>
import csv
class CsvFile(object):
"""
A class that allows the user to read data from a csv file. Can read columns, rows, specific fields
"""
def __init__(self, filename, delimiter="'", quotechar='"'):
"""
filename: A string. The full path to the file and the file. /home/user/Documents/table.csv
delimter & quotechar: Strings that define how the table's rows and columns are constructed
return: the file in a way use-able to other functions
Initializes the csv file
"""
self.filename = filename
self.delimiter = delimiter
self.quotechar = quotechar
def read_csv(self):
"""
Prints the csv file in a simple manner. Not much can be done with this.
"""
with open(self.filename, 'r') as csv_file:
csv_table = csv.reader(csv_file, delimiter=self.delimiter, quotechar=self.quotechar)
for row in csv_table:
print(', '.join(row))
my_file = CsvFile(file)
my_file.read_csv() # this one causes an I/O error

Transferring CSV data into different Functions in Python

i need some help. Basically, i have to create a function to read a csv file then i have to transfer this data into another function to use the data to generate a xml file.
Here is my code:
import csv
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
from xml.etree.ElementTree import ElementTree
import xml.etree.ElementTree as etree
def read_csv():
with open ('1250_12.csv', 'r') as data:
reader = csv.reader(data)
return reader
def generate_xml(reader):
root = Element('Solution')
root.set('version','1.0')
tree = ElementTree(root)
head = SubElement(root, 'DrillHoles')
head.set('total_holes', '238')
description = SubElement(head,'description')
current_group = None
i = 0
for row in reader:
if i > 0:
x1,y1,z1,x2,y2,z2,cost = row
if current_group is None or i != current_group.text:
current_group = SubElement(description, 'hole',{'hole_id':"%s"%i})
information = SubElement (current_group, 'hole',{'collar':', '.join((x1,y1,z1)),
'toe':', '.join((x2,y2,z2)),
'cost': cost})
i+=1
def main():
reader = read_csv()
generate_xml(reader)
if __name__=='__main__':
main()
but i get an error when i try to pass reader, the error is: ValueError: I/O operation on closed file
Turning the reader into a list should work:
def read_csv():
with open ('1250_12.csv', 'r') as data:
return list(csv.reader(data))
You tried to read from a closed file. list will trigger the reader to read the whole file.
the with statement tells python to clean up the context manager (in this case, a file) once control exits its body. Since functions exit when they return, there's no way to get data out of it with the file still open.
Other answers suggest reading the whole thing into a list, and returning that; this works, but may be awkward if the file is very large.
Fortunately, we can use generators:
def read_csv():
with open('1250_12.csv', 'r') as data:
reader = csv.reader(data)
for row in reader:
yield row
Since we yield from inside the with, we don't have to clean up the file before getting some rows. Once the data is consumed, (or if the generator is itself cleaned up,) the file will be closed.
So when you read a csv file it is very important to put that file into a list. This is because most operations you cannot perform on the csv.reader file, and that if you do, once you loop through it and it is at the end of the file, you can no longer do anything with it unless you open it and read it again. So lets just change your read_csv function
def read_csv():
with open ('1250_12.csv', 'r') as data:
reader = csv.reader(data)
x = [row for row in reader]
return x
Now you are manipulating a list and everything should work perfectly!

Python CSV DictReader with UTF-8 data

AFAIK, the Python (v2.6) csv module can't handle unicode data by default, correct? In the Python docs there's an example on how to read from a UTF-8 encoded file. But this example only returns the CSV rows as a list.
I'd like to access the row columns by name as it is done by csv.DictReader but with UTF-8 encoded CSV input file.
Can anyone tell me how to do this in an efficient way? I will have to process CSV files in 100's of MByte in size.
I came up with an answer myself:
def UnicodeDictReader(utf8_data, **kwargs):
csv_reader = csv.DictReader(utf8_data, **kwargs)
for row in csv_reader:
yield {unicode(key, 'utf-8'):unicode(value, 'utf-8') for key, value in row.iteritems()}
Note: This has been updated so keys are decoded per the suggestion in the comments
For me, the key was not in manipulating the csv DictReader args, but the file opener itself. This did the trick:
with open(filepath, mode="r", encoding="utf-8-sig") as csv_file:
csv_reader = csv.DictReader(csv_file)
No special class required. Now I can open files either with or without BOM without crashing.
First of all, use the 2.6 version of the documentation. It can change for each release. It says clearly that it doesn't support Unicode but it does support UTF-8. Technically, these are not the same thing. As the docs say:
The csv module doesn’t directly support reading and writing Unicode, but it is 8-bit-clean save for some problems with ASCII NUL characters. So you can write functions or classes that handle the encoding and decoding for you as long as you avoid encodings like UTF-16 that use NULs. UTF-8 is recommended.
The example below (from the docs) shows how to create two functions that correctly read text as UTF-8 as CSV. You should know that csv.reader() always returns a DictReader object.
import csv
def unicode_csv_reader(unicode_csv_data, dialect=csv.excel, **kwargs):
# csv.py doesn't do Unicode; encode temporarily as UTF-8:
csv_reader = csv.DictReader(utf_8_encoder(unicode_csv_data),
dialect=dialect, **kwargs)
for row in csv_reader:
# decode UTF-8 back to Unicode, cell by cell:
yield [unicode(cell, 'utf-8') for cell in row]
A classed based approach to #LMatter answer, with this approach you still get all the benefits of DictReader such as getting the fieldnames and getting the line number plus it handles UTF-8
import csv
class UnicodeDictReader(csv.DictReader, object):
def next(self):
row = super(UnicodeDictReader, self).next()
return {unicode(key, 'utf-8'): unicode(value, 'utf-8') for key, value in row.iteritems()}
That's easy with the unicodecsv package.
# pip install unicodecsv
import unicodecsv as csv
with open('your_file.csv') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
print(row)
The csvw package has other functionality as well (for metadata-enriched CSV for the Web), but it defines a UnicodeDictReader class wrapping around its UnicodeReader class, which at its core does exactly that:
class UnicodeReader(Iterator):
"""Read Unicode data from a csv file."""
[…]
def _next_row(self):
self.lineno += 1
return [
s if isinstance(s, text_type) else s.decode(self._reader_encoding)
for s in next(self.reader)]
It did catch me off a few times, but csvw.UnicodeDictReader really, really needs to be used in a with block and breaks otherwise. Other than that, the module is nicely generic and compatible with both py2 and py3.
The answer doesn't have the DictWriter methods, so here is the updated class:
class DictUnicodeWriter(object):
def __init__(self, f, fieldnames, dialect=csv.excel, encoding="utf-8", **kwds):
self.fieldnames = fieldnames # list of keys for the dict
# Redirect output to a queue
self.queue = cStringIO.StringIO()
self.writer = csv.DictWriter(self.queue, fieldnames, dialect=dialect, **kwds)
self.stream = f
self.encoder = codecs.getincrementalencoder(encoding)()
def writerow(self, row):
self.writer.writerow({k: v.encode("utf-8") for k, v in row.items()})
# Fetch UTF-8 output from the queue ...
data = self.queue.getvalue()
data = data.decode("utf-8")
# ... and reencode it into the target encoding
data = self.encoder.encode(data)
# write to the target stream
self.stream.write(data)
# empty queue
self.queue.truncate(0)
def writerows(self, rows):
for row in rows:
self.writerow(row)
def writeheader(self):
header = dict(zip(self.fieldnames, self.fieldnames))
self.writerow(header)

Categories

Resources