Python methods from csv

Python methods from csv - python

I am working on an assignment where I create "instances" of cities using rows in a .csv, then use these instances in methods to calculate distance and population change. Creating the instances works fine (using steps 1-4 below), until I try to call printDistance:
##Step 1. Open and read CityPop.csv
with open('CityPop.csv', 'r', newline='') as f:
try:
reader = csv.DictReader(f)
##Step 2. Create "City" class
class City:
##Step 3. Use _init method to assign attribute values
def __init__(self, row, header):
self.__dict__ = dict(zip(header, row))
##Step 4. Create "Cities" list
data = list(csv.reader(open('CityPop.csv')))
instances = [City(i, data[0]) for i in data[1:]]
##Step 5. Create printDistance method within "Cities" class
def printDistance(self, othercity, instances):
dist=math.acos((math.sin(math.radians(self.lat)))*(math.sin(math.radians(othercity.lat)))+(math.cos(math.radians(self.lat)))*(math.cos(math.radians(othercity.lat)))*(math.cos(math.radians(self.lon-othercity.lon)))) * 6300 (self.lat, self.lon, othercity.lat, othercity.lon)
When I enter instances[0].printDistance(instances1) in the shell, I get the error:
`NameError: name 'instances' is not defined`
Is this an indentation problem? Should I be calling the function from within the code, not the shell?

Nested functions must not contain self as parameter because they are not member functions. Class cannot pass instance variables to them. You are infact passing the same self from parent to child function.
Also you must not nest constructor, this is only for initiation purpose. Create a separate method indeed.
And try creating instance variable inside the constructor, and that is what init for !
self.instances = [self.getInstance(i, data[0]) for i in data[1:]]
Also create seperate function for instantiation
#classmethod
def getInstance(cls,d1,d2):
return cls(d1,d2)

This is not so much an indentation problem, but more of a general code structure problem. You're nesting a lot:
All the actual work on an incredibly long line (with errors)
Inside of function (correctly) printDistance
Inside of a constructor __init__
Inside of a class definition (correctly) City
Inside of a try block
Inside of a with block
I think this is what you are trying to do:
create a class City, which can print the distance of itself to other cities
generate a list of these City objects from a .csv that somehow has both distances and population (you should probably provide an example of data)
do so in a fault-tolerant and clean way (hence the try and the with)
The reason your instances isn't working is because, unlike you think, it's probably not being created correctly, or at least not in the correct context. And it certainly won't be available to you on the CLI due to all of the nesting.
There's a number of blatant bugs in your code:
What's the (self.lat, self.lon, othercity.lat, othercity.lon) at the end of the last line?
Why are you opening the file for reading twice? You're not even using the first reader
You are bluntly assigning column headers from a .csv as object attributes, but are misspelling their use (lat instead of latitude and lon instead of longitude)
It looks a bit like a lot of code found in various places got pasted together into one clump - this is what it looks like when cleaned up:
import csv
import math
class City:
def print_distance(self, other_city):
print(f'{self.city} to {other_city.city}')
# what a mess...
print(math.acos(
(math.sin(math.radians(float(self.latitude)))) * (math.sin(math.radians(float(other_city.latitude)))) + (
math.cos(math.radians(float(self.latitude)))) * (math.cos(math.radians(float(other_city.latitude)))) * (
math.cos(math.radians(float(self.longitude) - float(other_city.longitude))))) * 6300)
def __init__(self, values, attribute_names):
# this is *nasty* - much better to add the attributes explicitly, but left as original
# also, note that you're reading strings and floats here, but they are all stored as str
self.__dict__ = dict(zip(attribute_names, values))
with open('CityPop.csv', 'r', newline='') as f:
try:
reader = csv.reader(f)
header = next(reader)
cities = [City(row, header) for row in reader]
for city_1 in cities:
for city_2 in cities:
city_1.print_distance(city_2)
except Exception as e:
print(f'Apparently were doing something with this error: {e}')
Note how print_distance is now a method of City, which is called on each instance of City in cities (which is what I renamed instances to).
Now, if you are really trying, this makes more sense:
import csv
import math
class City:
def print_distance(self, other_city):
print(f'{self.name} to {other_city.name}')
# not a lot better, but some at least
print(
math.acos(
math.sin(math.radians(self.lat)) *
math.sin(math.radians(other_city.lat))
+
math.cos(math.radians(self.lat)) *
math.cos(math.radians(other_city.lat)) *
math.cos(math.radians(self.lon - other_city.lon))
) * 6300
)
def __init__(self, lat, lon, name):
self.lat = float(lat)
self.lon = float(lon)
self.name = str(name)
try:
with open('CityPop.csv', 'r', newline='') as f:
reader = csv.reader(f)
header = next(reader)
cities = [City(lat=row[1], lon=row[2], name=row[4]) for row in reader]
for city_1 in cities:
for city_2 in cities:
city_1.print_distance(city_2)
except FileNotFoundError:
print(f'Could not find the input file.')
Note the cleaned up computation, the catching of an error that could be expected to occur (with the with insides the try block) and a proper constructor that assigns what it needs with the correct type, while the reader decides which fields go where.
Finally, as a bonus: nobody should be writing distance calculations like this. Plenty libraries exist that do a much better job of this, like GeoPy. All you need to do is pip install geopy to get it and then you can use this:
import csv
import geopy.distance
class City:
def calc_distance(self, other_city):
return geopy.distance.geodesic(
(self.lat, self.lon),
(other_city.lat, other_city.lon)
).km
def __init__(self, lat, lon, name):
self.lat = float(lat)
self.lon = float(lon)
self.name = str(name)
try:
with open('CityPop.csv', 'r', newline='') as f:
reader = csv.reader(f)
header = next(reader)
cities = [City(lat=row[1], lon=row[2], name=row[4]) for row in reader]
for city_1 in cities:
for city_2 in cities:
print(city_1.calc_distance(city_2))
except FileNotFoundError:
print(f'Could not find the input file.')
Note that I moved the print out of the method as well, since it makes more sense to calculate in the object and print outside it. The nice thing about all this is that the calculation now uses a proper geodesic (WGS-84) to do the calculation and the odds of math errors are drastically reduced. If you must use a simple sphere, the library has functions for that as well.

Related

Applying Class to Items in a List

What I'm trying to do is allow any number of attributes to be supplied to a function. This function will handle creating a class based on those attributes. Then, I've got another function that will handle importing data from a text file, applying the generated class to each item, and adding it to a list. Below is what I have.
def create_class(attributes):
class classObject:
def __init__(self, **attributes):
for attr in attributes.keys():
self.__dict__[attr] = attributes[attr]
return classObject
def file_to_list(file, attributes):
classObject = create_class(attributes)
with open(file, "r") as f:
var = []
for line in f.readlines():
var.append(classObject(line))
return var
data = file_to_list("file.txt", ["propA", "propB"])
The issue is with how I'm trying to add the item to the list. Normally, I wouldn't have any issue, but I believe the way in which I'm creating the class is causing issues with how I usually do it.
File "file.py", line 17, in file_to_list
var.append(classObject(line))
TypeError: init() takes 1 positional argument but 2 were given
How do I loop through each of the attributes of the class, so that I can set the value for each and add it to the list?
UPDATE:
Below is an example of what file.txt looks like.
1A,1B
2A,2B
3A,3B

It looks like your class generation is wrong. You appear to want to be able to do:
Cls = create_class(["some", "attributes", "go", "here"])
and end up with a class object that looks like:
class Cls(object):
def __init__(self, some, attributes, go, here):
self.some = some
self.attributes = attributes
self.go = go
self.here = here
but what you're actually doing is creating a class that takes a dictionary, and gives that dictionary dot-syntax.
>>> obj = Cls({"different": "attributes", "go": "here"})
>>> obj.different
"attributes"
>>> obj.go
"here"
You can implement the former with:
def create_class(attributes: typing.List[str]):
class gen_class(object):
def __init__(self, *args):
if len(args) != len(attributes):
# how do you handle the case where the caller specifies fewer or more
# arguments than the generated class expects? I would throw a...
raise ValueError(f"Wrong number of arguments (expected {len(attributes)}, got {len(args)}.")
for attr, value in zip(attributes, args):
setattr(self, attr, value)
Then you should be able to use csv.reader to read in your file and instantiate those classes.
import csv
CSV_Cls = create_class(["propA", "propB"])
with open(file) as f:
reader = csv.reader(f)
data = [CSV_Cls(*row) for row in reader]
However, it does seem that writing your own code generator to make that class is the wrong choice here. Why not used a collections.namedtuple instead?
from collections import namedtuple
CSV_Cls = namedtuple("CSV_Cls", "propA propB")
with open(file) as f:
reader = csv.reader(f)
data = [CSV_Cls(*row) for row in reader]
This stdlib codegen is already written, known to work (and heavily tested) and won't accidentally introduce errors. The only reason to prefer a class is if you need to tightly-couple some behavior to the data, or if you need a mutable data structure

First, why not use type for this instead? It's the default metaclass, i.e. a callable that creates class objects. The class dict will be the third argument, which makes it easy to create programmatically.
type(name, (), attributes)
(You probably don't need any base classes, but that's what the second argument is for.)
Second, your __init__ doesn't appear to accept a str, which is the only thing you can get from readlines(). It takes only self (implied) and keyword arguments.
You could perhaps convert the line str to a dict (but that depends on what's in it), and then use the dict as your kwargs, like classObject(**kwargs), but then there's probably no point in declaring it with stars in the __init__ method in the first place.

pythonic way: Iterator chaining and resource management

I find it difficult to articulate smoothly chained iterators and resource management in Python.
It will probably be clearer by examining a concrete example:
I have this little program that works on a bunch of similar, yet different csv files. As they are shared with other co-workers, I need to open and close them frequently. Moreover, I need to transform and filter their content. So I have a lot of different fonctions of this kind:
def doSomething(fpath):
with open(fpath) as fh:
r=csv.reader(fh, delimiter=';')
s=imap(lambda row: fn(row), r)
t=ifilter(lambda row: test(row), s)
for row in t:
doTheThing(row)
That's nice and readable, but, as I said, I have a lot of those and I end up copy-pasting a lot more than I'd wish. But of course I can't refactor the common code into a function returning an iterator:
def iteratorOver(fpath):
with open(fpath) as fh:
r=csv.reader(fh, delimiter=';')
return r #oops! fh is closed by the time I use it
A first step to refactor the code would be to create another 'with-enabled' class:
def openCsv(fpath):
class CsvManager(object):
def __init__(self, fpath):
self.fh=open(fpath)
def __enter__(self):
return csv.reader(self.fh, delimiter=';')
def __exit__(self, type, value, traceback):
self.fh.close()
and then:
with openCsv('a_path') as r:
s=imap(lambda row: fn(row), r)
t=ifilter(lambda row: test(row), s)
for row in t:
doTheThing(row)
But I only reduced the boilerplate of each function by one step.
So what is the pythonic way to refactor such a code? My c++ background is getting in the way I think.

You can use generators; these produce an iterable you can then pass to other objects. For example, a generator yielding all the rows in a CSV file:
def iteratorOver(fpath):
with open(fpath) as fh:
r = csv.reader(fh, delimiter=';')
for row in r:
yield row
Because a generator function pauses whenever you are not iterating over it, the function doesn't exit until the loop is complete and the with statement won't close the file.
You can now use that generator in a filter:
rows = iteratorOver('some path')
filtered = ifilter(test, rows)
etc.

How to return to the beginning of a DictReader?

If I call the company_at_node method (shown below) twice, it will only print a row for the first call. I thought maybe that I needed to seek back to the beginning of the reader for the next call, so I added
self.companies.seek(0)
to the end of the company_at_node method but DictReader has no attribute seek. Since the file is never closed (and since I didn't get an error message to that effect), I didn't think this was a ValueError i/o operation on closed file (which there are numerous questions about on SO)
Is there a way to return to the beginning of a DictReader to iterate through a second time (i.e. a second function call)?
class CSVReader:
def __init__(self):
f = open('myfile.csv')
self.companies = csv.DictReader(f)
def company_at_node(self, node):
for row in self.companies:
if row['nodeid'] == node:
print row
self.companies.seek(0)

You need to do f.seek(0) instead of DictReader. Then, you can modify your code to be able to access file. This should work:
class CSVReader:
def __init__(self):
self.f = open('myfile.csv')
self.companies = csv.DictReader(f)
def company_at_node(self, node):
for row in self.companies:
if row['nodeid'] == node:
print row
self.f.seek(0)

In reader = csv.DictReader(f) the instance reader is an iterator. An iterator emits a unit of data on each explicit/ implicit invocation of __next__ on it. Now that process is called consuming the iterator, which can happen only once. This is how the iterator construct provides the ultimate memory efficiency. So if you want random indexing make a sequence out of it like,
rows = list(reader)

Can I "detect" a slicing expression in a python class method?

I am developing an application where I have defined a "variable" object containing data in the form of a numpy array. These variables are linked to (netcdf) data files, and I would like to dynamically load the variable values when needed instead of loading all data from the sometimes huge files at the start.
The following snippet demonstrates the principle and works well, including access to data portions with slices. For example, you can write:
a = var() # empty variable
print a.values[7] # values have been automatically "loaded"
or even:
a = var()
a[7] = 0
However, this code still forces me to load the entire variable data at once. Netcdf (with the netCDF4 library) would allow me to directly access data slices from the file. Example:
f = netCDF4.Dataset(filename, "r")
print f.variables["a"][7]
I cannot use the netcdf variable objects directly, because my application is tied to a web service which cannot remember the netcdf file handler, and also because the variable data don't always come from netcdf files, but may originate from other sources such as OGC web services.
Is there a way to "capture" the slicing expression in the property or setter methods and use them? The idea would be to write something like:
#property
def values(self):
if self._values is None:
self._values = np.arange(10.)[slice] # load from file ...
return self._values
instead of the code below.
Working demo:
import numpy as np
class var(object):
def __init__(self, values=None, metadata=None):
if values is None:
self._values = None
else:
self._values = np.array(values)
self.metadata = metadata # just to demonstrate that var has mor than just values
#property
def values(self):
if self._values is None:
self._values = np.arange(10.) # load from file ...
return self._values
#values.setter
def values(self, values):
self._values = values
First thought: Should I perhaps create values as a separate class and then use __getitem__? See In python, how do I create two index slicing for my own matrix class?

No, you cannot detect what will be done to the object after returning from .values. The result could be stored in a variable and only (much later on) be sliced, or sliced in different places, or used in its entirety, etc.
You indeed should instead return a wrapper object and hook into object.__getitem__; it would let you detect slicing and load data as needed. When slicing, Python passes in a slice() object.

Thanks to the guidance of Martijn Pieters and with a bit more reading, I came up with the following code as demonstration. Note that the Reader class uses a netcdf file and the netCDF4 library. If you want to try out this code yourself you will either need a netcdf file with variables "a" and "b", or replace Reader with something else that will return a data array or a slice from a data array.
This solution defines three classes: Reader does the actual file I/O handling, Values manages the data access part and invokes a Reader instance if no data have been stored in memory, and var is the final "variable" which in real life will contain a lot more metadata. The code contains a couple of extra print statements for educational purposes.
"""Implementation of a dynamic variable class which can read data from file when needed or
return the data values from memory if they were read already. This concepts supports
slicing for both memory and file access."""
import numpy as np
import netCDF4 as nc
FILENAME = r"C:\Users\m.schultz\Downloads\data\tmp\MACC_20141224_0001.nc"
VARNAME = "a"
class Reader(object):
"""Implements the actual data access to variable values. Here reading a
slice from a netcdf file.
"""
def __init__(self, filename, varname):
"""Final implementation will also have to take groups into account...
"""
self.filename = filename
self.varname = varname
def read(self, args=slice(None, None, None)):
"""Read a data slice. Args is a tuple of slice objects (e.g.
numpy.index_exp). The default corresponds to [:], i.e. all data
will be read.
"""
with nc.Dataset(self.filename, "r") as f:
values = f.variables[self.varname][args]
return values
class Values(object):
def __init__(self, values=None, reader=None):
"""Initialize Values. You can either pass numerical (or other) values,
preferrably as numpy array, or a reader instance which will read the
values on demand. The reader must have a read(args) method, where
args is a tuple of slices. If no args are given, all data should be
returned.
"""
if values is not None:
self._values = np.array(values)
self.reader = reader
def __getattr__(self, name):
"""This is only be called if attribute name is not present.
Here, the only attribute we care about is _values.
Self.reader should always be defined.
This method is necessary to allow access to variable.values without
a slicing index. If only __getitem__ were defined, one would always
have to write variable.values[:] in order to make sure that something
is returned.
"""
print ">>> in __getattr__, trying to access ", name
if name == "_values":
print ">>> calling reader and reading all values..."
self._values = self.reader.read()
return self._values
def __getitem__(self, args):
print "in __getitem__"
if not "_values" in self.__dict__:
values = self.reader.read(args)
print ">>> read from file. Shape = ", values.shape
if args == slice(None, None, None):
self._values = values # all data read, store in memory
return values
else:
print ">>> read from memory. Shape = ", self._values[args].shape
return self._values[args]
def __repr__(self):
return self._values.__repr__()
def __str__(self):
return self._values.__str__()
class var(object):
def __init__(self, name=VARNAME, filename=FILENAME, values=None):
self.name = name
self.values = Values(values, Reader(filename, name))
if __name__ == "__main__":
# define a variable and access all data first
# this will read the entire array and save it in memory, so that
# subsequent access with or without index returns data from memory
a = var("a", filename=FILENAME)
print "1: a.values = ", a.values
print "2: a.values[-1] = ", a.values[-1]
print "3: a.values = ", a.values
# define a second variable, where we access a data slice first
# In this case the Reader only reads the slice and no data are stored
# in memory. The second access indexes the complete array, so Reader
# will read everything and the data will be stored in memory.
# The last access will then use the data from memory.
b = var("b", filename=FILENAME)
print "4: b.values[0:3] = ", b.values[0:3]
print "5: b.values[:] = ", b.values[:]
print "6: b.values[5:8] = ",b.values[5:8]

Can I create object names from a text file in Python 2.7?

I'm working on a game project.
I've created an object, Star(Object).
I want to assign the name of the variables, dynamically, from a text file.
If I have a text file with:
Sol
Centauri
Vega
I want the program to create the Star(Object) with variable names from the text file. I want the process automated, because I'm looking to create hundreds of stars.
I could write the code out by hand:
Sol = Star(Sol)
Centauri = Star(Centauri)
Vega = Star(Vega)
But isn't there a way to automate this?
Essentially, what I eventually want is a tuple with the list of stars, as their own objects. Then, when I am doing game maintenance, I can just iterate over all the objects in the tuple.

The name of a star should not be the name of the variable. Variable names should reflect the context in which the variable is used, e.g. destinationStar or homeStar.
A star's name should be a property of the Star object, accessed via Star.name:
class Star(object):
"""Keeps track of a star."""
def __init__(self, starName):
self.name = starName
# other methods...
def read_stars(filename):
# oversimplified:
stars = {}
starfile = open(filename, "r")
for line in starfile:
words = line.split()
if len(words) == 2 and words[0] == 'star':
name = words[1]
stars[name] = Star(name)
return stars
By storing in a dictionary, you can search for a particular Star with stars[name] or iterate over all the stars with for s in stars.values(), for example.

I want to assign the name of the variables, dynamically This is a very good indication that your design is completely wrong.
It's hard to know exactly what your design is, but I'm going to guess you want to use a dictionary instead.

class BadStar(Exception): pass
class Star(object):
def __init__(self, name, mass, mag, color, x, y, z):
self.name = name
self.mass = float(mass)
self.mag = float(mag)
self.color = color
self.pos = (float(x),float(y),float(z))
#classmethod
def fromstr(cls, s):
"Alternate constructor from string"
stardata = [i.strip() for i in s.split(',')]
if len(stardata)==7:
return cls(*stardata)
else:
raise BadStar("wrong number of arguments in string constructor")
def __str__(self):
x,y,z = self.pos
return "{0} is at ({1}, {2}, {3})".format(self.name, x, y, z)
class StarIndex(dict):
def load(self, fname):
"Load stars from text file"
with open(fname, "r") as f:
for line in f:
line = line.split('#')[0] # discard comments
line = line.strip() # kill excess whitespace
if len(line): # anything left?
try:
star = Star.fromstr(line)
self[star.name] = star
except BadStar:
pass # discard lines that don't parse
return self
and some sample data:
# Name, Mass, Absolute Magnitude, Color, x, y, z
#
# Mass is kg
# Color is rgb hex
# x, y, z are lightyears from earth, with +x to galactic center and +z to galactic north
Sol, 2.0e30, 4.67, 0xff88ee, 0.0, 0.0, 0.0
Alpha Centauri A, 2.2e30, 4.35, 0xfff5f1, -1.676, -1.360, -3.835
then you can load your file like:
s = StarIndex().load("stars.txt")
and
print s["Sol"]
results in
Sol is at (0.0, 0.0, 0.0)

Your question isn't clear. It's clouded by the fact that you're using syntax 'Star(Centauri)', which, in Python, means that you want to create a class called 'Star' that inherits from Centauri. I think what you want is probably a factory object that creates different stars, but then you don't say anything about how the stars might differ. Presumably, the difference is location, but you don't say how that's being handled either.
Best bet, based on guesses, might be to put your star configurations in a YAML file and use pyYAML to load it, which returns a Python data structure ready to go for you.

def makeStar(starName):
globals()[starName] = Star(globals()[starName])
makeStar("Sol")
is the same as
Sol = Star(Sol)
except "Sol" can be replaced with any string (eg the values read in from that file).
Also, you may want to rethink making these global variables - it prevents you from being able to iterate through all the stars, should you need to, and could possibly cause naming conflicts. If you want these to be in a dictionary, then just replace "globals()" with the name of your dictionary.

You probably should use a dictionary for that. It is possible to create dinamic variable names, but it would make no sense, since to access then you would need an indirect reference anyway.
stars = {}
with open("stars.txt") as stars_file:
for star_name in stars_file:
star_name = star_name.strip()
stars[star_name] = Star(star_name)

You can use the types module to create class objects at the run time:
import types
def make_class(name):
cls = types.ClassType(name, (), {})
return cls
cls = make_class("Star")
obj = cls()
In the above example, cls becomes your class Star

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python methods from csv - python

Related

Applying Class to Items in a List

pythonic way: Iterator chaining and resource management

How to return to the beginning of a DictReader?

Can I "detect" a slicing expression in a python class method?

Can I create object names from a text file in Python 2.7?

Categories

Resources