I'm a novice programmer, so please pardon me for not using Python-specific vocabulary.
Suppose I define a class CarSpecs with attributes CarReg, Make, Model and Color, create several instances of this class (call them records) and one by one append them to a text file named SuperCars. What I want my program to do is to read the whole file and return the number of cars which are Red (i.e. by looking up the attribute Color of each instance).
Here's what I've done so far:
Defined a class:
class Carspecs(object):
def __init__(self, carreg, make, model, color):
self.CarReg = carreg
self.Make = make
self.Model = model
self.Color = color
Then I created several instances and defined a function to add the instances(or you can say ''records'') to SuperCars:
def addCar(CarRecord):
import pickle
CarFile = open('Supercars', 'ab')
pickle.dump(CarRecord, CarFile)
CarFile.close()
What do I do next to output the number of Red cars?
You'll have to open that file again, read all of the records and then see which cars Color attribute equals red. Because you're saving each instance in the pickle you will have to do something like the following:
>>> with open('Supercars', 'rb') as f:
... data = []
... while True:
... try:
... data.append(pickle.load(f))
... except EOFError:
... break
...
>>>
>>> print(x for x in data if x.Color == 'red')
I suggest you store the data in a list and pickle that list, this way you don't have to use that hacky loop the get all items. Storing such a list is easy. Assume you've created a list of CarSpec-objects and stored them in a list records:
>>> with open('Supercars', 'wb') as f:
... pickle.dump(records, f)
...
>>>
and then reading it is as simple as:
>>> with open('Supercars', 'rb') as f:
... data = pickle.load(f)
...
>>>
And you can even filter it easily:
>>> with open('Supercars', 'rb') as f:
... data = [x for x in pickle.load(f) if x.Color == 'Red']
...
>>>
If you want to display before you're storing them in the pickle you can just iterate over the records-list and print cars with a red color.
Related
I defined a simple function and pickled it
However when I deserialised it in another file
I couldn’t load it back
I got an error
Here is an example:
import pickle
def fnc(c=0):
a = 1
b = 2
return a,b,c
f = open('example', 'ab')
pickle.dump(fnc, f)
f.close()
f = open('example', 'rb')
fnc = pickle.load(f)
print(fnc)
print(fnc())
print(fnc(1))
<function fnc at 0x7f06345d7598>
(1, 2, 0)
(1, 2, 1)
You can also do it using shelve module. I believe it still uses pickle to store data, but very convenient feature of it is that you can store data in a form of key-value pairs. For example, if you store a ML model, you can store training data and/or feature column names along with the model itself which makes it more convenient.
import shelve
def func(a, b):
return a+b
# Now store function
with shelve.open('foo.shlv', 'w') as shlv:
shlv['function'] = func
# Load function
with shelve.open('foo.shlv', 'r') as shlv:
x = shlv['function']
print(x(2, 3))
I am working on an assignment where I create "instances" of cities using rows in a .csv, then use these instances in methods to calculate distance and population change. Creating the instances works fine (using steps 1-4 below), until I try to call printDistance:
##Step 1. Open and read CityPop.csv
with open('CityPop.csv', 'r', newline='') as f:
try:
reader = csv.DictReader(f)
##Step 2. Create "City" class
class City:
##Step 3. Use _init method to assign attribute values
def __init__(self, row, header):
self.__dict__ = dict(zip(header, row))
##Step 4. Create "Cities" list
data = list(csv.reader(open('CityPop.csv')))
instances = [City(i, data[0]) for i in data[1:]]
##Step 5. Create printDistance method within "Cities" class
def printDistance(self, othercity, instances):
dist=math.acos((math.sin(math.radians(self.lat)))*(math.sin(math.radians(othercity.lat)))+(math.cos(math.radians(self.lat)))*(math.cos(math.radians(othercity.lat)))*(math.cos(math.radians(self.lon-othercity.lon)))) * 6300 (self.lat, self.lon, othercity.lat, othercity.lon)
When I enter instances[0].printDistance(instances1) in the shell, I get the error:
`NameError: name 'instances' is not defined`
Is this an indentation problem? Should I be calling the function from within the code, not the shell?
Nested functions must not contain self as parameter because they are not member functions. Class cannot pass instance variables to them. You are infact passing the same self from parent to child function.
Also you must not nest constructor, this is only for initiation purpose. Create a separate method indeed.
And try creating instance variable inside the constructor, and that is what init for !
self.instances = [self.getInstance(i, data[0]) for i in data[1:]]
Also create seperate function for instantiation
#classmethod
def getInstance(cls,d1,d2):
return cls(d1,d2)
This is not so much an indentation problem, but more of a general code structure problem. You're nesting a lot:
All the actual work on an incredibly long line (with errors)
Inside of function (correctly) printDistance
Inside of a constructor __init__
Inside of a class definition (correctly) City
Inside of a try block
Inside of a with block
I think this is what you are trying to do:
create a class City, which can print the distance of itself to other cities
generate a list of these City objects from a .csv that somehow has both distances and population (you should probably provide an example of data)
do so in a fault-tolerant and clean way (hence the try and the with)
The reason your instances isn't working is because, unlike you think, it's probably not being created correctly, or at least not in the correct context. And it certainly won't be available to you on the CLI due to all of the nesting.
There's a number of blatant bugs in your code:
What's the (self.lat, self.lon, othercity.lat, othercity.lon) at the end of the last line?
Why are you opening the file for reading twice? You're not even using the first reader
You are bluntly assigning column headers from a .csv as object attributes, but are misspelling their use (lat instead of latitude and lon instead of longitude)
It looks a bit like a lot of code found in various places got pasted together into one clump - this is what it looks like when cleaned up:
import csv
import math
class City:
def print_distance(self, other_city):
print(f'{self.city} to {other_city.city}')
# what a mess...
print(math.acos(
(math.sin(math.radians(float(self.latitude)))) * (math.sin(math.radians(float(other_city.latitude)))) + (
math.cos(math.radians(float(self.latitude)))) * (math.cos(math.radians(float(other_city.latitude)))) * (
math.cos(math.radians(float(self.longitude) - float(other_city.longitude))))) * 6300)
def __init__(self, values, attribute_names):
# this is *nasty* - much better to add the attributes explicitly, but left as original
# also, note that you're reading strings and floats here, but they are all stored as str
self.__dict__ = dict(zip(attribute_names, values))
with open('CityPop.csv', 'r', newline='') as f:
try:
reader = csv.reader(f)
header = next(reader)
cities = [City(row, header) for row in reader]
for city_1 in cities:
for city_2 in cities:
city_1.print_distance(city_2)
except Exception as e:
print(f'Apparently were doing something with this error: {e}')
Note how print_distance is now a method of City, which is called on each instance of City in cities (which is what I renamed instances to).
Now, if you are really trying, this makes more sense:
import csv
import math
class City:
def print_distance(self, other_city):
print(f'{self.name} to {other_city.name}')
# not a lot better, but some at least
print(
math.acos(
math.sin(math.radians(self.lat)) *
math.sin(math.radians(other_city.lat))
+
math.cos(math.radians(self.lat)) *
math.cos(math.radians(other_city.lat)) *
math.cos(math.radians(self.lon - other_city.lon))
) * 6300
)
def __init__(self, lat, lon, name):
self.lat = float(lat)
self.lon = float(lon)
self.name = str(name)
try:
with open('CityPop.csv', 'r', newline='') as f:
reader = csv.reader(f)
header = next(reader)
cities = [City(lat=row[1], lon=row[2], name=row[4]) for row in reader]
for city_1 in cities:
for city_2 in cities:
city_1.print_distance(city_2)
except FileNotFoundError:
print(f'Could not find the input file.')
Note the cleaned up computation, the catching of an error that could be expected to occur (with the with insides the try block) and a proper constructor that assigns what it needs with the correct type, while the reader decides which fields go where.
Finally, as a bonus: nobody should be writing distance calculations like this. Plenty libraries exist that do a much better job of this, like GeoPy. All you need to do is pip install geopy to get it and then you can use this:
import csv
import geopy.distance
class City:
def calc_distance(self, other_city):
return geopy.distance.geodesic(
(self.lat, self.lon),
(other_city.lat, other_city.lon)
).km
def __init__(self, lat, lon, name):
self.lat = float(lat)
self.lon = float(lon)
self.name = str(name)
try:
with open('CityPop.csv', 'r', newline='') as f:
reader = csv.reader(f)
header = next(reader)
cities = [City(lat=row[1], lon=row[2], name=row[4]) for row in reader]
for city_1 in cities:
for city_2 in cities:
print(city_1.calc_distance(city_2))
except FileNotFoundError:
print(f'Could not find the input file.')
Note that I moved the print out of the method as well, since it makes more sense to calculate in the object and print outside it. The nice thing about all this is that the calculation now uses a proper geodesic (WGS-84) to do the calculation and the odds of math errors are drastically reduced. If you must use a simple sphere, the library has functions for that as well.
Say I have an object called obj that is initialized and has 50 member variables. I want to write this to a row in a csv file without having to type out every member variable of the object. The long winded way to do this I want to avoid would be:
writerPosts.writerow([obj.var1, obj.var2, obj.var3 .........................])
I want to do something instead that looks like this and achieves the same result:
writerPosts.writerow(obj)
How can this be done?
One way of getting all class fields/variables is as following:
members = [attr for attr in dir(obj) if not callable(getattr(obj, attr)) and not attr.startswith("__")]
Here members is list of all the possible variables. The above list comprehension filters out any callable attributes (means any possible functions of the class) and any built-in functions.
Then it is simple to write them to your csv file.
with open("output.csv",'wb') as resultFile:
wr = csv.writer(resultFile, dialect='excel')
wr.writerows(members)
This will write those variables (not values) to the csv file.
EDIT:
If you want to write values of the variables, then you can do the following:
values = [getattr(obj, member) for member in members]
The values list will have values for your each class fields.
Then you can write this list to csv file, as above.
I'm going to assume you have a number of these objects and write a little example. vars() can be used to get the member variables of a class instance:
import csv
class Class:
def __init__(self,a,b,c,d,e):
self.var1 = a
self.var2 = b
self.var3 = c
self.var4 = d
self.var5 = e
# Create a few objects with different member values...
L = [Class(1,2,3,4,5),Class(2,3,4,5,6),Class(3,4,5,6,7)]
with open('out.csv','w',newline='') as f:
# fieldnames lists the headers for the csv.
w = csv.DictWriter(f,fieldnames=sorted(vars(L[0])))
w.writeheader()
for obj in L:
# Build a dictionary of the member names and values...
w.writerow({k:getattr(obj,k) for k in vars(x)})
Output:
var1,var2,var3,var4,var5
1,2,3,4,5
2,3,4,5,6
3,4,5,6,7
Here is my code, I use it to open an excel sheet and then return each row as a list of strings (where each cell is a string). The class returns one list that is filled with as many lists as there are rows in the file. So 50 rows will return 50 lists.
from xlrd import open_workbook
class ExcelReadLines(object):
def __init__(self,path_to_file):
'''Accepts the Excel File'''
self.path_to_file = path_to_file
self.__work__()
def __work__(self):
self.full_file_as_read_lines = []
self.book = open_workbook(self.path_to_file)
self.sheet = self.book.sheet_by_index(0)
for row_index in range(self.sheet.nrows):
single_read_lines = []
for col_index in range(self.sheet.ncols):
cell_value_as_string = str(self.sheet.cell(row_index,col_index).value)
cell_value_stripped = cell_value_as_string.strip('u')
single_read_lines.append(cell_value_stripped)
self.full_file_as_read_lines.append(single_read_lines)
return self.full_file_as_read_lines
But when I run:
for x in ExcelReader('excel_sheet'): print x
I get the error message:
class is not iterable
In order for a class to be iterable, it needs to have an __iter__ method.
Consider:
class Foo(object):
def __init__(self,lst):
self.lst = lst
def __iter__(self):
return iter(self.lst)
example:
>>> class Foo(object):
... def __init__(self,lst):
... self.lst = lst
... def __iter__(self):
... return iter(self.lst)
...
>>> Foo([1,2,3])
<__main__.Foo object at 0xe9890>
>>> for x in Foo([1,2,3]): print x
...
1
2
3
Your example seems like it would be a good bit better as a generator -- I don't really understand what the need is for a class here:
def excel_reader(path_to_file):
book = open_workbook(path_to_file)
sheet = book.sheet_by_index(0)
for row_index in range(sheet.nrows):
single_read_lines = []
for col_index in range(sheet.ncols):
cell_value_as_string = str(self.sheet.cell(row_index,col_index).value)
cell_value_stripped = cell_value_as_string.strip('u')
single_read_lines.append(cell_value_stripped)
yield single_read_lines
You should look into implementing Python's special iterator methods.
Also, note that you shouldn't name a method __work__ since it uses magic method syntax but isn't actually a real magic method.
You have a few problems here.
Your code doesn't return anything. You call __work__ but don't return the value.
Even if it did, that wouldn't help, because returning something from __init__ doesn't make the object be that thing.
You don't want your object to be a list anyway, you just want to iterate over it.
See this question for a simple example of how to write an iterator in Python.
In addition, you shouldn't use double-underscore-sandwich names like __work__ in your code. That sort of name is by convention reserved for Python-internal use.
Unless I'm mistaken, what you're really after is
def first_sheet(fname):
wb = xlrd.open_workbook(fname)
ws = wb.sheet_by_index(0)
for i in xrange(ws.nrows):
yield ws.row_values(i) # maybe strip 'u''s - but that looks a bit sus... (probably something to do with your `str`)
list_of_rows = list(first_sheet('somefile.xls'))
Then do any transposition using zip if needs be...
What is the easiest way to save and load data in python, preferably in a human-readable output format?
The data I am saving/loading consists of two vectors of floats. Ideally, these vectors would be named in the file (e.g. X and Y).
My current save() and load() functions use file.readline(), file.write() and string-to-float conversion. There must be something better.
The most simple way to get a human-readable output is by using a serialisation format such a JSON. Python contains a json library you can use to serialise data to and from a string. Like pickle, you can use this with an IO object to write it to a file.
import json
file = open('/usr/data/application/json-dump.json', 'w+')
data = { "x": 12153535.232321, "y": 35234531.232322 }
json.dump(data, file)
If you want to get a simple string back instead of dumping it to a file, you can use json.dumps() instead:
import json
print json.dumps({ "x": 12153535.232321, "y": 35234531.232322 })
Reading back from a file is just as easy:
import json
file = open('/usr/data/application/json-dump.json', 'r')
print json.load(file)
The json library is full-featured, so I'd recommend checking out the documentation to see what sorts of things you can do with it.
There are several options -- I don't exactly know what you like. If the two vectors have the same length, you could use numpy.savetxt() to save your vectors, say x and y, as columns:
# saving:
f = open("data", "w")
f.write("# x y\n") # column names
numpy.savetxt(f, numpy.array([x, y]).T)
# loading:
x, y = numpy.loadtxt("data", unpack=True)
If you are dealing with larger vectors of floats, you should probably use NumPy anyway.
If it should be human-readable, I'd
also go with JSON. Unless you need to
exchange it with enterprise-type
people, they like XML better. :-)
If it should be human editable and
isn't too complex, I'd probably go
with some sort of INI-like format,
like for example configparser.
If it is complex, and doesn't need to
be exchanged, I'd go with just
pickling the data, unless it's very
complex, in which case I'd use ZODB.
If it's a LOT of data, and needs to
be exchanged, I'd use SQL.
That pretty much covers it, I think.
A simple serialization format that is easy for both humans to computers read is JSON.
You can use the json Python module.
Here is an example of Encoder until you probably want to write for Body class:
# add this to your code
class BodyEncoder(json.JSONEncoder):
def default(self, obj):
if isinstance(obj, np.ndarray):
return obj.tolist()
if hasattr(obj, '__jsonencode__'):
return obj.__jsonencode__()
if isinstance(obj, set):
return list(obj)
return obj.__dict__
# Here you construct your way to dump your data for each instance
# you need to customize this function
def deserialize(data):
bodies = [Body(d["name"],d["mass"],np.array(d["p"]),np.array(d["v"])) for d in data["bodies"]]
axis_range = data["axis_range"]
timescale = data["timescale"]
return bodies, axis_range, timescale
# Here you construct your way to load your data for each instance
# you need to customize this function
def serialize(data):
file = open(FILE_NAME, 'w+')
json.dump(data, file, cls=BodyEncoder, indent=4)
print("Dumping Parameters of the Latest Run")
print(json.dumps(data, cls=BodyEncoder, indent=4))
Here is an example of the class I want to serialize:
class Body(object):
# you do not need to change your class structure
def __init__(self, name, mass, p, v=(0.0, 0.0, 0.0)):
# init variables like normal
self.name = name
self.mass = mass
self.p = p
self.v = v
self.f = np.array([0.0, 0.0, 0.0])
def attraction(self, other):
# not important functions that I wrote...
Here is how to serialize:
# you need to customize this function
def serialize_everything():
bodies, axis_range, timescale = generate_data_to_serialize()
data = {"bodies": bodies, "axis_range": axis_range, "timescale": timescale}
BodyEncoder.serialize(data)
Here is how to dump:
def dump_everything():
data = json.loads(open(FILE_NAME, "r").read())
return BodyEncoder.deserialize(data)
Since we're talking about a human editing the file, I assume we're talking about relatively little data.
How about the following skeleton implementation. It simply saves the data as key=value pairs and works with lists, tuples and many other things.
def save(fname, **kwargs):
f = open(fname, "wt")
for k, v in kwargs.items():
print >>f, "%s=%s" % (k, repr(v))
f.close()
def load(fname):
ret = {}
for line in open(fname, "rt"):
k, v = line.strip().split("=", 1)
ret[k] = eval(v)
return ret
x = [1, 2, 3]
y = [2.0, 1e15, -10.3]
save("data.txt", x=x, y=y)
d = load("data.txt")
print d["x"]
print d["y"]
As I commented in the accepted answer, using numpy this can be done with a simple one-liner:
Assuming you have numpy imported as np (which is common practice),
np.savetxt('xy.txt', np.array([x, y]).T, fmt="%.3f", header="x y")
will save the data in the (optional) format and
x, y = np.loadtxt('xy.txt', unpack=True)
will load it.
The file xy.txt will then look like:
# x y
1.000 1.000
1.500 2.250
2.000 4.000
2.500 6.250
3.000 9.000
Note that the format string fmt=... is optional, but if the goal is human-readability it may prove quite useful. If used, it is specified using the usual printf-like codes (In my example: floating-point number with 3 decimals).