I am new to Python and currently searching for some internship or a job. I am currently working on a program in Python which reads a file that contains data in this shape:
Id;name;surname;age;gender;friends;
Id and age are the positive integers,
gender can be "male" or "female",
and friends is an array of numbers, separated by comma, which represent the Id's of persons who are friends with the current person. If Person1 is a friend to a Person2, it must work vice versa.
As you can see in the above example, attributes of a "Person" are separated by semicolon, and the trick is that not every person has every attribute, and of course, they differ by the number of friends. So, the first part of the task is to make a program which reads a file and creates a structure which represents a list of persons with the attributs mentioned above. I have to make a search for those persons by Id.
The second part is to make a function with two arguments (Id1, Id2) which returns True if a person with Id2 is a friend to a person with Id1. Otherwise, it returns false.
I have some ideas on my mind, but I am not sure how to realize this, since I don't know enough about Python yet. I guess the best structure for this would be a dictionary, but I am not sure how to load a file into it, since the attributes of all persons are different. I would be greatful for any help you can offer me.
Here is my attempt to write the code:
people = open(r"data.txt")
class People:
id = None
name = ''
surname = ''
age = None
gender = ['male', 'female']
friends = []
#def people(self):
# person = {'id': None,
# 'name': '',
# 'surname': '',
# 'age': None,
# 'gender': ['male', 'female'],
# 'friends': []
# }
# return person
def community(self):
comm = [People()]
return comm
def is_friend(id1, id2):
if (id1 in People.friends) & (id2 in People.friends):
return True
people.close()
Your question is too broad imho, but I'll give you a few hints:
the simplest datastructure for O(n) key access is indeed a dict. Note that a dict needs immutable values as keys (but that's fine since your Ids are integers), but can take anything as values. but that only works for (relatively) small datasets since it's all in memory. If you need bigger datasets and/or persistance, you want a database (key:value, relational, document, the choice is up to you).
Python has classes and computed attributes
In Python, the absence of a value is the None object
there's a csv files parser in the standard lib.
Now you just have to read the doc and start coding.
[edit] wrt/ your code snippet
class People:
id = None
name = ''
surname = ''
age = None
gender = ['male', 'female']
friends = []
Python is not Java or PHP. What you defined above are class attributes (shared by all instances of the class), you want instance attributes (defined in the __init() method). You should really read the FineManual.
Also if you're using Python 2.7.x, you want your classes to inherit from object (historical reasons).
So your Person class should look something like this:
class Person(object):
def __init__(self, id, name, surname, age, gender, friends=None):
self.id = id
self.name = name
self.surname = surname
self.age = age
self.gender = gender
self.friends = friends or []
And then to create a Person instance:
person = Person(42, "John Cleese", "Archie Leach", 77, "male", [11, 1337)])
def is_friend(id1, id2):
if (id1 in People.friends) & (id2 in People.friends):
return True
A few points points here:
First: you either want to rename this function are_friends or make it a method of the Person class and then only pass a (single) Person instance (not an 'id') as argument.
Second: in Python, & is the bitwise operator. The logical "and" operator is spelled, well, and.
Third: an expression has a truth value by itself, so your if statement is redundant. Whenever you see something like:
def func():
if <some expression>:
return True
else:
return False
you can just rewrite it as :
def func():
return <some expression>
Or if you want to ensure func returns a proper boolean (True or False):
def func():
return bool(<some expression>)
I'll stop here because I don't intend to teach you how to program. You obviously need to do at least the full official Python tutorial, and possibly some complete beginner tutorial too.
Related
I am attempting to find the number of unique customers for each worker from a .json file. transactions["transactions"][a]["worker] will return either Ben or David, these are the only workers and have previously been defined as objects within a class called Workers. In the for loop, I want the worker's name to be assigned to the variable wrkr, and the customer's name assigned the variable cust. I then want to check if the customer is already in that worker's list of customers, if it isn't, then I will append the name of the customer to the list. If they are already in the list I want the loop to iterate to the next transaction.
Ben.customers gives the list of customers (initially none) but if I set the variable wrkr = Ben and then do wrkr.customers it doesn't it gives me the error "AttributeError: 'unicode' object has no attribute 'customers'". I can see why as it just sees wrkr as a name and looks for it within the class. But I don't know what I should do instead?
import json
with open("transactions.json", "r") as f:
transactions = json.load(f)
class Worker:
def __init__(self, name, customers):
self.name = name
self.customers = customers
David = Worker("David", [])
Ben = Worker("Ben", [])
# Find the number of unique customers for each worker
for a in range(len(transactions["transactions"])):
cust = transactions["transactions"][a]["customer"]
wrkr = transactions["transactions"][a]["worker"]
if cust in wrkr.customers:
continue
else:
wrkr.customers.append(cust)
Gives me the error "AttributeError: 'unicode' object has no attribute 'customers'"
I want to find a workers name within the for loop and then load that worker's customer list.
I'm really sorry if my question doesn't make much sense or I'm using the wrong terminology. I'm self taught and don't really know what I'm doing.
You should create a dictionary with the keys being the worker names being expected in the json and the value being the Worker objects
class Worker:
def __init__(self, name, customers=None):
self.name = name
# If you want an empty list as a default parameter you can follow this pattern
self.customers = customers or []
workers = {
'David': Worker("David"),
'Ben': Worker("Ben")
}
for transaction_details in transactions["transactions"].values():
cust = transaction_details["customer"]
# Here you can get the Worker object from the dictionary using the worker name
wrkr = workers.get(transaction_details["worker"])
# You should handle the case where the worker is not expected
if cust in wrkr.customers:
continue
else:
wrkr.customers.append(cust)
Looking at your code, I am trying to guess what your data looks like, which is not easy, so if my solution does not work, please post what your data looks like. Here is what I have in mind:
import itertools
import json
from pprint import pprint
class Worker:
def __init__(self, name, customers):
self.name = name
self.customers = customers
def __repr__(self):
return 'Worker(name={}, customers={})'.format(self.name,
self.customers)
with open('transactions.json') as file_handle:
data = json.load(file_handle)
#workers is a dictionary where key=worker name, value=Worker object
workers = {}
for transaction in data['transactions']:
worker_name = transaction['worker']
customer = transaction['customer']
#Create a new worker object if needed
workers.setdefault(worker_name, Worker(worker_name, set()))
#Build the customers list
workers[worker_name].customers.add(customer)
pprint(workers)
Output:
{'Ben': Worker(name=Ben, customers={'Lisa', 'Janet', 'Alex'}),
'David': Worker(name=David, customers={'Jason', 'Anna'})}
Notes
Instead of having two variables Ben and David, I created a dictionary named workers for easy look up. The keys are the name of the workers and the values the Worker objects.
From your code, you have a test to make sure not to add the same name to the customers list. This tells me that you want a set, not a list. Using a set with simplify your logic because you don't have to deal with if/else statement.
The workers.setdefault() call deserve some explanation if you are not familiar with it. Here is what the documentation said:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
What this means is let say that the key 'Ben' is not in the dictionary, the setdefault method will add a new key/value to the dictionary. If the key is already in the dictionary, the setdefault does not do anything, but return the current value. Thus the line:
workers.setdefault(worker_name, Worker(worker_name, set()))
is equivalent to:
if worker_name not in workers:
workers[worker_name] = Worker(worker_name, set())
I have an array of object of class Person like the below, with thisRate first set to None:
class Person(object):
def __init__(self, id, name):
self.id = id
self.name = name
self.thisRate= None
I loaded around 21K Person objects into an array, name not sorted.
Then I loaded another array from data in a file which has data for thisRate, about 13K of them, name is not sorted as well:
person_data = []
# read from file
row['name'] = 'Peter'
row['thisRate'] = '0.12334'
person_data.append(row)
Now with these 2 sets of arrays, when the name is matched between them, I will assign thisRate from person_data into Person.thisRate.
What I am doing is a loop is like this:
for person in persons:
data = None
try:
data = next(personData for personData in person_data
if personData['name'] == person.name)
except StopIteration:
print("No rate for this person: {}".format(person.name))
if data:
person.thisRate = float( data['thisRate'] )
This loop
data = next(personData for personData in person_data
if personData['name'] == person.name)
is running fine and uses 21 seconds on my machine with Python 2.7.13.
My question is, is there a faster or better way to achieve the same thing with the 2 arrays I have?
Yes. Make an dictionary from name to thisRate:
nd = {}
with open(<whatever>) as f:
reader = csv.DictReader(<whatever>):
for row in reader:
nd[row['name']] = row['thisRate']
Now, use this dictionary to do a single pass over your Person list:
for person in persons:
thisRate = nd.get(person.name, None)
person.thisRate = thisRate
if thisRate is None:
print("No rate for this person: {}".format(person.name))
Dictionaries have a .get method which allows you to provide a default value in case the key is not in the dict. I used None (which is actually what is the default default value) but you can use whatever you want.
This is a linear-time solution. Your solution was quadratic time, because you are essentially doing:
for person in persons:
for data in person_data:
if data['name'] == person.name:
person.thisRate = data['thisRate']
break
else:
print("No rate for this person: {}".format(person.name))
Just in a fashion that obscures this fundamentally nested for-loop inside of a generator expression (not really a good use-case for a generator expression, you should have just used a for-loop to begin with, then you don't have to deal with try-catch a StopIteration
I'm new in python and I'm trying to dynamically create new instances in a class. So let me give you an example, if I have a class like this:
class Person(object):
def __init__(self, name, age, job):
self.name = name
self.age = age
self.job = job
As far as I know, for each new instance I have to insert, I would have to declare a variable and attach it to the person object, something like this:
variable = Person(name, age, job)
Is there a way in which I can dynamically do this? Lets suppose that I have a dictionary like this:
persons_database = {
'id' : ['name', age, 'job'], .....
}
Can I create a piece of code that can iterate over this db and automatically create new instances in the Person class?
Just iterate over the dictionary using a for loop.
people = []
for id in persons_database:
info = persons_database[id]
people.append(Person(info[0], info[1], info[2]))
Then the List people will have Person objects with the data from your persons_database dictionary
If you need to get the Person object from the original id you can use a dictionary to store the Person objects and can quickly find the correct Person.
people = {}
for id, data in persons_database.items():
people[id] = Person(data[0], data[1], data[2])
Then you can get the person you want from his/her id by doing people[id]. So to increment a person with id = 1's age you would do people[1].increment_age()
------ Slightly more advanced material below ----------------
Some people have mentioned using list/dictionary comprehensions to achieve what you want. Comprehensions would be slightly more efficient and more pythonic, but a little more difficult to understand if you are new to programming/python
As a dictionary comprehension the second piece of code would be people = {id: Person(*data) for id, data in persons_database.items()}
And just so nothing here goes unexplained... The * before a List in python unpacks the List as separate items in the sequential order of the list, so for a List l of length n, *l would evaluate to l[0], l[1], ... , l[n-2], l[n-1]
Sure, a simple list comprehension should do the trick:
people = [Person(*persons_database[pid]) for pid in persons_database]
This just loops through each key (id) in the person database and creates a person instance by passing through the list of attributes for that id directly as args to the Person() constructor.
I have a number of chemicals with corresponding data held within a database, how do I go about returning a specific chemical, and its data, via its formula, eg o2.
class SourceNotDefinedException(Exception):
def __init__(self, message):
super(SourceNotDefinedException, self).__init__(message)
class tvorechoObject(object):
"""The class stores a pair of objects, "tv" objects, and "echo" objects. They are accessed
simply by doing .tv, or .echo. If it does not exist, it will fall back to the other variable.
If neither are present, it returns None."""
def __init__(self, echo=None, tv=None):
self.tv = tv
self.echo = echo
def __repr__(self):
return str({"echo": self.echo, "tv": self.tv}) # Returns the respective strings
def __getattribute__(self, item):
"""Altered __getattribute__() function to return the alternative of .echo / .tv if the requested
attribute is None."""
if item in ["echo", "tv"]:
if object.__getattribute__(self,"echo") is None: # Echo data not present
return object.__getattribute__(self,"tv") # Select TV data
elif object.__getattribute__(self,"tv") is None: # TV data not present
return object.__getattribute__(self,"echo") # Select Echo data
else:
return object.__getattribute__(self,item) # Return all data
else:
return object.__getattribute__(self,item) # Return all data
class Chemical(object):
def __init__(self, inputLine, sourceType=None):
self.chemicalName = TVorEchoObject()
self.mass = TVorEchoObject()
self.charge = TVorEchoObject()
self.readIn(inputLine, sourceType=sourceType)
def readIn(self, inputLine, sourceType=None):
if sourceType.lower() == "echo": # Parsed chemical line for Echo format
chemicalName = inputLine.split(":")[0].strip()
mass = inputLine.split(":")[1].split(";")[0].strip()
charge = inputLine.split(";")[1].split("]")[0].strip()
# Store the objects
self.chemicalName.echo = chemicalName
self.mass.echo = mass
self.charge.echo = charge
elif sourceType.lower() == "tv": # Parsed chemical line for TV format
chemicalName = inputLine.split(":")[0].strip()
charge = inputLine.split(":")[1].split(";")[0].strip()
mass = inputLine.split(";")[1].split("&")[0].strip()
# Store the objects
self.chemicalName.tv = chemicalName
self.charge.tv = charge
self.mass.tv = molecularWeight
else:
raise SourceNotDefinedException(sourceType + " is not a valid `sourceType`") # Otherwise print
def toDict(self, priority="echo"):
"""Returns a dictionary of all the variables, in the form {"mass":<>, "charge":<>, ...}.
Design used is to be passed into the Echo and TV style line format statements."""
if priority in ["echo", "tv"]:
# Creating the dictionary by a large, to avoid repeated text
return dict([(attributeName, self.__getattribute__(attributeName).__getattribute__(priority))
for attributeName in ["chemicalName", "mass", "charge"]])
else:
raise SourceNotDefinedException("{0} source type not recognised.".format(priority)) # Otherwise print
from ParseClasses import Chemical
allChemical = []
chemicalFiles = ("/home/temp.txt")
for fileName in chemicalFiles:
with open(fileName) as sourceFile:
for line in sourceFile:
allChemical.append(Chemical(line, sourceType=sourceType))
for chemical in allChemical:
print chemical.chemicalName #Prints all chemicals and their data in list format
for chemical in allChemical(["o2"]):
print chemical.chemicalName
outputs the following error which I have tried to remedy with no luck;
TypeError: 'list' object is not callable
The issue is the two lines
for chemical in allChemical(["o2"]):
print chemical.chemicalName
allChemical is a list, and you can't just do a_list(). It looks like you're trying to find either ['o2'] or just 'o2' in a list. To do that, you can get the index of the item and then get that index from the list.
allChemical[allChemical.index("o2")]
Try this function:
def chemByString(chemName,chemicals,priority="echo"):
for chemical in chemicals:
chemDict = chemical.toDict(priority)
if chemDict["chemicalName"] == chemName
return chemical
return None
This function is using the toDict() method found in the Chemical class. The code you pasted from the Chemical class explains that this method returns a dictionary from the chemical object:
def toDict(self, priority="echo"):
"""Returns a dictionary of all the variables, in the form {"mass":<>, "charge":<>, ...}.
Design used is to be passed into the Echo and TV style line format statements."""
if priority in ["echo", "tv"]:
# Creating the dictionary by a large, to avoid repeated text
return dict([(attributeName, self.__getattribute__(attributeName).__getattribute__(priority))
for attributeName in ["chemicalName", "mass", "charge"]])
else:
raise SourceNotDefinedException("{0} source type not recognised.".format(priority)) # Otherwise print
This dictionary looks like this:
"chemicalName" : <the chemical name>
"mass" : <the mass>
"charge" : <the charge>
What the function I created above does is iterate through all of the chemicals in the list, finds the first one with a name equal to "o2", and returns that chemical. Here's how to use it:
chemByString("o2",allChemicals).chemicalName
If the above does not work, may want to try using the alternative priority ("tv"), though I'm unsure if this will have any effect:
chemByString("o2",allChemicals,"tv").chemicalName
If the chemical isn't found, the function returns None:
chemByString("myPretendChemical",allChemicals).chemicalName
EDIT: See my new answer. Leaving this one here since it might still be helpful info.
In python, a list object is a structure holding other objects with an index for each object it contains. Like this:
Index Object
0 "hello"
1 "world"
2 "spam"
If you want to get to one of those objects, you have to know its index:
objList[0] #returns "hello" string object
If you don't know the index, you can find it using the index method:
objList.index("hello") #returns 0
Then you can get the object out of the list using the found index:
objList[objList.index("hello")]
However this is kind of silly, since you can just do:
"hello"
Which in this case will produce the same result.
Your allChemical object is a list. It looks like the line chemicalFiles = ("/home/temp.txt") is filling your list with some type of object. In order to answer your question, you have to provide more information about the objects which the list contains. I assume that information is in the ParseClasses module you are using.
If you can provide more information about the Chemical object you are importing, that may go a long way to helping solve your problem.
IF the objects contained in your list are subclassed from str, this MAY work:
allChemical[allChemical.index("o2")].chemicalName
"02" is a str object, so index is going to look for a str object (or an object subclassed from str) in your list to find its index. However, if the object isn't a string, it will not find it.
As a learning exercise, try this:
class Chemical(str):
'''A class which is a subclass of string but has additional attributes such as chemicalName'''
def __init__(self,chemicalName):
self.chemicalName = chemicalName
someChemicals = [Chemical('o2'),Chemical('n2'),Chemical('h2')]
for chemical in someChemicals: print(chemical.chemicalName)
#prints all the chemical names
print(someChemicals[0].chemicalName)
#prints "o2"; notice you have to know the index ahead of time
print(someChemicals[someChemicals.index("o2")].chemicalName)
#prints "o2" again; this time index found it for you, but
#you already knew the object ahead of time anyway, sot it's a little silly
This works because index is able to find what you are looking for. If it isn't a string it can't find it, and if you don't know what index 'o2' is at, if you want to get to a specific chemical in your list of chemicals you're going to have to learn more about those objects.
Is there some easy way to access an object in a list, without using an index or iterating through the list?
In brief:
I'm reading in lines from a text file, splitting up the lines, and creating objects from the info. I do not know what information will be in the text file. So for example:
roomsfile.txt
0\bedroom\A bedroom with king size bed.\A door to the east.
1\kitchen\A modern kitchen with steel and chrome.\A door to the west.
2\familyRoom\A huge family room with a tv and couch.\A door to the south.
Some Python Code:
class Rooms:
def __init__(self, roomNum, roomName, roomDesc, roomExits):
self.roomNum = roomNum
self.roomName = roomName
self.roomDesc = roomDesc
self.roomExits = roomExits
def getRoomNum(self):
return self.roomNum
def getRoomName(self):
return self.roomName
def getRoomDesc(self):
return self.roomDesc
def getRoomExits(self):
return self.roomExits
def roomSetup():
roomsfile = "roomsfile.txt"
infile = open(roomsfile, 'r')
rooms = []
for line in infile:
rooms.append(makeRooms(line))
infile.close()
return rooms
def makeRooms(infoStr):
roomNum, roomName, roomDesc, roomExits = infoStr.split("\")
return Rooms(roomNum, roomName, roomDesc, roomExits)
When I want to know what exits the bedroom has, I have to iterate through the list with something like the below (where "noun" is passed along by the user as "bedroom"):
def printRoomExits(rooms, noun):
numRooms = len(rooms)
for n in range(numRooms):
checkRoom = rooms[n].getRoomName()
if checkRoom == noun:
print(rooms[n].getRoomExits())
else:
pass
This works, but it feels like I am missing some easier approach...especially since I have a piece of the puzzle (ie, "bedroom" in this case)...and especially since the rooms list could have thousands of objects in it.
I could create an assignment:
bedroom = makeRooms(0, bedroom, etc, etc)
and then do:
bedroom.getRoomExits()
but again, I won't know what info will be in the text file, and don't know what assignments to make. This StackOverFlow answer argues against "dynamically created variables", and argues in favor of using a dictionary. I tried this approach, but I could not find a way to access the methods (and thus the info) of the named objects I added to the dictionary.
So in sum: am I missing something dumb?
Thanks in advance! And sorry for the book-length post - I wanted to give enough details.
chris
At least one dictionary is the right answer here. The way you want to set it up is at least to index by name:
def roomSetup():
roomsfile = "roomsfile.txt"
infile = open(roomsfile, 'r')
rooms = {}
for line in infile:
newroom = makeRooms(line)
rooms[newroom.roomName] = newroom
infile.close()
return rooms
Then, given a name, you can access the Rooms instance directly:
exits = rooms['bedroom'].roomExits
There is a reason I'm not using your getRoomName and getRoomExits methods - getter and setter methods are unnecessary in Python. You can just track your instance data directly, and if you later need to change the implementation refactor them into properties. It gives you all the flexibility of getters and setters without needing the boilerplate code up front.
Depending on what information is present in your definitions file and what your needs are, you can get fancier - for instance, I would probably want to have my exits information stored in a dictionary mapping a canonical name for each exit (probably starting with 'east', 'west', 'north' and 'south', and expanding to things like 'up', 'down' and 'dennis' as necessary) to a tuple of a longer description and the related Rooms instance.
I would also name the class Room rather than Rooms, but that's a style issue rather than important behavior.
You can use in to check for membership (literally, if something is in a container). This works for lists, strings, and other iterables.
>>> li = ['a','b','c']
>>> 'a' in li
True
>>> 'x' in li
False
After you've read your rooms, you can create a dictionary:
rooms = roomSetup()
exits_of_each_room = {}
for room in rooms:
exits_of_each_room[room.getRoomName()] = room.getRoomExits()
Then you your function is simply:
def printRoomExits(exits_of_each_room, noun):
print exits_of_each_room[noun]