Faster or better way than looping to find data? - python

I have an array of object of class Person like the below, with thisRate first set to None:
class Person(object):
def __init__(self, id, name):
self.id = id
self.name = name
self.thisRate= None
I loaded around 21K Person objects into an array, name not sorted.
Then I loaded another array from data in a file which has data for thisRate, about 13K of them, name is not sorted as well:
person_data = []
# read from file
row['name'] = 'Peter'
row['thisRate'] = '0.12334'
person_data.append(row)
Now with these 2 sets of arrays, when the name is matched between them, I will assign thisRate from person_data into Person.thisRate.
What I am doing is a loop is like this:
for person in persons:
data = None
try:
data = next(personData for personData in person_data
if personData['name'] == person.name)
except StopIteration:
print("No rate for this person: {}".format(person.name))
if data:
person.thisRate = float( data['thisRate'] )
This loop
data = next(personData for personData in person_data
if personData['name'] == person.name)
is running fine and uses 21 seconds on my machine with Python 2.7.13.
My question is, is there a faster or better way to achieve the same thing with the 2 arrays I have?

Yes. Make an dictionary from name to thisRate:
nd = {}
with open(<whatever>) as f:
reader = csv.DictReader(<whatever>):
for row in reader:
nd[row['name']] = row['thisRate']
Now, use this dictionary to do a single pass over your Person list:
for person in persons:
thisRate = nd.get(person.name, None)
person.thisRate = thisRate
if thisRate is None:
print("No rate for this person: {}".format(person.name))
Dictionaries have a .get method which allows you to provide a default value in case the key is not in the dict. I used None (which is actually what is the default default value) but you can use whatever you want.
This is a linear-time solution. Your solution was quadratic time, because you are essentially doing:
for person in persons:
for data in person_data:
if data['name'] == person.name:
person.thisRate = data['thisRate']
break
else:
print("No rate for this person: {}".format(person.name))
Just in a fashion that obscures this fundamentally nested for-loop inside of a generator expression (not really a good use-case for a generator expression, you should have just used a for-loop to begin with, then you don't have to deal with try-catch a StopIteration

Related

Get a dictionary from a class?

I want to:
Take a list of lists
Make a frequency table in a dictionary
Do things with the resulting dictionary
The class works, the code works, the frequency table is correct.
I want to get a class that returns a dictionary, but I actually get a class that returns a class type.
I can see that it has the right content in there, but I just can't get it out.
Can someone show me how to turn the output of the class to a dictionary type?
I am working with HN post data. Columns, a few thousand rows.
freq_pph = {}
freq_cph = {}
freq_uph = {}
# Creates a binned frequency table:
# - key is bin_minutes (size of bin in minutes).
# - value is freq_value which sums/counts the number of things in that column.
class BinFreq:
def __init__(self, dataset, bin_minutes, freq_value, dict_name):
self.dataset = dataset
self.bin_minutes = bin_minutes
self.freq_value = freq_value
self.dict_name = dict_name
def make_table(self):
# Sets bin size
# Counts how of posts in that timedelta
if (self.bin_minutes == 60) and (self.freq_value == "None"):
for post in self.dataset:
hour_dt = post[-1]
hour_str = hour_dt.strftime("%H")
if hour_str in self.dict_name:
self.dict_name[hour_str] += 1
else:
self.dict_name[hour_str] = 1
# Sets bins size
# Sums the values of a given index/column
if (self.bin_minutes == 60) and (self.freq_value != "None"):
for post in self.dataset:
hour_dt = post[-1]
hour_str = hour_dt.strftime("%H")
if hour_str in self.dict_name:
self.dict_name[hour_str] += int(row[self.freq_value])
else:
self.dict_name[hour_str] = int(row[self.freq_value])
Instantiate:
pph = BinFreq(ask_posts, 60, "None", freq_pph)
pph.make_table()
How can pph be turned into a real dictionary?
If you want the make_table function to return a dictionary, then you have to add a return statement at the end of it, for example: return self.dict_name.
If you then want to use it outside of the class, you have to assign it to a variable, so in the second snipped do: my_dict = pph.make_table().
Classes can't return things – functions in classes could. However, the function in your class doesn't; it just modifies self.dict_name (which is a misnomer; it's really just a reference to a dict, not a name (which one might imagine is a string)), which the caller then reads (or should, anyway).
In addition, there seems to be a bug; the second if block (which is never reached anyway) refers to row, an undefined name.
Anyway, your class doesn't need to be a class at all, and is easiest implemented with the built-in collections.Counter() class:
from collections import Counter
def bin_by_hour(dataset, value_key=None):
counter = Counter()
for post in dataset:
hour = post[-1].hour # assuming it's a `datetime` object
if value_key: # count using `post[value_key]`
counter[hour] += post[value_key]
else: # just count
counter[hour] += 1
return dict(counter.items()) # make the Counter a regular dict
freq_pph = bin_by_hour(ask_posts)
freq_cph = bin_by_hour(ask_posts, value_key="num_comments") # or whatever

How to use a variable as object in a class in python

I am attempting to find the number of unique customers for each worker from a .json file. transactions["transactions"][a]["worker] will return either Ben or David, these are the only workers and have previously been defined as objects within a class called Workers. In the for loop, I want the worker's name to be assigned to the variable wrkr, and the customer's name assigned the variable cust. I then want to check if the customer is already in that worker's list of customers, if it isn't, then I will append the name of the customer to the list. If they are already in the list I want the loop to iterate to the next transaction.
Ben.customers gives the list of customers (initially none) but if I set the variable wrkr = Ben and then do wrkr.customers it doesn't it gives me the error "AttributeError: 'unicode' object has no attribute 'customers'". I can see why as it just sees wrkr as a name and looks for it within the class. But I don't know what I should do instead?
import json
with open("transactions.json", "r") as f:
transactions = json.load(f)
class Worker:
def __init__(self, name, customers):
self.name = name
self.customers = customers
David = Worker("David", [])
Ben = Worker("Ben", [])
# Find the number of unique customers for each worker
for a in range(len(transactions["transactions"])):
cust = transactions["transactions"][a]["customer"]
wrkr = transactions["transactions"][a]["worker"]
if cust in wrkr.customers:
continue
else:
wrkr.customers.append(cust)
Gives me the error "AttributeError: 'unicode' object has no attribute 'customers'"
I want to find a workers name within the for loop and then load that worker's customer list.
I'm really sorry if my question doesn't make much sense or I'm using the wrong terminology. I'm self taught and don't really know what I'm doing.
You should create a dictionary with the keys being the worker names being expected in the json and the value being the Worker objects
class Worker:
def __init__(self, name, customers=None):
self.name = name
# If you want an empty list as a default parameter you can follow this pattern
self.customers = customers or []
workers = {
'David': Worker("David"),
'Ben': Worker("Ben")
}
for transaction_details in transactions["transactions"].values():
cust = transaction_details["customer"]
# Here you can get the Worker object from the dictionary using the worker name
wrkr = workers.get(transaction_details["worker"])
# You should handle the case where the worker is not expected
if cust in wrkr.customers:
continue
else:
wrkr.customers.append(cust)
Looking at your code, I am trying to guess what your data looks like, which is not easy, so if my solution does not work, please post what your data looks like. Here is what I have in mind:
import itertools
import json
from pprint import pprint
class Worker:
def __init__(self, name, customers):
self.name = name
self.customers = customers
def __repr__(self):
return 'Worker(name={}, customers={})'.format(self.name,
self.customers)
with open('transactions.json') as file_handle:
data = json.load(file_handle)
#workers is a dictionary where key=worker name, value=Worker object
workers = {}
for transaction in data['transactions']:
worker_name = transaction['worker']
customer = transaction['customer']
#Create a new worker object if needed
workers.setdefault(worker_name, Worker(worker_name, set()))
#Build the customers list
workers[worker_name].customers.add(customer)
pprint(workers)
Output:
{'Ben': Worker(name=Ben, customers={'Lisa', 'Janet', 'Alex'}),
'David': Worker(name=David, customers={'Jason', 'Anna'})}
Notes
Instead of having two variables Ben and David, I created a dictionary named workers for easy look up. The keys are the name of the workers and the values the Worker objects.
From your code, you have a test to make sure not to add the same name to the customers list. This tells me that you want a set, not a list. Using a set with simplify your logic because you don't have to deal with if/else statement.
The workers.setdefault() call deserve some explanation if you are not familiar with it. Here is what the documentation said:
setdefault(key[, default])
If key is in the dictionary, return its value. If not, insert key with a value of default and return default. default defaults to None.
What this means is let say that the key 'Ben' is not in the dictionary, the setdefault method will add a new key/value to the dictionary. If the key is already in the dictionary, the setdefault does not do anything, but return the current value. Thus the line:
workers.setdefault(worker_name, Worker(worker_name, set()))
is equivalent to:
if worker_name not in workers:
workers[worker_name] = Worker(worker_name, set())

Making a search by ID in friends list in Python

I am new to Python and currently searching for some internship or a job. I am currently working on a program in Python which reads a file that contains data in this shape:
Id;name;surname;age;gender;friends;
Id and age are the positive integers,
gender can be "male" or "female",
and friends is an array of numbers, separated by comma, which represent the Id's of persons who are friends with the current person. If Person1 is a friend to a Person2, it must work vice versa.
As you can see in the above example, attributes of a "Person" are separated by semicolon, and the trick is that not every person has every attribute, and of course, they differ by the number of friends. So, the first part of the task is to make a program which reads a file and creates a structure which represents a list of persons with the attributs mentioned above. I have to make a search for those persons by Id.
The second part is to make a function with two arguments (Id1, Id2) which returns True if a person with Id2 is a friend to a person with Id1. Otherwise, it returns false.
I have some ideas on my mind, but I am not sure how to realize this, since I don't know enough about Python yet. I guess the best structure for this would be a dictionary, but I am not sure how to load a file into it, since the attributes of all persons are different. I would be greatful for any help you can offer me.
Here is my attempt to write the code:
people = open(r"data.txt")
class People:
id = None
name = ''
surname = ''
age = None
gender = ['male', 'female']
friends = []
#def people(self):
# person = {'id': None,
# 'name': '',
# 'surname': '',
# 'age': None,
# 'gender': ['male', 'female'],
# 'friends': []
# }
# return person
def community(self):
comm = [People()]
return comm
def is_friend(id1, id2):
if (id1 in People.friends) & (id2 in People.friends):
return True
people.close()
Your question is too broad imho, but I'll give you a few hints:
the simplest datastructure for O(n) key access is indeed a dict. Note that a dict needs immutable values as keys (but that's fine since your Ids are integers), but can take anything as values. but that only works for (relatively) small datasets since it's all in memory. If you need bigger datasets and/or persistance, you want a database (key:value, relational, document, the choice is up to you).
Python has classes and computed attributes
In Python, the absence of a value is the None object
there's a csv files parser in the standard lib.
Now you just have to read the doc and start coding.
[edit] wrt/ your code snippet
class People:
id = None
name = ''
surname = ''
age = None
gender = ['male', 'female']
friends = []
Python is not Java or PHP. What you defined above are class attributes (shared by all instances of the class), you want instance attributes (defined in the __init() method). You should really read the FineManual.
Also if you're using Python 2.7.x, you want your classes to inherit from object (historical reasons).
So your Person class should look something like this:
class Person(object):
def __init__(self, id, name, surname, age, gender, friends=None):
self.id = id
self.name = name
self.surname = surname
self.age = age
self.gender = gender
self.friends = friends or []
And then to create a Person instance:
person = Person(42, "John Cleese", "Archie Leach", 77, "male", [11, 1337)])
def is_friend(id1, id2):
if (id1 in People.friends) & (id2 in People.friends):
return True
A few points points here:
First: you either want to rename this function are_friends or make it a method of the Person class and then only pass a (single) Person instance (not an 'id') as argument.
Second: in Python, & is the bitwise operator. The logical "and" operator is spelled, well, and.
Third: an expression has a truth value by itself, so your if statement is redundant. Whenever you see something like:
def func():
if <some expression>:
return True
else:
return False
you can just rewrite it as :
def func():
return <some expression>
Or if you want to ensure func returns a proper boolean (True or False):
def func():
return bool(<some expression>)
I'll stop here because I don't intend to teach you how to program. You obviously need to do at least the full official Python tutorial, and possibly some complete beginner tutorial too.

Replacing value of variable in list of named tuples

I'm loading data about phone calls into a list of namedtuples called 'records'. Each phone call has information on the length of the call in the variable 'call_duration'. However, some have the variable set to None. I would like to replace None with zero in all of the records, but the following code doesn't seem to work:
for r in records:
if r.call_duration is None:
r = r._replace(call_duration=0)
How can replace the value in the list? I guess the problem is that the new 'r' isn't stored in the list. What would be the best way to capture in the change in the list?
You can replace the old record by using its index in the records list. You can get that index using enumerate():
for i, rec in enumerate(records):
if rec.call_duration is None:
records[i] = rec._replace(call_duration=0)
I suggest you create your own class, it will benefit you in the future as far as object management goes. When you want to create methods later on for a record, you'll be able to easily do so in a class:
class Record:
def __init__(self, number = None, length = None):
self.number = number
self.length = length
def replace(self, **kwargs):
self.__dict__.update(kwargs)
Now you can easily manage your records and replace object attributes as you deem necessary.
for r in records:
if r.length is None:
r.replace(length = 0)

Using Class, Methods to define variables

I have a number of chemicals with corresponding data held within a database, how do I go about returning a specific chemical, and its data, via its formula, eg o2.
class SourceNotDefinedException(Exception):
def __init__(self, message):
super(SourceNotDefinedException, self).__init__(message)
class tvorechoObject(object):
"""The class stores a pair of objects, "tv" objects, and "echo" objects. They are accessed
simply by doing .tv, or .echo. If it does not exist, it will fall back to the other variable.
If neither are present, it returns None."""
def __init__(self, echo=None, tv=None):
self.tv = tv
self.echo = echo
def __repr__(self):
return str({"echo": self.echo, "tv": self.tv}) # Returns the respective strings
def __getattribute__(self, item):
"""Altered __getattribute__() function to return the alternative of .echo / .tv if the requested
attribute is None."""
if item in ["echo", "tv"]:
if object.__getattribute__(self,"echo") is None: # Echo data not present
return object.__getattribute__(self,"tv") # Select TV data
elif object.__getattribute__(self,"tv") is None: # TV data not present
return object.__getattribute__(self,"echo") # Select Echo data
else:
return object.__getattribute__(self,item) # Return all data
else:
return object.__getattribute__(self,item) # Return all data
class Chemical(object):
def __init__(self, inputLine, sourceType=None):
self.chemicalName = TVorEchoObject()
self.mass = TVorEchoObject()
self.charge = TVorEchoObject()
self.readIn(inputLine, sourceType=sourceType)
def readIn(self, inputLine, sourceType=None):
if sourceType.lower() == "echo": # Parsed chemical line for Echo format
chemicalName = inputLine.split(":")[0].strip()
mass = inputLine.split(":")[1].split(";")[0].strip()
charge = inputLine.split(";")[1].split("]")[0].strip()
# Store the objects
self.chemicalName.echo = chemicalName
self.mass.echo = mass
self.charge.echo = charge
elif sourceType.lower() == "tv": # Parsed chemical line for TV format
chemicalName = inputLine.split(":")[0].strip()
charge = inputLine.split(":")[1].split(";")[0].strip()
mass = inputLine.split(";")[1].split("&")[0].strip()
# Store the objects
self.chemicalName.tv = chemicalName
self.charge.tv = charge
self.mass.tv = molecularWeight
else:
raise SourceNotDefinedException(sourceType + " is not a valid `sourceType`") # Otherwise print
def toDict(self, priority="echo"):
"""Returns a dictionary of all the variables, in the form {"mass":<>, "charge":<>, ...}.
Design used is to be passed into the Echo and TV style line format statements."""
if priority in ["echo", "tv"]:
# Creating the dictionary by a large, to avoid repeated text
return dict([(attributeName, self.__getattribute__(attributeName).__getattribute__(priority))
for attributeName in ["chemicalName", "mass", "charge"]])
else:
raise SourceNotDefinedException("{0} source type not recognised.".format(priority)) # Otherwise print
from ParseClasses import Chemical
allChemical = []
chemicalFiles = ("/home/temp.txt")
for fileName in chemicalFiles:
with open(fileName) as sourceFile:
for line in sourceFile:
allChemical.append(Chemical(line, sourceType=sourceType))
for chemical in allChemical:
print chemical.chemicalName #Prints all chemicals and their data in list format
for chemical in allChemical(["o2"]):
print chemical.chemicalName
outputs the following error which I have tried to remedy with no luck;
TypeError: 'list' object is not callable
The issue is the two lines
for chemical in allChemical(["o2"]):
print chemical.chemicalName
allChemical is a list, and you can't just do a_list(). It looks like you're trying to find either ['o2'] or just 'o2' in a list. To do that, you can get the index of the item and then get that index from the list.
allChemical[allChemical.index("o2")]
Try this function:
def chemByString(chemName,chemicals,priority="echo"):
for chemical in chemicals:
chemDict = chemical.toDict(priority)
if chemDict["chemicalName"] == chemName
return chemical
return None
This function is using the toDict() method found in the Chemical class. The code you pasted from the Chemical class explains that this method returns a dictionary from the chemical object:
def toDict(self, priority="echo"):
"""Returns a dictionary of all the variables, in the form {"mass":<>, "charge":<>, ...}.
Design used is to be passed into the Echo and TV style line format statements."""
if priority in ["echo", "tv"]:
# Creating the dictionary by a large, to avoid repeated text
return dict([(attributeName, self.__getattribute__(attributeName).__getattribute__(priority))
for attributeName in ["chemicalName", "mass", "charge"]])
else:
raise SourceNotDefinedException("{0} source type not recognised.".format(priority)) # Otherwise print
This dictionary looks like this:
"chemicalName" : <the chemical name>
"mass" : <the mass>
"charge" : <the charge>
What the function I created above does is iterate through all of the chemicals in the list, finds the first one with a name equal to "o2", and returns that chemical. Here's how to use it:
chemByString("o2",allChemicals).chemicalName
If the above does not work, may want to try using the alternative priority ("tv"), though I'm unsure if this will have any effect:
chemByString("o2",allChemicals,"tv").chemicalName
If the chemical isn't found, the function returns None:
chemByString("myPretendChemical",allChemicals).chemicalName
EDIT: See my new answer. Leaving this one here since it might still be helpful info.
In python, a list object is a structure holding other objects with an index for each object it contains. Like this:
Index Object
0 "hello"
1 "world"
2 "spam"
If you want to get to one of those objects, you have to know its index:
objList[0] #returns "hello" string object
If you don't know the index, you can find it using the index method:
objList.index("hello") #returns 0
Then you can get the object out of the list using the found index:
objList[objList.index("hello")]
However this is kind of silly, since you can just do:
"hello"
Which in this case will produce the same result.
Your allChemical object is a list. It looks like the line chemicalFiles = ("/home/temp.txt") is filling your list with some type of object. In order to answer your question, you have to provide more information about the objects which the list contains. I assume that information is in the ParseClasses module you are using.
If you can provide more information about the Chemical object you are importing, that may go a long way to helping solve your problem.
IF the objects contained in your list are subclassed from str, this MAY work:
allChemical[allChemical.index("o2")].chemicalName
"02" is a str object, so index is going to look for a str object (or an object subclassed from str) in your list to find its index. However, if the object isn't a string, it will not find it.
As a learning exercise, try this:
class Chemical(str):
'''A class which is a subclass of string but has additional attributes such as chemicalName'''
def __init__(self,chemicalName):
self.chemicalName = chemicalName
someChemicals = [Chemical('o2'),Chemical('n2'),Chemical('h2')]
for chemical in someChemicals: print(chemical.chemicalName)
#prints all the chemical names
print(someChemicals[0].chemicalName)
#prints "o2"; notice you have to know the index ahead of time
print(someChemicals[someChemicals.index("o2")].chemicalName)
#prints "o2" again; this time index found it for you, but
#you already knew the object ahead of time anyway, sot it's a little silly
This works because index is able to find what you are looking for. If it isn't a string it can't find it, and if you don't know what index 'o2' is at, if you want to get to a specific chemical in your list of chemicals you're going to have to learn more about those objects.

Categories

Resources