I want to:
Take a list of lists
Make a frequency table in a dictionary
Do things with the resulting dictionary
The class works, the code works, the frequency table is correct.
I want to get a class that returns a dictionary, but I actually get a class that returns a class type.
I can see that it has the right content in there, but I just can't get it out.
Can someone show me how to turn the output of the class to a dictionary type?
I am working with HN post data. Columns, a few thousand rows.
freq_pph = {}
freq_cph = {}
freq_uph = {}
# Creates a binned frequency table:
# - key is bin_minutes (size of bin in minutes).
# - value is freq_value which sums/counts the number of things in that column.
class BinFreq:
def __init__(self, dataset, bin_minutes, freq_value, dict_name):
self.dataset = dataset
self.bin_minutes = bin_minutes
self.freq_value = freq_value
self.dict_name = dict_name
def make_table(self):
# Sets bin size
# Counts how of posts in that timedelta
if (self.bin_minutes == 60) and (self.freq_value == "None"):
for post in self.dataset:
hour_dt = post[-1]
hour_str = hour_dt.strftime("%H")
if hour_str in self.dict_name:
self.dict_name[hour_str] += 1
else:
self.dict_name[hour_str] = 1
# Sets bins size
# Sums the values of a given index/column
if (self.bin_minutes == 60) and (self.freq_value != "None"):
for post in self.dataset:
hour_dt = post[-1]
hour_str = hour_dt.strftime("%H")
if hour_str in self.dict_name:
self.dict_name[hour_str] += int(row[self.freq_value])
else:
self.dict_name[hour_str] = int(row[self.freq_value])
Instantiate:
pph = BinFreq(ask_posts, 60, "None", freq_pph)
pph.make_table()
How can pph be turned into a real dictionary?
If you want the make_table function to return a dictionary, then you have to add a return statement at the end of it, for example: return self.dict_name.
If you then want to use it outside of the class, you have to assign it to a variable, so in the second snipped do: my_dict = pph.make_table().
Classes can't return things – functions in classes could. However, the function in your class doesn't; it just modifies self.dict_name (which is a misnomer; it's really just a reference to a dict, not a name (which one might imagine is a string)), which the caller then reads (or should, anyway).
In addition, there seems to be a bug; the second if block (which is never reached anyway) refers to row, an undefined name.
Anyway, your class doesn't need to be a class at all, and is easiest implemented with the built-in collections.Counter() class:
from collections import Counter
def bin_by_hour(dataset, value_key=None):
counter = Counter()
for post in dataset:
hour = post[-1].hour # assuming it's a `datetime` object
if value_key: # count using `post[value_key]`
counter[hour] += post[value_key]
else: # just count
counter[hour] += 1
return dict(counter.items()) # make the Counter a regular dict
freq_pph = bin_by_hour(ask_posts)
freq_cph = bin_by_hour(ask_posts, value_key="num_comments") # or whatever
Related
Helllo, sorry if my question has already been answered, i didn't find it...
I have a list of objects called structList
Here is the structure of my objects:
class ResponseStructure():
def __init__(self, isSucceededRequest: bool, latency: float, date: datetime, comment: str) -> None:
self.isSucceededRequest = isSucceededRequest
self.latency = latency
self.date = date
self.comment = comment
I'm actually trying to create a function to iterate over structList and return a dictionnary containing some data.
Here is an example of the output I'd like to have:
{
# The minimum value for all object 'lat' attributes in the object list and the corresponding date
'minLat': (0.2, "12/24/2018-04:59:31"),
# The maximum value for all object 'lat' attributes in the object list and the corresponding date
'maxLat': (4.2, "6/21/2019-05:56:32"),
# The mean value for all object 'lat' attributes in the object list
'meanLat': (0.6),
# Total number of isSuccess==True attributes in the object list
'isSuccessNumber' : 1234,
# Total number of com=="Error" attributes in the object list
'numberOfError' : 123
}
I'm able to create this function, however, i don't know how to do it by iterating only ONCE on the object list.
# HERE IS MY ORIGINAL CODE, I'VE PASTED IT BECAUSE SOMEONE ASKED IT
def generateReport(structList: List[ResponseStructure]) -> Dict:
report = {}
# Minimum latency
minLatStruct = min(structList, key=lambda struct: struct.latency)
report['minLat'] = (minLatStruct.latency, f"{minLatStruct.date.strftime('%m/%d/%Y-%H:%M:%S')}")
# Maximum latency
maxLatStruct = max(structList, key=lambda struct: struct.latency)
report['maxLat'] = (maxLatStruct.latency, f"{maxLatStruct.date.strftime('%m/%d/%Y-%H:%M:%S')}")
# Mean latency
report['meanLat'] = statistics.mean([struct.latency for struct in structList])
# Number of code 200
report['isSuccessNumber'] = sum(struct.isSucceededRequest == True for struct in structList)
# Number of Error
report['numberOfError'] = sum(struct.comment == "Error" for struct in structList)
return report
I'm iterating 5 times on the list, and i want to iterate only once, if possible.
Is there a way to do it ? Thanks for your answers !
One way to do it is
dataDict = {'minLat': (90, None), 'maxLat':(-90, None), 'isSuccessNumber':0,'numberOfError':0}
#this is assuming lat goes from -90 to 90, and that you can at least one item
count = 0
totalLat = 0
for item in testList:
count +=1
totalLat += item.lat
dataDict['isSuccessNumber'] += item.isSuccess
dataDict['numberOfError'] += (item.com == 'Error')
if item.lat > dataDict['maxLat'][0]:
dataDict['maxLat'] = (item.lat, item.date)
if item.lat < dataDict['minLat'][0]:
dataDict['minLat'] = (item.lat, item.date)
dataDict['mean'] = totalLat/count
However, I'm not sure this will be faster than doing max([(item.lat, item.date) for item in testList], key = lambda x: x.lat), etc.
I'm having trouble to udnerstand what I did wrong in my simple code. I've found a "solution" but i would like to understand "why" :)
In general i'm doing a simple "group by" using the dictionary and lists.
When I assign to a key a value that is a 2 element list, the first element(value) in that dictionary, is splitted into two seperate values, all other value behave properly - are 2 elements list.
So I had 4 values when the dictionary was printed.
1301
425
['979', '340']
['1301', '977']
But I was able to fix this to have 3 pair values - the desired outcome:
['1301', '425']
['979', '340']
['1301', '977']
As I said i was able to fix this issue using the .setdefault() method to assign the first element insted of simple assigment but I don't understand why does it make a difference.
Here is my code, all other aspects of the class are working properly or will work properly in the "future" :):D
import os
import copy
from pathlib import Path
class ManageSettings:
def __init__(self, record_or_not = False,config_path = ".\\settings\\"):
self.to_record = record_or_not
self.ini_path = config_path + "\\position_settings.ini" # file path for settings
path_to_save = Path(config_path)
path_to_save.mkdir(exist_ok=True)
self.__position_list = {}
def save_or_load (self, sequence = 0):
if(self.to_record):
pass
else:
self.load_settings()
def save_settings(self, sequence = 0):
with open(self.ini_path, 'w') as file_writter:
for posXY in self.position_list:
file_writter.write("{};{};{}\n".format(sequence, posXY[0], posXY[1]))
def load_settings(self):
with open(self.ini_path, 'r') as file_reader:
sequence_dict = {}
for line_XY in file_reader:
sequence, posX, posY = line_XY.strip().split(";")
XYposList = list((posX, posY))
print(XYposList)
if not sequence in sequence_dict:
Here is the part that if changed will print a different result
# sequence_dict.setdefault(sequence,[]).append( XYposList)
sequence_dict[sequence] = XYposList
else:
sequence_dict[sequence].append( XYposList) # self.position_list = sequence_dict
print ('in class')
for values in sequence_dict.values():
print (values)
self.position_list = sequence_dict
#property
def position_list(self):
return self.__position_list
#position_list.setter
def position_list(self, position_list):
self.__position_list = position_list
This is how can you simply test the class, of course if you have the ini file :)
testClas = ManageSettings(False)
testClas.save_or_load()
print("outside the class")
for value in testKlas.position_list.values():
# print (len(value))
for test in value:
print (test)
Edit: Hi, here is the content of the ini file
0;1301;425
0;979;340
0;1301;977
This line:
sequence_dict[sequence] = XYposList
should be:
sequence_dict[sequence] = [XYposList]
The values in sequence_dict are supposed to be 2-dimensional lists. But when you're creating the first value, you're not putting it in a list, you're just using XYPostList as the list. Then subsequent append() calls append to that list. So you start with
['1301', '425']
and the next line appends ['979', '340'] to that, resulting in
['1301', '425', ['979', '340']]
But what you want is for the initial value to be:
[['1301', '425']]
and then it becomes
[['1301', '425'], ['979', '340']]
I have an array of object of class Person like the below, with thisRate first set to None:
class Person(object):
def __init__(self, id, name):
self.id = id
self.name = name
self.thisRate= None
I loaded around 21K Person objects into an array, name not sorted.
Then I loaded another array from data in a file which has data for thisRate, about 13K of them, name is not sorted as well:
person_data = []
# read from file
row['name'] = 'Peter'
row['thisRate'] = '0.12334'
person_data.append(row)
Now with these 2 sets of arrays, when the name is matched between them, I will assign thisRate from person_data into Person.thisRate.
What I am doing is a loop is like this:
for person in persons:
data = None
try:
data = next(personData for personData in person_data
if personData['name'] == person.name)
except StopIteration:
print("No rate for this person: {}".format(person.name))
if data:
person.thisRate = float( data['thisRate'] )
This loop
data = next(personData for personData in person_data
if personData['name'] == person.name)
is running fine and uses 21 seconds on my machine with Python 2.7.13.
My question is, is there a faster or better way to achieve the same thing with the 2 arrays I have?
Yes. Make an dictionary from name to thisRate:
nd = {}
with open(<whatever>) as f:
reader = csv.DictReader(<whatever>):
for row in reader:
nd[row['name']] = row['thisRate']
Now, use this dictionary to do a single pass over your Person list:
for person in persons:
thisRate = nd.get(person.name, None)
person.thisRate = thisRate
if thisRate is None:
print("No rate for this person: {}".format(person.name))
Dictionaries have a .get method which allows you to provide a default value in case the key is not in the dict. I used None (which is actually what is the default default value) but you can use whatever you want.
This is a linear-time solution. Your solution was quadratic time, because you are essentially doing:
for person in persons:
for data in person_data:
if data['name'] == person.name:
person.thisRate = data['thisRate']
break
else:
print("No rate for this person: {}".format(person.name))
Just in a fashion that obscures this fundamentally nested for-loop inside of a generator expression (not really a good use-case for a generator expression, you should have just used a for-loop to begin with, then you don't have to deal with try-catch a StopIteration
I'm loading data about phone calls into a list of namedtuples called 'records'. Each phone call has information on the length of the call in the variable 'call_duration'. However, some have the variable set to None. I would like to replace None with zero in all of the records, but the following code doesn't seem to work:
for r in records:
if r.call_duration is None:
r = r._replace(call_duration=0)
How can replace the value in the list? I guess the problem is that the new 'r' isn't stored in the list. What would be the best way to capture in the change in the list?
You can replace the old record by using its index in the records list. You can get that index using enumerate():
for i, rec in enumerate(records):
if rec.call_duration is None:
records[i] = rec._replace(call_duration=0)
I suggest you create your own class, it will benefit you in the future as far as object management goes. When you want to create methods later on for a record, you'll be able to easily do so in a class:
class Record:
def __init__(self, number = None, length = None):
self.number = number
self.length = length
def replace(self, **kwargs):
self.__dict__.update(kwargs)
Now you can easily manage your records and replace object attributes as you deem necessary.
for r in records:
if r.length is None:
r.replace(length = 0)
I have a number of chemicals with corresponding data held within a database, how do I go about returning a specific chemical, and its data, via its formula, eg o2.
class SourceNotDefinedException(Exception):
def __init__(self, message):
super(SourceNotDefinedException, self).__init__(message)
class tvorechoObject(object):
"""The class stores a pair of objects, "tv" objects, and "echo" objects. They are accessed
simply by doing .tv, or .echo. If it does not exist, it will fall back to the other variable.
If neither are present, it returns None."""
def __init__(self, echo=None, tv=None):
self.tv = tv
self.echo = echo
def __repr__(self):
return str({"echo": self.echo, "tv": self.tv}) # Returns the respective strings
def __getattribute__(self, item):
"""Altered __getattribute__() function to return the alternative of .echo / .tv if the requested
attribute is None."""
if item in ["echo", "tv"]:
if object.__getattribute__(self,"echo") is None: # Echo data not present
return object.__getattribute__(self,"tv") # Select TV data
elif object.__getattribute__(self,"tv") is None: # TV data not present
return object.__getattribute__(self,"echo") # Select Echo data
else:
return object.__getattribute__(self,item) # Return all data
else:
return object.__getattribute__(self,item) # Return all data
class Chemical(object):
def __init__(self, inputLine, sourceType=None):
self.chemicalName = TVorEchoObject()
self.mass = TVorEchoObject()
self.charge = TVorEchoObject()
self.readIn(inputLine, sourceType=sourceType)
def readIn(self, inputLine, sourceType=None):
if sourceType.lower() == "echo": # Parsed chemical line for Echo format
chemicalName = inputLine.split(":")[0].strip()
mass = inputLine.split(":")[1].split(";")[0].strip()
charge = inputLine.split(";")[1].split("]")[0].strip()
# Store the objects
self.chemicalName.echo = chemicalName
self.mass.echo = mass
self.charge.echo = charge
elif sourceType.lower() == "tv": # Parsed chemical line for TV format
chemicalName = inputLine.split(":")[0].strip()
charge = inputLine.split(":")[1].split(";")[0].strip()
mass = inputLine.split(";")[1].split("&")[0].strip()
# Store the objects
self.chemicalName.tv = chemicalName
self.charge.tv = charge
self.mass.tv = molecularWeight
else:
raise SourceNotDefinedException(sourceType + " is not a valid `sourceType`") # Otherwise print
def toDict(self, priority="echo"):
"""Returns a dictionary of all the variables, in the form {"mass":<>, "charge":<>, ...}.
Design used is to be passed into the Echo and TV style line format statements."""
if priority in ["echo", "tv"]:
# Creating the dictionary by a large, to avoid repeated text
return dict([(attributeName, self.__getattribute__(attributeName).__getattribute__(priority))
for attributeName in ["chemicalName", "mass", "charge"]])
else:
raise SourceNotDefinedException("{0} source type not recognised.".format(priority)) # Otherwise print
from ParseClasses import Chemical
allChemical = []
chemicalFiles = ("/home/temp.txt")
for fileName in chemicalFiles:
with open(fileName) as sourceFile:
for line in sourceFile:
allChemical.append(Chemical(line, sourceType=sourceType))
for chemical in allChemical:
print chemical.chemicalName #Prints all chemicals and their data in list format
for chemical in allChemical(["o2"]):
print chemical.chemicalName
outputs the following error which I have tried to remedy with no luck;
TypeError: 'list' object is not callable
The issue is the two lines
for chemical in allChemical(["o2"]):
print chemical.chemicalName
allChemical is a list, and you can't just do a_list(). It looks like you're trying to find either ['o2'] or just 'o2' in a list. To do that, you can get the index of the item and then get that index from the list.
allChemical[allChemical.index("o2")]
Try this function:
def chemByString(chemName,chemicals,priority="echo"):
for chemical in chemicals:
chemDict = chemical.toDict(priority)
if chemDict["chemicalName"] == chemName
return chemical
return None
This function is using the toDict() method found in the Chemical class. The code you pasted from the Chemical class explains that this method returns a dictionary from the chemical object:
def toDict(self, priority="echo"):
"""Returns a dictionary of all the variables, in the form {"mass":<>, "charge":<>, ...}.
Design used is to be passed into the Echo and TV style line format statements."""
if priority in ["echo", "tv"]:
# Creating the dictionary by a large, to avoid repeated text
return dict([(attributeName, self.__getattribute__(attributeName).__getattribute__(priority))
for attributeName in ["chemicalName", "mass", "charge"]])
else:
raise SourceNotDefinedException("{0} source type not recognised.".format(priority)) # Otherwise print
This dictionary looks like this:
"chemicalName" : <the chemical name>
"mass" : <the mass>
"charge" : <the charge>
What the function I created above does is iterate through all of the chemicals in the list, finds the first one with a name equal to "o2", and returns that chemical. Here's how to use it:
chemByString("o2",allChemicals).chemicalName
If the above does not work, may want to try using the alternative priority ("tv"), though I'm unsure if this will have any effect:
chemByString("o2",allChemicals,"tv").chemicalName
If the chemical isn't found, the function returns None:
chemByString("myPretendChemical",allChemicals).chemicalName
EDIT: See my new answer. Leaving this one here since it might still be helpful info.
In python, a list object is a structure holding other objects with an index for each object it contains. Like this:
Index Object
0 "hello"
1 "world"
2 "spam"
If you want to get to one of those objects, you have to know its index:
objList[0] #returns "hello" string object
If you don't know the index, you can find it using the index method:
objList.index("hello") #returns 0
Then you can get the object out of the list using the found index:
objList[objList.index("hello")]
However this is kind of silly, since you can just do:
"hello"
Which in this case will produce the same result.
Your allChemical object is a list. It looks like the line chemicalFiles = ("/home/temp.txt") is filling your list with some type of object. In order to answer your question, you have to provide more information about the objects which the list contains. I assume that information is in the ParseClasses module you are using.
If you can provide more information about the Chemical object you are importing, that may go a long way to helping solve your problem.
IF the objects contained in your list are subclassed from str, this MAY work:
allChemical[allChemical.index("o2")].chemicalName
"02" is a str object, so index is going to look for a str object (or an object subclassed from str) in your list to find its index. However, if the object isn't a string, it will not find it.
As a learning exercise, try this:
class Chemical(str):
'''A class which is a subclass of string but has additional attributes such as chemicalName'''
def __init__(self,chemicalName):
self.chemicalName = chemicalName
someChemicals = [Chemical('o2'),Chemical('n2'),Chemical('h2')]
for chemical in someChemicals: print(chemical.chemicalName)
#prints all the chemical names
print(someChemicals[0].chemicalName)
#prints "o2"; notice you have to know the index ahead of time
print(someChemicals[someChemicals.index("o2")].chemicalName)
#prints "o2" again; this time index found it for you, but
#you already knew the object ahead of time anyway, sot it's a little silly
This works because index is able to find what you are looking for. If it isn't a string it can't find it, and if you don't know what index 'o2' is at, if you want to get to a specific chemical in your list of chemicals you're going to have to learn more about those objects.