All instances of maximum - python

I have a function that gets a set of data from a server, sorts and displays it
def load_data(dateStr):
data = get_date(dateStr).splitlines()
result = []
for c in data:
a = c.split(',')
time = a[0]
temp = float(a[1])
solar = float(a[2])
kwH = a[3:]
i = 0
while i < len(power):
power[i] = int(power[i])
i = i+1
result.append((time, temp, solar, tuple(kwH)))
return result
This is what the function returns when you enter in a particular date(only 3 entries out of a long list), the first number in each entry is the time, second is the temperature.
>>> load_data('20-01-2014')
[('05:00', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 18, 34)), ('05:01', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 20, 26)), ('05:02', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 17, 35))
I need write a function to find the maximum temperature of a date, and show all of the times in the day that the maximum occurred. Something like this:
>>> data = load_data('07-10-2011')
>>> max_temp(data)
(18.9, ['13:08', '13:09', '13:10'])
How would I go about this? Or can you point me to anywhere that might have answers

This is one way to do it (this loops over the data twice):
>>> data = [('05:00', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 18, 34)), ('05:01', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 20, 26)), ('05:02', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 17, 35))]
>>> max_temp = max(data, key=lambda x: x[1])[1]
>>> max_temp
19.9
>>> result = [item for item in data if item[1] == max_temp]
>>> result
[('05:00', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 18, 34)), ('05:01', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 20, 26)), ('05:02', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 17, 35))]

The most optimal way to get all matching times for the maximum temperature is to simply loop over the values and track the maximum found so far:
def max_temp(data):
maximum = float('-inf')
times = []
for entry in data:
time, temp = entry[:2]
if temp == maximum:
times.append(time)
elif temp > maximum:
maximum = temp
times = [time]
return maximum, times
This loops over the data just once.
The convenient way (which is probably going to be close in performance anyway) is to use the max() function to find the maximum temperature first, then a list comprehension to return all times with that temperature:
def max_temp(data):
maximum = max(data, key=lambda e: e[1])[1]
return maximum, [e[0] for e in data if e[1] == maximum]
This loops twice over the data, but the max() loop is implemented mostly in C code.

def max_temp(data):
maxt = max([d[1] for d in data])
return (maxt, [d[0] for d in data if d[1] == maxt])

Related

Run for loop on 2 variables from dataframe column

I need to run for loop on 2 columns coming from a dataframe and return a dict. But when i use zip I am getting only a part of a string on which the loop is running.
import pandas as pd
def split(owner, cost):
split_bill = {'ads': 0, 'qaweb': 0, 'ovt': 0, 'cs': 0, 'edu': 0, 'xms': 0, 'cc': 0}
for owner_in, cost in zip(owner, cost): --> #need to know what type of loop can work here
split_bill[owner_in] += cost
continue
return split_bill
data = {
"owner": ['ads', 'cs', 'edu'],
"cost": [2.3, 4.30, 45]
}
df = pd.DataFrame(data)
df['metric'] = df.apply(lambda x: split(x.owner, {x.cost}), axis=1)
Exptected output
df['metric'] =
metric
{'ads': 2.3, 'qaweb': 0, 'ovt': 0, 'cs': 0, 'edu': 0, 'xms': 0, 'cc': 0}
{'ads': 2.3, 'qaweb': 0, 'ovt': 0, 'cs': 4.3, 'edu': 0, 'xms': 0, 'cc': 0}
{'ads': 2.3, 'qaweb': 0, 'ovt': 0, 'cs': 0, 'edu': 45, 'xms': 0, 'cc': 0}
in the for loop owner_in is only taking a of ads Which should be taking ads instead of a.
Can you help with what type of loop could work?
zip is to zip some lists into list of tuple. The length of the final list is determined by the shortest list among those list.
In your example, owner is a string ads, cost is a set with one float value. In zip(owner, cost), string is treated as a list with three values. So the length of final list is 1 determined by the shortest set which has only one float value.
I guess you may want to do df.groupby('owner')['cost'].apply(sum).

Plotting by ignoring missing data in matplotlib

I have been trying to make a program that plots the frequency of usage of a word during Whatsapp chats between 2 people. The word night for example has been used a couple of times on a few days, and 0 times on the most of the days. The graph I have is as follows
Here is the code
word_occurances = [0 for i in range(len(just_dates))]
for i in range(len(just_dates)):
for j in range(len(df_word)):
if just_dates[i].date() == word_date[j].date():
word_occurances[i] += 1
title = person2.rstrip(':') + ' with ' + person1.rstrip(':') + ' usage of the word - ' + word
plt.plot(just_dates, word_occurances, color = 'purple')
plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title(title)
plt.savefig('Graphs/Words/' + title + '.jpg', dpi = 200)
plt.show()
word_occurances is a list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 0, 1, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
What I want is for the graph to only connect the points where it has been used while showing the entire timeline on the x axis. I don't want the graph to touch 0. How can I do this? I have searched and found similar answers but none have worked the way I them.
You simply have to find the indices of word_occurances on which the corresponding value is greater than zero. With this you can index just_dates to get the corresponding dates.
word_counts = [] # Only word counts > 0
dates = [] # Date of > 0 word count
for i, val in enumerate(word_occurances):
if val > 0:
word_counts.append(val)
dates.append(just_dates[i])
You may want to plot with an underlying bar plot in order to maintain the original scale.
plt.bar(just_dates, word_occurances)
plt.plot(dates, word_counts, 'r--')
One way to address this is to plot only data that contain entries but label all dates where a conversation took place to indicate the zero values in your graph:
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
#fake data generation, this block just imitates your unknown data and can be deleted
import numpy as np
import pandas as pd
np.random.seed(12345)
n = 30
just_dates = pd.to_datetime(np.random.randint(1, 100, n)+18500, unit="D").sort_values().to_list()
word_occurances = [0]*n
for i in range(10):
word_occurances[np.random.randint(n)] = np.random.randint(1, 10)
fig, ax = plt.subplots(figsize=(15,5))
#generate data to plot by filtering out zero values
plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]
#plot these data with marker to indicate each point
#think 1-1-1-1-1 would only be visible as two points with lines only
ax.plot(*zip(*plot_data), color = 'purple', marker="o")
#label all dates where conversations took place
ax.xaxis.set_major_locator(FixedLocator(mdates.date2num(just_dates)))
#prevent that matplotlib autoscales the y-axis
ax.set_ylim(0, )
ax.tick_params(axis="x", labelrotation= 90)
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()
Sample output:
This can get quite busy soon with all these date labels (and might or might not work with your datetime objects in just_dates that might differ in structure from my sample date). Another way would be to indicate each conversation with vlines:
...
fig, ax = plt.subplots(figsize=(15,5))
plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]
ax.plot(*zip(*plot_data), color = 'purple', marker="o")
ax.vlines((just_dates), 0, max(word_occurances), color="red", ls="--")
ax.set_ylim(0, )
plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()
Sample output:

Covert complexed list to flat list

I have a long list complexed of numpy arrays and integers, below is an example:
[array([[2218.67288865]]), array([[1736.90215229]]), array([[1255.13141592]]), array([[773.36067956]]), array([[291.58994319]]), 0, 0, 0, 0, 0, 0, 0, 0, 0]
and i'd like to convert it to a regular list as so:
[2218.67288865, 1736.90215229, 1255.13141592, 773.36067956, 291.58994319, 0, 0, 0, 0, 0, 0, 0, 0, 0]
How can I do that efficiently?
You can use a generator for flattening the nested list:
def convert(obj):
try:
for item in obj:
yield from convert(item)
except TypeError:
yield obj
result = list(convert(data))
list(itertools.from_iterable(itertools.from_iterable(...))) should work for removing 2 levels of nesting: just add or remove copies of itertools.from_iterable(...) as needed.
Here the simplest seems to also be the fastest:
x = [array([[2218.67288865]]), array([[1736.90215229]]), array([[1255.13141592]]), array([[773.36067956]]), array([[291.58994319]]), 0, 0, 0, 0, 0, 0, 0, 0, 0]
[y if y.__class__==int else y.item(0) for y in x]
# [2218.67288865, 1736.90215229, 1255.13141592, 773.36067956, 291.58994319, 0, 0, 0, 0, 0, 0, 0, 0, 0]
timeit(lambda:[y if y.__class__==int else y.item(0) for y in x])
# 2.198630048893392
You can stick to numpy by using np.ravel:
np.hstack([np.ravel(i) for i in l]).tolist()
Output:
[2218.67288865,
1736.90215229,
1255.13141592,
773.36067956,
291.58994319,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0]

how to call class methods inside list comprehension

This is a general question but I am providing the example for my case. I have a class name "Descriptors" which I import it as following:
from rdkit.Chem import Descriptors
Descriptors has a number of Methods; for example:
Descriptors.MolWt()
Descriptors.HeavyAtomCount()
I can get a list of methods for Descriptors as following:
names=[ x[0] for x in Descriptors._descList]
names
['MaxEStateIndex',
'MinEStateIndex',
'MaxAbsEStateIndex',
'MinAbsEStateIndex',
'qed',
'MolWt',
'HeavyAtomMolWt',
'ExactMolWt',
'NumValenceElectrons',
'NumRadicalElectrons',
'MaxPartialCharge',
'MinPartialCharge',
'MaxAbsPartialCharge',
'MinAbsPartialCharge',
'FpDensityMorgan1',
'FpDensityMorgan2',
'FpDensityMorgan3',
'BalabanJ',
'BertzCT',
'Chi0',
'Chi0n',
'Chi0v',
'Chi1',
'Chi1n',
'Chi1v',
'Chi2n',
'Chi2v',
'Chi3n',
'Chi3v',
'Chi4n',
'Chi4v',
'HallKierAlpha',
'Ipc',
'Kappa1',
'Kappa2',
'Kappa3',
'LabuteASA',
'PEOE_VSA1',
'PEOE_VSA10',
'PEOE_VSA11',
'PEOE_VSA12',
'PEOE_VSA13',
'PEOE_VSA14',
'PEOE_VSA2',
'PEOE_VSA3',
'PEOE_VSA4',
'PEOE_VSA5',
'PEOE_VSA6',
'PEOE_VSA7',
'PEOE_VSA8',
'PEOE_VSA9',
'SMR_VSA1',
'SMR_VSA10',
'SMR_VSA2',
'SMR_VSA3',
'SMR_VSA4',
'SMR_VSA5',
'SMR_VSA6',
'SMR_VSA7',
'SMR_VSA8',
'SMR_VSA9',
'SlogP_VSA1',
'SlogP_VSA10',
'SlogP_VSA11',
'SlogP_VSA12',
'SlogP_VSA2',
'SlogP_VSA3',
'SlogP_VSA4',
'SlogP_VSA5',
'SlogP_VSA6',
'SlogP_VSA7',
'SlogP_VSA8',
'SlogP_VSA9',
'TPSA',
'EState_VSA1',
'EState_VSA10',
'EState_VSA11',
'EState_VSA2',
'EState_VSA3',
'EState_VSA4',
'EState_VSA5',
'EState_VSA6',
'EState_VSA7',
'EState_VSA8',
'EState_VSA9',
'VSA_EState1',
'VSA_EState10',
'VSA_EState2',
'VSA_EState3',
'VSA_EState4',
'VSA_EState5',
'VSA_EState6',
'VSA_EState7',
'VSA_EState8',
'VSA_EState9',
'FractionCSP3',
'HeavyAtomCount',
'NHOHCount',
'NOCount',
'NumAliphaticCarbocycles',
'NumAliphaticHeterocycles',
'NumAliphaticRings',
'NumAromaticCarbocycles',
'NumAromaticHeterocycles',
'NumAromaticRings',
'NumHAcceptors',
'NumHDonors',
'NumHeteroatoms',
'NumRotatableBonds',
'NumSaturatedCarbocycles',
'NumSaturatedHeterocycles',
'NumSaturatedRings',
'RingCount',
'MolLogP',
'MolMR',
'fr_Al_COO',
'fr_Al_OH',
'fr_Al_OH_noTert',
'fr_ArN',
'fr_Ar_COO',
'fr_Ar_N',
'fr_Ar_NH',
'fr_Ar_OH',
'fr_COO',
'fr_COO2',
'fr_C_O',
'fr_C_O_noCOO',
'fr_C_S',
'fr_HOCCN',
'fr_Imine',
'fr_NH0',
'fr_NH1',
'fr_NH2',
'fr_N_O',
'fr_Ndealkylation1',
'fr_Ndealkylation2',
'fr_Nhpyrrole',
'fr_SH',
'fr_aldehyde',
'fr_alkyl_carbamate',
'fr_alkyl_halide',
'fr_allylic_oxid',
'fr_amide',
'fr_amidine',
'fr_aniline',
'fr_aryl_methyl',
'fr_azide',
'fr_azo',
'fr_barbitur',
'fr_benzene',
'fr_benzodiazepine',
'fr_bicyclic',
'fr_diazo',
'fr_dihydropyridine',
'fr_epoxide',
'fr_ester',
'fr_ether',
'fr_furan',
'fr_guanido',
'fr_halogen',
'fr_hdrzine',
'fr_hdrzone',
'fr_imidazole',
'fr_imide',
'fr_isocyan',
'fr_isothiocyan',
'fr_ketone',
'fr_ketone_Topliss',
'fr_lactam',
'fr_lactone',
'fr_methoxy',
'fr_morpholine',
'fr_nitrile',
'fr_nitro',
'fr_nitro_arom',
'fr_nitro_arom_nonortho',
'fr_nitroso',
'fr_oxazole',
'fr_oxime',
'fr_para_hydroxylation',
'fr_phenol',
'fr_phenol_noOrthoHbond',
'fr_phos_acid',
'fr_phos_ester',
'fr_piperdine',
'fr_piperzine',
'fr_priamide',
'fr_prisulfonamd',
'fr_pyridine',
'fr_quatN',
'fr_sulfide',
'fr_sulfonamd',
'fr_sulfone',
'fr_term_acetylene',
'fr_tetrazole',
'fr_thiazole',
'fr_thiocyan',
'fr_thiophene',
'fr_unbrch_alkane',
'fr_urea']
Now, I want to define a function to return all the Descriptors methods as a list and I am trying the following:
def fingerprint_all():
names=[ x[0] for x in Descriptors._descList]
features=[Descriptors.name() for name in names]
return features
However, when i call the function, it returns error:
print (fingerprint_all())
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-16-a36092bb806c> in <module>()
23 return features
24
---> 25 print (fingerprint_all())
<ipython-input-16-a36092bb806c> in fingerprint_all()
20 def fingerprint_all():
21 names=[ x[0] for x in Descriptors._descList]
---> 22 features=[Descriptors.name() for name in names]
23 return features
24
<ipython-input-16-a36092bb806c> in <listcomp>(.0)
20 def fingerprint_all():
21 names=[ x[0] for x in Descriptors._descList]
---> 22 features=[Descriptors.name() for name in names]
23 return features
24
AttributeError: module 'rdkit.Chem.Descriptors' has no attribute 'name'
I am not familiar with OO and classes and I really appreciate your help!
What you are trying to do is not valid python syntax. Use getattr instead:
features = [getattr(Descriptors, name) for name in names]
If I see it right, you want to calculate all descriptors for a mol at once.
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.ML.Descriptors import MoleculeDescriptors
mol = Chem.MolFromSmiles('c1ccccc1O')
allDes = [d[0] for d in Descriptors._descList]
calc = MoleculeDescriptors.MolecularDescriptorCalculator(allDes)
c = calc.CalcDescriptors(mol)
print(c)
And you will get all calculated descriptors for the mol.
(8.632222222222222, 0.3217592592592595, 8.632222222222222, 0.3217592592592595, 0.514729544768675, 94.11299999999999, 88.06499999999998, 94.041864812, 36, 0, 0.11507481947527982, -0.5079669948663066, 0.5079669948663066, 0.11507481947527982, 1.0, 1.5714285714285714, 1.8571428571428572, 3.0214653097240864, 134.10736969541455, 5.112884175122364, 3.833964941448087, 3.833964941448087, 3.393846850117352, 2.1342904002729384, 2.1342904002729384, 1.3355491589367874, 1.3355491589367874, 0.756193600181959, 0.756193600181959, 0.42799410427012347, 0.42799410427012347, -0.98, 47.19725257297226, 4.18611295681063, 1.6461962159398054, 0.9290591797144502, 42.22563687169298, 5.106527394840706, 5.749511833283905, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 18.19910120538483, 12.13273413692322, 0.0, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.33183534230805, 0.0, 5.749511833283905, 0.0, 0.0, 5.749511833283905, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 30.33183534230805, 0.0, 0.0, 0.0, 20.23, 0.0, 0.0, 0.0, 0.0, 5.749511833283905, 0.0, 0.0, 24.26546827384644, 6.06636706846161, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 17.666666666666664, 0.0, 7, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1.3922, 28.106799999999993, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Are you getting confused between the class and objects of that class. If thing is an object of type Descriptors you can call thing.MoWt() and it will return a result. If you call Descriptors.MoWt() I imagine you will get an error.
If you want to call each of the Descriptor's methods, as named in the _desclist, on a thing then using your list of names you may want something like operator.methodcaller
for name in names:
desc = operator.methodcaller(name)
print name, desc(thing)
I hope this is what you are asking, its not very clear.

count objects created in django application in past X days, for each day

I have following unsorted dict (dates are keys):
{"23-09-2014": 0, "11-10-2014": 0, "30-09-2014": 0, "26-09-2014": 0,
"03-10-2014": 0, "19-10-2014": 0, "15-10-2014": 0, "22-09-2014": 0,
"17-10-2014": 0, "29-09-2014": 0, "13-10-2014": 0, "16-10-2014": 0,
"12-10-2014": 0, "25-09-2014": 0, "14-10-2014": 0, "08-10-2014": 0,
"02-10-2014": 0, "09-10-2014": 0, "18-10-2014": 0, "24-09-2014": 0,
"28-09-2014": 0, "10-10-2014": 0, "21-10-2014": 0, "20-10-2014": 0,
"06-10-2014": 0, "04-10-2014": 0, "27-09-2014": 0, "05-10-2014": 0,
"01-10-2014": 0, "07-10-2014": 0}
I am trying to sort it from oldest to newest.
I've tried code:
mydict = OrderedDict(sorted(mydict .items(), key=lambda t: t[0], reverse=True))
to sort it, and it almost worked. It produced sorted dict, but it has ignored months:
{"01-10-2014": 0, "02-10-2014": 0, "03-10-2014": 0, "04-10-2014": 0,
"05-10-2014": 0, "06-10-2014": 0, "07-10-2014": 0, "08-10-2014": 0,
"09-10-2014": 0, "10-10-2014": 0, "11-10-2014": 0, "12-10-2014": 0,
"13-10-2014": 0, "14-10-2014": 0, "15-10-2014": 0, "16-10-2014": 0,
"17-10-2014": 0, "18-10-2014": 0, "19-10-2014": 0, "20-10-2014": 0,
"21-10-2014": 0, "22-09-2014": 0, "23-09-2014": 0, "24-09-2014": 0,
"25-09-2014": 0, "26-09-2014": 0, "27-09-2014": 0, "28-09-2014": 0,
"29-09-2014": 0, "30-09-2014": 0}
How can I fix this?
EDIT:
I need this to count objects created in django application in past X days, for each day.
event_chart = {}
date_list = [datetime.datetime.today() - datetime.timedelta(days=x) for x in range(0, 30)]
for date in date_list:
event_chart[formats.date_format(date, "SHORT_DATE_FORMAT")] = Event.objects.filter(project=project_name, created=date).count()
event_chart = OrderedDict(sorted(event_chart.items(), key=lambda t: t[0]))
return HttpResponse(json.dumps(event_chart))
You can use the datetime module to parse the strings into actual dates:
>>> from datetime import datetime
>>> sorted(mydict .items(), key=lambda t:datetime.strptime(t[0], '%d-%m-%Y'), reverse=True)
If you want to create a json response in the format: {"22-09-2014": 0, 23-09-2014": 0, "localized date": count_for_that_date} so that oldest dates will appear earlier in the output then you could make event_chart an OrderedDict:
event_chart = OrderedDict()
today = DT.date.today() # use DT.datetime.combine(date, DT.time()) if needed
for day in range(29, -1, -1): # last 30 days
date = today - DT.timedelta(days=day)
localized_date = formats.date_format(date, "SHORT_DATE_FORMAT")
day_count = Event.objects.filter(project=name, created=date).count()
event_chart[localized_date] = day_count
return HttpResponse(json.dumps(event_chart))

Categories

Resources