how to call class methods inside list comprehension - python

This is a general question but I am providing the example for my case. I have a class name "Descriptors" which I import it as following:
from rdkit.Chem import Descriptors
Descriptors has a number of Methods; for example:
Descriptors.MolWt()
Descriptors.HeavyAtomCount()
I can get a list of methods for Descriptors as following:
names=[ x[0] for x in Descriptors._descList]
names
['MaxEStateIndex',
'MinEStateIndex',
'MaxAbsEStateIndex',
'MinAbsEStateIndex',
'qed',
'MolWt',
'HeavyAtomMolWt',
'ExactMolWt',
'NumValenceElectrons',
'NumRadicalElectrons',
'MaxPartialCharge',
'MinPartialCharge',
'MaxAbsPartialCharge',
'MinAbsPartialCharge',
'FpDensityMorgan1',
'FpDensityMorgan2',
'FpDensityMorgan3',
'BalabanJ',
'BertzCT',
'Chi0',
'Chi0n',
'Chi0v',
'Chi1',
'Chi1n',
'Chi1v',
'Chi2n',
'Chi2v',
'Chi3n',
'Chi3v',
'Chi4n',
'Chi4v',
'HallKierAlpha',
'Ipc',
'Kappa1',
'Kappa2',
'Kappa3',
'LabuteASA',
'PEOE_VSA1',
'PEOE_VSA10',
'PEOE_VSA11',
'PEOE_VSA12',
'PEOE_VSA13',
'PEOE_VSA14',
'PEOE_VSA2',
'PEOE_VSA3',
'PEOE_VSA4',
'PEOE_VSA5',
'PEOE_VSA6',
'PEOE_VSA7',
'PEOE_VSA8',
'PEOE_VSA9',
'SMR_VSA1',
'SMR_VSA10',
'SMR_VSA2',
'SMR_VSA3',
'SMR_VSA4',
'SMR_VSA5',
'SMR_VSA6',
'SMR_VSA7',
'SMR_VSA8',
'SMR_VSA9',
'SlogP_VSA1',
'SlogP_VSA10',
'SlogP_VSA11',
'SlogP_VSA12',
'SlogP_VSA2',
'SlogP_VSA3',
'SlogP_VSA4',
'SlogP_VSA5',
'SlogP_VSA6',
'SlogP_VSA7',
'SlogP_VSA8',
'SlogP_VSA9',
'TPSA',
'EState_VSA1',
'EState_VSA10',
'EState_VSA11',
'EState_VSA2',
'EState_VSA3',
'EState_VSA4',
'EState_VSA5',
'EState_VSA6',
'EState_VSA7',
'EState_VSA8',
'EState_VSA9',
'VSA_EState1',
'VSA_EState10',
'VSA_EState2',
'VSA_EState3',
'VSA_EState4',
'VSA_EState5',
'VSA_EState6',
'VSA_EState7',
'VSA_EState8',
'VSA_EState9',
'FractionCSP3',
'HeavyAtomCount',
'NHOHCount',
'NOCount',
'NumAliphaticCarbocycles',
'NumAliphaticHeterocycles',
'NumAliphaticRings',
'NumAromaticCarbocycles',
'NumAromaticHeterocycles',
'NumAromaticRings',
'NumHAcceptors',
'NumHDonors',
'NumHeteroatoms',
'NumRotatableBonds',
'NumSaturatedCarbocycles',
'NumSaturatedHeterocycles',
'NumSaturatedRings',
'RingCount',
'MolLogP',
'MolMR',
'fr_Al_COO',
'fr_Al_OH',
'fr_Al_OH_noTert',
'fr_ArN',
'fr_Ar_COO',
'fr_Ar_N',
'fr_Ar_NH',
'fr_Ar_OH',
'fr_COO',
'fr_COO2',
'fr_C_O',
'fr_C_O_noCOO',
'fr_C_S',
'fr_HOCCN',
'fr_Imine',
'fr_NH0',
'fr_NH1',
'fr_NH2',
'fr_N_O',
'fr_Ndealkylation1',
'fr_Ndealkylation2',
'fr_Nhpyrrole',
'fr_SH',
'fr_aldehyde',
'fr_alkyl_carbamate',
'fr_alkyl_halide',
'fr_allylic_oxid',
'fr_amide',
'fr_amidine',
'fr_aniline',
'fr_aryl_methyl',
'fr_azide',
'fr_azo',
'fr_barbitur',
'fr_benzene',
'fr_benzodiazepine',
'fr_bicyclic',
'fr_diazo',
'fr_dihydropyridine',
'fr_epoxide',
'fr_ester',
'fr_ether',
'fr_furan',
'fr_guanido',
'fr_halogen',
'fr_hdrzine',
'fr_hdrzone',
'fr_imidazole',
'fr_imide',
'fr_isocyan',
'fr_isothiocyan',
'fr_ketone',
'fr_ketone_Topliss',
'fr_lactam',
'fr_lactone',
'fr_methoxy',
'fr_morpholine',
'fr_nitrile',
'fr_nitro',
'fr_nitro_arom',
'fr_nitro_arom_nonortho',
'fr_nitroso',
'fr_oxazole',
'fr_oxime',
'fr_para_hydroxylation',
'fr_phenol',
'fr_phenol_noOrthoHbond',
'fr_phos_acid',
'fr_phos_ester',
'fr_piperdine',
'fr_piperzine',
'fr_priamide',
'fr_prisulfonamd',
'fr_pyridine',
'fr_quatN',
'fr_sulfide',
'fr_sulfonamd',
'fr_sulfone',
'fr_term_acetylene',
'fr_tetrazole',
'fr_thiazole',
'fr_thiocyan',
'fr_thiophene',
'fr_unbrch_alkane',
'fr_urea']
Now, I want to define a function to return all the Descriptors methods as a list and I am trying the following:
def fingerprint_all():
names=[ x[0] for x in Descriptors._descList]
features=[Descriptors.name() for name in names]
return features
However, when i call the function, it returns error:
print (fingerprint_all())
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-16-a36092bb806c> in <module>()
23 return features
24
---> 25 print (fingerprint_all())
<ipython-input-16-a36092bb806c> in fingerprint_all()
20 def fingerprint_all():
21 names=[ x[0] for x in Descriptors._descList]
---> 22 features=[Descriptors.name() for name in names]
23 return features
24
<ipython-input-16-a36092bb806c> in <listcomp>(.0)
20 def fingerprint_all():
21 names=[ x[0] for x in Descriptors._descList]
---> 22 features=[Descriptors.name() for name in names]
23 return features
24
AttributeError: module 'rdkit.Chem.Descriptors' has no attribute 'name'
I am not familiar with OO and classes and I really appreciate your help!

What you are trying to do is not valid python syntax. Use getattr instead:
features = [getattr(Descriptors, name) for name in names]

If I see it right, you want to calculate all descriptors for a mol at once.
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.ML.Descriptors import MoleculeDescriptors
mol = Chem.MolFromSmiles('c1ccccc1O')
allDes = [d[0] for d in Descriptors._descList]
calc = MoleculeDescriptors.MolecularDescriptorCalculator(allDes)
c = calc.CalcDescriptors(mol)
print(c)
And you will get all calculated descriptors for the mol.
(8.632222222222222, 0.3217592592592595, 8.632222222222222, 0.3217592592592595, 0.514729544768675, 94.11299999999999, 88.06499999999998, 94.041864812, 36, 0, 0.11507481947527982, -0.5079669948663066, 0.5079669948663066, 0.11507481947527982, 1.0, 1.5714285714285714, 1.8571428571428572, 3.0214653097240864, 134.10736969541455, 5.112884175122364, 3.833964941448087, 3.833964941448087, 3.393846850117352, 2.1342904002729384, 2.1342904002729384, 1.3355491589367874, 1.3355491589367874, 0.756193600181959, 0.756193600181959, 0.42799410427012347, 0.42799410427012347, -0.98, 47.19725257297226, 4.18611295681063, 1.6461962159398054, 0.9290591797144502, 42.22563687169298, 5.106527394840706, 5.749511833283905, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 18.19910120538483, 12.13273413692322, 0.0, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.33183534230805, 0.0, 5.749511833283905, 0.0, 0.0, 5.749511833283905, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 30.33183534230805, 0.0, 0.0, 0.0, 20.23, 0.0, 0.0, 0.0, 0.0, 5.749511833283905, 0.0, 0.0, 24.26546827384644, 6.06636706846161, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 17.666666666666664, 0.0, 7, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1.3922, 28.106799999999993, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)

Are you getting confused between the class and objects of that class. If thing is an object of type Descriptors you can call thing.MoWt() and it will return a result. If you call Descriptors.MoWt() I imagine you will get an error.
If you want to call each of the Descriptor's methods, as named in the _desclist, on a thing then using your list of names you may want something like operator.methodcaller
for name in names:
desc = operator.methodcaller(name)
print name, desc(thing)
I hope this is what you are asking, its not very clear.

Related

How to construct a minimum bound box tuple for each geometry in a GeoDataFrame

I have a geopandas GeoDataFrame of lakes. I am trying to create a new column named 'MBB' with the bounding box for each lake.
I am using the bounds function from GeoPandas. However, this function exports minx, miny, maxx, and maxy in four separate columns.
# Preview the Use of the .bounds method to ensure it is exporting properly
lakes_a['geometry'].bounds
minx
miny
maxx
maxy
-69.37
44.19
-69.36
44.20
-69.33
44.19
-69.33
44.19
My desired output would look like the below and be able to be reinserted into the GeoPandasDataFrame
MBB
(-69.37, 44.19, -69.36, 44.20)
(-69.33, 44.19, -69.33, 44.19)
My gut tells me that I need to use either shapely.Geometry.Polygon or shapely.Geometry.box
The Polygon data used to create these is as follows.
Note: This is my first time working with GeoPandas (and new to Python as well); please forgive me if I made any mistakes :)
POLYGON Z ((-69.37232840276027 44.202966598054786 0, -69.37216940276056 44.202966598054786 0, -69.37181966942774 44.20276073138842 0, -69.37156540276146 44.20154879805699 0, -69.37092960276249 44.20138873139058 0, -69.370580002763 44.20111433139101 0, -69.37051640276309 44.20049693139197 0, -69.37042106942994 44.20042833139206 0, -69.37038926942995 44.20015393139249 0, -69.37013506943038 44.19976513139312 0, -69.36969020276439 44.19939919806035 0, -69.36838700276638 44.19903333139422 0, -69.36800546943368 44.198827531394556 0, -69.36787826943385 44.19864459806149 0, -69.3678466694339 44.19784419806274 0, -69.36797380276704 44.1973183313969 0, -69.36876860276584 44.19663233139795 0, -69.36759246943433 44.19658639806471 0, -69.3667658694356 44.1971809980638 0, -69.36641646943616 44.19722673139705 0, -69.36597146943683 44.19695219806414 0, -69.36549480277091 44.196403398065 0, -69.36470006943881 44.19583173139921 0, -69.36425520277282 44.19562593139955 0, -69.3618714694432 44.19500819806717 0, -69.36158546944364 44.19471099806759 0, -69.36152220277705 44.193887798068886 0, -69.36066406944508 44.19363613140263 0, -69.3604098027788 44.19345319806956 0, -69.3604098027788 44.193270198069854 0, -69.36066420277837 44.192995798070285 0, -69.36069540277833 44.19279379807057 0, -69.36069600277835 44.19278999807062 0, -69.36082306944479 44.19276719807061 0, -69.36098206944456 44.19237839807124 0, -69.3623808694424 44.19091499807348 0, -69.36288200277494 44.19074539807377 0, -69.36292126944159 44.19073213140712 0, -69.36342966944079 44.19084653140692 0, -69.36371580277364 44.191029531406684 0, -69.3639380027733 44.19198999807185 0, -69.36419220277293 44.19217279807157 0, -69.36451000277242 44.192195731404865 0, -69.36520940277131 44.191784131405484 0, -69.36587680277029 44.19157833140582 0, -69.3665442694359 44.19157853140581 0, -69.36733886943472 44.191761398072174 0, -69.36772020276743 44.19199013140519 0, -69.36791080276714 44.192516131404375 0, -69.368006002767 44.19256193140427 0, -69.36803786943364 44.19281339807054 0, -69.36845100276634 44.192767598070645 0, -69.36861000276605 44.19210453140499 0, -69.3694046027648 44.19155559807251 0, -69.36997680276392 44.1913039980729 0, -69.37058060276303 44.19118973140644 0, -69.37340926942528 44.19130413140624 0, -69.37448980275695 44.191601331405764 0, -69.37506200275607 44.19155559807251 0, -69.37541146942215 44.191326931406195 0, -69.37579286942156 44.19137273140615 0, -69.3759200027547 44.19146413140601 0, -69.37588826942141 44.19208153140505 0, -69.37534800275563 44.19322493140328 0, -69.37525260275572 44.19397959806872 0, -69.37541166942219 44.19436839806815 0, -69.37582466942155 44.19489433140069 0, -69.37633326942074 44.19521439806681 0, -69.37671466942015 44.19532873139997 0, -69.37798606941817 44.19532859806668 0, -69.37817680275123 44.19542013139983 0, -69.37801800275145 44.19578599806596 0, -69.37757286941883 44.19601473139892 0, -69.3765240027538 44.19601473139892 0, -69.37601546942125 44.19628913139849 0, -69.37557046942192 44.196723598064466 0, -69.37531620275564 44.1972039313971 0, -69.37528446942235 44.198598798061596 0, -69.37544340275548 44.19921619806064 0, -69.37582486942154 44.199970931392784 0, -69.37588846942145 44.20049679805862 0, -69.37607920275445 44.2009541980579 0, -69.37607926942115 44.20184593138987 0, -69.37582486942154 44.20223473138924 0, -69.37493486942293 44.2030807980546 0, -69.3744898694236 44.20337813138747 0, -69.37394946942442 44.20351539805392 0, -69.37340920275864 44.20351539805392 0, -69.37293226942603 44.2031037980546 0, -69.37232840276027 44.202966598054786 0))
POLYGON Z ((-69.33154920282357 44.19536753139994 0, -69.33170806948999 44.195504798066395 0, -69.3318348694898 44.19584779806587 0, -69.33212086948936 44.196076598065474 0, -69.33224780282251 44.196396798064995 0, -69.3329150028215 44.19676293139776 0, -69.33291466948816 44.19706019806398 0, -69.33278746948832 44.19726599806364 0, -69.33211986948936 44.19733433139686 0, -69.33103926949104 44.19719673139707 0, -69.3307216028249 44.19701373139736 0, -69.33069020282494 44.19653339806479 0, -69.33046780282524 44.19630473139847 0, -69.33046800282528 44.1960073980656 0, -69.33094520282452 44.195458798066454 0, -69.33154920282357 44.19536753139994 0))
You could use pandas.DataFrame.to_records:
pd.Series(
lakes_a['geometry'].bounds.to_records(index=False),
index=lakes_a.index,
)

Covert complexed list to flat list

I have a long list complexed of numpy arrays and integers, below is an example:
[array([[2218.67288865]]), array([[1736.90215229]]), array([[1255.13141592]]), array([[773.36067956]]), array([[291.58994319]]), 0, 0, 0, 0, 0, 0, 0, 0, 0]
and i'd like to convert it to a regular list as so:
[2218.67288865, 1736.90215229, 1255.13141592, 773.36067956, 291.58994319, 0, 0, 0, 0, 0, 0, 0, 0, 0]
How can I do that efficiently?
You can use a generator for flattening the nested list:
def convert(obj):
try:
for item in obj:
yield from convert(item)
except TypeError:
yield obj
result = list(convert(data))
list(itertools.from_iterable(itertools.from_iterable(...))) should work for removing 2 levels of nesting: just add or remove copies of itertools.from_iterable(...) as needed.
Here the simplest seems to also be the fastest:
x = [array([[2218.67288865]]), array([[1736.90215229]]), array([[1255.13141592]]), array([[773.36067956]]), array([[291.58994319]]), 0, 0, 0, 0, 0, 0, 0, 0, 0]
[y if y.__class__==int else y.item(0) for y in x]
# [2218.67288865, 1736.90215229, 1255.13141592, 773.36067956, 291.58994319, 0, 0, 0, 0, 0, 0, 0, 0, 0]
timeit(lambda:[y if y.__class__==int else y.item(0) for y in x])
# 2.198630048893392
You can stick to numpy by using np.ravel:
np.hstack([np.ravel(i) for i in l]).tolist()
Output:
[2218.67288865,
1736.90215229,
1255.13141592,
773.36067956,
291.58994319,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0]

Discrete data plots in matplotlib

I have two data arrays and I am looking to plot them in a single plot using matplotlib
The data arrays are:
date_array=['2018-03-26', '2018-03-27', '2018-03-28', '2018-03-29', '2018-04-02', '2018-04-03', '2018-04-04', '2018-04-05', '2018-04-06', '2018-04-09', '2018-04-10', '2018-04-11', '2018-04-12', '2018-04-13', '2018-04-16', '2018-04-17', '2018-04-18', '2018-04-19', '2018-04-20', '2018-04-23', '2018-04-24', '2018-04-25', '2018-04-26', '2018-04-27', '2018-04-30', '2018-05-01', '2018-05-02', '2018-05-03', '2018-05-04', '2018-05-07', '2018-05-08', '2018-05-09', '2018-05-10', '2018-05-11', '2018-05-14', '2018-05-15', '2018-05-16', '2018-05-17', '2018-05-18', '2018-05-21', '2018-05-22', '2018-05-23', '2018-05-24', '2018-05-25', '2018-05-29', '2018-05-30', '2018-05-31', '2018-06-01', '2018-06-04', '2018-06-05', '2018-06-06', '2018-06-07', '2018-06-08', '2018-06-11', '2018-06-12', '2018-06-13', '2018-06-14', '2018-06-15', '2018-06-18', '2018-06-19', '2018-06-20', '2018-06-21', '2018-06-22', '2018-06-25', '2018-06-26', '2018-06-27', '2018-06-28', '2018-06-29', '2018-07-02', '2018-07-03', '2018-07-05', '2018-07-06', '2018-07-09', '2018-07-10', '2018-07-11', '2018-07-12', '2018-07-13', '2018-07-16', '2018-07-17', '2018-07-18', '2018-07-19', '2018-07-20', '2018-07-23', '2018-07-24', '2018-07-25', '2018-07-26', '2018-07-27', '2018-07-30', '2018-07-31', '2018-08-01', '2018-08-02', '2018-08-03', '2018-08-06', '2018-08-07', '2018-08-08', '2018-08-09', '2018-08-10', '2018-08-13', '2018-08-14', '2018-08-15']
value_1 = [45.27, 44.53, 44.68, 45.29, 44.43, 44.88, 45.85, 45.7, 44.76, 44.22, 44.81, 44.54, 44.13, 44.0, 43.41, 43.68, 43.29, 42.33, 42.18, 41.8, 41.78, 42.46, 43.67, 43.92, 44.75, 44.33, 44.41, 45.7, 43.8, 44.16, 44.9, 45.07, 46.24, 48.3, 49.21, 49.84, 50.34, 50.4, 49.98, 50.7, 49.15, 48.5, 48.53, 47.65, 48.52, 47.36, 46.13, 46.01, 47.27, 48.04, 49.48, 49.96, 50.48, 51.3, 52.29, 51.86, 50.2, 49.42, 50.0, 52.42, 52.32, 52.62, 52.13, 51.13, 50.24, 48.66, 48.99, 48.05, 48.33, 49.22, 50.62, 51.39, 51.87, 47.37, 49.53, 49.54, 51.82, 51.65, 52.98, 52.09, 54.24, 53.98, 52.72, 51.09, 49.99, 48.55, 47.98, 48.67, 48.87, 48.45, 48.65, 50.06, 52.64, 54.6, 56.61, 55.77, 55.59, 56.5, 56.31, 54.0]
value_2 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 95.39398869716304, 95.39398869716304, 0, 0, 95.39398869716304, 95.39398869716304, 0, 0, 0, 0, 0, 0, 0, 95.39398869716304]
The thing is that I have data points available for value_1 for all dates in date_array but not for value_2 so wherever I don't have the value available I have filled in a zero (That is one of my question as you'll see later).
When I plot it using this code:
x = date_array
y1 = value_1
y2 = value_2
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.scatter(x, y1, s=10, c='b', marker="s", label='fig 1')
ax1.scatter(x,y2, s=10, c='r', marker="o", label='fig 2')
plt.legend(loc='upper left');
plt.show()
I get this:
My questions:
How do I work my around the fact that I don't have all values available for value_2 and still get the plot? I don't want the red dots to show that have value 0 in the plot but am not sure how I'll get around to do that. Note An entry in value_2 can't have 0 value so if it is 0 that means its not present.
How to fix the messed up data labels on x-axis? If there are only 10-12 markers on the x-axis that would look neater.
Thanks!
You can convert the zeros to NaN and they wont be plotted:
value_2 = [np.nan if x==0 else x for x in value_2]
For the second questions, I would transform to datetime object and the distance is adjusted automatically(and after rotate them):
from datetime import datetime
date_array = [datetime.strptime(i, '%Y-%m-%d').date() for i in date_array]
plt.xticks(rotation=70)
Complete code:
import matplotlib.pyplot as plt
from datetime import datetime
date_array = [datetime.strptime(i, '%Y-%m-%d').date() for i in date_array]
value_2 = [np.nan if x==0 else x for x in value_2]
x = date_array
y1 = value_1
y2 = value_2
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax1.plot_date(x, y1, c='b', label='fig 1')
ax1.plot_date(x, y2, c='r', label='fig 2')
plt.legend(loc='upper left')
plt.xticks(rotation=70)
plt.show()

Passing array arguments to my own 2D function applied on Pandas groupby

I am given the following pandas dataframe
df
long lat weekday hour
dttm
2015-07-03 00:00:38 1.114318 0.709553 6 0
2015-08-04 00:19:18 0.797157 0.086720 3 0
2015-08-04 00:19:46 0.797157 0.086720 3 0
2015-08-04 13:24:02 0.786688 0.059632 3 13
2015-08-04 13:24:34 0.786688 0.059632 3 13
2015-08-04 18:46:36 0.859795 0.330385 3 18
2015-08-04 18:47:02 0.859795 0.330385 3 18
2015-08-04 19:46:41 0.755008 0.041488 3 19
2015-08-04 19:47:45 0.755008 0.041488 3 19
I also have a function that receives as input 2 arrays:
import pandas as pd
import numpy as np
def time_hist(weekday, hour):
hist_2d=np.histogram2d(weekday,hour, bins = [xrange(0,8), xrange(0,25)])
return hist_2d[0].astype(int)
I wish to apply my 2D function to each and every group of the following groupby:
df.groupby(['long', 'lat'])
I tried passing *args to .apply():
df.groupby(['long', 'lat']).apply(time_hist, [df.weekday, df.hour])
but I get an error: "The dimension of bins must be equal to the dimension of the sample x."
Of course the dimensions mismatch. The whole idea is that I don't know in advance which mini [weekday, hour] arrays to send to each and every group.
How do I do that?
Do:
import pandas as pd
import numpy as np
df = pd.read_csv('file.csv', index_col=0)
def time_hist(x):
hour = x.hour
weekday = x.weekday
hist_2d = np.histogram2d(weekday, hour, bins=[xrange(0, 8), xrange(0, 25)])
return hist_2d[0].astype(int)
print(df.groupby(['long', 'lat']).apply(time_hist))
Output:
long lat
0.755008 0.041488 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.786688 0.059632 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.797157 0.086720 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
0.859795 0.330385 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
1.114318 0.709553 [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,...
dtype: object

All instances of maximum

I have a function that gets a set of data from a server, sorts and displays it
def load_data(dateStr):
data = get_date(dateStr).splitlines()
result = []
for c in data:
a = c.split(',')
time = a[0]
temp = float(a[1])
solar = float(a[2])
kwH = a[3:]
i = 0
while i < len(power):
power[i] = int(power[i])
i = i+1
result.append((time, temp, solar, tuple(kwH)))
return result
This is what the function returns when you enter in a particular date(only 3 entries out of a long list), the first number in each entry is the time, second is the temperature.
>>> load_data('20-01-2014')
[('05:00', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 18, 34)), ('05:01', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 20, 26)), ('05:02', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 17, 35))
I need write a function to find the maximum temperature of a date, and show all of the times in the day that the maximum occurred. Something like this:
>>> data = load_data('07-10-2011')
>>> max_temp(data)
(18.9, ['13:08', '13:09', '13:10'])
How would I go about this? Or can you point me to anywhere that might have answers
This is one way to do it (this loops over the data twice):
>>> data = [('05:00', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 18, 34)), ('05:01', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 20, 26)), ('05:02', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 17, 35))]
>>> max_temp = max(data, key=lambda x: x[1])[1]
>>> max_temp
19.9
>>> result = [item for item in data if item[1] == max_temp]
>>> result
[('05:00', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 18, 34)), ('05:01', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 20, 26)), ('05:02', 19.9, 0.0, (0, 0, 0, 0, 0, 0, 0, 17, 35))]
The most optimal way to get all matching times for the maximum temperature is to simply loop over the values and track the maximum found so far:
def max_temp(data):
maximum = float('-inf')
times = []
for entry in data:
time, temp = entry[:2]
if temp == maximum:
times.append(time)
elif temp > maximum:
maximum = temp
times = [time]
return maximum, times
This loops over the data just once.
The convenient way (which is probably going to be close in performance anyway) is to use the max() function to find the maximum temperature first, then a list comprehension to return all times with that temperature:
def max_temp(data):
maximum = max(data, key=lambda e: e[1])[1]
return maximum, [e[0] for e in data if e[1] == maximum]
This loops twice over the data, but the max() loop is implemented mostly in C code.
def max_temp(data):
maxt = max([d[1] for d in data])
return (maxt, [d[0] for d in data if d[1] == maxt])

Categories

Resources