I have been trying to make a program that plots the frequency of usage of a word during Whatsapp chats between 2 people. The word night for example has been used a couple of times on a few days, and 0 times on the most of the days. The graph I have is as follows
Here is the code
word_occurances = [0 for i in range(len(just_dates))]
for i in range(len(just_dates)):
for j in range(len(df_word)):
if just_dates[i].date() == word_date[j].date():
word_occurances[i] += 1
title = person2.rstrip(':') + ' with ' + person1.rstrip(':') + ' usage of the word - ' + word
plt.plot(just_dates, word_occurances, color = 'purple')
plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title(title)
plt.savefig('Graphs/Words/' + title + '.jpg', dpi = 200)
plt.show()
word_occurances is a list
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 2, 0, 0, 0, 1, 0, 5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
What I want is for the graph to only connect the points where it has been used while showing the entire timeline on the x axis. I don't want the graph to touch 0. How can I do this? I have searched and found similar answers but none have worked the way I them.
You simply have to find the indices of word_occurances on which the corresponding value is greater than zero. With this you can index just_dates to get the corresponding dates.
word_counts = [] # Only word counts > 0
dates = [] # Date of > 0 word count
for i, val in enumerate(word_occurances):
if val > 0:
word_counts.append(val)
dates.append(just_dates[i])
You may want to plot with an underlying bar plot in order to maintain the original scale.
plt.bar(just_dates, word_occurances)
plt.plot(dates, word_counts, 'r--')
One way to address this is to plot only data that contain entries but label all dates where a conversation took place to indicate the zero values in your graph:
from matplotlib import pyplot as plt
import matplotlib.dates as mdates
from matplotlib.ticker import FixedLocator
#fake data generation, this block just imitates your unknown data and can be deleted
import numpy as np
import pandas as pd
np.random.seed(12345)
n = 30
just_dates = pd.to_datetime(np.random.randint(1, 100, n)+18500, unit="D").sort_values().to_list()
word_occurances = [0]*n
for i in range(10):
word_occurances[np.random.randint(n)] = np.random.randint(1, 10)
fig, ax = plt.subplots(figsize=(15,5))
#generate data to plot by filtering out zero values
plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]
#plot these data with marker to indicate each point
#think 1-1-1-1-1 would only be visible as two points with lines only
ax.plot(*zip(*plot_data), color = 'purple', marker="o")
#label all dates where conversations took place
ax.xaxis.set_major_locator(FixedLocator(mdates.date2num(just_dates)))
#prevent that matplotlib autoscales the y-axis
ax.set_ylim(0, )
ax.tick_params(axis="x", labelrotation= 90)
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()
Sample output:
This can get quite busy soon with all these date labels (and might or might not work with your datetime objects in just_dates that might differ in structure from my sample date). Another way would be to indicate each conversation with vlines:
...
fig, ax = plt.subplots(figsize=(15,5))
plot_data = [(just_dates[i], word_occurances[i]) for i, num in enumerate(word_occurances) if num > 0]
ax.plot(*zip(*plot_data), color = 'purple', marker="o")
ax.vlines((just_dates), 0, max(word_occurances), color="red", ls="--")
ax.set_ylim(0, )
plt.gcf().autofmt_xdate()
plt.xlabel('Time')
plt.ylabel('number of times used')
plt.title("Conversations at night")
plt.tight_layout()
plt.show()
Sample output:
I have a long list complexed of numpy arrays and integers, below is an example:
[array([[2218.67288865]]), array([[1736.90215229]]), array([[1255.13141592]]), array([[773.36067956]]), array([[291.58994319]]), 0, 0, 0, 0, 0, 0, 0, 0, 0]
and i'd like to convert it to a regular list as so:
[2218.67288865, 1736.90215229, 1255.13141592, 773.36067956, 291.58994319, 0, 0, 0, 0, 0, 0, 0, 0, 0]
How can I do that efficiently?
You can use a generator for flattening the nested list:
def convert(obj):
try:
for item in obj:
yield from convert(item)
except TypeError:
yield obj
result = list(convert(data))
list(itertools.from_iterable(itertools.from_iterable(...))) should work for removing 2 levels of nesting: just add or remove copies of itertools.from_iterable(...) as needed.
Here the simplest seems to also be the fastest:
x = [array([[2218.67288865]]), array([[1736.90215229]]), array([[1255.13141592]]), array([[773.36067956]]), array([[291.58994319]]), 0, 0, 0, 0, 0, 0, 0, 0, 0]
[y if y.__class__==int else y.item(0) for y in x]
# [2218.67288865, 1736.90215229, 1255.13141592, 773.36067956, 291.58994319, 0, 0, 0, 0, 0, 0, 0, 0, 0]
timeit(lambda:[y if y.__class__==int else y.item(0) for y in x])
# 2.198630048893392
You can stick to numpy by using np.ravel:
np.hstack([np.ravel(i) for i in l]).tolist()
Output:
[2218.67288865,
1736.90215229,
1255.13141592,
773.36067956,
291.58994319,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0,
0.0]
This is a general question but I am providing the example for my case. I have a class name "Descriptors" which I import it as following:
from rdkit.Chem import Descriptors
Descriptors has a number of Methods; for example:
Descriptors.MolWt()
Descriptors.HeavyAtomCount()
I can get a list of methods for Descriptors as following:
names=[ x[0] for x in Descriptors._descList]
names
['MaxEStateIndex',
'MinEStateIndex',
'MaxAbsEStateIndex',
'MinAbsEStateIndex',
'qed',
'MolWt',
'HeavyAtomMolWt',
'ExactMolWt',
'NumValenceElectrons',
'NumRadicalElectrons',
'MaxPartialCharge',
'MinPartialCharge',
'MaxAbsPartialCharge',
'MinAbsPartialCharge',
'FpDensityMorgan1',
'FpDensityMorgan2',
'FpDensityMorgan3',
'BalabanJ',
'BertzCT',
'Chi0',
'Chi0n',
'Chi0v',
'Chi1',
'Chi1n',
'Chi1v',
'Chi2n',
'Chi2v',
'Chi3n',
'Chi3v',
'Chi4n',
'Chi4v',
'HallKierAlpha',
'Ipc',
'Kappa1',
'Kappa2',
'Kappa3',
'LabuteASA',
'PEOE_VSA1',
'PEOE_VSA10',
'PEOE_VSA11',
'PEOE_VSA12',
'PEOE_VSA13',
'PEOE_VSA14',
'PEOE_VSA2',
'PEOE_VSA3',
'PEOE_VSA4',
'PEOE_VSA5',
'PEOE_VSA6',
'PEOE_VSA7',
'PEOE_VSA8',
'PEOE_VSA9',
'SMR_VSA1',
'SMR_VSA10',
'SMR_VSA2',
'SMR_VSA3',
'SMR_VSA4',
'SMR_VSA5',
'SMR_VSA6',
'SMR_VSA7',
'SMR_VSA8',
'SMR_VSA9',
'SlogP_VSA1',
'SlogP_VSA10',
'SlogP_VSA11',
'SlogP_VSA12',
'SlogP_VSA2',
'SlogP_VSA3',
'SlogP_VSA4',
'SlogP_VSA5',
'SlogP_VSA6',
'SlogP_VSA7',
'SlogP_VSA8',
'SlogP_VSA9',
'TPSA',
'EState_VSA1',
'EState_VSA10',
'EState_VSA11',
'EState_VSA2',
'EState_VSA3',
'EState_VSA4',
'EState_VSA5',
'EState_VSA6',
'EState_VSA7',
'EState_VSA8',
'EState_VSA9',
'VSA_EState1',
'VSA_EState10',
'VSA_EState2',
'VSA_EState3',
'VSA_EState4',
'VSA_EState5',
'VSA_EState6',
'VSA_EState7',
'VSA_EState8',
'VSA_EState9',
'FractionCSP3',
'HeavyAtomCount',
'NHOHCount',
'NOCount',
'NumAliphaticCarbocycles',
'NumAliphaticHeterocycles',
'NumAliphaticRings',
'NumAromaticCarbocycles',
'NumAromaticHeterocycles',
'NumAromaticRings',
'NumHAcceptors',
'NumHDonors',
'NumHeteroatoms',
'NumRotatableBonds',
'NumSaturatedCarbocycles',
'NumSaturatedHeterocycles',
'NumSaturatedRings',
'RingCount',
'MolLogP',
'MolMR',
'fr_Al_COO',
'fr_Al_OH',
'fr_Al_OH_noTert',
'fr_ArN',
'fr_Ar_COO',
'fr_Ar_N',
'fr_Ar_NH',
'fr_Ar_OH',
'fr_COO',
'fr_COO2',
'fr_C_O',
'fr_C_O_noCOO',
'fr_C_S',
'fr_HOCCN',
'fr_Imine',
'fr_NH0',
'fr_NH1',
'fr_NH2',
'fr_N_O',
'fr_Ndealkylation1',
'fr_Ndealkylation2',
'fr_Nhpyrrole',
'fr_SH',
'fr_aldehyde',
'fr_alkyl_carbamate',
'fr_alkyl_halide',
'fr_allylic_oxid',
'fr_amide',
'fr_amidine',
'fr_aniline',
'fr_aryl_methyl',
'fr_azide',
'fr_azo',
'fr_barbitur',
'fr_benzene',
'fr_benzodiazepine',
'fr_bicyclic',
'fr_diazo',
'fr_dihydropyridine',
'fr_epoxide',
'fr_ester',
'fr_ether',
'fr_furan',
'fr_guanido',
'fr_halogen',
'fr_hdrzine',
'fr_hdrzone',
'fr_imidazole',
'fr_imide',
'fr_isocyan',
'fr_isothiocyan',
'fr_ketone',
'fr_ketone_Topliss',
'fr_lactam',
'fr_lactone',
'fr_methoxy',
'fr_morpholine',
'fr_nitrile',
'fr_nitro',
'fr_nitro_arom',
'fr_nitro_arom_nonortho',
'fr_nitroso',
'fr_oxazole',
'fr_oxime',
'fr_para_hydroxylation',
'fr_phenol',
'fr_phenol_noOrthoHbond',
'fr_phos_acid',
'fr_phos_ester',
'fr_piperdine',
'fr_piperzine',
'fr_priamide',
'fr_prisulfonamd',
'fr_pyridine',
'fr_quatN',
'fr_sulfide',
'fr_sulfonamd',
'fr_sulfone',
'fr_term_acetylene',
'fr_tetrazole',
'fr_thiazole',
'fr_thiocyan',
'fr_thiophene',
'fr_unbrch_alkane',
'fr_urea']
Now, I want to define a function to return all the Descriptors methods as a list and I am trying the following:
def fingerprint_all():
names=[ x[0] for x in Descriptors._descList]
features=[Descriptors.name() for name in names]
return features
However, when i call the function, it returns error:
print (fingerprint_all())
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-16-a36092bb806c> in <module>()
23 return features
24
---> 25 print (fingerprint_all())
<ipython-input-16-a36092bb806c> in fingerprint_all()
20 def fingerprint_all():
21 names=[ x[0] for x in Descriptors._descList]
---> 22 features=[Descriptors.name() for name in names]
23 return features
24
<ipython-input-16-a36092bb806c> in <listcomp>(.0)
20 def fingerprint_all():
21 names=[ x[0] for x in Descriptors._descList]
---> 22 features=[Descriptors.name() for name in names]
23 return features
24
AttributeError: module 'rdkit.Chem.Descriptors' has no attribute 'name'
I am not familiar with OO and classes and I really appreciate your help!
What you are trying to do is not valid python syntax. Use getattr instead:
features = [getattr(Descriptors, name) for name in names]
If I see it right, you want to calculate all descriptors for a mol at once.
from rdkit import Chem
from rdkit.Chem import Descriptors
from rdkit.ML.Descriptors import MoleculeDescriptors
mol = Chem.MolFromSmiles('c1ccccc1O')
allDes = [d[0] for d in Descriptors._descList]
calc = MoleculeDescriptors.MolecularDescriptorCalculator(allDes)
c = calc.CalcDescriptors(mol)
print(c)
And you will get all calculated descriptors for the mol.
(8.632222222222222, 0.3217592592592595, 8.632222222222222, 0.3217592592592595, 0.514729544768675, 94.11299999999999, 88.06499999999998, 94.041864812, 36, 0, 0.11507481947527982, -0.5079669948663066, 0.5079669948663066, 0.11507481947527982, 1.0, 1.5714285714285714, 1.8571428571428572, 3.0214653097240864, 134.10736969541455, 5.112884175122364, 3.833964941448087, 3.833964941448087, 3.393846850117352, 2.1342904002729384, 2.1342904002729384, 1.3355491589367874, 1.3355491589367874, 0.756193600181959, 0.756193600181959, 0.42799410427012347, 0.42799410427012347, -0.98, 47.19725257297226, 4.18611295681063, 1.6461962159398054, 0.9290591797144502, 42.22563687169298, 5.106527394840706, 5.749511833283905, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 18.19910120538483, 12.13273413692322, 0.0, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 30.33183534230805, 0.0, 5.749511833283905, 0.0, 0.0, 5.749511833283905, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 30.33183534230805, 0.0, 0.0, 0.0, 20.23, 0.0, 0.0, 0.0, 0.0, 5.749511833283905, 0.0, 0.0, 24.26546827384644, 6.06636706846161, 0.0, 5.106527394840706, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 17.666666666666664, 0.0, 7, 1, 1, 0, 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 1.3922, 28.106799999999993, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0)
Are you getting confused between the class and objects of that class. If thing is an object of type Descriptors you can call thing.MoWt() and it will return a result. If you call Descriptors.MoWt() I imagine you will get an error.
If you want to call each of the Descriptor's methods, as named in the _desclist, on a thing then using your list of names you may want something like operator.methodcaller
for name in names:
desc = operator.methodcaller(name)
print name, desc(thing)
I hope this is what you are asking, its not very clear.
I have following unsorted dict (dates are keys):
{"23-09-2014": 0, "11-10-2014": 0, "30-09-2014": 0, "26-09-2014": 0,
"03-10-2014": 0, "19-10-2014": 0, "15-10-2014": 0, "22-09-2014": 0,
"17-10-2014": 0, "29-09-2014": 0, "13-10-2014": 0, "16-10-2014": 0,
"12-10-2014": 0, "25-09-2014": 0, "14-10-2014": 0, "08-10-2014": 0,
"02-10-2014": 0, "09-10-2014": 0, "18-10-2014": 0, "24-09-2014": 0,
"28-09-2014": 0, "10-10-2014": 0, "21-10-2014": 0, "20-10-2014": 0,
"06-10-2014": 0, "04-10-2014": 0, "27-09-2014": 0, "05-10-2014": 0,
"01-10-2014": 0, "07-10-2014": 0}
I am trying to sort it from oldest to newest.
I've tried code:
mydict = OrderedDict(sorted(mydict .items(), key=lambda t: t[0], reverse=True))
to sort it, and it almost worked. It produced sorted dict, but it has ignored months:
{"01-10-2014": 0, "02-10-2014": 0, "03-10-2014": 0, "04-10-2014": 0,
"05-10-2014": 0, "06-10-2014": 0, "07-10-2014": 0, "08-10-2014": 0,
"09-10-2014": 0, "10-10-2014": 0, "11-10-2014": 0, "12-10-2014": 0,
"13-10-2014": 0, "14-10-2014": 0, "15-10-2014": 0, "16-10-2014": 0,
"17-10-2014": 0, "18-10-2014": 0, "19-10-2014": 0, "20-10-2014": 0,
"21-10-2014": 0, "22-09-2014": 0, "23-09-2014": 0, "24-09-2014": 0,
"25-09-2014": 0, "26-09-2014": 0, "27-09-2014": 0, "28-09-2014": 0,
"29-09-2014": 0, "30-09-2014": 0}
How can I fix this?
EDIT:
I need this to count objects created in django application in past X days, for each day.
event_chart = {}
date_list = [datetime.datetime.today() - datetime.timedelta(days=x) for x in range(0, 30)]
for date in date_list:
event_chart[formats.date_format(date, "SHORT_DATE_FORMAT")] = Event.objects.filter(project=project_name, created=date).count()
event_chart = OrderedDict(sorted(event_chart.items(), key=lambda t: t[0]))
return HttpResponse(json.dumps(event_chart))
You can use the datetime module to parse the strings into actual dates:
>>> from datetime import datetime
>>> sorted(mydict .items(), key=lambda t:datetime.strptime(t[0], '%d-%m-%Y'), reverse=True)
If you want to create a json response in the format: {"22-09-2014": 0, 23-09-2014": 0, "localized date": count_for_that_date} so that oldest dates will appear earlier in the output then you could make event_chart an OrderedDict:
event_chart = OrderedDict()
today = DT.date.today() # use DT.datetime.combine(date, DT.time()) if needed
for day in range(29, -1, -1): # last 30 days
date = today - DT.timedelta(days=day)
localized_date = formats.date_format(date, "SHORT_DATE_FORMAT")
day_count = Event.objects.filter(project=name, created=date).count()
event_chart[localized_date] = day_count
return HttpResponse(json.dumps(event_chart))