Related
I have a python list ['a1', 'b1', 'a2', 'b2','a3', 'b3']. Set m=3 and I want get this list using loops, because here m=3 could be a larger number such as m=100.
Since we can have
m = 3
['a' + str(i) for i in np.arange(1,m+1)]
# ['a1', 'a2', 'a3']
['b' + str(i) for i in np.arange(1,m+1)]
# ['b1', 'b2', 'b3']
then I try to get ['a1', 'b1', 'a2', 'b2','a3', 'b3'] using
[ ['a','b'] + str(i) for i in np.arange(1,m+1)]
and have TypeError: can only concatenate list (not "str") to list
Then I try
[ np.array(['a','b']) + str(i) for i in np.arange(1,m+1)]
and I still get errors as UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U1'), dtype('<U1')) -> None.
How can I fix the problem? And even more, how to get something like ['a1', 'b1', 'c1', 'a2', 'b2','c2','a3', 'b3', 'c3'] through similar ways?
A simple combined list comprehension would work as pointed out in the #j1-lee's answer (and later in other answers).
import string
def letter_number_loop(n, m):
letters = string.ascii_letters[:n]
numbers = range(1, m + 1)
return [f"{letter}{number}" for number in numbers for letter in letters]
Similarly, one could use itertools.product(), as evidenced in Nick's answer, to obtain substantially the same:
import itertools
def letter_number_it(n, m):
letters = string.ascii_letters[:n]
numbers = range(1, m + 1)
return [
f"{letter}{number}"
for number, letter in itertools.product(numbers, letters)]
However, it is possible to write a NumPy-vectorized approach, making use of the fact that if the dtype is object, the operations do follow the Python semantics.
import numpy as np
def letter_number_np(n, m):
letters = np.array(list(string.ascii_letters[:n]), dtype=object)
numbers = np.array([f"{i}" for i in range(1, m + 1)], dtype=object)
return (letters[None, :] + numbers[:, None]).ravel().tolist()
Note that the final numpy.ndarray.tolist() could be avoided if whatever will consume the output is capable of dealing with the NumPy array itself, thus saving some relatively small but definitely appreciable time.
Inspecting Output
The following do indicate that the functions are equivalent:
funcs = letter_number_loop, letter_number_it, letter_number_np
n, m = 2, 3
for func in funcs:
print(f"{func.__name__!s:>32} {func(n, m)}")
letter_number_loop ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
letter_number_it ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
letter_number_np ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
Benchmarks
For larger inputs, this is substantially faster, as evidenced by these benchmarks:
timings = {}
k = 16
for n in (2, 20):
for k in range(1, 10):
m = 2 ** k
print(f"n = {n}, m = {m}")
timings[n, m] = []
base = funcs[0](n, m)
for func in funcs:
res = func(n, m)
is_good = base == res
timed = %timeit -r 64 -n 64 -q -o func(n, m)
timing = timed.best * 1e6
timings[n, m].append(timing if is_good else None)
print(f"{func.__name__:>24} {is_good} {timing:10.3f} µs")
to be plotted with:
import matplotlib.pyplot as plt
import pandas as pd
n_s = (2, 20)
fig, axs = plt.subplots(1, len(n_s), figsize=(12, 4))
for i, n in enumerate(n_s):
partial_timings = {k[1]: v for k, v in timings.items() if k[0] == n}
df = pd.DataFrame(data=partial_timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ax=axs[i], title=f"n = {n}")
These show that the explicitly looped versions (letter_number_loop() and letter_number_it()) are somewhat comparable, while the NumPy-vectorized (letter_number_np()) fares much better relatively quickly for larger inputs, up to ~2x speed-up.
You need to iterate on both the range of numbers and the list of strings
In [106]: [s+str(i) for i in range(1,4) for s in ['a','b']]
Out[106]: ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
You can have more than one for in a list comprehension:
prefixes = ['a', 'b', 'c']
m = 3
output = [f"{prefix}{num}" for num in range(1, m+1) for prefix in prefixes]
print(output) # ['a1', 'b1', 'c1', 'a2', 'b2', 'c2', 'a3', 'b3', 'c3']
If you have multiple fors, those will be nested, as in
for num in range(1, m+1):
for prefix in prefixes:
...
You could use itertools.product to get all combinations of the the letters and the m range and then join them in an f-string (rather than using join as one element is an integer so would require converting to a string):
[f'{x}{y}' for x, y in itertools.product(range(1, m+1), ['a', 'b'])]
Output:
['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
nNumbers = [1,2,3]
baseVariables = ['a','b','c','d','e']
arr = np.empty(0)
for i in nNumbers:
x = np.empty(0)
for v in baseVariables:
x = np.append(x, y['result'][i][v])
print(x)
arr = np.concatenate((arr, x))
I have one Json input stored in y. need to filter some variables out of that json format. the above code works in that it gives me the output in an array, but it is only in a one dimensional array. I want the output in a two dimensional array like:
[['q','qr','qe','qw','etc']['','','','','']['','','','','']]
I have tried various different ways but am not able to figure it out. Any feedback on how to get it to the desired output format would be greatly appreciated.
A correct basic Python way of making a nested list of strings:
In [57]: nNumbers = [1,2,3]
...: baseVariables = ['a','b','c','d','e']
In [58]: alist = []
...: for i in nNumbers:
...: blist = []
...: for v in baseVariables:
...: blist.append(v+str(i))
...: alist.append(blist)
...:
In [59]: alist
Out[59]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
That can be turned into an array if necessary - though numpy doesn't provide much added utility for strings:
In [60]: np.array(alist)
Out[60]:
array([['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']], dtype='<U2')
Or in a compact list comprehension form:
In [61]: [[v+str(i) for v in baseVariables] for i in nNumbers]
Out[61]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
You are starting with lists! And making strings! And selecting items from a JSON, with y['result'][i][v]. None of that benefits from using numpy, especially not the repeated use of np.append and np.concatenate.
Could you provide an example of JSON? It sounds like you basically want to
Filter the JSON
Flatten the JSON
Depending on what your output example means, you might want to not filter, but replace certain values with empty values, is that correct?
Please note that Pandas has very powerfull out-of-the-box options to handle, and in particular, flatten JSONs. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-json-reader. An approach could be to first load in Pandas and filter it from there. Flattening a JSON can also be done by iterating over it like so:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
I got this code from: https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10. The author explains some challenges of flattening JSON. Of course, you can put some if statement into the function for your filtering need. I hope this can get you started at least!
I have a Dataframe with 1 column (+the index) containing lists of sublists or elements.
I would like to detect common elements in the lists/sublists and group the lists with at least 1 common element in order to have only lists of elements without any common elements.
The lists/sublists are currently like this (exemple for 4 rows):
Num_ID
Row1 [['A1','A2','A3'],['A1','B1','B2','C3','D1']]`
Row2 ['A1','E2','E3']
Row3 [['B4','B5','G4'],['B6','B4']]
Row4 ['B4','C9']
n lists with no common elements (example for the first 2):
['A1','A2','A3','B1','B2','C3','D1','E2','E3']
['B4','B5','B6','C9','G4']
You can use NetworkX's connected_components method for this. Here's how I'd approach this adapting this solution:
import networkx as nx
from itertools import combinations, chain
df= pd.DataFrame({'Num_ID':[[['A1','A2','A3'],['A1','B1','B2','C3','D1']],
['A1','E2','E3'],
[['B4','B5','G4'],['B6','B4']],
['B4','C9']]})
Start by flattening the sublists in each list:
L = [[*chain.from_iterable(i)] if isinstance(i[0], list) else i
for i in df.Num_ID.values.tolist()]
[['A1', 'A2', 'A3', 'A1', 'B1', 'B2', 'C3', 'D1'],
['A1', 'E2', 'E3'],
['B4', 'B5', 'G4', 'B6', 'B4'],
['B4', 'C9']]
Given that the lists/sublists have more than 2 elements, you can get all the length 2 combinations from each sublist and use these as the network edges (note that edges can only connect two nodes):
L2_nested = [list(combinations(l,2)) for l in L]
L2 = list(chain.from_iterable(L2_nested))
Generate a graph, and add your list as the graph edges using add_edges_from. Then use connected_components, which will precisely give you a list of sets of the connected components in the graph:
G=nx.Graph()
G.add_edges_from(L2)
list(nx.connected_components(G))
[{'A1', 'A2', 'A3', 'B1', 'B2', 'C3', 'D1', 'E2', 'E3'},
{'B4', 'B5', 'B6', 'C9', 'G4'}]
So I have a dataframe which I have plotted and looks great using the code below. The issue is that there are ~10 colors being used and repeated for multiple enteries in the dataframe so that it is impossible to distinguish one dataframe entry from 20 others.
I figure there must be some way to state 'use X color scheme and allow for 60 colors' but I dont know how when using a dataframe.
dataframe = pd.DataFrame(new_vals)
dataframe.plot(kind='bar',stacked=True,legend=False, ylim=(0,100),title='Taxonomic analysis of samples')
N = 29
ind = np.arange(N)
width = 0.35
plt.xlabel('Sample')
plt.ylabel('Percentage of assembly atrribted to each taxonomic group (%)')
plt.legend(loc="best", bbox_to_anchor=(1.0, 1.00))
plt.xticks(ind+width/2. - 0.2,('B11', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'C10', 'C11', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'D10', 'D11', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'D9'))
Here is an example of my dataframe;
Other actinomyces alcanivorax alkaliphilus bacillus bacteroides candidatus phytoplasma cyanothece enterobacter escherichia ... neisseria paenibacillus porphyromonas prevotella pseudoalteromonas rothia staphylococcus streptococcus streptomyces veillonella
0 26.229808 5.198240 4.694513 0.047974 3.691476 0.792203 2.782495 2.018697 2.180294 0.453228 ... 1.677198 4.944483 0.458910 5.496815 2.910004 0.372430 0.599676 8.276785 1.992817 0.595257
1 24.395006 11.615767 1.995668 0.069200 5.750399 0.921047 1.248692 0.740260 0.967860 1.904479 ... 0.873587 1.316648 0.261579 4.954371 1.348089 2.405995 1.061885 18.200302 1.660959 5.382657
2 22.078940 5.772762 3.107776 0.070983 5.523827 1.428608 1.846615 1.218850 1.542251 1.656823 ... 0.986514 2.414715 0.617899 6.893698 2.014352 0.496304 1.056452 22.272679 1.470803 2.696270
3 33.438669 5.210649 0.043170 0.136277 7.108181 2.148167 0.071589 0.034340 0.073281 2.719497 ... 1.922939 0.111153 3.898990 6.426144 0.045960 4.365727 1.545480 17.170125 0.870670 3.480261
4 20.831026 3.001972 4.746576 0.034374 2.198009 0.677926 3.264413 2.524014 2.720162 0.563074 ... 1.167616 9.110402 0.358742 3.323339 3.180934 0.420669 0.408120 8.948355 1.856454 1.086865
IIUC you can generate a colormap of kcolors with a function like the following:
def colormapgenerator(N, cm=None):
base = plt.cm.get_cmap(cm)
color_list = base(np.linspace(0, 1, N))
cm_name = base.name + str(N)
return base.from_list(cm_name, color_list, N)
where cm is your desidered cmap (e.g. Blues, Reds etc), and N is the number of colors you need. Than try to add to your dataframe.plot() the following:
cmap=colormapgenerator(60, 'Reds')
Hope that helps.
Ok, so I'm really bad at writing Python list comprehensions with more than one "for," but I want to get better at it. I want to know for sure whether or not the line
>>> [S[j]+str(i) for i in range(1,11) for j in range(3) for S in "ABCD"]
can be amended to return something like ["A1","B1","C1","D1","A2","B2","C2","D2"...(etc.)]
and if not, if there is a list comprehension that can return the same list, namely, a list of strings of all of the combinations of "ABCD" and the numbers from 1 to 10.
You have too many loops there. You don't need j at all.
This does the trick:
[S+str(i) for i in range(1,11) for S in "ABCD"]
The way I like to see more than one for loop in list comprehension is like the nested loop. Treat the next for loop as the loop nested in the first one and that will make it whole lot easier. To add to Daniel's answer:
[S+str(i) for i in range(1,11) for S in "ABCD"]
is nothing more than:
new_loop=[]
for i in range (1,11):
for S in "ABCD:
new_loop.append(S+str(i))
You may use itertools.product like this
import itertools
print [item[1] + str(item[0]) for item in itertools.product(range(1, 11),"ABCD")]
Output
['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3', 'A4',
'B4', 'C4', 'D4', 'A5', 'B5', 'C5', 'D5', 'A6', 'B6', 'C6', 'D6', 'A7', 'B7',
'C7', 'D7', 'A8', 'B8', 'C8', 'D8', 'A9', 'B9', 'C9', 'D9', 'A10', 'B10', 'C10',
'D10']
EVERY time you think in combining all the elements if a iterable with all the elements of another iterable, think itertools.product. It is a cartesian product of two sets (or lists).
I've found a solution that is slightly more fast than the ones presented here until now. And more than 2x fast than #daniel solution (Although his solution looks far more elegant):
import itertools
[x + y for (x,y) in (itertools.product('ABCD', map(str,range(1,5))))]
The difference here is that I casted the int to strings using map. Applying functions over vectors is usually faster than applying them on individual items.
And a general tip when dealing with complex comprehensions:
When you have lots of for and lots of conditionals inside your comprehension, break it into several lines, like this:
[S[j]+str(i) for i in range(1,11)
for j in range(3)
for S in "ABCD"]
In this case the change in easyness to read wasn't so big, but, when you have lots of conditionals and lots of fors, it makes a big diference. It's exactly like writing for loops and if statements nested, but without the ":" and the identation.
See the code using regular fors:
ans = []
for i in range(1,11):
for j in range(3):
for S in "ABCD":
ans.append(S[j] + str(i))
Almost the same thing :)
Why don't use itertools.product?
>>> import itertools
>>> [ i[0] + str(i[1]) for i in itertools.product('ABCD', range(1,5))]
['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'B3', 'B4', 'C1', 'C2', 'C3', 'C4', 'D1', 'D2', 'D3', 'D4']