Creating a list of 6 every possible combination - python

How would I create a list of 6 columns from a dataframe of 21 columns. I need to create every single combination possible and store these combinations is a dataframe.
Suppose
lst = ['c1', 'c2', 'c3', 'c4','c5', 'c6', 'c7','c8', 'c9', 'c10', 'c11','c12', 'c13', 'c14','c15', 'c16', 'c17', 'c18','c19', 'c20', 'c21']
# Calling DataFrame constructor on list
df = pd.DataFrame(lst)
Some list generator
adds new list to cdf dataframe
final df should be something like this not sure if i wrote this in the right syntax but a dataframe with 1 column haveing a list of 6 elements
cdf = [[c1', 'c2', 'c3', 'c4','c5', 'c6'],['c7','c8', 'c9', 'c10', 'c11','c12'], ['c13', 'c14','c15', 'c16', 'c17', 'c18']...]
Thank you!

I would do the combinations work in pure python, using the itertools.combinations function, documentation here. You can then import that list of tuples into pandas if desired.
Example code, which generates the combinations (combos are in sorted order, no repeated elements), prints how many combinations there are, and shows the first 2 and last 2 as examples:
import itertools
lst = ['c1', 'c2', 'c3', 'c4','c5', 'c6', 'c7','c8', 'c9', 'c10', 'c11','c12', 'c13', 'c14','c15', 'c16', 'c17', 'c18','c19', 'c20', 'c21']
combos = itertools.combinations(lst, 6)
combos_list = list(combos)
print(f'{len(combos_list)} Combinations')
print(combos_list[0])
print(combos_list[1])
print(combos_list[-2])
print(combos_list[-1])
This generates output:
54264 Combinations
('c1', 'c2', 'c3', 'c4', 'c5', 'c6')
('c1', 'c2', 'c3', 'c4', 'c5', 'c7')
('c15', 'c17', 'c18', 'c19', 'c20', 'c21')
('c16', 'c17', 'c18', 'c19', 'c20', 'c21')
Happy Coding!

Related

Concatenating one dimensional numpyarrays with variable size numpy array in loop

nNumbers = [1,2,3]
baseVariables = ['a','b','c','d','e']
arr = np.empty(0)
for i in nNumbers:
x = np.empty(0)
for v in baseVariables:
x = np.append(x, y['result'][i][v])
print(x)
arr = np.concatenate((arr, x))
I have one Json input stored in y. need to filter some variables out of that json format. the above code works in that it gives me the output in an array, but it is only in a one dimensional array. I want the output in a two dimensional array like:
[['q','qr','qe','qw','etc']['','','','','']['','','','','']]
I have tried various different ways but am not able to figure it out. Any feedback on how to get it to the desired output format would be greatly appreciated.
A correct basic Python way of making a nested list of strings:
In [57]: nNumbers = [1,2,3]
...: baseVariables = ['a','b','c','d','e']
In [58]: alist = []
...: for i in nNumbers:
...: blist = []
...: for v in baseVariables:
...: blist.append(v+str(i))
...: alist.append(blist)
...:
In [59]: alist
Out[59]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
That can be turned into an array if necessary - though numpy doesn't provide much added utility for strings:
In [60]: np.array(alist)
Out[60]:
array([['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']], dtype='<U2')
Or in a compact list comprehension form:
In [61]: [[v+str(i) for v in baseVariables] for i in nNumbers]
Out[61]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
You are starting with lists! And making strings! And selecting items from a JSON, with y['result'][i][v]. None of that benefits from using numpy, especially not the repeated use of np.append and np.concatenate.
Could you provide an example of JSON? It sounds like you basically want to
Filter the JSON
Flatten the JSON
Depending on what your output example means, you might want to not filter, but replace certain values with empty values, is that correct?
Please note that Pandas has very powerfull out-of-the-box options to handle, and in particular, flatten JSONs. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-json-reader. An approach could be to first load in Pandas and filter it from there. Flattening a JSON can also be done by iterating over it like so:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
I got this code from: https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10. The author explains some challenges of flattening JSON. Of course, you can put some if statement into the function for your filtering need. I hope this can get you started at least!

How can I detect common elements lists and groupe lists with at least 1 common element?

I have a Dataframe with 1 column (+the index) containing lists of sublists or elements.
I would like to detect common elements in the lists/sublists and group the lists with at least 1 common element in order to have only lists of elements without any common elements.
The lists/sublists are currently like this (exemple for 4 rows):
Num_ID
Row1 [['A1','A2','A3'],['A1','B1','B2','C3','D1']]`
Row2 ['A1','E2','E3']
Row3 [['B4','B5','G4'],['B6','B4']]
Row4 ['B4','C9']
n lists with no common elements (example for the first 2):
['A1','A2','A3','B1','B2','C3','D1','E2','E3']
['B4','B5','B6','C9','G4']
You can use NetworkX's connected_components method for this. Here's how I'd approach this adapting this solution:
import networkx as nx
from itertools import combinations, chain
df= pd.DataFrame({'Num_ID':[[['A1','A2','A3'],['A1','B1','B2','C3','D1']],
['A1','E2','E3'],
[['B4','B5','G4'],['B6','B4']],
['B4','C9']]})
Start by flattening the sublists in each list:
L = [[*chain.from_iterable(i)] if isinstance(i[0], list) else i
for i in df.Num_ID.values.tolist()]
[['A1', 'A2', 'A3', 'A1', 'B1', 'B2', 'C3', 'D1'],
['A1', 'E2', 'E3'],
['B4', 'B5', 'G4', 'B6', 'B4'],
['B4', 'C9']]
Given that the lists/sublists have more than 2 elements, you can get all the length 2 combinations from each sublist and use these as the network edges (note that edges can only connect two nodes):
L2_nested = [list(combinations(l,2)) for l in L]
L2 = list(chain.from_iterable(L2_nested))
Generate a graph, and add your list as the graph edges using add_edges_from. Then use connected_components, which will precisely give you a list of sets of the connected components in the graph:
G=nx.Graph()
G.add_edges_from(L2)
list(nx.connected_components(G))
[{'A1', 'A2', 'A3', 'B1', 'B2', 'C3', 'D1', 'E2', 'E3'},
{'B4', 'B5', 'B6', 'C9', 'G4'}]

Changing colors in graph using Matplotlib to plot Pandas dataframe

So I have a dataframe which I have plotted and looks great using the code below. The issue is that there are ~10 colors being used and repeated for multiple enteries in the dataframe so that it is impossible to distinguish one dataframe entry from 20 others.
I figure there must be some way to state 'use X color scheme and allow for 60 colors' but I dont know how when using a dataframe.
dataframe = pd.DataFrame(new_vals)
dataframe.plot(kind='bar',stacked=True,legend=False, ylim=(0,100),title='Taxonomic analysis of samples')
N = 29
ind = np.arange(N)
width = 0.35
plt.xlabel('Sample')
plt.ylabel('Percentage of assembly atrribted to each taxonomic group (%)')
plt.legend(loc="best", bbox_to_anchor=(1.0, 1.00))
plt.xticks(ind+width/2. - 0.2,('B11', 'B2', 'B3', 'B4', 'B5', 'B6', 'B7', 'B8', 'B9', 'C10', 'C11', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'D10', 'D11', 'D2', 'D3', 'D4', 'D5', 'D6', 'D7', 'D8', 'D9'))
Here is an example of my dataframe;
Other actinomyces alcanivorax alkaliphilus bacillus bacteroides candidatus phytoplasma cyanothece enterobacter escherichia ... neisseria paenibacillus porphyromonas prevotella pseudoalteromonas rothia staphylococcus streptococcus streptomyces veillonella
0 26.229808 5.198240 4.694513 0.047974 3.691476 0.792203 2.782495 2.018697 2.180294 0.453228 ... 1.677198 4.944483 0.458910 5.496815 2.910004 0.372430 0.599676 8.276785 1.992817 0.595257
1 24.395006 11.615767 1.995668 0.069200 5.750399 0.921047 1.248692 0.740260 0.967860 1.904479 ... 0.873587 1.316648 0.261579 4.954371 1.348089 2.405995 1.061885 18.200302 1.660959 5.382657
2 22.078940 5.772762 3.107776 0.070983 5.523827 1.428608 1.846615 1.218850 1.542251 1.656823 ... 0.986514 2.414715 0.617899 6.893698 2.014352 0.496304 1.056452 22.272679 1.470803 2.696270
3 33.438669 5.210649 0.043170 0.136277 7.108181 2.148167 0.071589 0.034340 0.073281 2.719497 ... 1.922939 0.111153 3.898990 6.426144 0.045960 4.365727 1.545480 17.170125 0.870670 3.480261
4 20.831026 3.001972 4.746576 0.034374 2.198009 0.677926 3.264413 2.524014 2.720162 0.563074 ... 1.167616 9.110402 0.358742 3.323339 3.180934 0.420669 0.408120 8.948355 1.856454 1.086865
IIUC you can generate a colormap of kcolors with a function like the following:
def colormapgenerator(N, cm=None):
base = plt.cm.get_cmap(cm)
color_list = base(np.linspace(0, 1, N))
cm_name = base.name + str(N)
return base.from_list(cm_name, color_list, N)
where cm is your desidered cmap (e.g. Blues, Reds etc), and N is the number of colors you need. Than try to add to your dataframe.plot() the following:
cmap=colormapgenerator(60, 'Reds')
Hope that helps.

sort multi demension list by another single list in python [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I have lists like this.
first : (apple, durian, cherry, egg, banana)
second : ((banana,b1,b2,b3,b4),
(durian,d1,d2,d3,d4),
(apple,a1,a2,a3,a4),
(egg,e1,e2,e3,e4),
(cherry,c1,c2,c3,c4))
I want to arrange second list using first list.
So I expect this.
((apple,a1,a2,a3,a4),
(durian,d1,d2,d3,d4),
(cherry,c1,c2,c3,c4),
(egg,e1,e2,e3,e4),
(banana,b1,b2,b3,b4))
please let me know how to do this.
thanks.
First of all - those are tuples, secondly all the samples you gave are not actually strings, so I did that for you.
Now lets convert it to a dictionary first:
data = [('banana','b1','b2','b3','b4'),
('durian','d1','d2','d3','d4'),
('apple','a1','a2','a3','a4'),
('egg','e1','e2','e3','e4'),
('cherry','c1','c2','c3','c4')]
data = {t[0]:t for t in data} # make dictionary with dictionary comprehension.
No we have our selector:
selector = ['apple', 'durian', 'cherry', 'egg', 'banana']
Then we order and create the list:
results = [data[key] for key in selector] # order result by selector
Answer:
[('apple', 'a1', 'a2', 'a3', 'a4'),
('durian', 'd1', 'd2', 'd3', 'd4'),
('cherry', 'c1', 'c2', 'c3', 'c4'),
('egg', 'e1', 'e2', 'e3', 'e4'),
('banana', 'b1', 'b2', 'b3', 'b4')]
What about using a dictionary? You could try this:
# first : (apple, durian, cherry, egg, banana)
# second : ((banana,b1,b2,b3,b4), (durian,d1,d2,d3,d4), (apple,a1,a2,a3,a4), (egg,e1,e2,e3,e4), (cherry,c1,c2,c3,c4))
d = {}
for lst in second:
d[lst[0]] = lst
result = []
for item in first:
# you shall ensure that key `item` exists in `d`
result.append(d[item])
In [25]: d = {L[0]:list(L[1:]) for L in second}
In [26]: answer = [[k]+d[k] for k in first]
In [27]: answer
Out[27]:
[['apple', 'a1', 'a2', 'a3', 'a4'],
['durian', 'd1', 'd2', 'd3', 'd4'],
['cherry', 'c1', 'c2', 'c3', 'c4'],
['egg', 'e1', 'e2', 'e3', 'e4'],
['banana', 'b1', 'b2', 'b3', 'b4']]

Using a Python list comprehension a bit like a zip

Ok, so I'm really bad at writing Python list comprehensions with more than one "for," but I want to get better at it. I want to know for sure whether or not the line
>>> [S[j]+str(i) for i in range(1,11) for j in range(3) for S in "ABCD"]
can be amended to return something like ["A1","B1","C1","D1","A2","B2","C2","D2"...(etc.)]
and if not, if there is a list comprehension that can return the same list, namely, a list of strings of all of the combinations of "ABCD" and the numbers from 1 to 10.
You have too many loops there. You don't need j at all.
This does the trick:
[S+str(i) for i in range(1,11) for S in "ABCD"]
The way I like to see more than one for loop in list comprehension is like the nested loop. Treat the next for loop as the loop nested in the first one and that will make it whole lot easier. To add to Daniel's answer:
[S+str(i) for i in range(1,11) for S in "ABCD"]
is nothing more than:
new_loop=[]
for i in range (1,11):
for S in "ABCD:
new_loop.append(S+str(i))
You may use itertools.product like this
import itertools
print [item[1] + str(item[0]) for item in itertools.product(range(1, 11),"ABCD")]
Output
['A1', 'B1', 'C1', 'D1', 'A2', 'B2', 'C2', 'D2', 'A3', 'B3', 'C3', 'D3', 'A4',
'B4', 'C4', 'D4', 'A5', 'B5', 'C5', 'D5', 'A6', 'B6', 'C6', 'D6', 'A7', 'B7',
'C7', 'D7', 'A8', 'B8', 'C8', 'D8', 'A9', 'B9', 'C9', 'D9', 'A10', 'B10', 'C10',
'D10']
EVERY time you think in combining all the elements if a iterable with all the elements of another iterable, think itertools.product. It is a cartesian product of two sets (or lists).
I've found a solution that is slightly more fast than the ones presented here until now. And more than 2x fast than #daniel solution (Although his solution looks far more elegant):
import itertools
[x + y for (x,y) in (itertools.product('ABCD', map(str,range(1,5))))]
The difference here is that I casted the int to strings using map. Applying functions over vectors is usually faster than applying them on individual items.
And a general tip when dealing with complex comprehensions:
When you have lots of for and lots of conditionals inside your comprehension, break it into several lines, like this:
[S[j]+str(i) for i in range(1,11)
for j in range(3)
for S in "ABCD"]
In this case the change in easyness to read wasn't so big, but, when you have lots of conditionals and lots of fors, it makes a big diference. It's exactly like writing for loops and if statements nested, but without the ":" and the identation.
See the code using regular fors:
ans = []
for i in range(1,11):
for j in range(3):
for S in "ABCD":
ans.append(S[j] + str(i))
Almost the same thing :)
Why don't use itertools.product?
>>> import itertools
>>> [ i[0] + str(i[1]) for i in itertools.product('ABCD', range(1,5))]
['A1', 'A2', 'A3', 'A4', 'B1', 'B2', 'B3', 'B4', 'C1', 'C2', 'C3', 'C4', 'D1', 'D2', 'D3', 'D4']

Categories

Resources