Python concatenate (multiple) strings to number - python

I have a python list ['a1', 'b1', 'a2', 'b2','a3', 'b3']. Set m=3 and I want get this list using loops, because here m=3 could be a larger number such as m=100.
Since we can have
m = 3
['a' + str(i) for i in np.arange(1,m+1)]
# ['a1', 'a2', 'a3']
['b' + str(i) for i in np.arange(1,m+1)]
# ['b1', 'b2', 'b3']
then I try to get ['a1', 'b1', 'a2', 'b2','a3', 'b3'] using
[ ['a','b'] + str(i) for i in np.arange(1,m+1)]
and have TypeError: can only concatenate list (not "str") to list
Then I try
[ np.array(['a','b']) + str(i) for i in np.arange(1,m+1)]
and I still get errors as UFuncTypeError: ufunc 'add' did not contain a loop with signature matching types (dtype('<U1'), dtype('<U1')) -> None.
How can I fix the problem? And even more, how to get something like ['a1', 'b1', 'c1', 'a2', 'b2','c2','a3', 'b3', 'c3'] through similar ways?

A simple combined list comprehension would work as pointed out in the #j1-lee's answer (and later in other answers).
import string
def letter_number_loop(n, m):
letters = string.ascii_letters[:n]
numbers = range(1, m + 1)
return [f"{letter}{number}" for number in numbers for letter in letters]
Similarly, one could use itertools.product(), as evidenced in Nick's answer, to obtain substantially the same:
import itertools
def letter_number_it(n, m):
letters = string.ascii_letters[:n]
numbers = range(1, m + 1)
return [
f"{letter}{number}"
for number, letter in itertools.product(numbers, letters)]
However, it is possible to write a NumPy-vectorized approach, making use of the fact that if the dtype is object, the operations do follow the Python semantics.
import numpy as np
def letter_number_np(n, m):
letters = np.array(list(string.ascii_letters[:n]), dtype=object)
numbers = np.array([f"{i}" for i in range(1, m + 1)], dtype=object)
return (letters[None, :] + numbers[:, None]).ravel().tolist()
Note that the final numpy.ndarray.tolist() could be avoided if whatever will consume the output is capable of dealing with the NumPy array itself, thus saving some relatively small but definitely appreciable time.
Inspecting Output
The following do indicate that the functions are equivalent:
funcs = letter_number_loop, letter_number_it, letter_number_np
n, m = 2, 3
for func in funcs:
print(f"{func.__name__!s:>32} {func(n, m)}")
letter_number_loop ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
letter_number_it ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
letter_number_np ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']
Benchmarks
For larger inputs, this is substantially faster, as evidenced by these benchmarks:
timings = {}
k = 16
for n in (2, 20):
for k in range(1, 10):
m = 2 ** k
print(f"n = {n}, m = {m}")
timings[n, m] = []
base = funcs[0](n, m)
for func in funcs:
res = func(n, m)
is_good = base == res
timed = %timeit -r 64 -n 64 -q -o func(n, m)
timing = timed.best * 1e6
timings[n, m].append(timing if is_good else None)
print(f"{func.__name__:>24} {is_good} {timing:10.3f} µs")
to be plotted with:
import matplotlib.pyplot as plt
import pandas as pd
n_s = (2, 20)
fig, axs = plt.subplots(1, len(n_s), figsize=(12, 4))
for i, n in enumerate(n_s):
partial_timings = {k[1]: v for k, v in timings.items() if k[0] == n}
df = pd.DataFrame(data=partial_timings, index=[func.__name__ for func in funcs]).transpose()
df.plot(marker='o', xlabel='Input size / #', ylabel='Best timing / µs', ax=axs[i], title=f"n = {n}")
These show that the explicitly looped versions (letter_number_loop() and letter_number_it()) are somewhat comparable, while the NumPy-vectorized (letter_number_np()) fares much better relatively quickly for larger inputs, up to ~2x speed-up.

You need to iterate on both the range of numbers and the list of strings
In [106]: [s+str(i) for i in range(1,4) for s in ['a','b']]
Out[106]: ['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

You can have more than one for in a list comprehension:
prefixes = ['a', 'b', 'c']
m = 3
output = [f"{prefix}{num}" for num in range(1, m+1) for prefix in prefixes]
print(output) # ['a1', 'b1', 'c1', 'a2', 'b2', 'c2', 'a3', 'b3', 'c3']
If you have multiple fors, those will be nested, as in
for num in range(1, m+1):
for prefix in prefixes:
...

You could use itertools.product to get all combinations of the the letters and the m range and then join them in an f-string (rather than using join as one element is an integer so would require converting to a string):
[f'{x}{y}' for x, y in itertools.product(range(1, m+1), ['a', 'b'])]
Output:
['a1', 'b1', 'a2', 'b2', 'a3', 'b3']

Related

Concatenating one dimensional numpyarrays with variable size numpy array in loop

nNumbers = [1,2,3]
baseVariables = ['a','b','c','d','e']
arr = np.empty(0)
for i in nNumbers:
x = np.empty(0)
for v in baseVariables:
x = np.append(x, y['result'][i][v])
print(x)
arr = np.concatenate((arr, x))
I have one Json input stored in y. need to filter some variables out of that json format. the above code works in that it gives me the output in an array, but it is only in a one dimensional array. I want the output in a two dimensional array like:
[['q','qr','qe','qw','etc']['','','','','']['','','','','']]
I have tried various different ways but am not able to figure it out. Any feedback on how to get it to the desired output format would be greatly appreciated.
A correct basic Python way of making a nested list of strings:
In [57]: nNumbers = [1,2,3]
...: baseVariables = ['a','b','c','d','e']
In [58]: alist = []
...: for i in nNumbers:
...: blist = []
...: for v in baseVariables:
...: blist.append(v+str(i))
...: alist.append(blist)
...:
In [59]: alist
Out[59]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
That can be turned into an array if necessary - though numpy doesn't provide much added utility for strings:
In [60]: np.array(alist)
Out[60]:
array([['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']], dtype='<U2')
Or in a compact list comprehension form:
In [61]: [[v+str(i) for v in baseVariables] for i in nNumbers]
Out[61]:
[['a1', 'b1', 'c1', 'd1', 'e1'],
['a2', 'b2', 'c2', 'd2', 'e2'],
['a3', 'b3', 'c3', 'd3', 'e3']]
You are starting with lists! And making strings! And selecting items from a JSON, with y['result'][i][v]. None of that benefits from using numpy, especially not the repeated use of np.append and np.concatenate.
Could you provide an example of JSON? It sounds like you basically want to
Filter the JSON
Flatten the JSON
Depending on what your output example means, you might want to not filter, but replace certain values with empty values, is that correct?
Please note that Pandas has very powerfull out-of-the-box options to handle, and in particular, flatten JSONs. https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-json-reader. An approach could be to first load in Pandas and filter it from there. Flattening a JSON can also be done by iterating over it like so:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
I got this code from: https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10. The author explains some challenges of flattening JSON. Of course, you can put some if statement into the function for your filtering need. I hope this can get you started at least!

Create list of lists from list based on criteria

I have a list of strings that I am trying to convert into a list of lists based on when a specific character appears in the list. Below is an example:
I am starting with the following list:
lst = ['ab', 'c1', 'cd', 'd2', 'a1', 'b1', 'c1', 'ax', 'by', 'cz', 'dzz']
I want to convert lst into a list of lists where each list begins where there is a string that starts with "a" and ends one element before the next string that starts with "a". The result should look like this:
new_lst = [['ab', 'c1', 'cd', 'd2'], ['a1', 'b1', 'c1'], ['ax', 'by', 'cz', 'dzz']]
What I have tried was to find the index of all elements that begin with "a", and I do so with the following code indices = [idx for idx, x in enumerate(lst) if x.startswith('a')]. This got me the position of each string that matched that criteria. This yielded [0, 4, 7]
Then I looked into splitting the list using the ranges created from the indices. So split at ranges (0,3), (4,6), and (7,10). I've been at it for hours and I can't figure out how to do this dynamically. Couldn't find any solutions online either. I was wondering if anyone could help me with this. Or perhaps my approach wasn't the most ideal from the start.
import numpy as np
lst = ["ab", "c1", "cd", "d2", "a1", "b1", "c1", "ax", "by", "cz", "dzz"]
indices = [idx for idx, x in enumerate(lst) if x.startswith("a")]
print([each_split.tolist() for each_split in np.split(lst, indices) if len(each_split)])
Numpy does the job but your approach was also good ! Moreover, it could be interesting to see your code and work on it, rather than giving you the solution !
Like you said, you just have to iterate through your indices list and create ranges. To do this, consider to add the end of the list ! :
max_idx = len(lst)
append(max_idx)
print(idx)
>> [0, 4, 7, 11]
Then, you just have to construct your ranges :
new_lst = []
# the idea is to only iterate on [0, 4, 7]
# to create then the ranges [(0,4), (4,7), (7,11)]
# in python list[0:4] will take indexes 0,1,2,3 but not 4
for i in range(len(idx)-1):
new_lst.append(lst[idx[i]:idx[i+1]])
print(new_lst)
>> [['ab', 'c1', 'cd', 'd2'], ['a1', 'b1', 'c1'], ['ax', 'by', 'cz', 'dzz']]
Try numpy.split
idx = [lst.index(a) for a in lst if a.lower()[0] == 'a']
new_lst = np.split(lst, idx)
[array([], dtype='<U3'),
array(['ab', 'c1', 'cd', 'd2'], dtype='<U3'),
array(['a1', 'b1', 'c1'], dtype='<U3'),
array(['ax', 'by', 'cz', 'dzz'], dtype='<U3')]
def listoflists(l):
new_lst=[]
a=[]
c=0
for i in l:
if i[0]=='a':
c+=1
if c<2:
a.append(i)
else:
new_lst.append(a)
a=[]
a.append(i)
else:
a.append(i)
new_lst.append(a)
return(new_lst)
l = ['ab', 'c1', 'cd', 'd2', 'a1', 'b1', 'c1', 'ax', 'by', 'cz', 'dzz','an','bw','ey']
print(listoflists(l))

How can I detect common elements lists and groupe lists with at least 1 common element?

I have a Dataframe with 1 column (+the index) containing lists of sublists or elements.
I would like to detect common elements in the lists/sublists and group the lists with at least 1 common element in order to have only lists of elements without any common elements.
The lists/sublists are currently like this (exemple for 4 rows):
Num_ID
Row1 [['A1','A2','A3'],['A1','B1','B2','C3','D1']]`
Row2 ['A1','E2','E3']
Row3 [['B4','B5','G4'],['B6','B4']]
Row4 ['B4','C9']
n lists with no common elements (example for the first 2):
['A1','A2','A3','B1','B2','C3','D1','E2','E3']
['B4','B5','B6','C9','G4']
You can use NetworkX's connected_components method for this. Here's how I'd approach this adapting this solution:
import networkx as nx
from itertools import combinations, chain
df= pd.DataFrame({'Num_ID':[[['A1','A2','A3'],['A1','B1','B2','C3','D1']],
['A1','E2','E3'],
[['B4','B5','G4'],['B6','B4']],
['B4','C9']]})
Start by flattening the sublists in each list:
L = [[*chain.from_iterable(i)] if isinstance(i[0], list) else i
for i in df.Num_ID.values.tolist()]
[['A1', 'A2', 'A3', 'A1', 'B1', 'B2', 'C3', 'D1'],
['A1', 'E2', 'E3'],
['B4', 'B5', 'G4', 'B6', 'B4'],
['B4', 'C9']]
Given that the lists/sublists have more than 2 elements, you can get all the length 2 combinations from each sublist and use these as the network edges (note that edges can only connect two nodes):
L2_nested = [list(combinations(l,2)) for l in L]
L2 = list(chain.from_iterable(L2_nested))
Generate a graph, and add your list as the graph edges using add_edges_from. Then use connected_components, which will precisely give you a list of sets of the connected components in the graph:
G=nx.Graph()
G.add_edges_from(L2)
list(nx.connected_components(G))
[{'A1', 'A2', 'A3', 'B1', 'B2', 'C3', 'D1', 'E2', 'E3'},
{'B4', 'B5', 'B6', 'C9', 'G4'}]

Create all possible char combinations from an unknown amount of lists in python

I want to create all possible character combinations from lists. The first char needs to be from the first array, the second char from the second array, etc.
If I have the following lists:
char1 = ['a','b','c']
char2 = ['1','2']
The possible strings, would be: a1, a2, b1, b2, c1 and c2.
How do I make the code with makes all the combinations from an unknown amount of lists with an unknown size?
The problem is that I do not know, how many lists there will be. The amount of lists will be decided by the user, while the code is running.
As mentioned above, you can use itertools.product()
And since you don't know number of lists you can pass list of lists as an argument:
import itertools
lists = [
['a','b','c'],
['1','2']
]
["".join(x) for x in itertools.product(*lists)]
Result:
['a1', 'a2', 'b1', 'b2', 'c1', 'c2']
That's a task for itertools.product()! Check out the docs: https://docs.python.org/2/library/itertools.html#itertools.product
>>> ["%s%s" % (c1,c2) for (c1,c2) in itertools.product(char1, char2)]
['a1', 'a2', 'b1', 'b2', 'c1', 'c2']
And yeah, it extends to a variable number of lists of unknown size.

lists permutation in python

I have the following lists:
list1 = [ 'A','B','C']
list2 = [ '1', '2' ]
Trying to generate a new list of tuples with the following desired result:
[(A1),(A2),(A1,B1),(A1,B2),(A2,B1),(A2,B2),(A1,B1,C1),(A2,B1,C1)...]
Each tuple will eventully be used to write a single line in an output file.
Note that:
In each tuple, each letter from list1, if defined, must be defined after the preceding letters. for example, if 'B' is defined in a tuple then 'A' must be in the tuple as well and prior to 'B'. tuple (A1,C1) is not desired since 'B' is not defined as well.
Tuples must be unique.
list1 & list2 are just an example and may vary in length.
I tried playing around with itertools, specifically with,
product,
permutations,
combinations
for quite some time. I can't seem to pull it off and I don't even have some code worth sharing.
Take successive larger slices of list1, and use products of products:
from itertools import product
elements = []
for letter in list1:
elements.append([''.join(c) for c in product(letter, list2)])
for combo in product(*elements):
print combo
The elements list is grown each loop, adding another set of letter + numbers list to produce products from.
This produces:
>>> elements = []
>>> for letter in list1:
... elements.append([''.join(c) for c in product(letter, list2)])
... for combo in product(*elements):
... print combo
...
('A1',)
('A2',)
('A1', 'B1')
('A1', 'B2')
('A2', 'B1')
('A2', 'B2')
('A1', 'B1', 'C1')
('A1', 'B1', 'C2')
('A1', 'B2', 'C1')
('A1', 'B2', 'C2')
('A2', 'B1', 'C1')
('A2', 'B1', 'C2')
('A2', 'B2', 'C1')
('A2', 'B2', 'C2')
What about this:
from itertools import product
output = []
for z in [list1[:n+1] for n in range(len(list1))]:
for y in product(list2, repeat=len(z)):
output.append(tuple(''.join(u) for u in zip(z, y)))
print(output)

Categories

Resources