Python - Split array into multiple arrays - python

I have a array contains file names like below:
['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png', ....]
I want to quickly group these files into multiple arrays like this:
[['001_1.png', '001_2.png', '001_3.png'], ['002_1.png', '002_2.png'], ['003_1.png', '003_2.png', '003_3.png', '003_4.png'], ...]
Could anyone tell me how to do it in few lines in python?

If your data is already sorted by the file name, you can use itertools.groupby:
files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
'003_1.png', '003_2.png', '003_3.png']
import itertools
keyfunc = lambda filename: filename[:3]
# this creates an iterator that yields `(group, filenames)` tuples,
# but `filenames` is another iterator
grouper = itertools.groupby(files, keyfunc)
# to get the result as a nested list, we iterate over the grouper to
# discard the groups and turn the `filenames` iterators into lists
result = [list(files) for _, files in grouper]
print(list(result))
# [['001_1.png', '001_2.png', '001_3.png'],
# ['002_1.png', '002_2.png'],
# ['003_1.png', '003_2.png', '003_3.png']]
Otherwise, you can base your code on this recipe, which is more efficient than sorting the list and then using groupby.
Input: Your input is a flat list, so use a regular ol' loop to iterate over it:
for filename in files:
Group identifier: The files are grouped by the first 3 letters:
group = filename[:3]
Output: The output should be a nested list rather than a dict, which can be done with
result = list(groupdict.values())
Putting it together:
files = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png',
'003_1.png', '003_2.png', '003_3.png']
import collections
groupdict = collections.defaultdict(list)
for filename in files:
group = filename[:3]
groupdict[group].append(filename)
result = list(groupdict.values())
print(result)
# [['001_1.png', '001_2.png', '001_3.png'],
# ['002_1.png', '002_2.png'],
# ['003_1.png', '003_2.png', '003_3.png']]
Read the recipe answer for more details.

Something like that should work:
import itertools
mylist = [...]
[list(v) for k,v in itertools.groupby(mylist, key=lambda x: x[:3])]
If input list isn't sorted, than use something like that:
import itertools
mylist = [...]
keyfunc = lambda x:x[:3]
mylist = sorted(mylist, key=keyfunc)
[list(v) for k,v in itertools.groupby(mylist, key=keyfunc)]

You can do it using a dictionary.
list = ['001_1.png', '001_2.png', '003_3.png', '002_1.png', '002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']
dict = {}
for item in list:
if item[:3] not in dict:
dict[item[:3]] = []
dict[item[:3]].append(item)
Then you have to sort the dictionary by key value.
dict = {k:v for k,v in sorted(dict.items())}
The last step is to use a list comprehension in order to achieve your requirement.
list = [v for k,v in dict.items()]
print(list)
Output
[['001_1.png', '001_2.png'], ['002_1.png', '002_2.png'], ['003_3.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']]

Using a simple iteration and dictionary.
Ex:
l = ['001_1.png', ' 001_2.png', ' 003_3.png', ' 002_1.png', ' 002_2.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png']
r = {}
for i in l:
v = i.split("_")[0][-1]
if v not in r:
r[v] = []
r[v].append(i)
print(r.values())
Output:
[['001_1.png', ' 001_2.png'], [' 003_3.png', ' 003_1.png', ' 003_2.png', ' 003_3.png', ' 003_4.png'], [' 002_1.png', ' 002_2.png']]

If your list is ordered like this here is a short script for this task.
myList = []
for i in a:
if i[:-4].endswith('1'):
myList.append([i])
else:
myList[-1].append(i)
# [['001_1.png', '001_2.png', '003_3.png'], ['002_1.png', '002_2.png'], ...]

#IYN
mini_list = []
p = ['001_1.png', '001_2.png', '001_3.png', '002_1.png','002_2.png', '003_1.png', '003_2.png', '003_3.png', '003_4.png']
new_p = []
for index, element in enumerate(p):
if index == len(p)-1:
mini_list.append(element)
new_p.append(mini_list)
break
if element[0:3]==p[index+1][0:3]:
mini_list.append(element)
else:
mini_list.append(element)
new_p.append(mini_list)
mini_list = []
print (new_p)
The code above will cut the initial list into sub lists and append them as individual lists into a resulting, larger list.
Note: not a few lines, but you can convert this to a function.
def list_cutter(ls):
mini_list = []
new_list = []
for index, element in enumerate(ls):
if index == len(ls)-1:
mini_list.append(element)
new_list.append(mini_list)
break
if element[0:3]==ls[index+1][0:3]:
mini_list.append(element)
else:
mini_list.append(element)
new_list.append(mini_list)
mini_list = []
return new_list

Related

String Compression in Python

I have the following input :
my_list = ["x d1","y d1","z d2","t d2"]
And would like to transform it into :
Expected_result = ["d1(x,y)","d2(z,t)"]
I had to use brute force, and also had to call pandas to my rescue, since I didn't find any way to do it in plain/vanilla python. Do you have any other way to solve this?
import pandas as pd
my_list = ["x d1","y d1","z d2","t d2"]
df = pd.DataFrame(my_list,columns=["col1"])
df2 = df["col1"].str.split(" ",expand = True)
df2.columns = ["col1","col2"]
grp = df2.groupby(["col2"])
result = []
for grp_name, data in grp:
res = grp_name +"(" + ",".join(list(data["col1"])) + ")"
result.append(res)
print(result)
The code defines an empty dictionary.
It then iterates over each item in your list and uses the split() method to split item into a key and a value.
Then uses the setdefault() method to add the key and the value to the empty dictionary. If the value already exists as a key in the dictionary, it appends the key to that value's existing list of keys. And if the value does not exist as a key in the dictionary, it creates a new key-value pair with the value as the key and the key as the first element in the new list.
Finally, the list comprehension iterates over the items in the dictionary and creates a string for each key-value pair using join() method to concatenate the keys in the value list into a single string.
result = {}
for item in my_list:
key, value = item.split()
result.setdefault(value, []).append(key)
output = [f"{k}({', '.join(v)})" for k, v in result.items()]
print(output)
['d1(x, y)', 'd2(z, t)']
If your values are already sorted by key (d1, d2), you can use itertools.groupby:
from itertools import groupby
out = [f"{k}({','.join(x[0] for x in g)})"
for k, g in groupby(map(str.split, my_list), lambda x: x[1])]
Output:
['d1(x,y)', 'd2(z,t)']
Otherwise you should use a dictionary as shown by #Jamiu.
A variant of your pandas solution:
out = (df['col1'].str.split(n=1, expand=True)
.groupby(1)[0]
.apply(lambda g: f"{g.name}({','.join(g)})")
.tolist()
)
my_list = ["x d1","y d1","z d2","t d2"]
res = []
for item in my_list:
a, b, *_ = item.split()
if len(res) and b in res[-1]:
res[-1] = res[-1].replace(')', f',{a})')
else:
res.append(f'{b}({a})')
print(res)
['d1(x,y)', 'd2(z,t)']
Let N be the number that follows d, this code works for any number of elements within dN, as long as N is ordered, that is, d1 comes before d2, which comes before d3, ... Works with any value of N , and you can use any letter in the d link as long as it has whatever value is in dN and then dN, keeping that order, "val_in_dN dN"
If you need something that works even if the dN are not in sequence, just say the word, but it will cost a little more
Another possible solution, which is based on pandas:
(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
.groupby('a')['b'].apply(lambda x: f'({x.values[0]}, {x.values[1]})')
.reset_index().sum(axis=1).tolist())
Output:
['d1(x, y)', 'd2(z, t)']
EDIT
The OP, #ShckTchamna, would like to see the above solution modified, in order to be more general: The reason of this edit is to provide a solution that works with the example the OP gives in his comment below.
my_list = ["x d1","y d1","z d2","t d2","kk d2","m d3", "n d3", "s d4"]
(pd.DataFrame(np.array([str.split(x, ' ') for x in my_list]), columns=['b', 'a'])
.groupby('a')['b'].apply(lambda x: f'({",".join(x.values)})')
.reset_index().sum(axis=1).tolist())
Output:
['d1(x,y)', 'd2(z,t,kk)', 'd3(m,n)', 'd4(s)']
import pandas as pd
df = pd.DataFrame(data=[e.split(' ') for e in ["x d1","y d1","z d2","t d2"]])
r = (df.groupby(1)
.apply(lambda r:"{0}({1},{2})".format(r.iloc[0,1], r.iloc[0,0], r.iloc[1,0]))
.reset_index()
.rename({1:"points", 0:"coordinates"}, axis=1)
)
print(r.coordinates.tolist())
# ['d1(x,y)', 'd2(z,t)']
print(r)
# points coordinates
# 0 d1 d1(x,y)
# 1 d2 d2(z,t)
In replacement of my previous one (that works too) :
import itertools as it
my_list = [e.split(' ') for e in ["x d1","y d1","z d2","t d2"]]
r=[]
for key, group in it.groupby(my_list, lambda x: x[1]):
l=[e[0] for e in list(group)]
r.append("{0}({1},{2})".format(key, l[0], l[1]))
print(r)
Output :
['d1(x,y)', 'd2(z,t)']

get the file names list by another list

I am new to Python and felt some kind of confused...
I have a List like:
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd","asakindsbbss","wooedakse","sdadakindwsd","xxxxsdsd"]
and also a keyword list:
Key = ["woo","kind"]
and I want to get something like this:
[
["aawoobbcc", "aawoobbca","wooedakse"],
["aabbcskindd","asakindsbbss","sdadakindwsd"]
]
I have tried list_sub = [file for file in List_all if Key in file]
or list_sub = [file for file in List_all if k for k in Key in file]
which were not right.
how could I go through the elements in Key for the substring of elements in List?
Thanks a lot!
One approach (O(n^2)), is the following:
res = [[e for e in List_all if k in e] for k in Key]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
A simpler to understand solution (for newbies) is to use nested for loops:
res = []
for k in Key:
filtered = []
for e in List_all:
if k in e:
filtered.append(e)
res.append(filtered)
A more advanced solution, albeit more performant (for really long lists), is to use a regular expression in conjunction with a defaultdict:
import re
from collections import defaultdict
List_all = ["aawoobbcc", "aawoobbca", "aabbcskindd", "asakindsbbss", "wooedakse", "sdadakindwsd", "xxxxsdsd"]
Key = ["woo", "kind"]
extract_key = re.compile(f"{'|'.join(Key)}")
table = defaultdict(list)
for word in List_all:
if match := extract_key.search(word):
table[match.group()].append(word)
res = [table[k] for k in Key if k in table]
print(res)
Output
[['aawoobbcc', 'aawoobbca', 'wooedakse'], ['aabbcskindd', 'asakindsbbss', 'sdadakindwsd']]
Note that this solution consider that each string contains only one key.

use python to combine data lists

If I have two separated lists:
list1 = [['2021-05-24', '31220'],....., ['2021-05-24', '6640'],['2021-05-10', '8830'].....]
list2 = [['2021-05-24', '77860'],.....,['2021-05-24', '438000'],['2021-05-10', '9990'].....]
How could I combine them to
[['2021-05-24', 'add all numbers in '2021-05-24' together'],['2021-05-10', 'add all numbers in '2021-05-10' together']]
, '.....' means there are many list-tuples
I am considering delete the duplicated date in each list and then add two lists up:
import networkx as nx
G = nx.Graph()
G.add_nodes_from(sum(list2, []))
q = [[(s[i],s[i+1]) for i in range(len(s)-1)] for s in list2]
for i in q:
G.add_edges_from(i)
print([list(i) for i in nx.connected_components(G)])
but my code not only deleted the same dates but also deleted the same numbers.
Thanks in advance.
I'd recommend using a defaultdict:
from collections import defaultdict
result = defaultdict(int)
for k,v in (list1 + list2):
result[k] += v
Then you can convert the dict back to a list. Of course if you have several lists you may want to use itertools.chain instead of list1 + list2
You can go with creating new dictionary and do calculation and then create list out of it. Here is the code for that
result={}
for i in list1+list2: #creating dict and doing calculation
if i[0] in result.keys():
result[i[0]] += int(i[1])
else:
result[i[0]] = int(i[1])
result_list = [[key, result[key]] for key in result] #converting dict to expected list
print(result_list)

While loop within for loop for list of lists

I'm trying to create a big list that will contain lists of strings. I iterate over the input list of strings and create a temporary list.
Input:
['Mike','Angela','Bill','\n','Robert','Pam','\n',...]
My desired output:
[['Mike','Angela','Bill'],['Robert','Pam']...]
What i get:
[['Mike','Angela','Bill'],['Angela','Bill'],['Bill']...]
Code:
for i in range(0,len(temp)):
temporary = []
while(temp[i] != '\n' and i<len(temp)-1):
temporary.append(temp[i])
i+=1
bigList.append(temporary)
Use itertools.groupby
from itertools import groupby
names = ['Mike','Angela','Bill','\n','Robert','Pam']
[list(g) for k,g in groupby(names, lambda x:x=='\n') if not k]
#[['Mike', 'Angela', 'Bill'], ['Robert', 'Pam']]
Fixing your code, I'd recommend iterating over each element directly, appending to a nested list -
r = [[]]
for i in temp:
if i.strip():
r[-1].append(i)
else:
r.append([])
Note that if temp ends with a newline, r will have a trailing empty [] list. You can get rid of that though:
if not r[-1]:
del r[-1]
Another option would be using itertools.groupby, which the other answerer has already mentioned. Although, your method is more performant.
Your for loop was scanning over the temp array just fine, but the while loop on the inside was advancing that index. And then your while loop would reduce the index. This caused the repitition.
temp = ['mike','angela','bill','\n','robert','pam','\n','liz','anya','\n']
# !make sure to include this '\n' at the end of temp!
bigList = []
temporary = []
for i in range(0,len(temp)):
if(temp[i] != '\n'):
temporary.append(temp[i])
print(temporary)
else:
print(temporary)
bigList.append(temporary)
temporary = []
You could try:
a_list = ['Mike','Angela','Bill','\n','Robert','Pam','\n']
result = []
start = 0
end = 0
for indx, name in enumerate(a_list):
if name == '\n':
end = indx
sublist = a_list[start:end]
if sublist:
result.append(sublist)
start = indx + 1
>>> result
[['Mike', 'Angela', 'Bill'], ['Robert', 'Pam']]

Collect parts of duplicated elements in a list and merge them in just one element

I'm trying to parse a huge Excel file which contains properties and their values.
The problem is as follows: Some properties are able to contain multiple values.
Example:
list = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
Should be:
list2 = ['a=1', 'b=2', 'c=3', 'd=4,5,6', 'e=7']
The elements are strings with a variable length and they are seperated by a '='.
This is how i generate the list out of the Excel file:
#for each row in the excel file.
for rows in range(DATA_ROW, sheet.nrows):
#generate a list with all properties.
for cols in range(sheet.ncols):
#if the propertie is not emty
if str(sheet.cell(PROPERTIE_ROW,cols).value) is not '':
proplist.append(sheet.cell(PROPERTIE_ROW,cols).value + '=' + str(sheet.cell(rows,cols).value) + '\n')
I gave it a try but that didn't work very well ...
last_item = ''
new_list = []
#find and collect multiple values.
for i, item in enumerate(proplist):
#if the propertie is already in the list
if str(item).find(last_item) is not -1:
#just copy the value and append it to the propertie
new_list.insert(i, propertie);
else:
#slize the string in propertie and value
pos = item.find('=')
propertie = item[0:pos+1]
value = item[pos+1:len(item)]
#save the propertie
last_item = propertie
#append item
new_list.append(item)
Any help would be greatly appreciated!
If the order doesn't matter, you could probably use a defaultdict for this sort of thing:
from collections import defaultdict
orig = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
d = defaultdict(list)
for item in orig:
k,v = item.split('=',1)
d[k].append(v)
new = ['{0}={1}'.format(k,','.join(v)) for k,v in d.items()]
print(new) #['a=1', 'c=3', 'b=2', 'e=7', 'd=4,5,6']
I suppose that if order does matter, you could use an OrderedDict + setdefault but it really isn't as pretty:
from collections import OrderedDict
orig = ['a=1', 'b=2', 'c=3', 'd=4', 'd=5', 'd=6', 'e=7']
d = OrderedDict()
for item in orig:
k,v = item.split('=',1)
d.setdefault(k,[]).append(v)
new = ['{0}={1}'.format(k,','.join(v)) for k,v in d.items()]
print new # ['a=1', 'b=2', 'c=3', 'd=4,5,6', 'e=7']

Categories

Resources