Related
For example the original list:
['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
We want to split the list into lists started with 'a' and ended with 'a', like the following:
['a','b','c','a']
['a','d','e','a']
['a','b','e','f','j','a']
['a','c','a']
The final ouput can also be a list of lists. I have tried a double for loop approach with 'a' as the condition, but this is inefficient and not pythonic.
One possible solution is using re (regex)
import re
l = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
r = [list(f"a{_}a") for _ in re.findall("(?<=a)[^a]+(?=a)", "".join(l))]
print(r)
# [['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]
You can do this in one loop:
lst = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
out = [[]]
for i in lst:
if i == 'a':
out[-1].append(i)
out.append([])
out[-1].append(i)
out = out[1:] if out[-1][-1] == 'a' else out[1:-1]
Also using numpy.split:
out = [ary.tolist() + ['a'] for ary in np.split(lst, np.where(np.array(lst) == 'a')[0])[1:-1]]
Output:
[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]
Firstly you can store the indices of 'a' from the list.
oList = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
idx_a = list()
for idx, char in enumerate(oList):
if char == 'a':
idx_a.append(idx)
Then for every consecutive indices you can get the sub-list and store it in a list
ans = [oList[idx_a[x]:idx_a[x + 1] + 1] for x in range(len(idx_a))]
You can also get more such lists if you take in-between indices also.
You can do this with a single iteration and a simple state machine:
original_list = list('kabcadeabefjacab')
multiple_lists = []
for c in original_list:
if multiple_lists:
multiple_lists[-1].append(c)
if c == 'a':
multiple_lists.append([c])
if multiple_lists[-1][-1] != 'a':
multiple_lists.pop()
print(multiple_lists)
[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]
We can use str.split() to split the list once we str.join() it to a string, and then use a f-string to add back the stripped "a"s. Note that even if the list starts/ends with an "a", this the split list will have an empty string representing the substring before the split, so our unpacking logic that discards the first + last subsequences will still work as intended.
def split(data):
_, *subseqs, _ = "".join(data).split("a")
return [list(f"a{seq}a") for seq in subseqs]
Output:
>>> from pprint import pprint
>>> testdata = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
>>> pprint(split(testdata))
[['a', 'b', 'c', 'a'],
['a', 'd', 'e', 'a'],
['a', 'b', 'e', 'f', 'j', 'a'],
['a', 'c', 'a']]
I have my data in txt file.
1 B F 2019-03-10
1 C G 2019-03-11
1 B H 2019-03-10
1 C I 2019-03-10
1 B J 2019-03-10
2 A K 2019-03-10
1 D L 2019-03-10
2 D M 2019-03-10
2 E N 2019-03-11
1 E O 2019-03-10
What I need to do is to split the data according to the first column.
So all rows with number 1 in the first column go to one list( or dictionary or whatever) and all rows with number 2 in the first column do to other list or whatever. This is a sample data, in original data we do not know how many different numbers are in the first column.
What I have to do next is to sort the data for each key (in my case for numbers 1 and 2) by date and time. I could do that with the data.txt, but not with the dictionary.
with open("data.txt") as file:
reader = csv.reader(file, delimiter="\t")
data=sorted(reader, key=itemgetter(0))
lines = sorted(data, key=itemgetter(3))
lines
OUTPUT:
[['1', 'B', 'F', '2019-03-10'],
['2', 'D', 'M', '2019-03-10'],
['1', 'B', 'H', '2019-03-10'],
['1', 'C', 'I', '2019-03-10'],
['1', 'B', 'J', '2019-03-10'],
['1', 'D', 'L', '2019-03-10'],
['2', 'A', 'K', '2019-03-10'],
['1', 'E', 'O', '2019-03-10'],
['1', 'C', 'G', '2019-03-11'],
['2', 'E', 'N', '2019-03-11']]
So what I need is to group the data by the number in the first column as well as to sort this by the date and time. Could anyone please help me to combine these two codes somehow? I am not sure if I had to use a dictionary, maybe there is another way to do that.
You can sort corresponding list for each key after splitting the data according to the first column
def sort_by_time(key_items):
return sorted(key_items, key=itemgetter(3))
d = {k: sort_by_time(v) for k, v in d.items()}
If d has separate elements for time and for date, then you can sort by several columns:
sorted(key_items, key=itemgetter(2, 3))
itertools.groupby can help build the lists:
from operator import itemgetter
from itertools import groupby
from pprint import pprint
# Read all the data splitting on whitespace
with open('data.txt') as f:
data = [line.split() for line in f]
# Sort by indicated columns
data.sort(key=itemgetter(0,3,4))
# Build a dictionary keyed on the first column
# Note: data must be pre-sorted by the groupby key for groupby to work correctly.
d = {group:list(items) for group,items in groupby(data,key=itemgetter(0))}
pprint(d)
Output:
{'1': [['1', 'B', 'F', '2019-03-10', '16:13:38.935'],
['1', 'B', 'H', '2019-03-10', '16:13:59.045'],
['1', 'C', 'I', '2019-03-10', '16:14:07.561'],
['1', 'B', 'J', '2019-03-10', '16:14:35.371'],
['1', 'D', 'L', '2019-03-10', '16:14:40.854'],
['1', 'E', 'O', '2019-03-10', '16:15:05.878'],
['1', 'C', 'G', '2019-03-11', '16:14:39.999']],
'2': [['2', 'D', 'M', '2019-03-10', '16:13:58.641'],
['2', 'A', 'K', '2019-03-10', '16:14:43.224'],
['2', 'E', 'N', '2019-03-11', '16:15:01.807']]}
I have a list of numbers and values for each number. What I want to do is check if the number already exists in a dictionary and if it does, append the value to that list of values for the specific key.
For an Example
0
a
2
b
3
c
0
d
7
e
What I want to achieve is to populate a dictionary where the numbers would be the keys and letters would be the values.However in the event that the number 0 comes up again I want to take the value of the second 0 and append it to my list of values.
Basically the outcome would be
"0" : [a,d]
"2" : [b]
"3" : [c]
"7" : [e]
Right now im in the process of the following:
num_letter_dict = {}
num = ['0', '2', '3', '0','7']
letters = ['a', 'b', 'c', 'd','e']
for line in num:
if line in num_letter_dict:
num_letter_dict[line].append(letters)
else:
num_letter_dict[line] = [letters]
print(num_letter_dict)
This is the result I am getting
{'0': [['a', 'b', 'c', 'd', 'e']]}
{'0': [['a', 'b', 'c', 'd', 'e']], '2': [['a', 'b', 'c', 'd', 'e']]}
{'0': [['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']], '2': [['a', 'b', 'c', 'd', 'e']]}
{'0': [['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']], '2': [['a', 'b', 'c', 'd', 'e']], '3': [['a', 'b', 'c', 'd', 'e']]}
{'0': [['a', 'b', 'c', 'd', 'e'], ['a', 'b', 'c', 'd', 'e']], '2': [['a', 'b', 'c', 'd', 'e']], '3': [['a', 'b', 'c', 'd', 'e']], '7': [['a', 'b', 'c', 'd', 'e']]}
You're appending letters, which by itself is a full list. Instead, you want to append the element in letter that corresponds to the index of the key you're looking at in the num list.
for idx, line in enumerate(num):
if line in num_letter_dict:
num_letter_dict[line].append(letters[idx]) # append the element
else:
num_letter_dict[line] = [letters[idx]]
Result:
>>> print(num_letter_dict)
{'0': ['a', 'c'], '2': ['b'], '3': ['d'], '7': ['e']}
You just need to append the individual letter to the relevant list inside the dictionary, and not the whole list of letters. The zip function will loop over corresponding values from the two input lists as shown:
num_letter_dict = {}
num = ['0', '2', '3', '0','7']
letters = ['a', 'b', 'c', 'd','e']
for n, letter in zip(num, letters):
if n not in num_letter_dict:
num_letter_dict[n] = []
num_letter_dict[n].append(letter)
print(num_letter_dict)
Gives:
{'0': ['a', 'c'], '3': ['d'], '7': ['e'], '2': ['b']}
I think this can help you.
num_letter_dict = {}
num = ['0', '2', '0', '3','7']
letters = ['a', 'b', 'c', 'd','e']
for value, line in zip(letters,num):
if line in num_letter_dict:
num_letter_dict[line].append(value) # append the element
else:
num_letter_dict[line] = [value]
Try the following code, I have modified the for loop and used extend
for i, letter in zip(num, letters):
if i not in num_letter_dict:
num_letter_dict[i] = []
num_letter_dict[i].extend(letter) #change over here
print(num_letter_dict)
Ahh, I had to deal with this same issue yesterday. While the other submitted answers would work, might I recommend the defaultdict from the Collections module?
from collections import defaultdict
num = ['0', '2', '3', '0','7']
letters = ['a', 'b', 'c', 'd','e']
num_letter_dict = defaultdict(list)
for n, letter in zip(num, letters):
num_letter_dict[n].append(letter)
print(num_letter_dict)
I like this approach because it allows the defaultdict class to do the construction of the list internally, rather than muddying up one's own source code with if-statements.
You can test-execute this solution on IDEOne: https://ideone.com/9Dg9Rk
I have a 2D list:
# # # ^ # ^ # ^
l = [['A', '1', '2'], ['B', 'xx', 'A'], ['C', 'B', 's'], ['D', 'd', 'B']]
and the first element in each list can be treated as an #ID string (in the example: A, B, C, D). Anywhere where the ID's (A, B, C, D) occur in the second dimension's lists I would like to replace it with the content of the actual list. Example: ['B', 'xx', 'A'] should become ['B', 'xx', ['A', '1', '2']] because A is an #ID (first string of list) and it occurs in the second list. Output should be:
n = [['A', '1', '2'], ['B', 'xx', ['A', '1', '2']], ['C', ['B', 'xx', ['A', '1', '2']], 's'],
['D', 'd', ['B', 'xx', ['A', '1', '2']]]]
The problem I am facing is that there can be longer lists and more branches so it's getting complicated. In the end I am trying to build a tree diagram. I was thinking of calculting first what is the highest branching but don't have a solution in mind yet.
l = [['A', '1', '2'], ['B', 'xx', 'A'], ['C', 'B', 's'], ['D', 'd', 'B']]
dic = {i[0]:i for i in l}
for i in l:
fv = i[0]
for j, v in enumerate(i):
if v in dic and j!=0:
dic[fv][j] = dic[v]
res = [v for i,v in dic.items()]
print(res)
output
[['A', '1', '2'],
['B', 'xx', ['A', '1', '2']],
['C', ['B', 'xx', ['A', '1', '2']], 's'],
['D', 'd', ['B', 'xx', ['A', '1', '2']]]]
Have you tried using a dictionary? If you have the ID's then you could possibly refer to them and then loop through the array and change entries. Below is what I had
l = [['A', '1', '2'], ['B', 'xx', 'A'], ['C', 'B', 's'], ['D', 'd', 'B'], ['E', 'C', 'b']]
dt = {}
for i in l:
dt[i[0]] = i
for i in range(len(l)):
for j in range(1, len(l[i])):
if(l[i][j] in dt):
l[i][j] = dt.get(l[i][j])
print(l)
Another more succinct version:
d = {item[0]: item for item in l}
for item in l:
item[1:] = [d.get(element, element) for element in item[1:]]
I would like to make a Python3 code using csv.reader.
This is an example file to read.
#hoge.txt
a b c d e f g
a b c d e f g
a b c d e f g
a b c d e f g
I want to have arrays like this
[[a,a,a,a],[b,b,b,b],[c,c,c,c]...[g,g,g,g]]
(The number of elements is fixed.)
My current code is
from csv import reader
with open('hoge.txt') as f:
data = reader(f, delimiter=' ')
But, apparently, it doesn't work.
How can I make it as if
data = reader(f, delimiter='\s+')
with open('hoge.txt', 'r') as fin:
data=[line.split() for line in fin]
this will give the output like
[['a', 'b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
['a', 'b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e', 'f', 'g']]
but since your desired output is different so
list1 = []
for i in range(0,len(data)):
list1.append([x[i] for x in data])
this will produce
[['a', 'a', 'a', 'a'], ['b', 'b', 'b', 'b'], ['c', 'c', 'c', 'c'], ['d', 'd', 'd', 'd']]
I hope it solves your issue.
Are you sure you've got CSV? Your example file is space-delimited, and my first approach is to use split(). Something like this:
allcols = []
with open("hoge.txt", "r") as f:
vals = f.read().split()
for i, el in enumerate(vals):
allcols[i].append(el)
If you really do have CSV but with extraneous spaces, then I'd still go with per-line processing, but like this:
from csv import reader
data = ""
with open("hoge.txt", "r") as f:
newline = f.read().strip(" ")
data.append(reader(newline))
hth