Python3 how to read a file delimited by spaces which length varies - python

I would like to make a Python3 code using csv.reader.
This is an example file to read.
#hoge.txt
a b c d e f g
a b c d e f g
a b c d e f g
a b c d e f g
I want to have arrays like this
[[a,a,a,a],[b,b,b,b],[c,c,c,c]...[g,g,g,g]]
(The number of elements is fixed.)
My current code is
from csv import reader
with open('hoge.txt') as f:
data = reader(f, delimiter=' ')
But, apparently, it doesn't work.
How can I make it as if
data = reader(f, delimiter='\s+')

with open('hoge.txt', 'r') as fin:
data=[line.split() for line in fin]
this will give the output like
[['a', 'b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e', 'f', 'g'],
['a', 'b', 'c', 'd', 'e', 'f', 'g'], ['a', 'b', 'c', 'd', 'e', 'f', 'g']]
but since your desired output is different so
list1 = []
for i in range(0,len(data)):
list1.append([x[i] for x in data])
this will produce
[['a', 'a', 'a', 'a'], ['b', 'b', 'b', 'b'], ['c', 'c', 'c', 'c'], ['d', 'd', 'd', 'd']]
I hope it solves your issue.

Are you sure you've got CSV? Your example file is space-delimited, and my first approach is to use split(). Something like this:
allcols = []
with open("hoge.txt", "r") as f:
vals = f.read().split()
for i, el in enumerate(vals):
allcols[i].append(el)
If you really do have CSV but with extraneous spaces, then I'd still go with per-line processing, but like this:
from csv import reader
data = ""
with open("hoge.txt", "r") as f:
newline = f.read().strip(" ")
data.append(reader(newline))
hth

Related

How to efficiently split a list that has a certain periodicity, into multiple lists?

For example the original list:
['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
We want to split the list into lists started with 'a' and ended with 'a', like the following:
['a','b','c','a']
['a','d','e','a']
['a','b','e','f','j','a']
['a','c','a']
The final ouput can also be a list of lists. I have tried a double for loop approach with 'a' as the condition, but this is inefficient and not pythonic.
One possible solution is using re (regex)
import re
l = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
r = [list(f"a{_}a") for _ in re.findall("(?<=a)[^a]+(?=a)", "".join(l))]
print(r)
# [['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]
You can do this in one loop:
lst = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
out = [[]]
for i in lst:
if i == 'a':
out[-1].append(i)
out.append([])
out[-1].append(i)
out = out[1:] if out[-1][-1] == 'a' else out[1:-1]
Also using numpy.split:
out = [ary.tolist() + ['a'] for ary in np.split(lst, np.where(np.array(lst) == 'a')[0])[1:-1]]
Output:
[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]
Firstly you can store the indices of 'a' from the list.
oList = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
idx_a = list()
for idx, char in enumerate(oList):
if char == 'a':
idx_a.append(idx)
Then for every consecutive indices you can get the sub-list and store it in a list
ans = [oList[idx_a[x]:idx_a[x + 1] + 1] for x in range(len(idx_a))]
You can also get more such lists if you take in-between indices also.
You can do this with a single iteration and a simple state machine:
original_list = list('kabcadeabefjacab')
multiple_lists = []
for c in original_list:
if multiple_lists:
multiple_lists[-1].append(c)
if c == 'a':
multiple_lists.append([c])
if multiple_lists[-1][-1] != 'a':
multiple_lists.pop()
print(multiple_lists)
[['a', 'b', 'c', 'a'], ['a', 'd', 'e', 'a'], ['a', 'b', 'e', 'f', 'j', 'a'], ['a', 'c', 'a']]
We can use str.split() to split the list once we str.join() it to a string, and then use a f-string to add back the stripped "a"s. Note that even if the list starts/ends with an "a", this the split list will have an empty string representing the substring before the split, so our unpacking logic that discards the first + last subsequences will still work as intended.
def split(data):
_, *subseqs, _ = "".join(data).split("a")
return [list(f"a{seq}a") for seq in subseqs]
Output:
>>> from pprint import pprint
>>> testdata = ['k','a','b','c','a','d','e','a','b','e','f','j','a','c','a','b']
>>> pprint(split(testdata))
[['a', 'b', 'c', 'a'],
['a', 'd', 'e', 'a'],
['a', 'b', 'e', 'f', 'j', 'a'],
['a', 'c', 'a']]

Merge 3 lists defined by 1 variable

I have a text (simplified):
a b c d
e f g
Then I have this code (MapReduce):
import sys
for line in sys.stdin:
words = line.strip().split()
print(words)
The variable words gives me:
['a', 'b', 'c', 'd']
[]
['e', 'f', 'g']
How to get:
['a', 'b', 'c', 'd', 'e', 'f', 'g']
Thanks!
Create words variable as an empty LIST and just add the values in it. Refer the code below. Also, print() should be used outside the loop.
import sys
words = []
for line in sys.stdin:
words += line.strip().split()
print(words)
words variable will be updated as this:
words = [ ] + ['a', 'b', 'c', 'd'] + [ ] + ['e', 'f', 'g']
Output:
['a', 'b', 'c', 'd', 'e', 'f', 'g']

How to remove floats of a list of lists?

For example:
[['D', 'D', '-', '1', '.', '0'],['+', '2', '.', '0', 'D', 'D'],['D', 'D', 'D']]
This is:
D D -1.0
+2.0 D D
D D D
I want to extract the values, put in differents variables and know the line and column where the signal was (so i can put symbol that corresponds to the old value).
D D x
y D D
D D D
[['D', 'D', '-1.0'],['+2.0', 'D', 'D'],['D', 'D', 'D']]
Don't create a list of list. Take directly the lines from your file and split them with the help of regular expressions:
maze = []
for line in arq:
maze.append(re.findall('[-+][0-9.]+|\S', line)
import itertools
merged = list(itertools.chain(*list2d))
print [x for x in merged if not (x.isdigit() or x in '-+.')]
Use re.findall. The pattern [-+]?\d*\.\d+|\d+ is used to extract float values from a string.
import re
list2d = [['D', 'D', '-', '1', '.', '0'],['+', '2', '.', '0', 'D', 'D'],['D', 'D', 'D']]
lists = list()
for l in list2d:
s = ''.join(l)
matches = re.findall(r"D|[-+]?\d*\.\d+|\d+", s)
lists.append(matches)
print(lists)
# Output
[['D', 'D', '-1.0'], ['+2.0', 'D', 'D'], ['D', 'D', 'D']]
I'm not sure if this is what you want, could add more information in your description.
import csv
csv_file = open("maze.txt")
csv_reader = csv.reader(csv_file)
maze = []
for line in csv_reader:
for char in line:
maze.append(char.split())
print(maze)
# Output
[['D', 'D', '-1.0'], ['+2.0', 'D', 'D'], ['D', 'D', 'D']]

Find the same elements from two lists and print the elements from both lists

There are two lists:
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l = ['a', 'c', 'e']
I want to find the same elements from these two lists, that is:
['a', 'c', 'e']
then I want to print out the element we found, for example, 'a' from both lists, that is: ['a', 'a', 'a'].
The result I want is as follows:
['a', 'a', 'a', 'c', 'c', 'c', 'e', 'e']
I try to doing in this way:
c = []
for item_k in k:
for item_j in j:
if item_k== item_j:
c.append(item_k)
c.append(item_j)
However, the result is ['a', 'a', 'c', 'c', 'e', 'e']
Also in this way:
c=[]
for item_k in k:
if item_k in l:
c.append(item_k)
d=l.count(item_k)
c.append(item_k*d)
print c
But it do not works, can anybody tell me how to do it? really appreciate your help in advance
result = [x for x in sorted(k + l) if x in k and x in l]
print(result)
results:
['a', 'a', 'a', 'c', 'c', 'c', 'e', 'e']
Since you want to pick up elements from both lists, the most straight forward way is probably to iterate over both while checking the other one (this is highly optimizatiable if you depend on speed for doing this):
merged = []
for el in list1:
if el in list2:
merged.append(el)
for el in list2:
if el in list1:
merged.append(el)
.. if the order of the elements is important, you'll have to define an iteration order (in what order do you look at what element from what array?).
If the lists are sorted and you want the result to be sorted:
sorted([x for x in list1 if x in set(list2)] + [x for x in list2 if x in set(list1)] )
You can use set operations to intersect and then loop through, appending to a new list any that match the intersected list
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l = ['a', 'c', 'e']
common_list = list(set(k).intersection(set(l)))
all_results = []
for item in k:
if item in common_list:
all_results.append(item)
for item in l:
if item in common_list:
all_results.append(item)
print sorted(all_results)
output:
['a', 'a', 'a', 'c', 'c', 'c', 'e', 'e']
Here's a compact way. Readability might suffer a little, but what fun are comprehensions without a little deciphering?
import itertools
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l = ['a', 'c', 'e']
combined = [letter for letter in itertools.chain(k,l) if letter in l and letter in k]
Here is an implementation that matches your initial algorithm:
k = ['a', 'a', 'b', 'b', 'c', 'c', 'd', 'e']
l=['a', 'c', 'e']
c=[]
for x in l:
count = 0
for y in k:
if x == y:
count += 1
while count>=0:
c.append(x)
count = count -1
print c

Move Beginning of List Up To Index To Back in Python

Let's say I had a list:
[a, b, c, d, e, f]
Given an index, say 3, what is a pythonic way to remove everything before
that index from the front of the list, and then add it to the back.
So if I was given index 3, I would want to reorder the list as
[d, e, f, a, b, c]
>>> l = ['a', 'b', 'c', 'd', 'e', 'f']
>>>
>>> l[3:] + l[:3]
['d', 'e', 'f', 'a', 'b', 'c']
>>>
or bring it into a function:
>>> def swap_at_index(l, i):
... return l[i:] + l[:i]
...
>>> the_list = ['a', 'b', 'c', 'd', 'e', 'f']
>>> swap_at_index(the_list, 3)
['d', 'e', 'f', 'a', 'b', 'c']
use the slice operation
e.g.,
myList = ['a', 'b','c', 'd', 'e', 'f']
myList[3:] + myList[:3]
gives
['d', 'e', 'f', 'a', 'b', 'c']
def foo(myList, x):
return myList[x:] + myList[:x]
Should do the trick.
Call it like this:
>>> aList = ['a', 'b' ,'c', 'd', 'e', 'f']
>>> print foo(aList, 3)
['d', 'e', 'f', 'a', 'b', 'c']
EDIT Haha all answers are the same...
The pythonic way it's that's sdolan said, i can only add the inline way:
>>> f = lambda l, q: l[q:] + l[:q]
so, you can use like:
>>> f([1,2,3,4,5,6], 3)
[4, 5, 6, 1, 2, 3]

Categories

Resources