I'm trying to read items from a .txt file that has the following:
294.nii.gz [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
296.nii.gz [[10, 13, 62], [40, 1, 64], [34, 0, 49], [27, 0, 49]]
312.nii.gz [[0, 27, 57], [25, 25, 63], [0, 42, 38], [0, 11, 21]]
The way I want to extract the data is:
Get the item name: 294.nii.gz
Item's coordinates serially: [9, 46, 54] [36, 48, 44] ...
Get the next item:
N.B. all the items have the same number of 3D coordinates.
So far I can read the data by following codes:
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
print(item.split(' ')[0])
This only prints the item names:
294.nii.gz
296.nii.gz
312.nii.gz
How do I get the rest of the data in the format I need?
So you have the fun task of converting a string representation of a list to a list.
To do this, you'll can use the ast library. Specifically, the ast.literal_eval method.
Disclaimer:
According to documentation:
Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.
This is NOT the same as using eval. From the docs:
Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
This can be used for safely evaluating strings containing Python expressions from untrusted sources without the need to parse the values oneself.
You get the first part of the data with item.split(' ')[0].
Then, you'll use item.split(' ')[1:] to get (for example) a string with contents "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]".
If this is a risk you're willing to accept:
A demonstration using ast:
import ast
list_str = "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]"
list_list = ast.literal_eval(list_str)
print(isinstance(list_list, list))
#Outputs True
print(list_list)
#Outputs [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
Tying it together with your code:
import os
import ast
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
name,coords_str = item.split(' ')[0], item.split(' ')[1:]
coords = ast.literal_eval(coords_str)
#name,coords now contain your required data
#use as needed
Relevant posts:
https://stackoverflow.com/a/10775909/5763413
How to convert string representation of list to a list?
Others have suggested using the dynamic evaluator eval in Python (and even the ast.literal_eval, which definitely works, but there are still ways to perform this kind of parsing without that.
Given that the formatting of the coordinate list in the coor_downsampled.txt file is very json-esque, we can parse it using the very cool json module instead.
NOTE:
There are sources claiming that json.loads is 4x faster than eval, and almost 7x faster than ast.literal_eval, which depending on if you are in the need for speed, I'd recommend using the faster option.
Complete example
import os
import json
coortxt = 'coor_downsampled.txt'
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
# split the line just like you did in your own example
split_line = item.split(" ")
# the "name" is the first element
name = split_line[0]
# here's the tricky part.
coords = json.loads("".join(split_line[1:]))
print(name)
print(coords)
Explanation
Let's break down this tricky line coords = json.loads("".join(split_line[1:]))
split_line[1:] will give you everything past the first space, so something like this:
['[[9,', '46,', '54],', '[36,', '48,', '44],', '[24,', '19,', '46],', '[15,', '0,', '22]]']
But by wrapping it with a "".join(), we can turn it into
'[[9,46,54],[36,48,44],[24,19,46],[15,0,22]]' as a string instead.
Once we have it like that, we simply do json.loads() to get the actual list object
[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]].
Related
I do realize this has already been addressed here (e.g., Removing duplicates in the lists), Accessing the index in 'for' loops?, Append indices to duplicate strings in Python efficiently and many more...... Nevertheless, I hope this question was different.
Pretty much I need to write a program that checks if a list has any duplicates and if it does, returns the duplicate element along with the indices.
The sample list sample_list
sample = """An article is any member of a class of dedicated words that are used with noun phrases to
mark the identifiability of the referents of the noun phrases. The category of articles constitutes a
part of speech. In English, both "the" and "a" are articles, which combine with a noun to form a noun
phrase."""
sample_list = sample.split()
my_list = [x.lower() for x in sample_list]
len(my_list)
output: 55
The common approach to get a unique collection of items is to use a set, set will help here to remove duplicates.
unique_list = list(set(my_list))
len(unique_list)
output: 38
This is what I have tried but honestly, I don't know what to do next...
from functools import partial
def list_duplicates_of(seq,item):
start_at = -1
locs = []
while True:
try:
loc = seq.index(item,start_at+1)
except ValueError:
break
else:
locs.append(loc)
start_at = loc
return locs
dups_in_source = partial(list_duplicates_of, my_list)
for i in my_list:
print(i, dups_in_source(i))
This returns all the elements with indices and duplicate indices
an [0]
article [1]
.
.
.
form [51]
a [6, 33, 48, 52]
noun [15, 26, 49, 53]
phrase. [54]
Here I want to return only duplicate elements along with their indices like below
of [5, 8, 21, 24, 30, 35]
a [6, 33, 48, 52]
are [12, 43]
with [14, 47]
.
.
.
noun [15, 26, 49, 53]
You could do something along these lines:
from collections import defaultdict
indeces = defaultdict(list)
for i, w in enumerate(my_list):
indeces[w].append(i)
for k, v in indeces.items():
if len(v) > 1:
print(k, v)
of [5, 8, 21, 24, 30, 35]
a [6, 33, 48, 52]
are [12, 43]
with [14, 47]
noun [15, 26, 49, 53]
to [17, 50]
the [19, 22, 25, 28]
This uses collections.defaultdict and enumerate to efficiently collect the indeces of each word. Ridding this of duplicates remains a simple conditional comprehension or loop with an if statement.
Here is the code:
a = [0, 11, 22, 33, 44, 55]
a[1:4][1] = 666
print(a)
The output is [0, 11, 22, 33, 44, 55]
So list a is not updated, then what is the effect of that assignment?
[UPDATE]
Thanks #Amadan for explanation, it makes sense. But I am still puzzled, the following slicing directly updates the list:
a[1:4] = [111, 222, 333]
Intuitively I expect a[1:4][1] still operates on the list, but it is not.
Is my intuition wrong?
a[1:4] creates a new list, whose elements are [11, 22, 33]. Then you replace its #1 element with 666, which results in a list [11, 666, 33]. Then, because this list is not referred to by any variable, it is forgotten and garbage collected.
Note that the result is very different if you have a numpy array instead of the list, since slicing of a numpy array creates a view, not a new array, if at all possible:
import numpy as np
a = np.array([0, 11, 22, 33, 44])
a[1:4][1] = 666
a
# => array([ 0, 11, 666, 33, 44])
Here, a[1:4] is not an independent [11, 22, 33], but a view into the original list, where changing a[1:4] actually changes a.
Just another solution to think off in case you didn't know the position of 22 (and wanted to replace it with 666) or didn't care about removing other items from the list.
a = [0, 11, 22, 33, 44, 55]
# make use of enumerate to keep track of when the item is 22 and replace that
# with the help of indexing count i.e the position at which 22 is and replace it
# with 666.
for count,item in enumerate(a):
if item==22:
a[count]=666
print(a)
Output:
>>>[0, 11, 666, 33, 44, 55]
Hope that helps, cheers!
I know sum(list) works to add ALL the elements in a list, but it doesn't allow you to select a range.
ex:
l = [11, 22, 33, 44, 55, 66, 77]
x = 4
In this case I want to add l[0 : 4] together.
I know I can do:
short_l = l[0 : x]
sum(short_l)
But is there a function that allows me to select the range of elements within a list to add together?
If you don't want to create a sublist, you can use itertools.islice:
>>> import itertools
>>> l = [11, 22, 33, 44, 55, 66, 77]
>>> sum(itertools.islice(l, 0, 4))
110
You can use the builtin slice function to get the range of items, like this
l, x = [11, 22, 33, 44, 55, 66, 77], 4
print(sum(l[slice(0, 4)]))
# 110
The parameters to slice are the same as the slicing syntax.
Why do you need a new function anyways? Just do sum(l[0:x]). If you really want a function, you can define one yourself:
def sum_range(lst, end, start=0):
return(sum(lst[start : end + 1]))
which adds from index start to end including end. And start is default to index 0 if not specified.
There is a matter that I can't resolve in Python.
I'm trying to get lists by reading file (like .xml or .txt).
I've put my lists in a big list in my file like it :
[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]
Now I'm looking for code to get this big list like a list, not like a string. In deed, I've already try some parts of code with open(), write() and read() functions. But Python returned me :
'[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]'
And it isn't a list, just a string. So I can't use list's functions to modify it.
Thanks for those who will answer to my problem
well, a simple way is to parse it as a json string:
>>> import json
>>> l_str = '[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]'
>>> l = json.loads(l_str)
>>> print l
[[48, 49, 39, 7, 13, 1, 11], [46, 27, 19, 15, 24, 8, 4], [35, 5, 41, 10, 31, 5, 9], [12, 9, 22, 2, 36, 9, 2], [50, 47, 25, 6, 42, 3, 1]]
if you want to load a file that only contains that string, you can simply do it using the following:
>>> import json
>>> with open('myfile') as f:
>>> l = json.load(f)
>>> print l
[[48, 49, 39, 7, 13, 1, 11], [46, 27, 19, 15, 24, 8, 4], [35, 5, 41, 10, 31, 5, 9], [12, 9, 22, 2, 36, 9, 2], [50, 47, 25, 6, 42, 3, 1]]
But if what you want is to serialize python objects, then you should instead use pickle that's more powerful at that task…
Of course, there are other ways that others may give you to parse your string through an eval()-like function, but I strongly advice you against that, as this is dangerous and leads to insecure code. Edit: after reading #kamikai answer, I'm discovering about ast.literal_eval() which looks like a decent option as well, though json.loads() is more efficient.
If your example is truly representative of your data (i.e., your text file contains only a list of lists of integers), you can parse it as JSON:
import json
data = read_the_contents_of_the_file()
decoded = json.loads(data)
Replace data = read_the_contents_of_the_file() with your existing code for reading the contents as string.
As seen here, the inbuilt ast module is probably your best bet, assuming the text is still valid python.
import ast
ast.literal_eval("[[1,2,3], [4,5,6], [7,8,9]]") # Returns nested lists
Use json to load and parse the file:
import json
with open(my_file_path, "rb") as f:
my_list = json.load(my_file_path)
This question already has answers here:
Converting a string that represents a list, into an actual list object [duplicate]
(5 answers)
Closed 8 years ago.
i have a string like this
sample="[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]"
how do i convert that to list? I am expecting the output to be list, like this
output=[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
I am aware of split() function but in this case if i use
sample.split(',')
it will take in the [ and ] symbols. Is there any easy way to do it?
EDIT Sorry for the duplicate post..I didn't see this post until now
Converting a string that represents a list, into an actual list object
If you're going to be dealing with Python-esque types (such as tuples for instance), you can use ast.literal_eval:
from ast import literal_eval
sample="[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]"
sample_list = literal_eval(sample)
print type(sample_list), type(sample_list[0]), sample_list
# <type 'list'> <type 'int'> [2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
you can use standard string methods with python:
output = sample.lstrip('[').rstrip(']').split(', ')
if you use .split(',') instead of .split(',') you will get the spaces along with the values!
you can convert all values to int using:
output = map(lambda x: int(x), output)
or load your string as json:
import json
output = json.loads(sample)
as a happy coincidence, json lists have the same notation as python lists! :-)