There is a matter that I can't resolve in Python.
I'm trying to get lists by reading file (like .xml or .txt).
I've put my lists in a big list in my file like it :
[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]
Now I'm looking for code to get this big list like a list, not like a string. In deed, I've already try some parts of code with open(), write() and read() functions. But Python returned me :
'[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]'
And it isn't a list, just a string. So I can't use list's functions to modify it.
Thanks for those who will answer to my problem
well, a simple way is to parse it as a json string:
>>> import json
>>> l_str = '[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]'
>>> l = json.loads(l_str)
>>> print l
[[48, 49, 39, 7, 13, 1, 11], [46, 27, 19, 15, 24, 8, 4], [35, 5, 41, 10, 31, 5, 9], [12, 9, 22, 2, 36, 9, 2], [50, 47, 25, 6, 42, 3, 1]]
if you want to load a file that only contains that string, you can simply do it using the following:
>>> import json
>>> with open('myfile') as f:
>>> l = json.load(f)
>>> print l
[[48, 49, 39, 7, 13, 1, 11], [46, 27, 19, 15, 24, 8, 4], [35, 5, 41, 10, 31, 5, 9], [12, 9, 22, 2, 36, 9, 2], [50, 47, 25, 6, 42, 3, 1]]
But if what you want is to serialize python objects, then you should instead use pickle that's more powerful at that task…
Of course, there are other ways that others may give you to parse your string through an eval()-like function, but I strongly advice you against that, as this is dangerous and leads to insecure code. Edit: after reading #kamikai answer, I'm discovering about ast.literal_eval() which looks like a decent option as well, though json.loads() is more efficient.
If your example is truly representative of your data (i.e., your text file contains only a list of lists of integers), you can parse it as JSON:
import json
data = read_the_contents_of_the_file()
decoded = json.loads(data)
Replace data = read_the_contents_of_the_file() with your existing code for reading the contents as string.
As seen here, the inbuilt ast module is probably your best bet, assuming the text is still valid python.
import ast
ast.literal_eval("[[1,2,3], [4,5,6], [7,8,9]]") # Returns nested lists
Use json to load and parse the file:
import json
with open(my_file_path, "rb") as f:
my_list = json.load(my_file_path)
Related
I'm trying to read items from a .txt file that has the following:
294.nii.gz [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
296.nii.gz [[10, 13, 62], [40, 1, 64], [34, 0, 49], [27, 0, 49]]
312.nii.gz [[0, 27, 57], [25, 25, 63], [0, 42, 38], [0, 11, 21]]
The way I want to extract the data is:
Get the item name: 294.nii.gz
Item's coordinates serially: [9, 46, 54] [36, 48, 44] ...
Get the next item:
N.B. all the items have the same number of 3D coordinates.
So far I can read the data by following codes:
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
print(item.split(' ')[0])
This only prints the item names:
294.nii.gz
296.nii.gz
312.nii.gz
How do I get the rest of the data in the format I need?
So you have the fun task of converting a string representation of a list to a list.
To do this, you'll can use the ast library. Specifically, the ast.literal_eval method.
Disclaimer:
According to documentation:
Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.
This is NOT the same as using eval. From the docs:
Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
This can be used for safely evaluating strings containing Python expressions from untrusted sources without the need to parse the values oneself.
You get the first part of the data with item.split(' ')[0].
Then, you'll use item.split(' ')[1:] to get (for example) a string with contents "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]".
If this is a risk you're willing to accept:
A demonstration using ast:
import ast
list_str = "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]"
list_list = ast.literal_eval(list_str)
print(isinstance(list_list, list))
#Outputs True
print(list_list)
#Outputs [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
Tying it together with your code:
import os
import ast
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
name,coords_str = item.split(' ')[0], item.split(' ')[1:]
coords = ast.literal_eval(coords_str)
#name,coords now contain your required data
#use as needed
Relevant posts:
https://stackoverflow.com/a/10775909/5763413
How to convert string representation of list to a list?
Others have suggested using the dynamic evaluator eval in Python (and even the ast.literal_eval, which definitely works, but there are still ways to perform this kind of parsing without that.
Given that the formatting of the coordinate list in the coor_downsampled.txt file is very json-esque, we can parse it using the very cool json module instead.
NOTE:
There are sources claiming that json.loads is 4x faster than eval, and almost 7x faster than ast.literal_eval, which depending on if you are in the need for speed, I'd recommend using the faster option.
Complete example
import os
import json
coortxt = 'coor_downsampled.txt'
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
# split the line just like you did in your own example
split_line = item.split(" ")
# the "name" is the first element
name = split_line[0]
# here's the tricky part.
coords = json.loads("".join(split_line[1:]))
print(name)
print(coords)
Explanation
Let's break down this tricky line coords = json.loads("".join(split_line[1:]))
split_line[1:] will give you everything past the first space, so something like this:
['[[9,', '46,', '54],', '[36,', '48,', '44],', '[24,', '19,', '46],', '[15,', '0,', '22]]']
But by wrapping it with a "".join(), we can turn it into
'[[9,46,54],[36,48,44],[24,19,46],[15,0,22]]' as a string instead.
Once we have it like that, we simply do json.loads() to get the actual list object
[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]].
Here is the code:
a = [0, 11, 22, 33, 44, 55]
a[1:4][1] = 666
print(a)
The output is [0, 11, 22, 33, 44, 55]
So list a is not updated, then what is the effect of that assignment?
[UPDATE]
Thanks #Amadan for explanation, it makes sense. But I am still puzzled, the following slicing directly updates the list:
a[1:4] = [111, 222, 333]
Intuitively I expect a[1:4][1] still operates on the list, but it is not.
Is my intuition wrong?
a[1:4] creates a new list, whose elements are [11, 22, 33]. Then you replace its #1 element with 666, which results in a list [11, 666, 33]. Then, because this list is not referred to by any variable, it is forgotten and garbage collected.
Note that the result is very different if you have a numpy array instead of the list, since slicing of a numpy array creates a view, not a new array, if at all possible:
import numpy as np
a = np.array([0, 11, 22, 33, 44])
a[1:4][1] = 666
a
# => array([ 0, 11, 666, 33, 44])
Here, a[1:4] is not an independent [11, 22, 33], but a view into the original list, where changing a[1:4] actually changes a.
Just another solution to think off in case you didn't know the position of 22 (and wanted to replace it with 666) or didn't care about removing other items from the list.
a = [0, 11, 22, 33, 44, 55]
# make use of enumerate to keep track of when the item is 22 and replace that
# with the help of indexing count i.e the position at which 22 is and replace it
# with 666.
for count,item in enumerate(a):
if item==22:
a[count]=666
print(a)
Output:
>>>[0, 11, 666, 33, 44, 55]
Hope that helps, cheers!
I am attempting to take an RDD containing pairs of integer ranges, and transform it so that each pair has a third term which iterates through the possible values in the range. Basically, I've got this:
[[1,10], [11,20], [21,30]]
And I'd like to end up with this:
[[1,1,10], [2,1,10], [3,1,10], [4,1,10], [5,1,10]...]
The file I'd like to transform is very large, which is why I'm looking to do this with PySpark rather than just Python on a local machine (I've got a way to do it locally on a CSV file, but the process takes several hours given the file's size). So far, I've got this:
a = [[1,10], [11,20], [21,30]]
b = sc.parallelize(a)
c = b.map(lambda x: [range(x[0], x[1]+1), x[0], x[1]])
c.collect()
Which yields:
>>> c.collect()
[[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], 1, 10], [[11, 12, 13, 14, 15, 16, 17, 18, 19, 20], 11, 20], [[21, 22, 23, 24, 25, 26, 27, 28, 29, 30], 21, 30]]
I can't figure out what the next step needs to be from here, to iterate over the expanded range, and pair each of those with the range delimiters.
Any ideas?
EDIT 5/8/2017 3:00PM
The local Python technique that works on a CSV input is:
import csv
import gzip
csvfile_expanded = gzip.open('C:\output.csv', 'wb')
ranges_expanded = csv.writer(csvfile_expanded, delimiter=',', quotechar='"')
csvfile = open('C:\input.csv', 'rb')
ranges = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in ranges:
for i in range(int(row[0]),int(row[1])+1):
ranges_expanded.writerow([i,row[0],row[1])
The PySpark script I'm questioning begins with the CSV file already having been loaded into HDFS and cast as an RDD.
Try this:
c = b.flatMap(lambda x: ([y, x[0], x[1]] for y in xrange(x[0], x[1]+1)))
The flatMap() ensures that you get one output record per element of the range. Note also the outer ( ) in conjunction with the xrange -- this is a generator expression that avoids materialising the entire range in memory of the executor.
Note: xrange() is Python2. If you are running Python3, use range()
I have created a text file in one program, which outputted the numbers 1 to 25 in a pseudo-random order, for example like so:
[21, 19, 14, 22, 18, 23, 25, 10, 6, 9, 1, 13, 2, 7, 5, 12, 8, 20, 24, 15, 17, 4, 11, 3, 16]
Now I have another python file which is supposed to read the file I created earlier and use a sorting algorithm to sort the numbers.
The problem is that I can't seem to figure out how to read the list I created earlier into the file as a list.
Is there actually a way to do this? Or would I be better of to rewrite my output program somehow, so that I can cast the input into a list?
If your file looks like:
21
19
14
22
18
23
...
use this:
with open('file') as f:
mylist = [int(i.strip()) for i in f]
If it really looks like a list like [21, 19, 14, 22...], here is a simple way:
with open('file') as f:
mylist = list(map(int, f.read().lstrip('[').rstrip(']\n').split(', ')))
And if your file not strictly conforms to specs. For example it looks like [ 21,19, 14 , 22...]. Here is another way that use regex:
import re
with open('file') as f:
mylist = list(map(int, re.findall('\d+', f.read())))
If you don't want to change the output of your current script, you may use ast.literal_eval()
import ast
with open ("output.txt", "r") as f:
array=ast.literal_eval(f.read())
This question already has answers here:
Converting a string that represents a list, into an actual list object [duplicate]
(5 answers)
Closed 8 years ago.
i have a string like this
sample="[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]"
how do i convert that to list? I am expecting the output to be list, like this
output=[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
I am aware of split() function but in this case if i use
sample.split(',')
it will take in the [ and ] symbols. Is there any easy way to do it?
EDIT Sorry for the duplicate post..I didn't see this post until now
Converting a string that represents a list, into an actual list object
If you're going to be dealing with Python-esque types (such as tuples for instance), you can use ast.literal_eval:
from ast import literal_eval
sample="[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]"
sample_list = literal_eval(sample)
print type(sample_list), type(sample_list[0]), sample_list
# <type 'list'> <type 'int'> [2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
you can use standard string methods with python:
output = sample.lstrip('[').rstrip(']').split(', ')
if you use .split(',') instead of .split(',') you will get the spaces along with the values!
you can convert all values to int using:
output = map(lambda x: int(x), output)
or load your string as json:
import json
output = json.loads(sample)
as a happy coincidence, json lists have the same notation as python lists! :-)