Related
I'm wondering if there's any beautiful/clean way of doing what I'm trying to :).
(I'm sure there is)
So My function receives a list of strings that can either contains strings in 2 format:
"12,13,14,15" or "12 to 15"
The goal is to parse the second type and replace the "to" by the numbers in the interval.
Delimiters between numbers doesn't matter, a regex will do the job after.
Here is pseudo code and an ugly implementation
The idea is to replace "to" in the list by the numbers in the interval so that I can easily parse numbers with a regex afterwards
# The list is really inconsistent, separators may change and it's hand filled so some comments like in the last example might be present
l = ["12,13,14,15",
"12 to 18",
"10,21,22 to 42",
"14,48,52",
"12,14,22;45 and also 24 to 32"
]
def process_list(l):
for x in l:
if "to" in x:
# Find the 2 numbers around the to and replace the "to" by ",".join(list([interval of number]))
final_list = numero_regex.findall(num)
return final_list
I think you don't need regex:
def process_list(l):
final_list = []
for s in l:
l2 = []
for n in s.split(','):
params = n.split(' to ')
nums = list(range(int(params[0]), int(params[-1])+1))
l2.extend(nums)
final_list.append(l2)
return final_list
Output:
>>> process_list(l)
[[12, 13, 14, 15],
[12, 13, 14, 15, 16, 17, 18],
[10, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42],
[14, 48, 52]]
Update:
I wanted an output for this case like this ["12,21,32;14, and the 12,13,14,15,[...],40"]. Which I can really easily parse with a regex
If you just want to replace 'number1 to number2', you can do:
def process_list(l):
def to_range(m):
return ','.join([str(i) for i in range(int(m.group('start')),
int(m.group('stop'))+1)])
return [re.sub(pat, to_range, s) for s in l]
Output:
# l = ["12,21,18 to 20;32;14, and the 12 to 16"]
>>> process_list(l)
['12,21,18,19,20;32;14, and the 12,13,14,15,16']
Here is one solution:
from itertools import chain
def split(s):
return list(chain(*(list(range(*list(map(int, x.split(' to ')))))+[int(x.split(' to ')[1])]
if ' to ' in x else
[int(x)]
for x in s.split(',')
)))
[split(e) for e in l]
output:
[[12, 13, 14, 15],
[12, 13, 14, 15, 16, 17, 18],
[10, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42],
[14, 48, 52]]
edit: I adapted the above solution to be used with regexex:
from itertools import chain
def split(s):
regex = re.compile('(\d+\s*to\s*\d+|\d+)')
return list(chain(*([int(x)] if x.isnumeric() else
list(range(*map(int, re.split('\s+to\s+', x))))
+[int(re.split('\s+to\s+', x)[-1])]
for x in regex.findall(s)
)))
I'm trying to read items from a .txt file that has the following:
294.nii.gz [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
296.nii.gz [[10, 13, 62], [40, 1, 64], [34, 0, 49], [27, 0, 49]]
312.nii.gz [[0, 27, 57], [25, 25, 63], [0, 42, 38], [0, 11, 21]]
The way I want to extract the data is:
Get the item name: 294.nii.gz
Item's coordinates serially: [9, 46, 54] [36, 48, 44] ...
Get the next item:
N.B. all the items have the same number of 3D coordinates.
So far I can read the data by following codes:
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
print(item.split(' ')[0])
This only prints the item names:
294.nii.gz
296.nii.gz
312.nii.gz
How do I get the rest of the data in the format I need?
So you have the fun task of converting a string representation of a list to a list.
To do this, you'll can use the ast library. Specifically, the ast.literal_eval method.
Disclaimer:
According to documentation:
Warning It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler.
This is NOT the same as using eval. From the docs:
Safely evaluate an expression node or a string containing a Python expression. The string or node provided may only consist of the following Python literal structures: strings, numbers, tuples, lists, dicts, booleans, and None.
This can be used for safely evaluating strings containing Python expressions from untrusted sources without the need to parse the values oneself.
You get the first part of the data with item.split(' ')[0].
Then, you'll use item.split(' ')[1:] to get (for example) a string with contents "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]".
If this is a risk you're willing to accept:
A demonstration using ast:
import ast
list_str = "[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]"
list_list = ast.literal_eval(list_str)
print(isinstance(list_list, list))
#Outputs True
print(list_list)
#Outputs [[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]]
Tying it together with your code:
import os
import ast
coortxt = os.path.join(coordir, 'coor_downsampled.txt')
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
name,coords_str = item.split(' ')[0], item.split(' ')[1:]
coords = ast.literal_eval(coords_str)
#name,coords now contain your required data
#use as needed
Relevant posts:
https://stackoverflow.com/a/10775909/5763413
How to convert string representation of list to a list?
Others have suggested using the dynamic evaluator eval in Python (and even the ast.literal_eval, which definitely works, but there are still ways to perform this kind of parsing without that.
Given that the formatting of the coordinate list in the coor_downsampled.txt file is very json-esque, we can parse it using the very cool json module instead.
NOTE:
There are sources claiming that json.loads is 4x faster than eval, and almost 7x faster than ast.literal_eval, which depending on if you are in the need for speed, I'd recommend using the faster option.
Complete example
import os
import json
coortxt = 'coor_downsampled.txt'
with open(coortxt) as f:
content = f.readlines()
content = [x.strip() for x in content]
for item in content:
# split the line just like you did in your own example
split_line = item.split(" ")
# the "name" is the first element
name = split_line[0]
# here's the tricky part.
coords = json.loads("".join(split_line[1:]))
print(name)
print(coords)
Explanation
Let's break down this tricky line coords = json.loads("".join(split_line[1:]))
split_line[1:] will give you everything past the first space, so something like this:
['[[9,', '46,', '54],', '[36,', '48,', '44],', '[24,', '19,', '46],', '[15,', '0,', '22]]']
But by wrapping it with a "".join(), we can turn it into
'[[9,46,54],[36,48,44],[24,19,46],[15,0,22]]' as a string instead.
Once we have it like that, we simply do json.loads() to get the actual list object
[[9, 46, 54], [36, 48, 44], [24, 19, 46], [15, 0, 22]].
What I have is like this:
(('3177000000000053', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 36, 42)),
('3177000000000035', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 37, 6 )))
It is exactly the way it looks in mysql database.
What I want is like this:
[[1,2,3],
[4,5,6]]
It's okay to have a series, dataframe, array, list.etc. I just want it to be managable for further analysing process.
I've tried several ways to deal with this such as dataframe(),list(),even pandas.replace(and it gives me a tuple-cant-be-replaced error).
I'm new to python, thanks for your answers!:)))))))))
In case you have tuple of tuples you may try the following list comprehension
import datetime
input_tuple = (('3177000000000053', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 36, 42)),
('3177000000000035', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 37, 6 )))
output_list = [[i for i in j] for j in input_tuple]
print(output_list)
I am a bit new to python and programming. In my code, I have developed a feature (which is a 1-D array of 39 elements) for each audio file. I want to write the name of the file, the feature and its target value {0,1} into a CSV file to train my SVM classifier. I used the CSV writer as follows.
with open('train.csv', 'a') as csvfile:
albumwriter = csv.writer(csvfile, delimiter=' ')
albumwriter.writerow(['1.03 I Want To Hold Your Hand'] + Final_feature + [0] )
I want to write the details of around 180 audio files to this CSV file and feed it to the SVM classifier. The code that I use to read the file is:
with open('train.csv', 'rb') as csvfile:
albumreader = csv.reader(csvfile, delimiter=' ')
data = list()
for row in albumreader:
data.append(row[0:])
data = np.array(data)
I can access the name of the file in the first row as data[0][1] and the feature as data[0][2] but both of them are in <type 'numpy.string_'>. I want to convert the feature into a list of floats. The main problem seems to be the ',' that separates the elements in the list. I tried using .astype(np.float) but in vain.
Can anyone suggest me a good method to convert the strings from the CSV file back to the floats? Your help is very much appreciated as I have very less time to complete this project. Thanks in advance.
Edit: As per the comment, this is how my train.csv looks like:
"1.01 I saw her standing there" "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]" 0
"1.02 I saw her" "[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]" 0
"1.03 I want to hold your hand" "[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]" 1
I don't get exactly what you want to achieve, but assuming Final_feature is a python list of floats, and according to your code snippets for writing the csv file, you get the list as a string which probably looks like this: (which you get in data[0][2])
feature = '[3.14, 2.12, 4.5]' # 3 elements only for clarity
You asked how to convert this string to float, you can use:
map(float, feature[1:-1].split(','))
For reference, map applies its first argument to every element of its second argument, thus transforming every string in a float and returning a list of floats.
Another solution would be to write each element of your Final_feature in a separate column.
To convert string like "[1.0, 2.0, 3.0]" to list [1.0, 2.0, 3.0]:
# string to convert
s = '[1.0, 2.0, 3.0]'
lst = [float(x) for x in s[1: -1].split(',')]
# and result will be
[1.0, 2.0, 3.0]
This works both with standard python string type and with numpy.string type.
From what I can see, the variable Final_feature is a list of floats? In which case based
on how you wrote the file the following will import the data
with open('train.csv', 'rb') as csvfile:
albumreader = csv.reader(csvfile, delimiter=' ')
audio_file_names = []
final_features = []
target_values = []
for row in albumreader:
audio_file_names.append(row[0])
final_features.append([float(s) for s in row[1:-1]])
target_values.append([int(s) for s in row[-1]])
There are two list comprehensions to convert the data into floats and integers.
There is a matter that I can't resolve in Python.
I'm trying to get lists by reading file (like .xml or .txt).
I've put my lists in a big list in my file like it :
[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]
Now I'm looking for code to get this big list like a list, not like a string. In deed, I've already try some parts of code with open(), write() and read() functions. But Python returned me :
'[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]'
And it isn't a list, just a string. So I can't use list's functions to modify it.
Thanks for those who will answer to my problem
well, a simple way is to parse it as a json string:
>>> import json
>>> l_str = '[[48,49,39,7,13,1,11],[46,27,19,15,24,8,4],[35,5,41,10,31,5,9],[12,9,22,2,36,9,2],[50,47,25,6,42,3,1]]'
>>> l = json.loads(l_str)
>>> print l
[[48, 49, 39, 7, 13, 1, 11], [46, 27, 19, 15, 24, 8, 4], [35, 5, 41, 10, 31, 5, 9], [12, 9, 22, 2, 36, 9, 2], [50, 47, 25, 6, 42, 3, 1]]
if you want to load a file that only contains that string, you can simply do it using the following:
>>> import json
>>> with open('myfile') as f:
>>> l = json.load(f)
>>> print l
[[48, 49, 39, 7, 13, 1, 11], [46, 27, 19, 15, 24, 8, 4], [35, 5, 41, 10, 31, 5, 9], [12, 9, 22, 2, 36, 9, 2], [50, 47, 25, 6, 42, 3, 1]]
But if what you want is to serialize python objects, then you should instead use pickle that's more powerful at that task…
Of course, there are other ways that others may give you to parse your string through an eval()-like function, but I strongly advice you against that, as this is dangerous and leads to insecure code. Edit: after reading #kamikai answer, I'm discovering about ast.literal_eval() which looks like a decent option as well, though json.loads() is more efficient.
If your example is truly representative of your data (i.e., your text file contains only a list of lists of integers), you can parse it as JSON:
import json
data = read_the_contents_of_the_file()
decoded = json.loads(data)
Replace data = read_the_contents_of_the_file() with your existing code for reading the contents as string.
As seen here, the inbuilt ast module is probably your best bet, assuming the text is still valid python.
import ast
ast.literal_eval("[[1,2,3], [4,5,6], [7,8,9]]") # Returns nested lists
Use json to load and parse the file:
import json
with open(my_file_path, "rb") as f:
my_list = json.load(my_file_path)