Writing and reading floats and strings in a CSV file - python - python

I am a bit new to python and programming. In my code, I have developed a feature (which is a 1-D array of 39 elements) for each audio file. I want to write the name of the file, the feature and its target value {0,1} into a CSV file to train my SVM classifier. I used the CSV writer as follows.
with open('train.csv', 'a') as csvfile:
albumwriter = csv.writer(csvfile, delimiter=' ')
albumwriter.writerow(['1.03 I Want To Hold Your Hand'] + Final_feature + [0] )
I want to write the details of around 180 audio files to this CSV file and feed it to the SVM classifier. The code that I use to read the file is:
with open('train.csv', 'rb') as csvfile:
albumreader = csv.reader(csvfile, delimiter=' ')
data = list()
for row in albumreader:
data.append(row[0:])
data = np.array(data)
I can access the name of the file in the first row as data[0][1] and the feature as data[0][2] but both of them are in <type 'numpy.string_'>. I want to convert the feature into a list of floats. The main problem seems to be the ',' that separates the elements in the list. I tried using .astype(np.float) but in vain.
Can anyone suggest me a good method to convert the strings from the CSV file back to the floats? Your help is very much appreciated as I have very less time to complete this project. Thanks in advance.
Edit: As per the comment, this is how my train.csv looks like:
"1.01 I saw her standing there" "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38]" 0
"1.02 I saw her" "[2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40]" 0
"1.03 I want to hold your hand" "[3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41]" 1

I don't get exactly what you want to achieve, but assuming Final_feature is a python list of floats, and according to your code snippets for writing the csv file, you get the list as a string which probably looks like this: (which you get in data[0][2])
feature = '[3.14, 2.12, 4.5]' # 3 elements only for clarity
You asked how to convert this string to float, you can use:
map(float, feature[1:-1].split(','))
For reference, map applies its first argument to every element of its second argument, thus transforming every string in a float and returning a list of floats.
Another solution would be to write each element of your Final_feature in a separate column.

To convert string like "[1.0, 2.0, 3.0]" to list [1.0, 2.0, 3.0]:
# string to convert
s = '[1.0, 2.0, 3.0]'
lst = [float(x) for x in s[1: -1].split(',')]
# and result will be
[1.0, 2.0, 3.0]
This works both with standard python string type and with numpy.string type.

From what I can see, the variable Final_feature is a list of floats? In which case based
on how you wrote the file the following will import the data
with open('train.csv', 'rb') as csvfile:
albumreader = csv.reader(csvfile, delimiter=' ')
audio_file_names = []
final_features = []
target_values = []
for row in albumreader:
audio_file_names.append(row[0])
final_features.append([float(s) for s in row[1:-1]])
target_values.append([int(s) for s in row[-1]])
There are two list comprehensions to convert the data into floats and integers.

Related

Python: Finding strings in a list containing an interval and replacing this interval by every number in it

I'm wondering if there's any beautiful/clean way of doing what I'm trying to :).
(I'm sure there is)
So My function receives a list of strings that can either contains strings in 2 format:
"12,13,14,15" or "12 to 15"
The goal is to parse the second type and replace the "to" by the numbers in the interval.
Delimiters between numbers doesn't matter, a regex will do the job after.
Here is pseudo code and an ugly implementation
The idea is to replace "to" in the list by the numbers in the interval so that I can easily parse numbers with a regex afterwards
# The list is really inconsistent, separators may change and it's hand filled so some comments like in the last example might be present
l = ["12,13,14,15",
"12 to 18",
"10,21,22 to 42",
"14,48,52",
"12,14,22;45 and also 24 to 32"
]
def process_list(l):
for x in l:
if "to" in x:
# Find the 2 numbers around the to and replace the "to" by ",".join(list([interval of number]))
final_list = numero_regex.findall(num)
return final_list
I think you don't need regex:
def process_list(l):
final_list = []
for s in l:
l2 = []
for n in s.split(','):
params = n.split(' to ')
nums = list(range(int(params[0]), int(params[-1])+1))
l2.extend(nums)
final_list.append(l2)
return final_list
Output:
>>> process_list(l)
[[12, 13, 14, 15],
[12, 13, 14, 15, 16, 17, 18],
[10, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42],
[14, 48, 52]]
Update:
I wanted an output for this case like this ["12,21,32;14, and the 12,13,14,15,[...],40"]. Which I can really easily parse with a regex
If you just want to replace 'number1 to number2', you can do:
def process_list(l):
def to_range(m):
return ','.join([str(i) for i in range(int(m.group('start')),
int(m.group('stop'))+1)])
return [re.sub(pat, to_range, s) for s in l]
Output:
# l = ["12,21,18 to 20;32;14, and the 12 to 16"]
>>> process_list(l)
['12,21,18,19,20;32;14, and the 12,13,14,15,16']
Here is one solution:
from itertools import chain
def split(s):
return list(chain(*(list(range(*list(map(int, x.split(' to ')))))+[int(x.split(' to ')[1])]
if ' to ' in x else
[int(x)]
for x in s.split(',')
)))
[split(e) for e in l]
output:
[[12, 13, 14, 15],
[12, 13, 14, 15, 16, 17, 18],
[10, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42],
[14, 48, 52]]
edit: I adapted the above solution to be used with regexex:
from itertools import chain
def split(s):
regex = re.compile('(\d+\s*to\s*\d+|\d+)')
return list(chain(*([int(x)] if x.isnumeric() else
list(range(*map(int, re.split('\s+to\s+', x))))
+[int(re.split('\s+to\s+', x)[-1])]
for x in regex.findall(s)
)))

filter input string to array python

I am trying to make an array out of the following string:
'25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n'
Where only the numbers need to be added.
I have tried the following:
MyString.decode().strip('\r\n')
But then i just removed the '\r\n'
Question: Is there a way to filter on only the numbers and put it in an array?
EDIT:
array = [int(x) for x in data.split('\r\n')]
this seems to work, only not in my case.
I am working with bluetooth and so I am trying to read the outputstream.
here is my code:
def bluetooth_connect(self):
bd_addr = "98:D3:31:FB:14:C8" # MAC-address of our bluetooth-module
port = 1
sock = bluetooth.BluetoothSocket(bluetooth.RFCOMM)
sock.connect((bd_addr, port))
data = ""
while 1:
try:
data += sock.recv(1024)
data_end = data.find('\n')
array = []
if data_end != -1:
self.move_all_servos(data)
data = data[data_end + 1:]
array = [int(x) for x in data.split('\r\n')]
for i in range(0, leng(array)):
print(i)
except KeyboardInterrupt:
break
sock.close()
first i get the correct array, but after a while it crashes with this error:
array = [int(x) for x in data.split('\r\n')]
ValueError: invalid literal for int() with base 10: ''
The problem with [int(x) for x in data.split('\r\n')] is that the result will contain an empty string '' at the end, after the final \r\n. You could use a filter condition to remove it...
>>> [int(x) for x in data.split('\r\n') if x]
[25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25]
... or just use data.split() without parameter:
>>> [int(x) for x in data.split()]
[25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25, 35, 5, 15, 25]
From the documentation (emphasis mine):
S.split(sep=None, maxsplit=-1) -> list of strings
Return a list of the words in S, using sep as the
delimiter string. If maxsplit is given, at most maxsplit
splits are done. If sep is not specified or is None, any
whitespace string is a separator and empty strings are
removed from the result.
Use split.
s = '25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n35\r\n5\r\n15\r\n25\r\n'
s.split('\r\n')

How can I transfer a compiled tuple(rawly taken from Sql) in to a array, series, or a dataframe?

What I have is like this:
(('3177000000000053', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 36, 42)),
('3177000000000035', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 37, 6 )))
It is exactly the way it looks in mysql database.
What I want is like this:
[[1,2,3],
[4,5,6]]
It's okay to have a series, dataframe, array, list.etc. I just want it to be managable for further analysing process.
I've tried several ways to deal with this such as dataframe(),list(),even pandas.replace(and it gives me a tuple-cant-be-replaced error).
I'm new to python, thanks for your answers!:)))))))))
In case you have tuple of tuples you may try the following list comprehension
import datetime
input_tuple = (('3177000000000053', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 36, 42)),
('3177000000000035', '8018000000000498', datetime.datetime(2016, 9, 29, 21, 37, 6 )))
output_list = [[i for i in j] for j in input_tuple]
print(output_list)

Python string to list conversion [duplicate]

This question already has answers here:
Converting a string that represents a list, into an actual list object [duplicate]
(5 answers)
Closed 8 years ago.
i have a string like this
sample="[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]"
how do i convert that to list? I am expecting the output to be list, like this
output=[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
I am aware of split() function but in this case if i use
sample.split(',')
it will take in the [ and ] symbols. Is there any easy way to do it?
EDIT Sorry for the duplicate post..I didn't see this post until now
Converting a string that represents a list, into an actual list object
If you're going to be dealing with Python-esque types (such as tuples for instance), you can use ast.literal_eval:
from ast import literal_eval
sample="[2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]"
sample_list = literal_eval(sample)
print type(sample_list), type(sample_list[0]), sample_list
# <type 'list'> <type 'int'> [2, 6, 10, 14, 18, 22, 26, 30, 34, 38, 42, 46, 50]
you can use standard string methods with python:
output = sample.lstrip('[').rstrip(']').split(', ')
if you use .split(',') instead of .split(',') you will get the spaces along with the values!
you can convert all values to int using:
output = map(lambda x: int(x), output)
or load your string as json:
import json
output = json.loads(sample)
as a happy coincidence, json lists have the same notation as python lists! :-)

Python "Value Error: cannot delete array elements" -- Why am I getting this?

I haven't been able to find anything about this value error online and I am at a complete loss as to why my code is eliciting this response.
I have a large dictionary of around 50 keys. The value associated with each key is a 2D array of many elements of the form [datetime object, some other info]. A sample would look like this:
{'some_random_key': array([[datetime(2010, 10, 26, 11, 5, 28, 157404), 14.1],
[datetime(2010, 10, 26, 11, 5, 38, 613066), 17.2]],
dtype=object),
'some_other_key': array([[datetime(2010, 10, 26, 11, 5, 28, 157404), 'true'],
[datetime(2010, 10, 26, 11, 5, 38, 613066), 'false']],
dtype=object)}
What I want my code to do is to allow a user to select a start and stop date and remove all of the array elements (for all of the keys) that are not within that range.
Placing print statements throughout the code I was able to deduce that it can find the dates that are out of range, but for some reason, the error occurs when it attempts to remove the element from the array.
Here is my code:
def selectDateRange(dictionary, start, stop):
#Make a clone dictionary to delete values from
theClone = dict(dictionary)
starting = datetime.strptime(start, '%d-%m-%Y') #put in datetime format
ending = datetime.strptime(stop+' '+ '23:59', '%d-%m-%Y %H:%M') #put in datetime format
#Get a list of all the keys in the dictionary
listOfKeys = theClone.keys()
#Go through each key in the list
for key in listOfKeys:
print key
#The value associate with each key is an array
innerAry = theClone[key]
#Loop through the array and . . .
for j, value in enumerate(reversed(innerAry)):
if (value[0] <= starting) or (value[0] >= ending):
#. . . delete anything that is not in the specified dateRange
del innerAry[j]
return theClone
This is the error message that I get:
ValueError: cannot delete array elements
and it occurs at the line: del innerAry[j]
Please help - perhaps you have the eye to see the problem where I cannot.
Thanks!
If you use numpy arrays, then use them as arrays and not as lists
numpy does comparison elementwise for the entire array, which can then be used to select the relevant subarray. This also removes the need for the inner loop.
>>> a = np.array([[datetime(2010, 10, 26, 11, 5, 28, 157404), 14.1],
[datetime(2010, 10, 26, 11, 5, 30, 613066), 17.2],
[datetime(2010, 10, 26, 11, 5, 31, 613066), 17.2],
[datetime(2010, 10, 26, 11, 5, 32, 613066), 17.2],
[datetime(2010, 10, 26, 11, 5, 33, 613066), 17.2],
[datetime(2010, 10, 26, 11, 5, 38, 613066), 17.2]],
dtype=object)
>>> start = datetime(2010, 10, 26, 11, 5, 28, 157405)
>>> end = datetime(2010, 10, 26, 11, 5, 33, 613066)
>>> (a[:,0] > start)&(a[:,0] < end)
array([False, True, True, True, False, False], dtype=bool)
>>> a[(a[:,0] > start)&(a[:,0] < end)]
array([[2010-10-26 11:05:30.613066, 17.2],
[2010-10-26 11:05:31.613066, 17.2],
[2010-10-26 11:05:32.613066, 17.2]], dtype=object)
just to make sure we still have datetimes in there:
>>> b = a[(a[:,0] > start)&(a[:,0] < end)]
>>> b[0,0]
datetime.datetime(2010, 10, 26, 11, 5, 30, 613066)
NumPy arrays are fixed in size. Use lists instead.

Categories

Resources