Reading arrays from .txt file as numbers instead of strings - python

I'm using an automatic data acquisition software that exports the data as .txt files. I then imported the file into python (using the pandas package and turning the columns into arrays) but I'm facing a problem. Python can't "read" the data because the automatic data acquisition software exported it into the following number format, and so Python is treating each entry of the array as a string instead of a number:
Is there any way I can "teach" python to read my data? Or to automatically rewrite the entries in the array so they're read as numbers?

You can simply change the comma in the strings with a dot and use float() to parse it.
number = float('7,025985E-36'.replace(',', '.'))
print(number)
print(type(number))
The above code would print:
7.025985e-36
<class 'float'>

You can try this way.
>>> value=str('7,025985e-36')
>>> value2=value.replace(',', '.')
>>> float(value2)
7.025985e-36

Related

Pandas: Read CSV: ValueError: could not convert string to float

I'm trying to read a large and complex CSV file with pandas.read_csv.
The exact command is
pd.read_csv(filename, quotechar='"', low_memory=True, dtype=data_types, usecols= columns, true_values=['T'], false_values=['F'])
I am pretty sure that the data types are correct. I can read the first 16 million lines (setting nrows=16000000) without problems but somewhere after this I get the following error
ValueError: could not convert string to float: '1,123'
As it seems, for some reason pandas thinks two columns would be one.
What could be the problem? How can I fix it?
I found the mistake. The problem was a thousand separator.
When writing the CSV file, most numbers were below thousand and were correctly written to the CSV file. However, this one value was greater than thousand and it was written as "1,123" which pandas did not recognize as a number but as a string.

Read in binary data with python

I am very new to Python and I am trying to read in a file that partially contains binary data. There is a header with some information about the data and after the header binary data follow. If one opens the file in a texteditor it looks like this:
>>> Begin of header <<<
value1: 5
value2: 7
...
value65: 9
>>> End of header <<<
���ÄI›C¿���†¨¨v#���ÄW]c¿��� U⁄z#���#¬P\¿����∂:q#���#Ò˚U¿���†÷Us#���`ªw4¿��� :‘m#���#À›9#���ÄAs#���¿‹ ¿����ır#���¿#&%#���†„bq#����*˙-#��� [q#����ÚN8#����
Òo#���#√·T#���†‰zm#����9\#����ÃÜq#����€dZ#���`Ëäs#���†∏8I#���¿¬Ot#���†�6
an additional problem is that I did not create the file myself and do not now if those are double or float data.
So how can I interpret those data?
So first, thanks to all for the help: So basically the problem is the header. I can read in the data quit well, when i remove the header from the file. This can be done with
x = numpy.fromfile(f, dtype = numpy.complex128 , count = -1)
quite easily. The problem is that I cannot find any option for the function fromfile that skips lines (one can skip bytes, but the header size may be different from file to file.
In this great thread I found the how to convert an binary array to an numpy array:
convert binary string to numpy array
With this I could overcome the problem by reading in the datafile line for line and then merge every line after the end header line together in one string.
This string was then changed into an nice array exactly as I wanted it.

Writing Integers to a File

I'm having a really difficult time writing integers out to a file. Here's my situation. I have a file, let's call it 'idlist.txt'. It has multiple columns and is fairly long (10,000 rows), but I only care about the first column of data.
I'm loading it into python using:
import numpy as np
FH = np.loadtxt('idlist.txt',delimiter=',',comments='#')
# Testing initial data type
print FH[0,0],type(FH[0,0])
>>> 85000370342.0 <type 'numpy.float64'>
# Converting to integers
F = [int(FH[i,0]) for i in range(len(FH))]
print F[0],type(F[0])
>>> 85000370342 <type 'long'>
As you can see, the data must be made into integers. What I now would like to do is to write the entries of this list out as the first column of another file (really the only column in the entire file), we can rename it 'idonly.txt'. Here is how I'm trying to do it:
with open('idonly.txt','a') as f:
for i in range(len(F)):
f.write('%d\n' % (F[i]))
This is clearly not producing the desired output - when I open the file 'idonly.txt', each entry is actually a float (i.e. - 85000370342.0). What exactly is going on here, and why is writing integers to a file such a complicated task? I found the string formatting idea from here: How to write integers to a file , but it didn't fix my issue.
Okay, well it appears that this is completely my fault. When I'm opening the file I'm using the mode 'a', which means append. It turns out that the first time I wrote this out to a file I did it incorrectly, and ever since I've been appending the correct answer onto that and simply not looking down as far as I should since it's a really long file.
For reference here are all of the modes you can use when handling files in python: http://www.tutorialspoint.com/python/python_files_io.htm. Choose carefully.
Try to use:
f.write('{}'.format(F[i]));

Using numpy.loadtxt, how does one convert strings in the .txt file into integer values/floats?

So, I have a .txt file that I want to read into Pylab. The problem is that, when I try to do so using the numpy.loadtxt("filename.txt"), Pylab cannot read the numbers in the array in my file as float values (it returns the error: cannot convert string to float.).
I am not sure if there is something wrong with my syntax as above; when I remove the quotation marks inside the parentheses, numpy.loadtxt(filename.txt), Pylab returns the error: filename is not defined.
Any suggestions on how to read a series of numbers saved in a .txt file into Pylab as an array of floats?
You need to provide sample lines in your filename.txt file. I guess you may need to read the doc for numpy.loadtxt here. There are some good examples on the document page.
BTW, the second command numpy.loadtxt(filename.txt) is wrong since you have not defined a variable filename.

write numpy array with its size to binary file

I need to write a 2D numpy array to a file, including its dimensions so I can read it from a C++ program and create the corresponding array.
I have written some simple code that saves the array and it can be read from C++, but if I try to write the array's size first it always gives me an error.
Here's my simple python code:
1 file = open("V.bin","wb")
2 file.write(V.shape)
3 file.write(V)
4 file.close()
The second line gives the error, I've also tried:
n1, n2 = V.shape
file.write(n1)
file.write(n2)
But it doesn't work either.
I'm adding the error it shows:
Traceback (most recent call last):
file.write(V.shape[0])
TypeError: must be string or buffer, not int
Thanks!
You can use numpy.save() which saves in binary.
You can use numpy.savetext if you want to save it as ascii.
Alternatively (since it looks like you're dealing with binary data), if you want to save the raw data stream, you could use ndarray.tostring to get a string of bytes that you can dump to the file directly.
The advantage of this approach is that you can create your own file format. The downside is that you need to create a string in order to actually write it to the file.
And since you say that you're getting an error on the second line, it's an error because f.write expects a string. You're trying to pass it a tuple or ints. You could use struct.pack to solve this problem:
f.write(struct.pack('2i',*array.shape))

Categories

Resources