Convert .xsf (or .dat) file to array with np.loadtxt Python - python

I have been checking some guides and also some of the questions that have been posted here but nothing has been working for me so far. I have an .xsf file which contains the first 57 lines as general instructions and then ca. 3*10^6 numbers. I want to load those numbers into a np.array and figured that the command
data = np.loadtxt('filename.xsf', skiprows = 57)
would do the trick.
That actually does not work because the data between line 58 and 531509 are organised as following
0.362077E+02 0.960500E+00 0.600950E+00 0.901461E-01 0.478295E-02 0.710280E-01
whereas the last line only contains one element. The error I get is
ValueError: Wrong number of columns at line 531510
I figured then to specify a delimiter (the double space)
data = np.loadtxt('filename.xsf', delimiter =' ' ,skiprows = 57)
this results in the inability of reading the file.
From my understanding my first attempt results in something which is not an array of floats but rather an array where every element is a list (taken from each line as a whole) of floats. Beeing the last line of the file a single number it does not match the format of the rest of the array. In the second case scenario I am struggling with the definition of the delimiter.
I know this is a often asked and answered question, but none of the methods i tried has been working. I am hence trying to provide as much contest as possible as to my problem. Thanks to everyone who is willing to contribute

It took me some time but I have seem to been able to find an answer to my question... which I am posting in order to get possible corrections
1- I have converted my file to a csv (probably not necessary)
2-
import itertools
data = []
with open('filename.csv') as f:
for line in f:
data.append(line.strip().split(','))
#this returns a list of lists each of which is a line from the file
data= list(itertools.chain.from_iterable(data))
#merges the list of list into a single list

Related

TypeError: list indices must be integers or slices, not tuple

fvecs = []
for line in open(filename):
stats = line.split(',')
labels.append(int(stats[0]))
fvecs.append([float(x) for x in stats[5,6,12,27,29,37,39,41]])
I have a big csv. file that I am using as a dataset containing 43 columns and hundreds of rows, I am attempting to extract specific columns to be used as individual records and I can't seem to work this out. The error is caused by the final line of code and produces the error message in the title, it works perfectly when the range is set to, stats[30:38] for example.
I have tried storing the required columns in a separate array and calling it like stats[requiredcolumns] but this produces the same error.
I have considered using pandas but this is just a small snippet of code from a much larger program, which all functions correctly, and the implementation of pandas would require a complete overhaul of the full program which is not possible due to time constraints.
Any help would be greatly appreciated
If you have few columns, you can try this:
for line in open(filename):
stats = line.split(',')
labels.append(int(stats[0]))
fvecs.append([float(x) for x in stats[5],stats[6],stats[12],stats[27], stats[29], stats[37], stats[39], stats[41]])
This code will return a list of lists; otherwise, the first comment is right about indexing and NumpPy.

Read complex numbers from a csv file using python

I am having problem reading complex number from a csv file.
The format of the file is the following:
( -353.10438 +j1.72317617 ),( -23.16000 +j0.72512251 )
I tried importing the data using numpy.genfromtxt:
data=genfromtxt(fname, dtype=complex, skip_header=10, skip_footer=212, delimiter=',')
But every time I have a complex entry it returns me nan+0.j.
I also tried removing the brackets before and after the number, and replacing the j with 1j* but it didn't work.
Any suggestions?
Thanks
I moved each 'j' to the position immediately behind the imaginary part of the complex number and squeezed out all the blanks to get a sample file like this.
(-353.10438+1.72317617j),(-23.16000+0.72512251j)
(-353.10438+1.72317617j),(-23.16000+0.72512251j)
(-353.10438+1.72317617j),(-23.16000+0.72512251j)
(-353.10438+1.72317617j),(-23.16000+0.72512251j)
Then I ran code similar to yours with a result similar to what follows.
>>> np.genfromtxt('fname.txt', dtype=complex, delimiter=',')
array([[-353.10438+1.72317617j, -23.16000+0.72512251j],
[-353.10438+1.72317617j, -23.16000+0.72512251j],
[-353.10438+1.72317617j, -23.16000+0.72512251j],
[-353.10438+1.72317617j, -23.16000+0.72512251j]])
I don't know exactly what you might have to do to get similar results, if indeed this approach will work for you at all.
Best of luck!
You can use
np.complex(str(a).replace('j', '') + 'j'
to first cast to a string, then shift the 'j' and cast back to a complex number.

Reading multidimensional array data into Python

I have data in the format of 10000x500 matrix contained in a .txt file. In each row, data points are separated from each other by one whitespace and at the end of each row there a new line starts.
Normally I was able to read this kind of multidimensional array data into Python by using the following snippet of code:
with open("position.txt") as f:
data = [line.split() for line in f]
# Get the data and convert to floats
ytemp = np.array(data)
y = ytemp.astype(np.float)
This code worked until now. When I try to use the exact some code with another set of data formatted in the same way, I get the following error:
setting an array element with a sequence.
When I try to get the 'shape' of ytemp, it gives me the following:
(10001,)
So it converts the rows to array, but not the columns.
I thought of any other information to include, but nothing came to my mind. Basically I'm trying to convert my data from a .txt file to a multidimensional array in Python. The code worked before, but now for some reason that is unclear to me it doesn't work. I tried to look compare the data, of course it's huge, but everything seems quite similar between the data that is working and the data that is not working.
I would be more than happy to provide any other information you may need. Thanks in advance.
Use numpy's builtin function:
data = numpy.loadtxt('position.txt')
Check out the documentation to explore other available options.

How to 'flatten' lines from text file if they meet certain criteria using Python?

To start I am a complete new comer to Python and programming anything other than web languages.
So, I have developed a script using Python as an interface between a piece of Software called Spendmap and an online app called Freeagent. This script works perfectly. It imports and parses the text file and pushes it through the API to the web app.
What I am struggling with is Spendmap exports multiple lines per order where as Freeagent wants One line per order. So I need to add the cost values from any orders spread across multiple lines and then 'flatten' the lines into One so it can be sent through the API. The 'key' field is the 'PO' field. So if the script sees any matching PO numbers, I want it to flatten them as per above.
This is a 'dummy' example of the text file produced by Spendmap:
5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP
COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,42.000,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000002,1133919,359.400,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
The above has been formatted for easier reading and normally is just one line after the next with no text formatting.
The 'key' or PO field is the first bold item and the second bold/italic item is the cost to be totalled. So if this example was to be passed through the script id expect the first row to be left alone, the Second and Third row costs to be added as they're both from the same PO number and the Fourth line to left alone.
Expected result:
5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP
COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,401.400,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP
COMMENT,002143
Any help with this would be greatly appreciated and if you need any further details just say.
Thanks in advance for looking!
I won't give you the solution. But you should:
Write and test a regular expression that breaks the line down into its parts, or use the CSV library.
Parse the numbers out so they're decimal numbers rather than strings
Collect the lines up by ID. Perhaps you could use a dict that maps IDs to lists of orders?
When all the input is finished, iterate over that dict and add up all orders stored in that list.
Make a string format function that outputs the line in the expected format.
Maybe feed the output back into the input to test that you get the same result. Second time round there should be no changes, if I understood the problem.
Good luck!
I would use a dictionary to compile the lines, using get(key,0.0) to sum values if they exist already, or start with zero if not:
InputData = """5090071648,2013-06-05,2013-09-05,P000001,1133997,223.010,20,2013-09-10,104,xxxxxx,AP COMMENT,002091
301067,2013-09-06,2013-09-11,P000002,1133919,42.000,20,2013-10-31,103,xxxxxx,AP COMMENT,002143
301067,2013-09-06,2013-09-11,P000002,1133919,359.400,20,2013-10-31,103,xxxxxx,AP COMMENT,002143
301067,2013-09-06,2013-09-11,P000003,1133910,23.690,20,2013-10-31,103,xxxxxx,AP COMMENT,002143"""
OutD = {}
ValueD = {}
for Line in InputData.split('\n'):
# commas in comments won't matter because we are joining after anyway
Fields = Line.split(',')
PO = Fields[3]
Value = float(Fields[5])
# set up the output string with a placeholder for .format()
OutD[PO] = ",".join(Fields[:5] + ["{0:.3f}"] + Fields[6:])
# add the value to the old value or to zero if it is not found
ValueD[PO] = ValueD.get(PO,0.0) + Value
# the output is unsorted by default, but you could sort or preserve original order
for POKey in ValueD:
print OutD[POKey].format(ValueD[POKey])
P.S. Yes, I know Capitals are for Classes, but this makes it easier to tell what variables I have defined...

Using Python to write a CSV file with delimiter

I'm new to programming, and also to this site, so my apologies in advance for anything silly or "newbish" I may say or ask.
I'm currently trying to write a script in python that will take a list of items and write them into a csv file, among other things. Each item in the list is really a list of two strings, if that makes sense. In essence, the format is [[Google, http://google.com], [BBC, http://bbc.co.uk]], but with different values of course.
Within the CSV, I want this to show up as the first item of each list in the first column and the second item of each list in the second column.
This is the part of my code that I need help with:
with open('integration.csv', 'wb') as f:
writer = csv.writer(f, delimiter=',', dialect='excel')
writer.writerows(w for w in foundInstances)
For whatever reason, it seems that the delimiter is being ignored. When I open the file in Excel, each cell has one list. Using the old example, each cell would have "Google, http://google.com". I want Google in the first column and http://google.com in the second. So basically "Google" and "http://google.com", and then below that "BBC" and "http://bbc.co.uk". Is this possible?
Within my code, foundInstances is the list in which all the items are contained. As a whole, the script works fine, but I cannot seem to get this last step. I've done a lot of looking around within stackoverflow and the rest of the Internet, but I haven't found anything that has helped me with this last step.
Any advice is greatly appreciated. If you need more information, I'd be happy to provide you with it.
Thanks!
In your code on pastebin, the problem is here:
foundInstances.append(['http://' + str(num) + 'endofsite' + ', ' + desc])
Here, for each row in your data, you create one string that already has a comma in it. That is not what you need for the csv module. The CSV module makes comma-delimited strings out of your data. You need to give it the data as a simple list of items [col1, col2, col3]. What you are doing is ["col1, col2, col3"], which already has packed the data into a string. Try this:
foundInstances.append(['http://' + str(num) + 'endofsite', desc])
I just tested the code you posted with
foundInstances = [[1,2],[3,4]]
and it worked fine. It definitely produces the output csv in the format
1,2
3,4
So I assume that your foundInstances has the wrong format. If you construct the variable in a complex manner, you could try to add
import pdb; pdb.set_trace()
before the actual variable usage in the csv code. This lets you inspect the variable at runtime with the python debugger. See the Python Debugger Reference for usage details.
As a side note, according to the PEP-8 Style Guide, the name of the variable should be found_instances in Python.

Categories

Resources