How to save a large array in txt file in python - python

I have 2D array:
import numpy as np
output = np.array([1,1,6])*np.arange(6)[:,None]+1
output
Out[32]:
array([[ 1, 1, 1],
[ 2, 2, 7],
[ 3, 3, 13],
[ 4, 4, 19],
[ 5, 5, 25],
[ 6, 6, 31]])
I tried to use np.savetxt('file1.txt', output, fmt='%10d')
i have got the result in one line only
How can I save it in txt file simillar to :
x y z
1 1 1
2 2 7
3 3 13
4 4 19
5 5 25
6 6 31
3 separate columns, each column has name (x,y,z)
Please note: the original array too large (40000000 rows and 3 columns), I am using Python 3.6
I have tried the solutions in here and here but, it does not work with me

Noor, let me guess - you are using windows notepad to view the file?
I use Notepad++ which is smart enough to understand Unix-style-Lineendings which are used (by default) when creating files by np.savetxt() even when operated under windows.
You might want to explicitly specify newline="\r\n" when calling savetxt.
np.savetxt('file1.txt', output, fmt='%10d' ,header= " x y z", newline="\r\n")
Doku: https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html
I am not sure about your data, but this:
import numpy as np
output = np.array([1,1,6])*np.arange(60)[:,None]+1
print(output)
np.savetxt('file1.txt', output, fmt='%10d' ,header= " x y z")
Produces this output:
# x y z
1 1 1
2 2 7
3 3 13
=== snipped a few lines ===
58 58 343
59 59 349
60 60 355
for me.
for np.arange(1000000) its about 32MB big and similarly formatted...
for np.arange(10000000) its about 322MB big and similarly formatted...
willem-van-onsem 1+Gb was far closer.
I did not account for the spacing of fixed 10 chars per number, my bad.

Related

how to randomize a matrix keeping the rows fixed in python

I'm trying to randomize all the rows of my DataFrame but with no success.
What I want to do is from this matrix
A= [ 1 2 3
4 5 6
7 8 9 ]
to this
A_random=[ 4 5 6
7 8 9
1 2 3 ]
I've tried with np. random.shuffle but it doesn't work.
I'm working in Google Colaboratory environment.
If you want to make this work with np.random.shuffle, then one way would be to extract the rows into an ArrayLike structure, shuffle them in place and then recreate the DataFrame:
A = pandas.DataFrame([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
extracted_rows = A.values.tolist() # Each row is an array element, so rows will remain fixed but their order shuffled
np.random.shuffle(extracted_rows)
A_random = pandas.DataFrame(extracted_rows)

Assign values from small matrix to specified places in larger matrix

I would like to know if there exists a similar way of doing this (Mathematica) in Python:
Mathematica
I have tried it in Python and it does not work. I have also tried it with numpy.put() or with simple 2 for loops. This 2 ways work properly but I find them very time consuming with larger matrices (3000×3000 elements for example).
Described problem in Python,
import numpy as np
a = np.arange(0, 25, 1).reshape(5, 5)
b = np.arange(100, 500, 100).reshape(2, 2)
p = np.array([0, 3])
a[p][:, p] = b
which outputs non-changed matrix a: Python
Perhaps you are looking for this:
a[p[...,None], p] = b
Array a after the above assignment looks like this:
[[100 1 2 200 4]
[ 5 6 7 8 9]
[ 10 11 12 13 14]
[300 16 17 400 19]
[ 20 21 22 23 24]]
As documented in Integer Array Indexing, the two integer index arrays will be broadcasted together, and iterated together, which effectively indexes the locations a[0,0], a[0,3], a[3,0], and a[3,3]. The assignment statement would then perform an element-wise assignment at these locations of a, using the respective element-values from RHS.

How to implement fast numpy array computation with multiple occuring slice indices?

I was recently wondering how I could by-pass the following numpy behavior.
Starting with an simple example:
import numpy as np
a = np.array([[1,2,3,4,5,6,7,8,9,0], [11, 12, 13, 14, 15, 16, 17, 18, 19, 10]])
then:
b = a.copy()
b[:, [0,1,4,8]] = b[:, [0,1,4,8]] + 50
print(b)
...results in printing:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
but also taking one index double into the slice then:
c = a.copy()
c[:, [0,1,4,4,8]] = c[:, [0,1,4,4,8]] + 50
print(c)
giving:
[[51 52 3 4 55 6 7 8 59 0]
[61 62 13 14 65 16 17 18 69 10]]
(in short; they do the same thing)
Could I also have that for index 4 it is executed 2 times?
Or more practically; Let the slice element i be given r times: Can we let the above expression be applied r times, instead of numpy just taking it once into account? Also if we replace "50" by something that differs for every occurance of i?
For my current code, I used:
w[p1] = w[p1] + D[pix]
where I define "pix", "p1" as some numpy arrays with dtype int, same length and some integers may appear multiple times.
(So one may have pix = [..., 1,1,1,2,2,3,...] at the same time as p1 = [..., 21,32,13,23,11,78,...], however, thus resulting on its own into taking for index 1 only the first 1 and the corresponding 21 and scraping the rest of the ones.)
Of course using a for loop would solve the problem easily. The point is that both the integers and the sizes of the arrays are huge, so it would cost a lot of computational resources to use for-loops instead of efficient numpy-array routines. Any ideas, links to existing documentation etc.?

Modifying Array Giving Back Wrong Output

I'm new to numpy and am trying to do some slicing and indexing with arrays. My goal is to take an array, and use slicing and indexing to square the last column, and then subtract the first column from that result. I then want to put the new column back into the old array.
I've been able to figure out how to slice and index the column to get the result I want for the last column. My problem however is that when I try to put it back into my original array, I get the wrong output (as seen below).
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
sliceColumnOne = theNumbers[:,0]
sliceColumnThree = theNumbers[:,3]**2
editColumnThree = sliceColumnThree - sliceColumnOne
newArray = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[editColumnThree]])
print("nums:\n{}".format(newArray))
I want the output to be
[[ 1 2 3 15]
[ 5 6 7 59]
[ 9 10 11 135]
[ 13 14 15 243]]
However mine becomes:
[list([1, 2, 3, 4]) list([5, 6, 7, 8]) list([9, 10, 11, 12])
list([array([ 15, 59, 135, 243])])]
Any suggestions on how to fix this?
Just assign the last numpy array row to the new one "theNumbers[3] = editColumnThree"
Code:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
sliceColumnOne = theNumbers[:,0]
sliceColumnThree = theNumbers[:,3]**2
editColumnThree = sliceColumnThree - sliceColumnOne
theNumbers[3] = editColumnThree
print("nums:\n{}".format(theNumbers))
Output:
[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[ 15 59 135 243]]
newArray = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[editColumnThree]])
print("nums:\n{}".format(newArray))
this way, editColumnThree is the last row, not column. You can use
newArray = theNumbers.copy() # if a copy is needed
newArray[:,-1] = editColumnThree # replace last (-1) column
If you just want to stack the vectors on top of eachother, use vstack:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
newNumbers = np.vstack(theNumbers)
print(newNumbers)
>>>[[ 1 2 3 4]
[ 5 6 7 8]
[ 9 10 11 12]
[13 14 15 16]]
But the issue here isn't just that you need to stack these numbers, you are mixing up columns and rows. You are changing a row instead of a column. To change the column, update the last element in each row:
import numpy as np
theNumbers = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]])
LastColumn = theNumbers[:,3]**2
FirstColumn = theNumbers[:,0]
editColumnThree = LastColumn - FirstColumn
for i in range(4):
theNumbers[i,3] = editColumnThree [i]
print(theNumbers)
>>>[[ 1 2 3 15]
[ 5 6 7 59]
[ 9 10 11 135]
[ 13 14 15 243]]

Creating columns with numpy Python

I have some elements stored in numpy.array[]. I wish to store them in a ".txt" file. The case is it needs to fit a certain standard, which means each element needs to be stored x lines into the file.
Example:
numpy.array[0] needs to start in line 1, col 26.
numpy.array[1] needs to start in line 1, col 34.
I use numpy.savetxt() to save the arrays to file.
Later I will implement this in a loop to create a lagre ".txt" file with coordinates.
Edit: This good example was provided below, it does point out my struggle:
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
The fmt option '%20d %10d' gives you spacing which depend on the last integer. What I need is an option which lets me set the spacing from the left side regardless of other integers.
Template is need to fit integers into:
XXXXXXXX.XXX YYYYYYY.YYY ZZZZ.ZZZ
Final Edit:
I solved it by creating a test which checks how many spaces the last float used. I was then able to predict the number of spaces the next float needed to fit the template.
Have you played with the fmt of np.savetxt?
Let me illustrate with a concrete example (the sort that you should have given us)
Make a 2 row array:
In [111]: A=np.arange((12)).reshape(2,6)
In [112]: A
Out[112]:
array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]])
Save it, and get 2 rows, 6 columns
In [113]: np.savetxt('test.txt',A,'%d')
In [114]: cat test.txt
0 1 2 3 4 5
6 7 8 9 10 11
save its transpose, and get 6 rows, 2 columns
In [115]: np.savetxt('test.txt',A.T,'%d')
In [116]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
Put more detail into fmt to space out the columns
In [117]: np.savetxt('test.txt',A.T,'%20d %10d')
In [118]: cat test.txt
0 6
1 7
2 8
3 9
4 10
5 11
I think you can figure out how to make a fmt string that puts your numbers in the correct columns (join 26 spaces etc, or use left and right justification - the usual Python formatting issues).
savetxt also takes an opened file. So you can open a file for writing, write one array, add some filler lines, and write another. Also, savetxt doesn't do anything fancy. It just iterates through the rows of the array, and writes each row to a line, e.g.
for row in A:
file.write(fmt % tuple(row))
So if you don't like the control that savetxt gives you, write the file directly.

Categories

Resources