Putting a small 3D numpy array into a larger 3D numpy array - python

I want to do essentially what this person is doing but in 3D:
How can I add small 2D array to larger array?
Currently I am generated a large empty numpy array:
BigArray = np.zeros((2048,2048,1000), np.float16)
Then I am trying to add data into this array using the method shown in the stackoverflow question I linked.
The array I want to add is a smaller numpy array of 100,100,100 dimensions and the locations are given in a list of X,Y,Z coords that define the centre. Therefore my code reads as follows:
SmallArray = np.random.random((100,100,100))
X = 1000
Y = 1000
Z = 500
BigArray([X-49:Y+50, Y-49:Y+50, Z-49:Z+50]) = SmallArray
However I am getting an error that the first colon is a syntax error and I am not sure why.
Any and all help is much appreciated, thank you.
edit: typos.

Just remove the parenthesis:
BigArray[X-49:Y+50, Y-49:Y+50, Z-49:Z+50] = SmallArray
Also you will need to fix the dimensions since as-is they do not match, i.e.:
BigArray[X-50:Y+50, Y-50:Y+50, Z-50:Z+50] = SmallArray
Edit:
Also you have couple of typos in the code (like using zeroes instead of zeros and np.random instead of np.random.random), but I assume that you are not interested in those

It is interesting for me to write an very simple similar program:
#!/usr/bin/env python3
# encoding: utf-8
#如果上述二列對調,卻使用./執行的話程式就會進入無窮迴圈
#此列就是註解
#template定義時間:2019/7/5 v0.2
import sys
print (f'Python版本號={sys.version}')
#---開始include---
import numpy as np
#---開始定義class---
#---開始定義function---
print ('sam說:---主程式開始---')
if __name__=='__main__':
big_arr1 = np.random.rand(5,5,5)
small_arr1 = np.zeros((3,3,3), np.float16)
center_x = center_y = center_z = 2
#center_x-1:center_x+2代表有center_x-1、center_x、center_x+1三個
# center_y-1:center_y+2代表有center_y-1、center_y、center_y+1三個
# center_z-1:center_z+2代表有center_z-1、center_z、center_z+1三個
big_arr1[center_x-1:center_x+2,center_y-1:center_y+2,center_z-1:center_z+2]=small_arr1
print(f'big_arr1={big_arr1}')
print ('sam說:---程式結束---')
And it shows the result:
C:\Users\sam_lin\PycharmProjects\untitled\venv\Scripts\python.exe C:/Users/sam_lin/PycharmProjects/untitled/t1.py
Python版本號=3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)]
sam說:---主程式開始---
big_arr1=[[[0.97776624 0.67305297 0.46804029 0.49918114 0.5827188 ]
[0.13722586 0.74721955 0.4252602 0.0834001 0.46595479]
[0.81273576 0.93105509 0.3550111 0.86397551 0.43013513]
[0.9162833 0.73321085 0.30299702 0.88372877 0.08543336]
[0.28145705 0.21005119 0.66785858 0.46601126 0.08057115]]
[[0.96049458 0.92891106 0.59779765 0.24660834 0.28381255]
[0.78940735 0. 0. 0. 0.79491548]
[0.65879618 0. 0. 0. 0.22152894]
[0.56392902 0. 0. 0. 0.543028 ]
[0.14708571 0.16269789 0.37162561 0.19991273 0.65671269]]
[[0.78414023 0.77322729 0.93637718 0.5180622 0.18265001]
[0.68605786 0. 0. 0. 0.50407885]
[0.98517971 0. 0. 0. 0.04644972]
[0.93118566 0. 0. 0. 0.79937072]
[0.97311219 0.47284717 0.82679315 0.76999545 0.82846718]]
[[0.0707382 0.46587191 0.89547718 0.85520751 0.62759826]
[0.4973629 0. 0. 0. 0.97606185]
[0.99327387 0. 0. 0. 0.28557434]
[0.31332736 0. 0. 0. 0.34476194]
[0.12757201 0.90843887 0.68510409 0.98262188 0.48204498]]
[[0.93682369 0.88788666 0.17488394 0.68499998 0.09336082]
[0.39821201 0.39974341 0.14220433 0.55088245 0.09949949]
[0.1348445 0.11631945 0.9976837 0.21023914 0.73043136]
[0.91621389 0.4786832 0.66673538 0.37927733 0.92164015]
[0.9259155 0.91651767 0.98076657 0.83919033 0.95723923]]]
sam說:---程式結束---
Process finished with exit code 0

Related

Loading the binary data to a NumPy array

I am having trouble reading the binary file. I have a NumPy array as,
data = array([[ 0. , 0. , 7.821725 ],
[ 0.05050505, 0. , 7.6358337 ],
[ 0.1010101 , 0. , 7.453858 ],
...,
[ 4.8989897 , 5. , 16.63227 ],
[ 4.949495 , 5. , 16.88153 ],
[ 5. , 5. , 17.130795 ]], dtype=float32)
I wrote this array to a file in binary format.
file = open('model_binary', 'wb')
data.tofile(file)
Now, I am unable to get back the data from the saved binary file. I tried using numpy.fromfile() but it didn't work out for me.
file = open('model_binary', 'rb')
data = np.fromfile(file)
When I printed the data I got [0.00000000e+00 2.19335211e-13 8.33400000e+04 ... 2.04800049e+03 2.04800050e+03 5.25260241e+07] which is absolutely not what I want.
I ran the following code to check what was in the file,
for line in file:
print(line)
break
I got the output as b'\x00\x00\x00\x00\......\c1\x07#\x00\x00\x00\x00S\xc5{#j\xfd\n' which I suppose is in binary format.
I would like to get the array back from the binary file as it was saved. Any help will be appreciated.
As Kevin noted, adding the dtype is required. You might also need to reshape (you have 3 columns in your example. So
file = open('model_binary', 'rb')
data = fromfile(file, dtype=np.float32).reshape((-1,3))
should work for you.
As an aside, I think np.save does save to binary format, and should avoid these issues.

Dot product for correlation with complex numbers

OK, this question probably has a very simple answer, but I've been searching for quite a while with no luck...
I want to get the dot product of 2 complex numbers in complex-plane-space. However, np.dot and np.vdot both give the wrong result.
Example of what I WANT to do:
a = 1+1j
b = 1-1j
dot(a,b) == 0
What I actually get:
np.dot(a,b) == 2+0j
np.vdot(a,b) == 0-2j
np.conj(a)*b == 0-2j
I am able to get what I want using this rather clumsy expression (edit for readability):
a.real*b.real + a.imag*b.imag
But I am very surprised not to find a nice ufunc to do this. Does it not exist? I was not expecting to have to write my own ufunc to vectorize such a common operation.
Part of my concern here, is that it seems like my expression is doing a lot of extra work extracting out the real/imaginary parts when they should be already in adjacent memory locations (considering a,b are actually already combined in a data type like complex64). This has the potential to cause a pretty severe slowdown.
** EDIT
Using Numba I ended up defining a ufunc:
#vectorize
def cdot(a, b):
return (a.real*b.real + a.imag*b.imag)
This allowed me to correlate complex data properly. Here's a correlation image for the guys who helped me!
For arrays and np.complex scalars but not plain python complex numbers you can viewcast to float. For example:
a = np.exp(1j*np.arange(4))
b = np.exp(-1j*np.arange(4))
a
# array([ 1. +0.j , 0.54030231+0.84147098j,
# -0.41614684+0.90929743j, -0.9899925 +0.14112001j])
b
# array([ 1. -0.j , 0.54030231-0.84147098j,
# -0.41614684-0.90929743j, -0.9899925 -0.14112001j])
ar = a[...,None].view(float)
br = b[...,None].view(float)
ar
# array([[ 1. , 0. ],
# [ 0.54030231, 0.84147098],
# [-0.41614684, 0.90929743],
# [-0.9899925 , 0.14112001]])
br
# array([[ 1. , -0. ],
# [ 0.54030231, -0.84147098],
# [-0.41614684, -0.90929743],
# [-0.9899925 , -0.14112001]])
Now, for example, all pairwise dot products:
np.inner(ar,br)
# array([[ 1. , 0.54030231, -0.41614684, -0.9899925 ],
# [ 0.54030231, -0.41614684, -0.9899925 , -0.65364362],
# [-0.41614684, -0.9899925 , -0.65364362, 0.28366219],
# [-0.9899925 , -0.65364362, 0.28366219, 0.96017029]])

trying to seperate real and imaginary but it gets add up

I have given a set of array where it contains real and imaginary values . I need to separate them and print it out . but its not working out . the output gives
[3. +0.j 4.5+0.j 0. +0.j]
import numpy as np
array = np.array([3,4.5,3+5j,0])
real = np.isreal(array)
print(array[real])
img = np.iscomplex(array)
print(array[img])
Referring to numpy documentation you should do the following:
print(array.real)
print(array.imag)
I guess what you are looking is to print the real numbers if it does not have any imaginary number. If it has imaginary part, then just print the imaginary part.
import numpy as np
array = np.array([3,4.5,3+5j,0])
real = np.isreal(array)
print(array[real].real)
img = np.iscomplex(array)
print(array[img].imag)
# output
[ 3. 4.5 0. ]
[5.]
is this right?
import numpy as np
array = np.array([3,4.5,3+5j,0, 12.5])
real = np.isreal(array)
#here I check to prevent round number and just cast number like 12.0 1.0 to int
print([int(i) if str(i).rsplit(".", 1)[-1] == '0' else i for i in array[real].real ])
img = np.iscomplex(array)
print([complex(int(i.real),i.imag) for i in array[img]])
output:
[3, 4.5, 0, 12.5]
[(3+5j)]
I just append 12.5 for test to see how it's working!

Using numpy's function loadtxt() to read a two column file then using polyfit() to return an array

afile is a given file, and z the degree of the polynomial. Been breaking my head over this for a while, frustrating how I'm basically given zero instructions on how to proceed.
This is what I thought it should be like:
import numpy as np
def newfile(afile,z):
x,y = np.loadtxt(afile)
d= np.polyfit(x,y,z)
return d
I have attempted to do it as
data = np.loadtxt(afile)
x = data[0:]
by printing "data" I'm given this format:
[[ 2. 888.8425]
[ 6. 888.975 ]
[ 14. 888.1026]
[ 17. 888.2071]
[ 23. 886.0479]
[ 26. 883.3316]
[ 48. 877.04 ]
[ 99. 854.3665]]
By printing "x" in this case just gives me the whole list (I'm thinking the issue lies in the lack of comma). In this case I'd want x to be an array of the left column.
I suppose you are getting an error when unpacking in this statement:
x,y = np.loadtxt(afile)
you should replace it for this:
x, y = zip(*np.loadtxt(afile))
the rest should work

Is shared readonly data copied to different processes for multiprocessing?

The piece of code that I have looks some what like this:
glbl_array = # a 3 Gb array
def my_func( args, def_param = glbl_array):
#do stuff on args and def_param
if __name__ == '__main__':
pool = Pool(processes=4)
pool.map(my_func, range(1000))
Is there a way to make sure (or encourage) that the different processes does not get a copy of glbl_array but shares it. If there is no way to stop the copy I will go with a memmapped array, but my access patterns are not very regular, so I expect memmapped arrays to be slower. The above seemed like the first thing to try. This is on Linux. I just wanted some advice from Stackoverflow and do not want to annoy the sysadmin. Do you think it will help if the the second parameter is a genuine immutable object like glbl_array.tostring().
You can use the shared memory stuff from multiprocessing together with Numpy fairly easily:
import multiprocessing
import ctypes
import numpy as np
shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)
#-- edited 2015-05-01: the assert check below checks the wrong thing
# with recent versions of Numpy/multiprocessing. That no copy is made
# is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()
# Parallel processing
def my_func(i, def_param=shared_array):
shared_array[i,:] = i
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
pool.map(my_func, range(10))
print shared_array
which prints
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[ 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
[ 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
[ 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.]
[ 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.]
[ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.]
[ 9. 9. 9. 9. 9. 9. 9. 9. 9. 9.]]
However, Linux has copy-on-write semantics on fork(), so even without using multiprocessing.Array, the data will not be copied unless it is written to.
The following code works on Win7 and Mac (maybe on linux, but not tested).
import multiprocessing
import ctypes
import numpy as np
#-- edited 2015-05-01: the assert check below checks the wrong thing
# with recent versions of Numpy/multiprocessing. That no copy is made
# is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()
shared_array = None
def init(shared_array_base):
global shared_array
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)
# Parallel processing
def my_func(i):
shared_array[i, :] = i
if __name__ == '__main__':
shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
pool = multiprocessing.Pool(processes=4, initializer=init, initargs=(shared_array_base,))
pool.map(my_func, range(10))
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)
print shared_array
For those stuck using Windows, which does not support fork() (unless using CygWin), pv's answer does not work. Globals are not made available to child processes.
Instead, you must pass the shared memory during the initializer of the Pool/Process as such:
#! /usr/bin/python
import time
from multiprocessing import Process, Queue, Array
def f(q,a):
m = q.get()
print m
print a[0], a[1], a[2]
m = q.get()
print m
print a[0], a[1], a[2]
if __name__ == '__main__':
a = Array('B', (1, 2, 3), lock=False)
q = Queue()
p = Process(target=f, args=(q,a))
p.start()
q.put([1, 2, 3])
time.sleep(1)
a[0:3] = (4, 5, 6)
q.put([4, 5, 6])
p.join()
(it's not numpy and it's not good code but it illustrates the point ;-)
If you are looking for an option that works efficiently on Windows, and works well for irregular access patterns, branching, and other scenarios where you might need to analyze different matrices based on a combination of a shared-memory matrix and process-local data, the mathDict toolkit in the ParallelRegression package was designed to handle this exact situation.
I know, I am answering to a very old question. But the this topic does not work in Windows OS. The above answers were misleading without providing substantial proof. So I had tried following code.
# -*- coding: utf-8 -*-
from __future__ import annotations
import ctypes
import itertools
import multiprocessing
import os
import time
from concurrent.futures import ProcessPoolExecutor
import numpy as np
import numpy.typing as npt
shared_np_array_for_subprocess: npt.NDArray[np.double]
def init_processing(shared_raw_array_obj: ctypes.Array[ctypes.c_double]):
global shared_np_array_for_subprocess
#shared_np_array_for_subprocess = np.frombuffer(shared_raw_array_obj, dtype=np.double)
shared_np_array_for_subprocess = np.ctypeslib.as_array(shared_raw_array_obj)
def do_processing(i: int) -> int:
print("\n--------------->>>>>>")
print(f"[P{i}] input is {i} in process id {os.getpid()}")
print(f"[P{i}] 0th element via np access: ", shared_np_array_for_subprocess[0])
print(f"[P{i}] 1st element via np access: ", shared_np_array_for_subprocess[1])
print(f"[P{i}] NP array's base memory is: ", shared_np_array_for_subprocess.base)
np_array_addr, _ = shared_np_array_for_subprocess.__array_interface__["data"]
print(f"[P{i}] NP array obj pointing memory address is: ", hex(np_array_addr))
print("\n--------------->>>>>>")
time.sleep(3.0)
return i
if __name__ == "__main__":
shared_raw_array_obj: ctypes.Array[ctypes.c_double] = multiprocessing.RawArray(ctypes.c_double, 128) # 8B * 1MB = 8MB
# This array is malloced, 0 filled.
print("Shared Allocated Raw array: ", shared_raw_array_obj)
shared_raw_array_ptr = ctypes.addressof(shared_raw_array_obj)
print("Shared Raw Array memory address: ", hex(shared_raw_array_ptr))
# Assign data
print("Assign 0, 1 element data in Shared Raw array.")
shared_raw_array_obj[0] = 10.2346
shared_raw_array_obj[1] = 11.9876
print("0th element via ptr access: ", (ctypes.c_double).from_address(shared_raw_array_ptr).value)
print("1st element via ptr access: ", (ctypes.c_double).from_address(shared_raw_array_ptr + ctypes.sizeof(ctypes.c_double)).value)
print("Create NP array from the Shared Raw array memory")
shared_np_array: npt.NDArray[np.double] = np.frombuffer(shared_raw_array_obj, dtype=np.double)
print("0th element via np access: ", shared_np_array[0])
print("1st element via np access: ", shared_np_array[1])
print("NP array's base memory is: ", shared_np_array.base)
np_array_addr, _ = shared_np_array.__array_interface__["data"]
print("NP array obj pointing memory address is: ", hex(np_array_addr))
print("NP array , Raw array points to same memory , No copies? : ", np_array_addr == shared_raw_array_ptr)
print("Now that we have native memory based NP array , Send for multi processing.")
# results = []
with ProcessPoolExecutor(max_workers=4, initializer=init_processing, initargs=(shared_raw_array_obj,)) as process_executor:
results = process_executor.map(do_processing, range(0, 2))
print("All jobs sumitted.")
for result in results:
print(result)
print("Main process is going to shutdown.")
exit(0)
here is the sample output
Shared Allocated Raw array: <multiprocessing.sharedctypes.c_double_Array_128 object at 0x000001B8042A9E40>
Shared Raw Array memory address: 0x1b804300000
Assign 0, 1 element data in Shared Raw array.
0th element via ptr access: 10.2346
1st element via ptr access: 11.9876
Create NP array from the Shared Raw array memory
0th element via np access: 10.2346
1st element via np access: 11.9876
NP array's base memory is: <multiprocessing.sharedctypes.c_double_Array_128 object at 0x000001B8042A9E40>
NP array obj pointing memory address is: 0x1b804300000
NP array , Raw array points to same memory , No copies? : True
Now that we have native memory based NP array , Send for multi processing.
--------------->>>>>>
[P0] input is 0 in process id 21852
[P0] 0th element via np access: 10.2346
[P0] 1st element via np access: 11.9876
[P0] NP array's base memory is: <memory at 0x0000021C7ACAFF40>
[P0] NP array obj pointing memory address is: 0x21c7ad60000
--------------->>>>>>
--------------->>>>>>
[P1] input is 1 in process id 11232
[P1] 0th element via np access: 10.2346
[P1] 1st element via np access: 11.9876
[P1] NP array's base memory is: <memory at 0x0000022C7FF3FF40>
[P1] NP array obj pointing memory address is: 0x22c7fff0000
--------------->>>>>>
All jobs sumitted.
0
1
Main process is going to shutdown.
The above output is from following environment:
OS: Windows 10 20H2
Python: Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)]
You can clearly see that, The numpy's pointing memory array's different for every subprocess , Meaning memcopies are made. So in Windows OS, Subprocess does not share the underlaying memory. I do think, its due to OS protection, Processes can not refer arbitrary pointer address in memory , it will lead to memory access violations.

Categories

Resources