I have given a set of array where it contains real and imaginary values . I need to separate them and print it out . but its not working out . the output gives
[3. +0.j 4.5+0.j 0. +0.j]
import numpy as np
array = np.array([3,4.5,3+5j,0])
real = np.isreal(array)
print(array[real])
img = np.iscomplex(array)
print(array[img])
Referring to numpy documentation you should do the following:
print(array.real)
print(array.imag)
I guess what you are looking is to print the real numbers if it does not have any imaginary number. If it has imaginary part, then just print the imaginary part.
import numpy as np
array = np.array([3,4.5,3+5j,0])
real = np.isreal(array)
print(array[real].real)
img = np.iscomplex(array)
print(array[img].imag)
# output
[ 3. 4.5 0. ]
[5.]
is this right?
import numpy as np
array = np.array([3,4.5,3+5j,0, 12.5])
real = np.isreal(array)
#here I check to prevent round number and just cast number like 12.0 1.0 to int
print([int(i) if str(i).rsplit(".", 1)[-1] == '0' else i for i in array[real].real ])
img = np.iscomplex(array)
print([complex(int(i.real),i.imag) for i in array[img]])
output:
[3, 4.5, 0, 12.5]
[(3+5j)]
I just append 12.5 for test to see how it's working!
Related
I've a numpy matrix a defined as:
>>> a
>>> array([[ 1.920941165 , 0.9518795607, 1.5358781432],
[-0.2418292026, 0.0851087409, -0.2760766872],
[-0.4161812806, 0.7409229185, -0.3248560283],
[-0.3439163186, 1.4052927665, -1.612850871 ],
[ 1.5810794171, 1.1820622504, 1.8063415367]])
If I typecast it to float32, it gives:
>>> a.astype(np.float32)
>>> array([[ 1.9209411 , 0.95187956, 1.5358782 ],
[-0.2418292 , 0.08510874, -0.27607667],
[-0.41618127, 0.7409229 , -0.32485604],
[-0.34391633, 1.4052927 , -1.6128509 ],
[ 1.5810794 , 1.1820623 , 1.8063415 ]], dtype=float32)
When I convert original a matrix to a tensor, I get:
>>> torch.tensor(a)
>>> tensor([[ 1.9209411650, 0.9518795607, 1.5358781432],
[-0.2418292026, 0.0851087409, -0.2760766872],
[-0.4161812806, 0.7409229185, -0.3248560283],
[-0.3439163186, 1.4052927665, -1.6128508710],
[ 1.5810794171, 1.1820622504, 1.8063415367]], dtype=torch.float64)
which looks correct as it retains original values from matrix a.
But when I convert float32-typecasted matrix to a tensor, I get different floating point numbers.
>>> torch.tensor(a.astype(np.float32))
>>> tensor([[ 1.9209411144, 0.9518795609, 1.5358781815],
[-0.2418292016, 0.0851087421, -0.2760766745],
[-0.4161812663, 0.7409229279, -0.3248560429],
[-0.3439163268, 1.4052927494, -1.6128509045],
[ 1.5810793638, 1.1820622683, 1.8063415289]])
Why can't the second tensor(tensor of type-casted matrix) be equal to the second matrix(type-casted one) provided above.
float32 has 24fraction bit (7.2 decimal point in decimal), what you see after that is not meaningful. ex: 1.920941165 (9 point).
this means if you want to retain all points you shoult represent as 64 float. however when you convert to 32, either in numpy or torch they should be same values, it is only printing is different. torch print till the number of floating that you have set, while numpy truncate only till valid points.
for example:
import numpy as np
np.set_printoptions(precision=10)
a = np.array([1.920941165],dtype=np.float32)
array([1.9209411], dtype=float32)**
t = torch.tensor(a , dtype=torch.float32)
tensor([1.9209411144])
however if you look into underlying memory of both (one uint32) are the same:
np.ndarray(1, dtype=np.uint32,buffer=a )
array([1073078630], dtype=uint32)
import ctypes
ptr = ctypes.cast(t.data_ptr(), ctypes.POINTER(ctypes.c_uint32))
ptr[0]
1073078630
My code
import numpy as np
from numpy import loadtxt
s = loadtxt("sest.txt", delimiter=" ", unpack=False)
b = loadtxt("base.txt", delimiter=" ", unpack=False)
d=b-s
e = np.absolute(d)
me = e.argsort()[-100:][::-1]
print me
I got
[400600 401600 399600 400601 401601 399601 401599 400599 399599 399602
401602 400602 399598 401598 400598 400603 401603 399603 401597 399597
401604 400597 399604 400604 400605 399605 401605 401596 399596 400596
399606 401606 400606 399595 401595 400595 399607 401607 400607 400608
400594 401608 399608 401594 399594 400609 401609 399609 401593 400593
399593 401610 400610 399610 400592 401592 399592 399611 400611 401611
399591 401612 401591 400612 400591 399612 399613 401613 400613 399590
400590 401590 400614 399614 401614 399589 400589 401589 401615 399615
400615 401616 399616 400616 400588 399588 401588 400617 401617 399617
401587 400587 399587 400618 399618 401618 399586 400586 401586 400619]
Works fine.But I want to specify all elements in d that are larger then 2.5?So I do not care if there are 100 or 200 just everything above this threshold level.Is it possible to extend argsort or not?
if you are just seeking array elements above a certain threshold value, you can use x[x>a], where a is the threshold. For purposes of illustration, I will show now using ipython and edit later. Let us assume "x" is some numpy array:
In [9]: x=np.random.rand(1,10) # an array with random elements
In [10]: print x[x>0.6] # select elements above 0.6
[ 0.71733906 0.74028607 0.66293195 0.86649922 0.7423685 0.71807904
0.8215429 ]
In [11]: print x
[[ 0.36655557 0.71733906 0.74028607 0.66293195 0.86649922 0.21478604
0.7423685 0.71807904 0.30482062 0.8215429 ]]
afile is a given file, and z the degree of the polynomial. Been breaking my head over this for a while, frustrating how I'm basically given zero instructions on how to proceed.
This is what I thought it should be like:
import numpy as np
def newfile(afile,z):
x,y = np.loadtxt(afile)
d= np.polyfit(x,y,z)
return d
I have attempted to do it as
data = np.loadtxt(afile)
x = data[0:]
by printing "data" I'm given this format:
[[ 2. 888.8425]
[ 6. 888.975 ]
[ 14. 888.1026]
[ 17. 888.2071]
[ 23. 886.0479]
[ 26. 883.3316]
[ 48. 877.04 ]
[ 99. 854.3665]]
By printing "x" in this case just gives me the whole list (I'm thinking the issue lies in the lack of comma). In this case I'd want x to be an array of the left column.
I suppose you are getting an error when unpacking in this statement:
x,y = np.loadtxt(afile)
you should replace it for this:
x, y = zip(*np.loadtxt(afile))
the rest should work
I have an array that looks like this:
A=[ id. r d ]
[[ 47. 223.25190261 58.0551391 ]
[ 49. 223.25102751 58.05662719]
[ 57. 223.25013049 58.05139459]]
The first column isnt important. The following two are though, they are coordinates.
I have to compare EACH set of coordinates (column 2 and 3 together) against these coordinates:
(223.251, 58.05) by using this equation: B=sin(D)sin(d) + cos(D)cos(d)cos(R-r).
Where (R,D) are the original coordinates (223.251, 58.05) and (r,d) are the coordinates in the array.
How do I do this for each set of coordinates in the array without having to input the numbers myself or having to define each number and replace them with the next set of coordinates? I want the program to obviously keep (R,D) consistent and change (r,d) for each row and make the calculations. After it's done making the calculation for each row I want to have them output. I really have no idea how to do this, I'm thinking maybe something with a loop. I'm seriously lost.
The end of the code is this:
B=(((sin(dec0))*(sin(dec1)) + (cos(dec0)*cos(dec1))*(cos(ra0-ra1))))
print B
0.540302302454
But this only does the first row of coordinates, I want it to be done manually
I'm not sure if the formula is correct and data representative, because your values are really close to each other. Anyway, to print the B value for each item in your data, you can use:
from math import radians, sin, cos
orig = (223.251, 58.05)
R = radians(orig[0])
D = radians(orig[1])
A = [[ 47., 223.25190261, 58.0551391 ],
[ 49., 223.25102751, 58.05662719],
[ 57., 223.25013049, 58.05139459]]
for item in A:
r = radians(item[1])
d = radians(item[2])
B = sin(D)*sin(d) + cos(D)*cos(d)*cos(R-r)
print(B)
if you've got numpy array as input, use numpy module instead of math of course.
If you are willing to use NumPy you the operations can be vectorized avoiding the for loops like:
from numpy import array, deg2rad, sin, cos
orig = (223.251, 58.05)
R = deg2rad(orig[0])
D = deg2rad(orig[1])
A = array([[47., 223.25190261, 58.0551391 ],
[49., 223.25102751, 58.05662719],
[57., 223.25013049, 58.05139459]])
r = deg2rad(A[:,1])
d = deg2rad(A[:,2])
B = sin(D)*sin(d) + cos(D)*cos(d)*cos(R-r)
where B is a numpy.ndarray containing the result for each line of A.
The piece of code that I have looks some what like this:
glbl_array = # a 3 Gb array
def my_func( args, def_param = glbl_array):
#do stuff on args and def_param
if __name__ == '__main__':
pool = Pool(processes=4)
pool.map(my_func, range(1000))
Is there a way to make sure (or encourage) that the different processes does not get a copy of glbl_array but shares it. If there is no way to stop the copy I will go with a memmapped array, but my access patterns are not very regular, so I expect memmapped arrays to be slower. The above seemed like the first thing to try. This is on Linux. I just wanted some advice from Stackoverflow and do not want to annoy the sysadmin. Do you think it will help if the the second parameter is a genuine immutable object like glbl_array.tostring().
You can use the shared memory stuff from multiprocessing together with Numpy fairly easily:
import multiprocessing
import ctypes
import numpy as np
shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)
#-- edited 2015-05-01: the assert check below checks the wrong thing
# with recent versions of Numpy/multiprocessing. That no copy is made
# is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()
# Parallel processing
def my_func(i, def_param=shared_array):
shared_array[i,:] = i
if __name__ == '__main__':
pool = multiprocessing.Pool(processes=4)
pool.map(my_func, range(10))
print shared_array
which prints
[[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 2. 2. 2. 2. 2. 2. 2. 2. 2. 2.]
[ 3. 3. 3. 3. 3. 3. 3. 3. 3. 3.]
[ 4. 4. 4. 4. 4. 4. 4. 4. 4. 4.]
[ 5. 5. 5. 5. 5. 5. 5. 5. 5. 5.]
[ 6. 6. 6. 6. 6. 6. 6. 6. 6. 6.]
[ 7. 7. 7. 7. 7. 7. 7. 7. 7. 7.]
[ 8. 8. 8. 8. 8. 8. 8. 8. 8. 8.]
[ 9. 9. 9. 9. 9. 9. 9. 9. 9. 9.]]
However, Linux has copy-on-write semantics on fork(), so even without using multiprocessing.Array, the data will not be copied unless it is written to.
The following code works on Win7 and Mac (maybe on linux, but not tested).
import multiprocessing
import ctypes
import numpy as np
#-- edited 2015-05-01: the assert check below checks the wrong thing
# with recent versions of Numpy/multiprocessing. That no copy is made
# is indicated by the fact that the program prints the output shown below.
## No copy was made
##assert shared_array.base.base is shared_array_base.get_obj()
shared_array = None
def init(shared_array_base):
global shared_array
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)
# Parallel processing
def my_func(i):
shared_array[i, :] = i
if __name__ == '__main__':
shared_array_base = multiprocessing.Array(ctypes.c_double, 10*10)
pool = multiprocessing.Pool(processes=4, initializer=init, initargs=(shared_array_base,))
pool.map(my_func, range(10))
shared_array = np.ctypeslib.as_array(shared_array_base.get_obj())
shared_array = shared_array.reshape(10, 10)
print shared_array
For those stuck using Windows, which does not support fork() (unless using CygWin), pv's answer does not work. Globals are not made available to child processes.
Instead, you must pass the shared memory during the initializer of the Pool/Process as such:
#! /usr/bin/python
import time
from multiprocessing import Process, Queue, Array
def f(q,a):
m = q.get()
print m
print a[0], a[1], a[2]
m = q.get()
print m
print a[0], a[1], a[2]
if __name__ == '__main__':
a = Array('B', (1, 2, 3), lock=False)
q = Queue()
p = Process(target=f, args=(q,a))
p.start()
q.put([1, 2, 3])
time.sleep(1)
a[0:3] = (4, 5, 6)
q.put([4, 5, 6])
p.join()
(it's not numpy and it's not good code but it illustrates the point ;-)
If you are looking for an option that works efficiently on Windows, and works well for irregular access patterns, branching, and other scenarios where you might need to analyze different matrices based on a combination of a shared-memory matrix and process-local data, the mathDict toolkit in the ParallelRegression package was designed to handle this exact situation.
I know, I am answering to a very old question. But the this topic does not work in Windows OS. The above answers were misleading without providing substantial proof. So I had tried following code.
# -*- coding: utf-8 -*-
from __future__ import annotations
import ctypes
import itertools
import multiprocessing
import os
import time
from concurrent.futures import ProcessPoolExecutor
import numpy as np
import numpy.typing as npt
shared_np_array_for_subprocess: npt.NDArray[np.double]
def init_processing(shared_raw_array_obj: ctypes.Array[ctypes.c_double]):
global shared_np_array_for_subprocess
#shared_np_array_for_subprocess = np.frombuffer(shared_raw_array_obj, dtype=np.double)
shared_np_array_for_subprocess = np.ctypeslib.as_array(shared_raw_array_obj)
def do_processing(i: int) -> int:
print("\n--------------->>>>>>")
print(f"[P{i}] input is {i} in process id {os.getpid()}")
print(f"[P{i}] 0th element via np access: ", shared_np_array_for_subprocess[0])
print(f"[P{i}] 1st element via np access: ", shared_np_array_for_subprocess[1])
print(f"[P{i}] NP array's base memory is: ", shared_np_array_for_subprocess.base)
np_array_addr, _ = shared_np_array_for_subprocess.__array_interface__["data"]
print(f"[P{i}] NP array obj pointing memory address is: ", hex(np_array_addr))
print("\n--------------->>>>>>")
time.sleep(3.0)
return i
if __name__ == "__main__":
shared_raw_array_obj: ctypes.Array[ctypes.c_double] = multiprocessing.RawArray(ctypes.c_double, 128) # 8B * 1MB = 8MB
# This array is malloced, 0 filled.
print("Shared Allocated Raw array: ", shared_raw_array_obj)
shared_raw_array_ptr = ctypes.addressof(shared_raw_array_obj)
print("Shared Raw Array memory address: ", hex(shared_raw_array_ptr))
# Assign data
print("Assign 0, 1 element data in Shared Raw array.")
shared_raw_array_obj[0] = 10.2346
shared_raw_array_obj[1] = 11.9876
print("0th element via ptr access: ", (ctypes.c_double).from_address(shared_raw_array_ptr).value)
print("1st element via ptr access: ", (ctypes.c_double).from_address(shared_raw_array_ptr + ctypes.sizeof(ctypes.c_double)).value)
print("Create NP array from the Shared Raw array memory")
shared_np_array: npt.NDArray[np.double] = np.frombuffer(shared_raw_array_obj, dtype=np.double)
print("0th element via np access: ", shared_np_array[0])
print("1st element via np access: ", shared_np_array[1])
print("NP array's base memory is: ", shared_np_array.base)
np_array_addr, _ = shared_np_array.__array_interface__["data"]
print("NP array obj pointing memory address is: ", hex(np_array_addr))
print("NP array , Raw array points to same memory , No copies? : ", np_array_addr == shared_raw_array_ptr)
print("Now that we have native memory based NP array , Send for multi processing.")
# results = []
with ProcessPoolExecutor(max_workers=4, initializer=init_processing, initargs=(shared_raw_array_obj,)) as process_executor:
results = process_executor.map(do_processing, range(0, 2))
print("All jobs sumitted.")
for result in results:
print(result)
print("Main process is going to shutdown.")
exit(0)
here is the sample output
Shared Allocated Raw array: <multiprocessing.sharedctypes.c_double_Array_128 object at 0x000001B8042A9E40>
Shared Raw Array memory address: 0x1b804300000
Assign 0, 1 element data in Shared Raw array.
0th element via ptr access: 10.2346
1st element via ptr access: 11.9876
Create NP array from the Shared Raw array memory
0th element via np access: 10.2346
1st element via np access: 11.9876
NP array's base memory is: <multiprocessing.sharedctypes.c_double_Array_128 object at 0x000001B8042A9E40>
NP array obj pointing memory address is: 0x1b804300000
NP array , Raw array points to same memory , No copies? : True
Now that we have native memory based NP array , Send for multi processing.
--------------->>>>>>
[P0] input is 0 in process id 21852
[P0] 0th element via np access: 10.2346
[P0] 1st element via np access: 11.9876
[P0] NP array's base memory is: <memory at 0x0000021C7ACAFF40>
[P0] NP array obj pointing memory address is: 0x21c7ad60000
--------------->>>>>>
--------------->>>>>>
[P1] input is 1 in process id 11232
[P1] 0th element via np access: 10.2346
[P1] 1st element via np access: 11.9876
[P1] NP array's base memory is: <memory at 0x0000022C7FF3FF40>
[P1] NP array obj pointing memory address is: 0x22c7fff0000
--------------->>>>>>
All jobs sumitted.
0
1
Main process is going to shutdown.
The above output is from following environment:
OS: Windows 10 20H2
Python: Python 3.9.9 (tags/v3.9.9:ccb0e6a, Nov 15 2021, 18:08:50) [MSC v.1929 64 bit (AMD64)]
You can clearly see that, The numpy's pointing memory array's different for every subprocess , Meaning memcopies are made. So in Windows OS, Subprocess does not share the underlaying memory. I do think, its due to OS protection, Processes can not refer arbitrary pointer address in memory , it will lead to memory access violations.