I am generating some arrays for a simulation and I want save them in a JSON file. I am using the jsonpickle library.
The problem is that the arrays I need to save can be very large in size (hundreds of MB up to some GB). Thus, I need to save each array to the JSON file immediately after its generation.
Basically, I am generating a multiple independent large arrays, storing them in another array and saving them into the JSON file after all of them have been generated:
N = 1000 # Number of arrays
z_NM = np.zeros((64000,1), dtype=complex) # Large array
z_NM_array = np.zeros((N,64000,1), dtype=complex) # Array of z_NM arrays
for in range(0, N):
z_NM[:,0] = GenerateArray() # Generate array and store it in z_NM_array
z_NM_array[i] = z_NM
# Write data to JSON file
data = {"z_NM_array": z_NM_array}
outdata = json.encode(data)
with open(filename, "wb+") as f:
f.write(outdata.encode("utf-8"))
f.close()
I was wondering if it is instead possible to append the new data to the existing JSON file, by writing each array to the file immediately after its generation, inside the for loop? If so, how? And how can it be read back? Maybe using a library different from jsonpickle?
I know I could save each array in a separate file, but I'm wondering if there's a solution that lets me use a single file. I also have some settings in the dict which I want to save along with the array.
Related
I'm trying to save data from an array called Sevol. This matrix has 100 rows and 1000 columns, so len(Sevol[i]) has 1000 elements and Sevol[0][0] would be the first element of the first list.
I tried to save this array with the commands
np.savetxt(path + '/data_Sevol.txt', Sevol[i], delimiter=" ")
It works fine. However, I would like the file to be organized as an array anyway. For example, currently, the file is being saved like this in Notepad:
And I would like the data to remain organized, as for example in this file:
Is there an argument in the np.savetxt function or something I can do to better organize the text file?
I have some image datasets and I want to convert them to CSV file by using np.savetxt, but I couldn't find any way to combine them to one csv file. When I combine dataset vectors with "np.array", enter image description here It is being something like this. And when I try to merge multiple csv files, even they have different header names, they are combined in the same headers but I don't want it. Are there anyway to combine them or just save them as one file by np.savetxt?
(btw really sorry for my English and my question,I'm new at stackoverflow)
For example I have these two csv files (enter image description here,enter image description here) And I want something like thisenter image description here(but for multiple files
here is my code
while x!=y:
img=Image.open(f"0_resized/{x}.jpg").convert("L")
arr = np.array(img)
shape = arr.shape
flat_arr = arr.ravel()
np.savetxt(f"{x}.csv",flat_arr,fmt="%d")
x+=1
Instead of creating several .csv files and combining them, we can create a list with the images and save it to one .csv file. To do so, we can make some little modifications to your code, as shown bellow:
list_arrays = []
while x!=y:
img=Image.open(f"0_resized/{x}.jpg").convert("L")
arr = np.array(img)
shape = arr.shape
flat_arr = arr.ravel().tolist()
list_arrays.append(flat_arr)
x+=1
final_arrays = np.asarray(list_arrays)
np.savetxt("images.csv", final_arrays.T, delimiter=",")
In the code above, we created a list called list_arrays where we save the flat arrays created in the while loop. After reading all images and saving their flat version in our list, we can transform it in an array, using the np.asarray method.
The key point here is to save not the array, but the transposed array (final_arrays.T), which puts each image in a column.
I'm probably trying to reinvent the wheel here, but numpy has a fromfile() function that can read - I imagine - CSV files.
It appears to be incredibly fast - even compared to Pandas read_csv(), but I'm unclear on how it works.
Here's some test code:
import pandas as pd
import numpy as np
# Create the file here, two columns, one million rows of random numbers.
filename = 'my_file.csv'
df = pd.DataFrame({'a':np.random.randint(100,10000,1000000), 'b':np.random.randint(100,10000,1000000)})
df.to_csv(filename, index = False)
# Now read the file into memory.
arr = np.fromfile(filename)
print len(arr)
I included the len() at the end there to make sure it wasn't reading just a single line. But curiously, the length for me (will vary based on your random number generation) was 1,352,244. Huh?
The docs show an optional sep parameter. But when that is used:
arr = np.fromfile(filename, sep = ',')
...we get a length of 0?!
Ideally I'd be able to load a 2D array of arrays from this CSV file, but I'd settle for a single array from this CSV.
What am I missing here?
numpy.fromfile is not made to read .csv files, instead, it is made for reading data written with the numpy.ndarray.tofile method.
From the docs:
A highly efficient way of reading binary data with a known data-type, as well as parsing simply formatted text files. Data written using the tofile method can be read using this function.
By using it without a sep parameter, numpy assumes you are reading a binary file, hence the different lengths. When you specify a separator, I guess the function just breaks.
To read a .csv file using numpy, I think you can use numpy.genfromtext or numpy.loadtxt (from this question).
I am trying to add features of multiple images by converting them from raw data format to .csv.
I have read and displayed features of two images via print function but during addition of contents to csv, i am only able to add single numpy array. I want to add few thousand images in same csv.
Below is printed output, but csv only shows one array (having features of single image).
Image showing code and output
I have done this by using following code:
with open("output.csv", "a") as f:
writer = csv.writer(f)
writer.writerows(data_read[2])
I need to read a file into a numpy array. The program only has access to the binary data from the file, and the original file extension if needed. The data the program receives would look something like the "data" shown below.
data = open('file.csv', 'rb').read()
I need to generate an array from this binary data. I do not have permission to write the data to a file so doing that then sending the file to numpy won't work.
Is there some way I can treat the binary data like a file so I can use the numpy function below?
my_data = genfromtxt(data, delimiter=',')
Thanks.