loading csv files - SyntaxError: invalid syntax (python 3.8) - python

I was working on a project that requires me to add csv file in two places of the code. I have seen kinda similar problem here at stackoverflow. But their problem was due to old python version 2.5. But my python version is 3.8.
import csv
from tensorflow.keras.datasets import mnist
import numpy as np
def load_az_dataset("C:\A_Z_Handwritten_Data\A_Z_Handwritten_Data.csv"):
# initialize the list of data and labels
data = []
labels = []
# loop over the rows of the A-Z handwritten digit dataset
for row in open("C:\A_Z_Handwritten_Data\A_Z_Handwritten_Data.csv"):
# parse the label and image from the row
row = row.split(",")
label = int(row[0])
image = np.array([int(x) for x in row[1:]], dtype="uint8")
# images are represented as single channel (grayscale) images
# that are 28x28=784 pixels -- we need to take this flattened
# 784-d list of numbers and repshape them into a 28x28 matrix
image = image.reshape((28, 28))
# update the list of data and labels
data.append(image)
labels.append(label)
# convert the data and labels to NumPy arrays
data = np.array(data, dtype="float32")
labels = np.array(labels, dtype="int")
# return a 2-tuple of the A-Z data and labels
return (data, labels)
It's showing this syntax error

The syntax error is caused by the fact that the file path is in the parameter list in the function definition. This is the culprit:
def load_az_dataset("C:\A_Z_Handwritten_Data\A_Z_Handwritten_Data.csv"):
You have no parameters listed in the function definition. You just have a literal string.
Furthermore, you should also either be using raw strings: r"..." or escaping your backslashes, as others have mentioned.
Finally, you should be using the with open(file_path) as f: pattern to open your file.

The syntax error is caused since you are passing the literal string in the method declaration of load_az_dataset.
You need to define the parameter to the function as:
def load_az_dataset(fileName):
Further, if you want to add that file as the default value for the parameter then use:
def load_az_dataset(fileName="C:\\A_Z_Handwritten_Data\\A_Z_Handwritten_Data.csv"):
Also, unrelated to the problem, you need to escape the \ with another \.
Try:
open("C:\\A_Z_Handwritten_Data\\A_Z_Handwritten_Data.csv")

Related

Strange behaviour when passing a function into a Tensorflow dataset map method

This was working perfectly fine for me earlier today, but it suddenly started behaving very strangely when I restarted my notebook.
I have a tf dataset that takes in numpy files and their corresponding labels as input, like so tf.data.Dataset.from_tensor_slices((specgram_files, labels)).
When I take 1 item using for item in ds.take(1): print(item) I get the expected output, which is a tuple of tensors, where the first tensor contains the name of the numpy file as a bytes string and the second tensor contains the encoded label.
I then have a function that reads the file using np.load() and produces a numpy array, which is then returned. This function is passed into the map() method, and it looks like this:
ds = ds.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)
, where read_npy_file looks like this:
def read_npy_file(data):
# 'data' stores the file name of the numpy binary file storing the features of a particular sound file
# as a bytes string.
# decode() is called on the bytes string to decode it from a bytes string to a regular string
# so that it can passed as a parameter into np.load()
data = np.load(data.decode())
return data.astype(np.float32)
As you can see, the mapping should create another tuple of tensors, where the first tensor is the numpy array and the second tensor is the label, untouched. This worked perfectly earlier, but now it gives the most bizarre behaviour. I placed print statements in the read_npy_file() function to see if the correct data was being passed in. I expected it to pass a single bytes string, but it instead produces this output when I call print(data) in the read_npy_file() function and take 1 item from the dataset to trigger one mapping using ds.take(1):
b'./challengeA_data/log_spectrogram/2603ebb3-3cd3-43cc-98ef-0c128c515863.npy'b'./challengeA_data/log_spectrogram/fab6a266-e97a-4935-a0c3-444fc4426fc5.npy'b'./challengeA_data/log_spectrogram/93014682-60a2-45bd-9c9e-7f3c97b83be9.npy'b'./challengeA_data/log_spectrogram/710f2430-5da3-4822-a252-6ad3601b92d9.npy'b'./challengeA_data/log_spectrogram/e757058c-91de-4381-8184-65f001c95647.npy'
b'./challengeA_data/log_spectrogram/38b12689-04ba-422b-a972-5856b05ca868.npy'
b'./challengeA_data/log_spectrogram/7c9ccc04-a2d2-4eec-bafd-0c97b3658c26.npy'b'./challengeA_data/log_spectrogram/c7cc3520-7218-4d07-9f0a-6bd7bb90a551.npy'
b'./challengeA_data/log_spectrogram/21f6060a-9766-4810-bd7c-0437f47ccb98.npy'
I didn't modify any formatting of the output.
I'd greatly appreciate any help. TFDS has been an absolute nightmare to work with haha.
Here's the full code
def read_npy_file(data):
# 'data' stores the file name of the numpy binary file storing the features of a particular sound file
# as a bytes string.
# decode() is called on the bytes string to decode it from a bytes string to a regular string
# so that it can passed as a parameter into np.load()
print(data)
data = np.load(data.decode())
return data.astype(np.float32)
specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels))
specgram_ds = specgram_ds.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)
num_files = len(train_df)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)
specgram_ds = specgram_ds.shuffle(buffer_size=1000)
specgram_train_ds = specgram_ds.take(num_train)
specgram_test_ds = specgram_ds.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)
# iterating over one item to trigger the mapping function
for item in specgram_ds.take(1):
pass
Thanks!
Your logic seems to be fine. You are actually just observing the behavior of tf.data.AUTOTUNE in combination with print(*). According to the docs :
If the value tf.data.AUTOTUNE is used, then the number of parallel calls is set dynamically based on available CPU.
You can run the following code a few times to observe the changes:
import tensorflow as tf
import numpy as np
def read_npy_file(data):
# 'data' stores the file name of the numpy binary file storing the features of a particular sound file
# as a bytes string.
# decode() is called on the bytes string to decode it from a bytes string to a regular string
# so that it can passed as a parameter into np.load()
print(data)
data = np.load(data.decode())
return data.astype(np.float32)
# Create dummy data
for i in range(4):
np.save('{}-array'.format(i), np.random.random((5,5)))
specgram_files = ['/content/0-array.npy', '/content/1-array.npy', '/content/2-array.npy', '/content/3-array.npy']
labels = [1, 0, 0, 1]
specgram_ds = tf.data.Dataset.from_tensor_slices((specgram_files, labels))
specgram_ds = specgram_ds.map(
lambda file, label: tuple([tf.numpy_function(read_npy_file, [file], [tf.float32]), label]),
num_parallel_calls=tf.data.AUTOTUNE)
num_files = len(specgram_files)
num_train = int(0.8 * num_files)
num_val = int(0.1 * num_files)
num_test = int(0.1 * num_files)
specgram_ds = specgram_ds.shuffle(buffer_size=1000)
specgram_train_ds = specgram_ds.take(num_train)
specgram_test_ds = specgram_ds.skip(num_train)
specgram_val_ds = specgram_test_ds.take(num_val)
specgram_test_ds = specgram_test_ds.skip(num_val)
for item in specgram_ds.take(1):
pass
Also see this. Finally, note that using tf.print instead of print should get ride of any side effects.

Converting pixels into wavelength using 2 FITS files

I am new to python and FITS image files, as such I am running into issues. I have two FITS files; the first FITS file is pixels/counts and the second FITS file (calibration file) is pixels/wavelength. I need to convert pixels/counts into wavelength/counts. Once this is done, I need to output wavelength/counts as a new FITS file for further analysis. So far I have managed to array the required data as shown in the code below.
import numpy as np
from astropy.io import fits
# read the images
image_file = ("run_1.fits")
image_calibration = ("cali_1.fits")
hdr = fits.getheader(image_file)
hdr_c = fits.getheader(image_calibration)
# print headers
sp = fits.open(image_file)
print('\n\nHeader of the spectrum :\n\n', sp[0].header, '\n\n')
sp_c = fits.open(image_calibration)
print('\n\nHeader of the spectrum :\n\n', sp_c[0].header, '\n\n')
# generation of arrays with the wavelengths and counts
count = np.array(sp[0].data)
wave = np.array(sp_c[0].data)
I do not understand how to save two separate arrays into one FITS file. I tried an alternative approach by creating list as shown in this code
file_list = fits.open(image_file)
calibration_list = fits.open(image_calibration)
image_data = file_list[0].data
calibration_data = calibration_list[0].data
# make a list to hold images
img_list = []
img_list.append(image_data)
img_list.append(calibration_data)
# list to numpy array
img_array = np.array(img_list)
# save the array as fits - image cube
fits.writeto('mycube.fits', img_array)
However I could only save as a cube, which is not correct because I just need wavelength and counts data. Also, I lost all the headers in the newly created FITS file. To say I am lost is an understatement! Could someone point me in the right direction please? Thank you.
I am still working on this problem. I have now managed (I think) to produce a FITS file containing the wavelength and counts using this website:
https://www.mubdirahman.com/assets/lecture-3---numerical-manipulation-ii.pdf
This is my code:
# Making a Primary HDU (required):
primaryhdu = fits.PrimaryHDU(flux) # Makes a header # or if you have a header that you’ve created: primaryhdu = fits.PrimaryHDU(arr1, header=head1)
# If you have additional extensions:
secondhdu = fits.ImageHDU(wave)
# Making a new HDU List:
hdulist1 = fits.HDUList([primaryhdu, secondhdu])
# Writing the file:
hdulist1.writeto("filename.fits", overwrite=True)
image = ("filename.fits")
hdr = fits.open(image)
image_data = hdr[0].data
wave_data = hdr[1].data
I am sure this is not the correct format for wavelength/counts. I need both wavelength and counts to be contained in hdr[0].data
If you are working with spectral data, it might be useful to look into specutils which is designed for common tasks associated with reading/writing/manipulating spectra.
It's common to store spectral data in FITS files using tables, rather than images. For example you can create a table containing wavelength, flux, and counts columns, and include the associated units in the column metadata.
The docs include an example on how to create a generic "FITS table" writer with wavelength and flux columns. You could start from this example and modify it to suit your exact needs (which can vary quite a bit from case to case, which is probably why a "generic" FITS writer is not built-in).
You might also be able to use the fits-wcs1d format.
If you prefer not to use specutils, that example still might be useful as it demonstrates how to create an Astropy Table from your data and output it to a well-formatted FITS file.

Reading Dataset from files where some might be missing

I'm trying to load files to TensorFlow Dataset where some files might be missing (in which case I want to replace these with zeroes).
The structure of directories that I'm trying to read data from is as follows:
|-data
|---sensor_A
|-----1.dat
|-----2.dat
|-----3.dat
|---sensor_B
|-----1.dat
|-----2.dat
|-----3.dat
.dat files are .csv files with spacebar as a separator. The content of every file is a single, multi-row observation where the number of columns is constant (say 4) and the number of rows is unknown (timeseries data).
I've successfully managed to read every sensor data to a separate TensorFlow Dataset with the following code:
import os
import tensorflow as tf
tf.enable_eager_execution()
data_root_dir = "data"
modalities_to_use = ["sensor_A", "sensor_B"]
timestamps = [1, 2, 3]
for mod_idx, modality in enumerate(modalities_to_use):
# Will produce: ['data/sensor_A/1.dat', 'data/sensor_A/2.dat', 'data/sensor_A/3.dat']
filenames = [os.path.join(data_root_dir, modality, str(timestamp) + ".dat") for timestamp in timestamps]
dataset = tf.data.Dataset.from_tensor_slices((filenames,))
def _parse_function_internal(filename):
number_of_columns = 4
single_observation = tf.read_file(filename)
# Tokenise every value so we can cast these to floats later.
single_observation = tf.string_split([single_observation], sep='\r\n ').values
single_observation = tf.reshape(single_observation, (-1, number_of_columns))
single_observation = tf.strings.to_number(single_observation, tf.float32)
return filename, single_observation
dataset = dataset.map(_parse_function_internal)
print('Result:')
for el in dataset:
try:
# Filename
print(el[0])
# Parsed file content
print(el[1])
except tf.errors.OutOfRangeError:
break
which successfully prints out content of all three files for every sensor.
My problem is that some timestamps in the dataset might be missing. For instance if file 1.dat in sensor_A directory will be missing I'm getting this error:
tensorflow.python.framework.errors_impl.NotFoundError: NewRandomAccessFile failed to Create/Open: mock_data\sensor_A\1.dat : The system cannot find the file specified.
; No such file or directory
[[{{node ReadFile}}]] [Op:IteratorGetNextSync]
which is thrown in this line:
for el in dataset:
What I've tried to do is to surround the call to tf.read_file() function with try block but obviously it doesn't work as the error is not thrown when tf.read_file() is called, but when the value is fetched from the dataset. Later I want to pass this dataset to a Keras model so I can't just surround it with try block. Is there any workaround? Is that even supported?
Thanks!
I managed to solve the problem, sharing the solution just in case someone else will be struggling with it as well. I had to use additional list of booleans specifying whether the file actually exist and pass it into the mapper. Then using tf.cond() function we decide whether to read the file or mock the data with zeroes (or any other logic).
import os
import tensorflow as tf
tf.enable_eager_execution()
data_root_dir = "data"
modalities_to_use = ["sensor_A", "sensor_B"]
timestamps = [1, 2, 3]
for mod_idx, modality in enumerate(modalities_to_use):
# Will produce: ['data/sensor_A/1.dat', 'data/sensor_A/2.dat', 'data/sensor_A/3.dat']
filenames = [os.path.join(data_root_dir, modality, str(timestamp) + ".dat") for timestamp in timestamps]
files_exist = [os.path.isfile(filename) for filename in filenames]
dataset = tf.data.Dataset.from_tensor_slices((filenames, files_exist))
def _parse_function_internal(filename, file_exist):
number_of_columns = 4
single_observation = tf.cond(file_exist, lambda: tf.read_file(filename), lambda: ' '.join(['0.0'] * number_of_columns))
# Tokenise every value so we can cast these to floats later.
single_observation = tf.string_split([single_observation], sep='\r\n ').values
single_observation = tf.reshape(single_observation, (-1, number_of_columns))
single_observation = tf.strings.to_number(single_observation, tf.float32)
return filename, single_observation
dataset = dataset.map(_parse_function_internal)
print('Result:')
for el in dataset:
try:
# Filename
print(el[0])
# Parsed file content
print(el[1])
except tf.errors.OutOfRangeError:
break

Creating image files that's named in numerical sequence

I have a script that's supposed to open a png image and then resize it and then save it as an jpg in numerical sequence. But the code for the number sequencing I copied from the internet isn't working with PIL. It gives me the exception "KeyError: 'W'"
import os
from PIL import Image
os.chdir('C:\\Users\\paul\\Downloads')
# open canvas.png
original = Image.open('canvas.png')
# resize image height to 2160
size = (3000, 2160)
original.thumbnail(size)
# convert to RGB
RGB = original.convert('RGB')
# save image as sequence
i = 0
while os.path.exists("image%s.jpg" % i):
i += 1
RGB.save("image%s.jpg" % i, "w")
Is there another way to do this?
Edit based on Haken Lid's comment
The PIL documentation says that the function save accepts these argument:
Image.save(fp, format=None, **params)
The parameter w you passed is not within the set of accepted file format.
Here you can see which formats are accepted. To make it works, just drop the w argument and substitute the %s with %d (i is an integer, not a string):
RGB.save("image%d.jpg" % i)
Note: from your tags it is not clear if you're using python2 or python3. If you are using python 3, I suggest to use the new method to format string:
RGB.save("image{}.jpg".format(i))
You can even specify a padding so that you can sort your file by name later on:
RGB.save("image{:04d}.jpg".format(i))
where 4 means that your number will be padded with zeros as to have length of at least 4.

Value error: numpy.loadtxt could not convert string to float

I am trying to plot a graph but it is impossible to correctly read the data under text form: I receive the message "Value error: could not convert string to float".
from matplotlib import pyplot as plt
import numpy as np
y,x = np.loadtxt('C:\\Users\\Sarah\\Downloads\\XRDdata.txt', unpack = True, delimiter = ';')
plt.plot(x,y)
plt.title('Diffractogramme de la substance KNO3')
plt.ylabel('Intensité (u.a.)')
plt.xlabel('Angle 2θ (°)')
plt.show()
My data looks like this:
19.04;24.5
19.37;11.57
23.57;100
23.84;55.4
27.74;1.7
29.06;5.69
29.44;65.53
32.38;33.95
32.44;7.54
33.09;19.97
33.68;36.61
33.87;48.58
34.06;24.59
37.35;8.61
38.01;4.57
38.63;9.22
39.32;2.83
40.74;1.59
41.2;52.8
41.85;25.27
43.71;11.8
44.18;21.33
45.22;2.31
46.64;21.82
46.79;9.81
47.02;7.97
48.22;2.96
48.8;1.97
51.62;1.67
51.86;3.78
etc.
How can I make it work ?
Thank you for your consideration of my troubles with this program.
Just to keep this question not unanswered:
The problem comes from the fact that you have a line containing a lot of white spaces (line 64 in this case) in your data.
One option is of course to manually delete them.
The other option is to use np.genfromtxt() instead of np.loadtxt().
x,y = np.genfromtxt('XRDdata.txt', unpack = True, delimiter = ';' )

Categories

Resources