How to load images from a folder into pandas with class name - python

I have a folder which comprises of different images like dell_01.png, hp_01.png, toshiba_01.png and would like to create a dataframe from it which would look like this:
If the file starts with hp, it should be assigned to class 1. If it starts with toshiba, it should be assigned to class 1. If it starts with dell, it should be assigned to class 2 as seen in the below expected dataframe output.
filename class
hp_01.png 0
toshiba_01.png 1
dell_01.png 2

Break the problem up:
I have a folder... different images
So you need to get the filenames from the folder. User pathlib.Path.glob:
from pathlib import Path
for fn in Path("/path/to/folder/").glob("*.png"):
...
if the file starts with hp... class 1, toshiba... class 2
so you have a condition.
if fn.stem.startswith("hp"):
class = 1
elif ...
Now you can solve the two parts individually.
in a dataframe
Use the two constructs above to make a dictionary for every file. Then make a dataframe from that dict. Your code will look something like this:
files = []
for fn in Path("/path/to/pngs").glob("*.png"):
if fn.stem.startwith("hp"):
class = 0
...
files.append({"filename": fn.name, "class": class})
(Yes, there are more direct routes to getting this into a dataframe, but I was trying to make what was going on clearer.)
Does that help? I've deliberately avoided just writing the answer for you, but tried to get just close enough you can fill in the rest.

Related

Can I import a python file and change a variable and save it parmanently?

I created a file 'user.py' and I gave it a variable 'coin' = '100'
coin = 100
I created another file and import this code
import user
print(user.coin) # Output 100
user.coin = 50
This variable is not updated in the 'user.py' file. I can to change the value from 99 to 50.
I want the change in 'user.py' file
coin = 50
That's not how programming works. You for sure don't want to change the actual source code during execution.
What you are planing is more of a persistance topic. You could create a user that has a coins attribute and then store this somewhere - a file or a database for example. Then on the next execution you proceed from that state but your code should be unmodifiable except by yourself opening the file, writing stuff into it and saving again.
The variable 'coin' is assigned statically in user.py. You cannot change this in runtime. To change the assignation you would need to import the user.py as a textfile and edit accordingly.
See here.
assuming you know what you are doing, and there is no way to persist this data in json/yaml/xml do this:
import user
from inspect import getsource
import re
patterns = {
'coin':200
}
text =getsource(user)
for key,value in patterns.items():
finded = re.search(f'{key}.*',text).group()
text = text.replace(finded,f'{key} = {value}')
with open('user.py','w') as arq:
arq.write(text)

How to generate auto increment in a folder in python?

I am new to Python. Anyone help with how to generate auto-increment like B00001, B00002, B00003...... which can autosave the excel file name with a button in a specific folder.
I have tried with
global numXlsx
numXlsx = 1
wb.save(f'invoice/B{numXlsx}.xlsx')
numXlsx += 1
But when I click the button for few times with different data, it still keeps overwriting the B1.xlsx file. Anyone help with this :)
It sounds like the biggest problem you're having is that each button click is re-starting the execution of your python script, so using a global variable won't work since that doesn't persist across executions. In this case, I'd suggest using something like the pickle module to store and reload your counter value each time you execute your script. Using that module, your solution could look something like this:
import pickle
from pathlib import Path
# creates file if it doesn't exist
myfile = Path("save.p")
myfile.touch(exist_ok=True)
persisted = {}
with (open(myfile, "rb")) as f:
try:
persisted = pickle.load(f)
except EOFError:
print("file was empty, nothing to load")
# use get() to avoid KeyError if key doesn't exist
if persisted.get('counter') is None:
persisted['counter'] = 1
wb.save(f"invoice/B{persisted.get('counter')}.xlsx")
persisted['counter'] += 1
# save everything back into the same file to be used next execution
pickle.dump(persisted, open(myfile, "wb"))
BONUS: If you want the count to be padded with zeros in the filename, use persisted.get('counter'):05d in the curly brackets when saving the file. The 5 indicates you want the resulting value to be at least 5 characters long, so for example 2 would become 00002 and 111 would become 00111.
You can try using a global variable and incrementing it everytime.
Try with something like:
(inizialize it to 0)
global numXlsx # this is like your counter variable)
wb.save(f'folder/B{numXlsx}.xlsx')
numXlsx += 1 # Incrementing the variable so it does not overwrite the file as your code is doing
Have a nice day!

How do I import images with filenames corresponding to column values in a dataframe?

I'm a doctor trying to learn some code for work, and was hoping you could help me solve a problem I have with regards to importing multiple images into python.
I am working in Jupyter Notebook, where I have created a dataframe (named df_1) using pandas. In this dataframe each row represents a patient, and the first column shows the case number for each patient (e.g. 85).
Now, what I want to do is import multiple images (.bmp) from a given folder(same location as the .ipynb file). There are many images in this folder, and I do not want all of them - only the ones who have filenames corresponding to the "case_number" column in my dataframe (e.g. 85.bmp).
I already read this post, but I must admit it was way to complicated for me to understand.
Is there some simple loop (or something else) I could create to import all images with filenames corresponding to the values of the "case number" column in the dataframe?
I was imagining something like the below would be possible, I just do not know how to write it.
for i=[(df_1['case_number'()]
cv2.imread('[i].bmp')
The images don't really need to be implemented in the dataframe, but I would like to be able to view them in my notebook by using e.g. plt.imshow(85) afterwards.
Here is an image of the head of my dataframe
Thank you for helping!
You can access all of your files using this:
imageList = []
for i in range(0, len(df_1)):
cv2.imread('./' + str(df_1['case_number'][i]) + '.bmp')
imageList.append('./' + str(df_1['case_number'][i]) + '.bmp')
plt.imshow(imagelist[x])
This is looping through every item in the case_number column, the ./ shows that your file is within the current directory, using the directory path leading up to your current file. And by making everything a string and joining it you make it so that the file path is readable. The path created by joining the strings should look something like ./85.bmp, which should open your desired file. Also, you are appending the filenames to the list so that they can be accessed by the plt.imshow()
If you would like to access the files based on their name, you can use another variable (which could be set as an input) and implement the code below
fileName = input('Enter Your Value: ')
inputFile = imageList.index('./' + fileName + '.bmp')
and from here, you could use the same plt.imshow(imagelist[x]), but replace the x with the inputFile variable.

Pythonic way to modify a for loop

I'm using python in the lab to control measurements. I often find myself looping over a value (let's say voltage), measuring another (current) and repeating that measurement a couple of times to be able to average the results later. Since I want to keep all the measured data, I like to write it to disk immediately and to keep things organized I use the hdf5 file format. This file format is hierarchical, meaning it has some sort of directory structure inside that uses Unix style names (e.g. / is the root of the file). Groups are the equivalent of directories and datasets are more or less equivalent to files and contain the actual data. The code resulting from such an approach looks something like:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
group = hdf_file.create_group('/'+str(v))
v_source.voltage = v
for i in range(3):
group2 = group.create_group(str(i))
current = i_probe.current
group2.create_dataset('current', data = current)
hdf_file.close()
I've written a small library to handle the communication with instruments in the lab and I want this library to automatically store the data to file, without explicitly instructing to do so in the script. The problem I run into when doing this is that the groups (or directories if you prefer) still need to be explicitly created at the start of the for loop. I want to get rid of all the file handling code in the script and therefore would like some way to automatically write to a new group on each iteration of the for loop. One way of achieving this would be to somehow modify the for statement itself, but I'm not sure how to do this. The for loop can of course be nested in more elaborate experiments.
Ideally I would be left with something along the lines of:
import h5py
hdf_file = h5py.File('data.h5', 'w')
for v_source.voltage in range(5): # v_source.voltage=x sets the voltage of a physical device to x
for i in range(3):
current = i_probe.current # i_probe.current reads the current from a physical device
current_group.create_dataset('current', data = current)
hdf_file.close()
Any pointers to implement this solution or something equally readable would be very welcome.
Edit:
The code below includes all class definitions etc and might give a better idea of my intentions. I'm looking for a way to move all the file IO to a library (e.g. the Instrument class).
import h5py
class Instrument(object):
def __init__(self, address):
self.address = address
#property
def value(self):
print('getting value from {}'.format(self.address))
return 2 # dummy value instead of value read from instrument
#value.setter
def value(self, value):
print('setting value of {} to {}'.format(self.address, value))
source1 = Instrument('source1')
source2 = Instrument('source2')
probe = Instrument('probe')
hdf_file = h5py.File('data.h5', 'w')
for v in range(5):
source1.value = v
group = hdf_file.create_group('/'+str(v))
group.attrs['address'] = source1.address
for i in range(4):
source2.value = i
group2 = group.create_group(str(i))
group2.attrs['address'] = source2.address
group2.create_dataset('current', data = probe.value)
hdf_file.close()
Without seeing the code it is hard to see, but essentially from the looks of it the pythonic way to do this is that every time you add a new dataset, you want to check whether the directory exists, and if it does you want to append the new dataset, and if it doesn't you want to create a new directory - i.e. this question might help
Writing to a new file if not exist, append to file if it do exist
Instead of writing a new file, use it to create a directory instead. Another helpful one might be
How to check if a directory exists and create it if necessary?

Storing data globally in Python

Django and Python newbie here. Ok, so I want to make a webpage where the user can enter a number between 1 and 10. Then, I want to display an image corresponding to that number. Each number is associated with an image filename, and these 10 pairs are stored in a list in a .txt file.
One way to retrieve the appropriate filename is to create a NumToImage model, which has an integer field and a string field, and store all 10 NumToImage objects in the SQL database. I could then retrieve the filename for any query number. However, this does not seem like such a great solution for storing a simple .txt file which I know is not going to change.
So, what is the way to do this in Python, without using a database? I am used to C++, where I would create an array of strings, one for each of the numbers, and load these from the .txt file when the application starts. This vector would then lie within a static object such that I can access it from anywhere in my application.
How can a similar thing be done in Python? I don't know how to instantiate a Python object and then enable it to be accessible from other Python scripts. The only way I can think of doing this is to pass the object instance as an argument for every single function that I call, which is just silly.
What's the standard solution to this?
Thank you.
The Python way is quite similar: you run code at the module level, and create objects in the module namespace that can be imported by other modules.
In your case it might look something like this:
myimage.py
imagemap = {}
# Now read the (image_num, image_path) pairs from the
# file one line at a time and do:
# imagemap[image_num] = image_path
views.py
from myimage import imagemap
def my_view(image_num)
image_path = imagemap[image_num]
# do something with image_path

Categories

Resources