Sort images (using json) to folders with python - python

I've got a large set of images(in one folder) and I've labeled them using software. The output is a json file which contains the labels. I want to write a script that should be able to create folders and move the images according to the description label on the json file. So far I've got python to access json file and display the required label.
CODE1 :
import json
import os
with open('filedirectory.json') as json_data:
data = json.load(json_data)
for i, r in enumerate(data):
if r['label'] != 'tag'
print(i)
print(r['label']['tag1'])
CODE2 :
import json
import os
import shutil
path = "filedirectory//samplefolder"
try:
os.mkdir(path)
except OSError:
print ("Creation of the directory %s failed" % path)
else:
print ("Successfully created the directory %s " % path)
source = "filedirectory//images"
dest1 = "filedrectory//tag1"
dest1 = "filedrectory//tag2"
files = os.listdir(source)
with open('filedirectory.json') as json_data:
data = json.load(json_data)
for i, r in enumerate(data):
if r['label']['tag1'] = 'tag1'
shutil.move(f, tag1)
The first code displays the label output.
The second code is what I want to try but not sure if it'll work. Any help?

I see following issues with CODE2:
is: if r['label']['tag1'] = 'tag1'
should be: if r['label']['tag1'] == 'tag1'
in line: shutil.move(f, tag1) you are using f and tag1, but tag1 is not defined earlier and f is not defined earlier.
I do not know if there are other problems, but if you are fearing it will make mess with your files I suggest you to create function (at begin of CODE2) as follows:
def mock_move(a,b):
print('moving from:',a,'to',b)
Then replacing shutil.move with mock_move and launching your CODE2, thus you will be able to check if it is doing what it should without any movements of your files. After you are sure it works as intended you might replace mock_move with shutil.move and launch CODE2.

Related

files not deletes when using os module

I tried to make a program which delete all of the empty files ( whose size is zero ). Then, i run the program by dragging the script file in "command prompt" and run it .
However, no empty files had deleted (but i have some of them).
Please help me to find the error in my code.
import os
a = os.listdir('C:\\Python27')
for folder in a :
sizes = os.stat('C:\\Python27')
b = sizes.st_size
s = folder
if b == 0 :
remove('C:\\Python27\s')
You're assigning the values iterator os.listdir returns to folder and yet you aren't using it at all in os.stat or os.remove, but instead you are passing to them fixed values that you don't need.
You should do something like this:
import os
dir = 'C:\\Python27'
for file_name in os.listdir(dir):
file_path = os.path.join(dir, file_name)
if os.stat(file_path).st_size == 0:
os.remove(file_path)
You can delete something like the following code and you need to add some exception handling. I have used a test folder name to demonstrate.
import os
import sys
dir = 'c:/temp/testfolder'
for root, dirs, files in os.walk(dir):
for file in files:
fname = os.path.join(root, file)
try:
if os.path.getsize(fname) == 0:
print("Removing file %s" %(fname))
os.remove(fname)
except:
print("error: unable to remove 0 byte file")
raise

Unzip and Rename

I have written/ pieced together the following code to to compile a list of links and download the link. The links are downloaded as .zip archives each containing one .tif image that needs to be extracted and named the same as is parent zip folder. Everything about the script works correctly except for the extracting and renaming the zip folder portion below
The script still executes, but when you view the output, the .tif is in the correct directory, but has not been renamed.
What is the correct way to get the script to rename the extracted .tif?
also any other suggestions for improvement would be welcomed
FULL SCRIPT
import pandas as pd
import urllib
import os
import zipfile
data = pd.read_csv('StateRaster.csv')
links = data.SCAN_URL
file_names = data.API_NUMBER +"_"+ data.TOOL
dir = data.FOLDER +"/"+ data.SECTION2
root='g:/Data/Wells'
n=0
e=0
for link, file_name,dir in zip(links, file_names,dir):
try:
u = urllib.request.urlopen(link)
udata = u.read()
os.makedirs(os.path.join(root,dir), exist_ok=True)
f = open(os.path.join(root,dir,file_name+".zip"), "wb+")
f.write(udata)
f.close()
u.close()
zip_ref = zipfile.ZipFile((os.path.join(root,dir,file_name+".zip")), 'r')
#for filename in (os.path.join(root,dir,file_name+".zip")):
zip_ref.extractall((os.path.join(root, dir)))
for filename in ((os.path.join(root,dir))):
if filename.endswith(".tif"):
os.makedirs(os.path.join(root,dir,file_name+".tif"), exist_ok=True)
os.rename(filename,file_name+".tif")
zip_ref.close()
n += 1
print ('Pass',n,'Fail',e,'Total',n+e)
except:
e+=1
print ('Error-Pass',n,'Fail',e,'Total',n+e)
print("Done!!!!!")
The problem is around
for filename in ((os.path.join(root,dir))):
There is no need to put (( and )) around and this line simply iterates over characters. You need to os.walk or maybe glob.glob like this:
for filename in glob.glob(os.path.join(root, dir, "*.tif")):
print(filename)

Removing elements from list and importing list into another code

I'm trying to list all the html files in a single directory which works fine. However I'm trying to remove everything that is not a html file from that list as well.
i have 8 files called 1, 2, 3... etc. and 6.htmxl (remove) and pipe.sh.save(remove)
The code I made removes the .htmxl file but does not remove the .sh.save file from the list.
from os import listdir
from os.path import isfile, join
import pyimport
import time
def main():
onlyfiles = [f for f in listdir('/home/pi/keyboard/html') if isfile(join('/home/pi/keyboard/html',f)) ]
j = 0
print len(onlyfiles)
for i in onlyfiles:
if i.find(".html") == -1:
print i
print "not a html"
j = onlyfiles.index(i)
print j
del onlyfiles[j]
else:
print i
print "html found"
time.sleep(0.5)
outfiles = onlyfiles
print outfiles
return outfiles
if __name__ == "__main__":
main()
I also have another code which is suppose to get the "outfiles" list
import server_handler
files = server_handler.main()
fileList = server_handler.outfiles
But when I run it I get:
AttributeError: 'module' object has no attribute 'outfiles'
I'm running nearly the exact same code on another code which is creating 'output' and I import it in the exact same way so I'm not sure why its not importing correctly.
You might find the following approach more suitable, it uses Python's glob module:
import glob
print glob.glob('/home/pi/keyboard/html/*.html')
This will return you a list of all files in that folder ending in .html automatically.
Replace i.find(".html") with i.endswith(".html"). Now you should get only HTML files (or more precisely only files that look like HTML).
For the second part - remove:
fileList = server_handler.outfiles
server_handler.main() already returns the list so you have it in files. The last line doesn't make any sense.
EDIT:
Martin's answer has better way to get files. So I recommend you use his advice and not mine :).
This would work if main was a class, and you used
class main():
def __init__(self):
....
self.outfiles = onlyfiles
return self.outfiles

shutil.move doesn't delete source files

Newish Python guy here. I've written what I thought would be a fairly simple script to extract the creation date metadata from photos and video and move them to a new folder based on year and month. I'm using PIL for picture and hachoir for video metadata.
For the most part I've got it working until I actually use shutil.move. At that point all the jpg's move to the new folders just fine. But all the videos are being COPIED. The original files are being left in the source folder.
My assumption is that some process that I invoke during the script is still accessing the video file and not letting it be deleted. Can anyone tell me what I'm messing up, and how I can release these video files to be moved?
========================
import os.path, time, sys, shutil
from PIL import Image
from PIL.ExifTags import TAGS
from hachoir_core.error import HachoirError
from hachoir_core.cmd_line import unicodeFilename
from hachoir_parser import createParser
from hachoir_core.tools import makePrintable
from hachoir_metadata import extractMetadata
from hachoir_core.i18n import getTerminalCharset
def get_field (exif,field) :
for (k,v) in exif.iteritems():
if TAGS.get(k) == field:
return v
for picture in os.listdir(os.getcwd()):
if picture.endswith(".jpg") or picture.endswith(".JPG"):
print picture
rawMetadata = Image.open(picture)._getexif()
datetime = get_field(rawMetadata, 'DateTime')
datedict = {'year' : datetime[0:4], 'month' : datetime[5:7]}
target = datedict['year']+'-'+ datedict['month']
if not os.path.isdir(target):
newdir = os.mkdir(target)
if picture not in target:
shutil.move(picture, target)
if picture.endswith('.mov') or picture.endswith('.MOV') or \
picture.endswith('mp4') or picture.endswith('.MP4'):
picture, realname = unicodeFilename(picture), picture
parser = createParser(picture, realname)
rawMetadata = extractMetadata(parser)
text = rawMetadata.exportPlaintext()
datedict = {'year' : text[4][17:21], 'month' : text[4][22:24]}
target = datedict['year']+'-'+ datedict['month']
dest = os.path.join(target, picture)
if not os.path.isdir(target):
newdir = os.mkdir(target)
if picture not in target:
try:
shutil.move(picture, dest)
except WindowsError:
pass
The in operator says whether items are in collections (e.g. an element in a list) or strings are substrings of other strings. It doesn't know that your string variable target is the name of a directory, nor does it know anything about checking directories to see if files are in them. Instead, use:
if os.path.exists(dest):
It's hard to tell what exactly is failing without a decent error code. Use this in your except block to get more answers:
except WindowsError as e:
print("There was an error copying {picture} to {target}".format(
picture=picture,target=target))
print("The error thrown was {e}".format
(e=e))
print("{picture} exists? {exist}".format(
picture=picture, exist=os.exists(picture))
print("{target} exists? {exist}".format(
target=target,exist=os.exists(target))
Note that much can be said for the logging module for errors like these. It's rather outside the scope of this question, though.

Opening multiple python files from folder

I am trying to take a folder which contains 9 files, each containing FASTA records of separate genes, and remove duplicate records. I want to set it up so that the script is called with the folder that contains the genes as the first parameter, and a new folder name to rewrite the new files without duplicates to. However, if the files are stored in a folder called results within the current directory it is not letting me open any of the gene files within that folder to process them for duplicates. I have searched around and it seems that I should be able to call python's open() function with a string of the file name like this:
input_handle = open(f, "r")
This line is not allowng me to open the file to read its contents, and I think it may have something to do with the type of f, which shows to be type 'str' when I call type(f)
Also, if I use the full path:
input_handle = open('~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")
It says that no such file exists. I have checked my spelling and I am sure that the file does exist. I also get that file does not exist if I try to call its name as a raw string:
input_handle = open(r'~/Documents/Research/Scala/hiv-biojava-scala/results/rev.fa', "r")
Or if I try to call it as the following it says that no global results exists:
input_handle = open(os.path.join(os.curdir,results/f), "r")
Here is the full code. If anybody knows what the problem is I would really appreciate any help that you could offer.
#!/usr/bin/python
import os
import os.path
import sys
import re
from Bio import SeqIO
def processFiles(files) :
for f in files:
process(f)
def process(f):
input_handle = open(f, "r")
records = list(SeqIO.parse(input_handle, "fasta"))
print records
i = 0
while i < len(records)-1:
temp = records[i]
next = records[i+1]
if (next.id == temp.id) :
print "duplicate found at " + next.id
if (len(next.seq) < len(temp.seq)) :
records.pop(i+1)
else :
records.pop(i)
i = i + 1
output_handle = open("out.fa", "w")
for record in records:
SeqIO.write(records, output_handle, "fasta")
input_handle.close()
def main():
input_folder = sys.argv[1]
out_folder = sys.argv[2]
if os.path.exists(out_folder):
print("Folder %s exists; please specify empty folder or new one" % out_folder)
sys.exit(1)
os.makedirs(out_folder)
files = os.listdir(input_folder)
print files
processFiles(files)
main()
Try input_handle = open(os.path.join(os.getcwd,results/f), "r"). os.curdir returns . See mail.python.org/pipermail/python-list/2012-September/631864.html.

Categories

Resources