Python - Strip all drive letters from csv file and replace with Z: - python

Here is the code example. Basically output.csv needs to remove any drive letter A:-Y: and replace it with Z: I tried to do this with a list (not complete yet) but it generates the error: TypeError: expected a character buffer object
#!/usr/bin/python
import os.path
import os
import shutil
import csv
import re
# Create the videos directory in the current directory
# If the directory exists ignore it.
#
# Moves all files with the .wmv extenstion to the
# videos folder for file structure
#
#Crawl the videos directory then change to videos directory
# create the videos.csv file in the videos directory
# replace any drive letter A:-Y: with Z:
def createCSV():
directory = "videos"
if not os.path.isdir("." + directory + "/"):
os.mkdir("./" + directory + "/")
for file in os.listdir("./"):
if os.path.splitext(file)[1] == ".wmv":
shutil.move(file, os.path.join("videos", file))
listDirectory = os.listdir("videos")
os.chdir(directory)
f = open("videos.csv", "w")
f.writelines(os.path.join(os.getcwd(), f + '\n') for f in listDirectory)
f = open('videos.csv', 'r')
w = open('output.csv', 'w')
f_cont = f.readlines()
for line in f_cont:
regex = re.compile("\b[GHI]:")
re.sub(regex, "Z:", line)
w.write(line)
f.close()
createCSV()
EDIT:
I think my flow/logic is wrong, the output.csv file that gets created still G: in the .csv it was not renamed to Z:\ from the re.sub line.

I can see you use some pythonic snippets, with smart uses of path.join and a commented code. This can get even better, let's rewrite a few things so we can solve your drive letters issue, and gain a more pythonic code on the way :
#!/usr/bin/env python
# -*- coding= UTF-8 -*-
# Firstly, modules can be documented using docstring, so drop the comments
"""
Create the videos directory in the current directory
If the directory exists ignore it.
Moves all files with the .wmv extension to the
videos folder for file structure
Crawl the videos directory then change to videos directory
create the videos.csv file in the videos directory
create output.csv replace any drive letter A:-Y: with Z:
"""
# not useful to import os and os.path as the second is contain in the first one
import os
import shutil
import csv
# import glob, it will be handy
import glob
import ntpath # this is to split the drive
# don't really need to use a function
# Here, don't bother checking if the directory exists
# and you don't need add any slash either
directory = "videos"
ext = "*.wmv"
try :
os.mkdir(directory)
except OSError :
pass
listDirectory = [] # creating a buffer so no need to list the dir twice
for file in glob.glob(ext): # much easier this way, isn't it ?
shutil.move(file, os.path.join(directory, file)) # good catch for shutil :-)
listDirectory.append(file)
os.chdir(directory)
# you've smartly imported the csv module, so let's use it !
f = open("videos.csv", "w")
vid_csv = csv.writer(f)
w = open('output.csv', 'w')
out_csv = csv.writer(w)
# let's do everything in one loop
for file in listDirectory :
file_path = os.path.abspath(file)
# Python includes functions to deal with drive letters :-D
# I use ntpath because I am under linux but you can use
# normal os.path functions on windows with the same names
file_path_with_new_letter = ntpath.join("Z:", ntpath.splitdrive(file_path)[1])
# let's write the csv, using tuples
vid_csv.writerow((file_path, ))
out_csv.writerow((file_path_with_new_letter, ))

It seems like the problem is in the loop at the bottom of your code. The string's replace method doesn't receive a list as its first arguments, but another string. You need to loop through your removeDrives list and call line.remove with every item in that list.

You could use
for driveletter in removedrives:
line = line.replace(driveletter, 'Z:')
thereby iterating over your list and replacing one of the possible drive letters after the other. As abyx wrote, replace expects a string, not a list, so you need this extra step.
Or use a regular expression like
import re
regex = re.compile(r"\b[FGH]:")
re.sub(regex, "Z:", line)
Additional bonus: Regex can check that it's really a drive letter and not, for example, a part of something bigger like OH: hydrogen group.
Apart from that, I suggest you use os.path's own path manipulation functions instead of trying to implement them yourself.
And of course, if you do anything further with the CSV file, take a look at the csv module.
A commentator above has already mentioned that you should close all the files you've opened. Or use with with statement:
with open("videos.csv", "w") as f:
do_stuff()

Related

Shutil find and remove files

I am trying to automate some work which is currently done by hand.
The aim is to find all the documents which have, for example, the number 408710 in their file name. Please note that the file name does also include other letters or figures. An example could be 2rsgf54087105f85sfr. The program should now search for all the files which own the combination 408710 and then move them into the right path.
I do know how to move the files, but so far I am only able to move the files by entering the exact file name. In that case I do only have one file and not all the files with the mentioned combination. Of course I do not know the exact file name in advance anyway.
Here the code for the stuff which is working:
import shutil
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
dst = "C:/Users/Startklar/Desktop/Empfangsordner/Sven"
dst2 = "C:/Users/Startklar/Desktop/Empfangsordner/Gerald"
# remove files
shutil.move(src=src + "/AA023300408710LFVI.docx", dst=dst)
shutil.move(src=src + "/BB023310187105ADIK.docx", dst=dst2)
If you just want to remove the files you can do it like this using regexp:
import os
import re
regexp = r'yourPattern.*\.docx$'
res = [f for f in os.listdir(path) if re.search(regexp , f)]
for f in res:
print('Remove: '+f)
os.remove(f)
You will need to find a regular expression which only finds all the files you would like to remove.
If you want infact move the files, like in your example, this looks like this (just guessing the regexp from your example)
import os
import re
src = "C:/Users/Startklar/Desktop/Ausgangsordner"
filters = [["C:/Users/Startklar/Desktop/Empfangsordner/Sven", r'.*LFVI\.docx$'],
["C:/Users/Startklar/Desktop/Empfangsordner/Gerald", r'.*ADIK\.docx$']]
for f in os.listdir(src):
for dst,regexp in filters:
if re.search(regexp , f):
shutil.move(src=f, dst=dst)

Edit multiple text files, and save as new files

My first post on StackOverflow, so please be nice. In other words, a super beginner to Python.
So I want to read multiple files from a folder, divide the text and save the output as a new file. I currently have figured out this part of the code, but it only works on one file at a time. I have tried googling but can't figure out a way to use this code on multiple text files in a folder and save it as "output" + a number, for each file in the folder. Is this something that's doable?
with open("file_path") as fReader:
corpus = fReader.read()
loc = corpus.find("\n\n")
print(corpus[:loc], file=open("output.txt","a"))
Possibly work with a list, like:
from pathlib import Path
source_dir = Path("./") # path to the directory
files = list(x for x in filePath.iterdir() if x.is_file())
for i in range(len(files)):
file = Path(files[i])
outfile = "output_" + str(i) + file.suffix
with open(file) as fReader, open(outfile, "w") as fOut:
corpus = fReader.read()
loc = corpus.find("\n\n")
fOut.write(corpus[:loc])
** sorry for multiple editting....
welcome to the site. Yes, what you are asking above is completely doable and you are on the right track. You will need to do a little research/practice with the os module which is highly useful when working with files. The two commands that you will want to research a bit are:
os.path.join()
os.listdir()
I would suggest you put two folders within your python file, one called data and the other called output to catch the results. Start and see if you can just make the code to list all the files in your data directory, and just keep building that loop. Something like this should list all the files:
# folder file lister/test writer
import os
source_folder_name = 'data' # the folder to be read that is in the SAME directory as this file
output_folder_name = 'output' # will be used later...
files = os.listdir(source_folder_name)
# get this working first
for f in files:
print(f)
# make output folder names and just write a 1-liner into each file...
for f in files:
output_filename = f.split('.')[0] # the part before the period
output_filename += '_output.csv'
output_path = os.path.join(output_folder_name, output_filename)
with open(output_path, 'w') as writer:
writer.write('some data')

How can I run a python script on many files to get many output files?

I am new at programming and I have written a script to extract text from a vcf file. I am using a Linux virtual machine and running Ubuntu. I have run this script through the command line by changing my directory to the file with the vcf file in and then entering python script.py.
My script knows which file to process because the beginning of my script is:
my_file = open("inputfile1.vcf", "r+")
outputfile = open("outputfile.txt", "w")
The script puts the information I need into a list and then I write it to outputfile. However, I have many input files (all .vcf) and want to write them to different output files with a similar name to the input (such as input_processed.txt).
Do I need to run a shell script to iterate over the files in the folder? If so how would I change the python script to accommodate this? I.e writing the list to an outputfile?
I would integrate it within the Python script, which will allow you to easily run it on other platforms too and doesn't add much code anyway.
import glob
import os
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(output_filename, 'w')
# Process the data
...
To output the resulting files in a separate directory I would:
import glob
import os
output_dir = 'processed'
os.makedirs(output_dir, exist_ok=True)
# Find all files ending in 'vcf'
for vcf_filename in glob.glob('*.vcf'):
vcf_file = open(vcf_filename, 'r+')
# Similar name with a different extension
output_filename = os.path.splitext(vcf_filename)[0] + '.txt'
outputfile = open(os.path.join(output_dir, output_filename), 'w')
# Process the data
...
You don't need write shell script,
maybe this question will help you?
How to list all files of a directory?
It depends on how you implement the iteration logic.
If you want to implement it in python, just do it;
If you want to implement it in a shell script, just change your python script to accept parameters, and then use shell script to call the python script with your suitable parameters.
I have a script I frequently use which includes using PyQt5 to pop up a window that prompts the user to select a file... then it walks the directory to find all of the files in the directory:
pathname = first_fname[:(first_fname.rfind('/') + 1)] #figures out the pathname by finding the last '/'
new_pathname = pathname + 'for release/' #makes a new pathname to be added to the names of new files so that they're put in another directory...but their names will be altered
file_list = [f for f in os.listdir(pathname) if f.lower().endswith('.xls') and not 'map' in f.lower() and not 'check' in f.lower()] #makes a list of the files in the directory that end in .xls and don't have key words in the names that would indicate they're not the kind of file I want
You need to import os to use the os.listdir command.
You can use listdir(you need to write condition to filter the particular extension) or glob. I generally prefer glob. For example
import os
import glob
for file in glob.glob('*.py'):
data = open(file, 'r+')
output_name = os.path.splitext(file)[0]
output = open(output_name+'.txt', 'w')
output.write(data.read())
This code will read the content from input and store it in outputfile.

Removing numbers and spaces in multiple file names with Python

I am trying to rename multiple mp3 files I have in a folder. They start with something like "1 Hotel California - The Eagles" and so on. I would like it to be just "Hotel California - The Eagles".
Also, there could be a "05 Hotel California - The Eagles" as well, which means removing the number from a different files would create duplicates, which is the problem I am facing. I want it to replace existing files/overwrite/delete one of them or whatever a solution might be.
P.S, Adding "3" to the "1234567890 " would remove the "3" from the .mp3 extension
I am new to python, but here is the code I am using to implement this
import os
def renamefiles():
list = os.listdir(r"E:\NEW")
print(list)
path = os.getcwd()
print(path)
os.chdir(r"E:\NEW")
for name in list:
os.rename(name, name.translate(None, "124567890 "))
os.chdir(path)
renamefiles()
And here is the error I get
WindowsError: [Error 183] Cannot create a file when that file already exists
Any help on how I could rename the files correctly would be highly appreciated!
You need to verify that the names being changed actually changed. If the name doesn't have digits or spaces in it, the translate will return the same string, and you'll try to rename name to name, which Windows rejects. Try:
for name in list:
newname = name.translate(None, "124567890 ")
if name != newname:
os.rename(name, newname)
Note, this will still fail if the file target exists, which you'd probably want if you were accidentally collapsing two names into one. But if you want silent replace behavior, if you're on Python 3.3 or higher, you can change os.rename to os.replace to silently overwrite; on earlier Python, you can explicitly os.remove before calling os.rename.
You can catch an OSError and also use glob to find the .mp3 files:
import os
from glob import iglob
def renamefiles(pth):
os.chdir(pth)
for name in iglob("*.mp3"):
try:
os.rename(name, name.translate(None, "124567890").lstrip())
except OSError:
print("Caught error for {}".format(name))
# os.remove(name) ?
What you do when you catch the error is up to you, you could keep some record of names found and increment a count for each or leave as is.
If the numbers are always at the start you can also just lstrip then away so you can then use 3 safely:
os.rename(name, name.lstrip("0123456789 "))
using one of your example strings:
In [2]: "05 Hotel California - The Eagles.mp3".lstrip("01234567890 ")
Out[2]: 'Hotel California - The Eagles.mp3'
Using your original approach could never work as desired as you would remove all spaces:
In [3]: "05 Hotel California - The Eagles.mp3".translate(None,"0124567890 ")
Out[3]: 'HotelCalifornia-TheEagles.mp3'
If you don't care what file gets overwritten you can use shutil.move:
import os
from glob import iglob
from shutil import move
def renamefiles(pth):
os.chdir(pth)
for name in iglob("*.mp3"):
move(name, name.translate(None, "124567890").lstrip())
On another note, don't use list as a variable name.
instead of using name.translate, import the re lib (regular expressions) and use something like
"(?:\d*)?\s*(.+?).mp3"
as your pattern. You can then use
Match.group(1)
as your rename.
For dealing with multiple files, add an if statement that checks if the file already exists in the library like this:
os.path.exists(dirpath)
where dirpath is the directory that you want to check in
I was unable to easily get any of the answers to work with Python 3.5, so here's one that works under that condition:
import os
import re
def rename_files():
path = os.getcwd()
file_names = os.listdir(path)
for name in file_names:
os.rename(name, re.sub("[0-9](?!\d*$)", "", name))
rename_files()
This should work for a list of files like "1 Hotel California - The Eagles.mp3", renaming them to "Hotel California - The Eagles.mp3" (so the extension is untouched).
Ok so what you want is:
create a new filename removing leading numbers
if that new filename exists, remove it
rename the file to that new filename
The following code should work (not tested).
import os
import string
class FileExists(Exception):
pass
def rename_files(path, ext, remove_existing=True):
for fname in os.listdir(path):
# test if the file name ends with the expected
# extension else skip it
if not fname.endswith(ext):
continue
# chdir is not a good idea, better to work
# with absolute path whenever possible
oldpath = os.path.join(path, fname)
# remove _leading_ digits then remove all whitespaces
newname = fname.lstrip(string.digits).strip()
newpath = os.path.join(path, newname)
# check if the file already exists
if os.path.exists(newpath):
if remove_existing:
# it exists and we were told to
# remove existing file:
os.remove(newpath)
else:
# it exists and we were told to
# NOT remove existing file:
raise FileExists(newpath)
# ok now we should be safe
os.rename(oldpath, newpath)
# only execute the function if we are called directly
# we dont want to do anything if we are just imported
# from the Python shell or another script or module
if __name__ == "__main__":
# exercice left to the reader:
# add command line options / arguments handling
# to specify the path to browse, the target
# extension and whether to remove existing files
# or not
rename_files(r"E:\NEW", ".mp3", True)
You just need to change directory to where *.mp3 files are located and execute 2 lines of below with python:
import os,re
for filename in os.listdir():
os.rename(filename, filname.strip(re.search("[0-9]{2}", filename).group(0)))

Python: How to create a unique file name?

I have a python web form with two options - File upload and textarea. I need to take the values from each and pass them to another command-line program. I can easily pass the file name with file upload options, but I am not sure how to pass the value of the textarea.
I think what I need to do is:
Generate a unique file name
Create a temporary file with that name in the working directory
Save the values passed from textarea into the temporary file
Execute the commandline program from inside my python module and pass it the name of the temporary file
I am not sure how to generate a unique file name. Can anybody give me some tips on how to generate a unique file name? Any algorithms, suggestions, and lines of code are appreciated.
Thanks for your concern
I didn't think your question was very clear, but if all you need is a unique file name...
import uuid
unique_filename = str(uuid.uuid4())
If you want to make temporary files in Python, there's a module called tempfile in Python's standard libraries. If you want to launch other programs to operate on the file, use tempfile.mkstemp() to create files, and os.fdopen() to access the file descriptors that mkstemp() gives you.
Incidentally, you say you're running commands from a Python program? You should almost certainly be using the subprocess module.
So you can quite merrily write code that looks like:
import subprocess
import tempfile
import os
(fd, filename) = tempfile.mkstemp()
try:
tfile = os.fdopen(fd, "w")
tfile.write("Hello, world!\n")
tfile.close()
subprocess.Popen(["/bin/cat", filename]).wait()
finally:
os.remove(filename)
Running that, you should find that the cat command worked perfectly well, but the temporary file was deleted in the finally block. Be aware that you have to delete the temporary file that mkstemp() returns yourself - the library has no way of knowing when you're done with it!
(Edit: I had presumed that NamedTemporaryFile did exactly what you're after, but that might not be so convenient - the file gets deleted immediately when the temp file object is closed, and having other processes open the file before you've closed it won't work on some platforms, notably Windows. Sorry, fail on my part.)
The uuid module would be a good choice, I prefer to use uuid.uuid4().hex as random filename because it will return a hex string without dashes.
import uuid
filename = uuid.uuid4().hex
The outputs should like this:
>>> import uuid
>>> uuid.uuid()
UUID('20818854-3564-415c-9edc-9262fbb54c82')
>>> str(uuid.uuid4())
'f705a69a-8e98-442b-bd2e-9de010132dc4'
>>> uuid.uuid4().hex
'5ad02dfb08a04d889e3aa9545985e304' # <-- this one
Maybe you need unique temporary file?
import tempfile
f = tempfile.NamedTemporaryFile(mode='w+b', delete=False)
print f.name
f.close()
f is opened file. delete=False means do not delete file after closing.
If you need control over the name of the file, there are optional prefix=... and suffix=... arguments that take strings. See https://docs.python.org/3/library/tempfile.html.
You can use the datetime module
import datetime
uniq_filename = str(datetime.datetime.now().date()) + '_' + str(datetime.datetime.now().time()).replace(':', '.')
Note that:
I am using replace since the colons are not allowed in filenames in many operating systems.
That's it, this will give you a unique filename every single time.
In case you need short unique IDs as your filename, try shortuuid, shortuuid uses lowercase and uppercase letters and digits, and removing similar-looking characters such as l, 1, I, O and 0.
>>> import shortuuid
>>> shortuuid.uuid()
'Tw8VgM47kSS5iX2m8NExNa'
>>> len(ui)
22
compared to
>>> import uuid
>>> unique_filename = str(uuid.uuid4())
>>> len(unique_filename)
36
>>> unique_filename
'2d303ad1-79a1-4c1a-81f3-beea761b5fdf'
I came across this question, and I will add my solution for those who may be looking for something similar. My approach was just to make a random file name from ascii characters. It will be unique with a good probability.
from random import sample
from string import digits, ascii_uppercase, ascii_lowercase
from tempfile import gettempdir
from os import path
def rand_fname(suffix, length=8):
chars = ascii_lowercase + ascii_uppercase + digits
fname = path.join(gettempdir(), 'tmp-'
+ ''.join(sample(chars, length)) + suffix)
return fname if not path.exists(fname) \
else rand_fname(suffix, length)
This can be done using the unique function in ufp.path module.
import ufp.path
ufp.path.unique('./test.ext')
if current path exists 'test.ext' file. ufp.path.unique function return './test (d1).ext'.
To create a unique file path if its exist, use random package to generate a new string name for file. You may refer below code for same.
import os
import random
import string
def getUniquePath(folder, filename):
path = os.path.join(folder, filename)
while os.path.exists(path):
path = path.split('.')[0] + ''.join(random.choice(string.ascii_lowercase) for i in range(10)) + '.' + path.split('.')[1]
return path
Now you can use this path to create file accordingly.

Categories

Resources