How to rename filename in Arabic in Python - python

I have a script to rename folder and file name from English to another language. So far this script work well for language reading from left to right. However, when I run this script on a language reading from right to left (e.g. Arabic), I run into an:
errorFileExistsError: [WinError 183] Cannot create a file when that file already exists
My folder structure looks like this:
C:\Users\ABC\Desktop\Template\Report Element Snippets\Review.
Inside the Review folder, I have a file called Blue Review Note.xml. The full file path of this xml file should be
C:\User\ABC\Desktop\Template\Report Element Snippets\Review\Blue Review Note.xml
I will need to rename the Report Element Snippets and Review folders first then run another loop to rename the xml file to Arabic.
The code to rename the xml file is:
os.rename(os.path.join(dirpath,file)
os.path.join(dirpath,newfname))
The problem I could see from tracing the print out of the path is that os.path.join(dirpath,file) give me:
C:\Users\ABC\Desktop\Template\تقرير قصاصات العنصر\إعادة النظر\Blue Review Note. xml
where إعادة النظر is Review and تقرير قصاصات العنصر is Report Element Snippets
but os.path.join(dirpath,newfname) give me:
C:\Users\ABC\Desktop\Template\تقرير قصاصات العنصر\إعادة النظر\ملاحظة مراجعة باللون الأزرق.xml
ملاحظة مراجعة باللون الأزرق.xml is for Blue Review Note.xml
As you can see, the join statement has split the ملاحظة مراجعة باللونالأزرق.xml into 2 parts in the full path. The ملاحظة مراجعة bit is not put in at the starting of the Arabic path and leave the file name as باللون الأزرق.xml but it doesn't have the \ to separate the xml file name with تقرير قصاصات العنصر folder. To me the paths between before and after rename of the xml file are different hence Python can't apply the rename on the folder.
I just wonder has anyone has this issue before when working with Arabic file name?

Related

Python: Iterate through directory to find specific text file, open it, write some of its contents to a new file, close it, and move on to the next dir

I have a script that takes an input text file then finds data in it, puts that data as a variable, then later I call that variable to write to a new file. This snippet of code is just for reading the txt file and storing the data from it as variables.
searchfile = open('C://Users//Me//DynamicFolder//report//summary.txt','r', encoding='utf-8')
slab_count=0
slab_number=[]
slab_total=0
for line in searchfile:
if "Slab" in line:
slab_num = ([float(s) for s in re.findall(r'[-+]?(?:\d*\.\d+|\d+)', line)])
slab_percent = slab_num[-1]
slab_number.append(slab_percent)
slab_count=slab_count+1
slab_total=0
for slab_percent in slab_number:
slab_total+=slab_percent
searchfile.close()
I am using xlsxwriter to write the variables to an excel doc.
My question is, how do I iterate this to search through a given directories sub-directories for summary.txt when there is a dynamic folder.
So C://Users//Me//DynamicFolder//report//summary.txt is a path to one of the files. There are several folders I named DynamicFolder that are there because another process puts them there, they change their names all the time. I need have this script go into each of those dynamic folders to a subdir called report, this is a static name and is always the same. So each of those dynamicfolders has another subdir called report, and in the report folder is a file called summary.txt. I am trying to go through each of those dynamicfolders into the subdir report > summary.txt and then opening and writing data from those txt files.
How do I iterate or loop this? Right now I have 18 folders with those DynamicFolder names that will change when they are over written. How can I put this snip of code to iterate through?
for path in Path('C://Users//Me//DynamicFolder//report//summary.txt').rglob('summary.txt'):
report folder is not the only folder with a summary.txt file, but its the only folder with the file I want. So this code above pulls ALL summary.txt files from all subdir's under the DynamicFolder (not just report folder). I am wondering if I can make this JUST do the 'report' subdir folders under DynamicFolders, and somehow use this to iterate the rest of my code?

os.listdir() adds characters to the beginning of file name?

I had a quick google of this but couldn't find anything. I'm using os to get a list of all the file names in the current working directory using the following code:
path = os.getcwd()
files = os.listdir(path)
The list of files returns fine, but the last element has an extra '~$' that isn't in the actual file name. For example:
files
['File1.xlsx', 'File2.xlsx', '~$File3.xlsx']
This is then causing an issue when I iterate through these files to try and import them, as I get the error of:
[Errno 2] No such file or directory: 'C:\\Users\\$File3.xlsx'
If anyone knows why this happens and how I can fix/prevent it, that would be great!
Just thought I'd answer in case anyone else has this issue.
It's nothing to do with os. It happened because I had File3 open in Excel while pulling the list of file names. I've found out that opening a microsoft document creates a temporary 'lock' file, which are denoted by '~$' (this is how it can re-open unsaved data if it crashes etc).
I found the below from here:
The files you are describing are so-called owner files (sometimes
referred to as "lock" files). An owner file is created when you work
with a document ... and it should be deleted when you save your
document and exit.
There's also a SO question about this within Microsoft files, which can be found here

Why Does a Strange File Shows Up in Directory When Using os.walk()?

The project is written in Pycharm on Windows 10.
I wrote a program that grabs .docx files from a directory and searches for information. At the end of the list of file names I get this file: "~$640188.docx"
I get this error when it hits this file:
raise BadZipfile, "File is not a zip file"
zipfile.BadZipfile: File is not a zip file
This error happens when I try to put file '~$640188.docx' into the docx2text method process
text = docx2txt.process(r'C:\path\to\folder\~$640188.docx')
From what I can see, this file does not exist in the directory I'm searching nor anywhere on my computer. The other strange part is that yesterday I wasn't getting this error.
I know there are sometimes "hidden" files in directories and I ran into those before on my mac (specifically '.DS_Store') but this is a .docx file.
I currently have an ugly solution, which says "don't run the code if you run into '~$640188.docx'". My concern is that this will become more of a problem when I dump 11000 files into the directory.
Where does this file come from?
Below is the code for reference
import docx2txt
import os
check_files = []
for dir, subdir, files in os.walk(r'C:\path\to\folder'):
for file in files:
check_files.append(file)
for file in check_files:
print "file: {0}".format(file)
text = docx2txt.process(r'C:\path\to\folder\{0}'.format(file))
Hidden .docx files starting with ~$ are simply temporary files created by Word while a file is actively open and being edited – the first two characters of the respective parent file's name are replaced with the ~$. They are usually deleted once you save and close a document, but sometimes they manage to stick around after you quit anyway. Since they are designed to be temporary compliments to a proper .docx file, they do not necessary have the correct zip package structure at all times.
You will do well to skip those. Checking if the file name starts with '~' should be good enough. Just add the following filtering:
check_files2 = [fl for fl in check_files if fl[0] != '~']
for file in check_files2:

Read a file name list and perform code on each file in that list.

I seem to be encountering a problem every time I run this program. I am new to programming and with some help me and my partner have developed this code. Essentially there is a text tile called "Files" which contains the names of 80,000+ files to be analysed. A brief outline - The code needs to run over all the data files and return a set of file names in the output file if R_1 or R_2 has a value in the range of 0.9 to 1.1.
I have made an attempt at trying to read the names of the file but it does not seem to work. Every time it simply says indent error expected indent, then once I have added an indent, it says unexpected indent in that same location. I have made a replica directory on sky drive of the data files and code, and included the 3 data files I am using to develop this code. The skydrive thing did not let me add the suitable data text files which was essentially a blank file.
https://skydrive.live.com/redir?resid=E5A0B5D5F1A45A4D!231&authkey=!AJQqmATbrTr2Rko&ithint=folder%2c.txt
I know this probably isn't the most efficient code for this but any help with the file input and output will be greatly appreciated. Also in the file name file, i have simply put the data file names because they are in the same directory, so i assume the code will run in this directory and therefore the full file paths are not needed... is that a valid assumption or will i need to include the full path to each file?
Thank you!
You need to change your editor settings to:
Use 4 spaces instead of each tab
Strip training spaces off each line
As the line after your for has more spaces/tabs than the next line.

I don't understand how to specify a path for a file I want to open in python

I am a new user to Python and I tried to import genbank and fasta format files.
In their documentation, they provide an example that illustrates how we can import datasets into Python.
specifically, they provide the following example in the Biopython Tutorial and Cookbook, page 16:
from Bio import SeqIO
for seq_record in SeqIO.parse("ls_orchid.gbk", "genbank"):
print seq_record.id
print repr(seq_record.seq)
print len(seq_record)
Now, they mention in page 14 that the Biopython source code contains this file which is true. However, how does python know through the Bio import SeqIO where the file exactly is?
Note that I tried the above code after installing biopython and its components but it never worked?
Also, can I just specify the path for the genbank file and open it somehow!
Thank you
According to http://biopython.org/DIST/docs/tutorial/Tutorial.html#htoc10
You need to copy the files to your local directory
When this tutorial was originally written, this search gave us only 94
hits, which we saved as a FASTA formatted text file and as a GenBank
formatted text file (files ls_orchid.fasta and ls_orchid.gbk, also
included with the Biopython source code under
docs/tutorial/examples/).
If you run the search today, you’ll get hundreds of results! When
following the tutorial, if you want to see the same list of genes,
just download the two files above or copy them from docs/examples/ in
the Biopython source code. In Section 2.5 we will look at how to do a
search like this from within Python.
It seems that you need to save the ls_orchid.gbk file in the same directory as your python script, otherwise you would need to specify the full path to the file. You can also just download any genbank file from NCBI and add it to the directory, or specify its location as such
for seq_record in SeqIO.parse("~/File Location/File Name", "genbank")
i put Genbank and FASTA in C:\Python27
I could parse all kind of other file formats such as Newick, PhyloXML, etc
U should contact the developers for further information

Categories

Resources