I have a Python script that prompts for text input, searches an online Korean dictionary, and then downloads MP3 audio files for the words found. I use the script to help me make Anki flashcards with audio. The script is originally from this post on reddit.
I can execute the script from the terminal while in the directory that the script is stored in. However, when I am in a different directory and execute the script by calling its full path, the script appears to run but does not find any words or download any MP3s. I cannot figure out why the script fails to execute correctly when I call it from a different directory.
The script is stored in the downloads folder on my Mac /Users/matt/Downloads
So, when I run the following commands, it works:
cd Downloads
python3 naver.py
However, when I run the following, the script executes, but doesn't download any MP3s:
python3 /Users/matt/Downloads/naver.py
The full Python script is here:
import urllib.request, json, codecs, math, time
def searchWords(koreanWords):
url = ('https://ko.dict.naver.com/api3/koko/search?' + urllib.parse.urlencode({'query': koreanWords}) + '&range=word&page=1')
response = urllib.request.urlopen(url)
reader = codecs.getreader("utf-8")
jsonInfo = json.load(reader(response))
pageCount = jsonInfo["pagerInfo"]["totalPages"]
searchData = jsonInfo["searchResultMap"]["searchResultListMap"]["WORD"]["items"]
for pageCountInc in range(0, pageCount):
if pageCountInc != 0:
url = ('https://ko.dict.naver.com/api3/koko/search?' + urllib.parse.urlencode({'query': koreanWords}) + '&range=word&page=' + str(pageCountInc+1))
response = urllib.request.urlopen(url)
reader = codecs.getreader("utf-8")
jsonInfo = json.load(reader(response))
searchData = jsonInfo["searchResultMap"]["searchResultListMap"]["WORD"]["items"]
for z in range (0, len(searchData)):
if searchData[z]["handleEntry"] in unchangedWordList:
if searchData[z]["searchPhoneticSymbolList"]:
if searchData[z]["searchPhoneticSymbolList"][0]["phoneticSymbolPath"] != "":
timesDownloaded[unchangedWordList.index(searchData[z]["handleEntry"])] += 1
mp3Link = searchData[z]["searchPhoneticSymbolList"][0]["phoneticSymbolPath"]
if mp3Link not in mp3Links:
mp3Links.append(mp3Link)
urllib.request.urlretrieve(mp3Link, searchData[z]["handleEntry"] + str(timesDownloaded[unchangedWordList.index(searchData[z]["handleEntry"])]) + ".mp3")
time.sleep(.3)
def parseWords(listOfWords):
for x in range(0, math.floor(len(listOfWords)/10)):
tempWords = []
for y in range(0, 10):
tempWords.append(listOfWords[x*10+y])
print("Searching: " + str(x+1) + "/" + str(math.ceil(len(listOfWords)/10)))
searchWords(tempWords)
tempWords = []
for y in range(math.floor(len(listOfWords)/10)*10+1, len(listOfWords)):
tempWords.append(listOfWords[y])
print("Searching: " + str((math.ceil(len(listOfWords)/10))) + "/" + str(math.ceil(len(listOfWords)/10)))
searchWords(tempWords)
unfoundWords = []
unchangedWordList = []
timesDownloaded = []
mp3Links = []
wordInputs = unchangedWordList = input('Enter Words: ').split()
timesDownloaded = [0] * len(unchangedWordList)
parseWords(wordInputs)
for z in range(0, len(timesDownloaded)):
if(timesDownloaded[z] == 0):
unfoundWords.append(unchangedWordList[z])
if unfoundWords:
print(",".join(str(x) for x in unfoundWords) + " could not be found.")
print("Rerunning individual searches for unfound words.")
print(unfoundWords)
oldUnfoundWords = unfoundWords
unfoundWords = []
for x in range(0, len(oldUnfoundWords)):
print("Searching: " + str(x+1) + "/" + str(len(oldUnfoundWords)))
searchWords(oldUnfoundWords[x])
for z in range(0, len(timesDownloaded)):
if(timesDownloaded[z] == 0):
unfoundWords.append(unchangedWordList[z])
if unfoundWords:
print(",".join(str(x) for x in unfoundWords) + " could not be found.")
To answer question of how to save to a specific folder use pathlib to construct the path to MP3 folder.
import os
from pathlib import Path
# Create parent folder
mp3DIR = os.path.join(Path.home(),'Music')
basename = searchData[z]["handleEntry"]
+ str(timesDownloaded[unchangedWordList.index(searchData[z]["handleEntry"])]) + ".mp3"
urllib.request.urlretrieve(mp3Link, os.path.join(mp3Dir, basename))
The reason is the following:
Your python file runs in your current directory. So, when you run this: python3 /Users/matt/Downloads/naver.py, it either runs and saves the mp3 files in the current directory, or it doesn't save anything at all if it doesn't have the permissions to.
Related
Looking to use win32 to compare multiple word docs. The naming convention is the same except the modified doc has test.docx added to the file name. The below is the code i have but it is coming up with "pywintypes.com_error: (-2147023170, 'The remote procedure call failed.', None, None)". Any ideas on how i can get this to work? I have around 200docs to compare so python seems to be the way to do it.
import win32com.client
from docx import Document
import os
def get_docx_list(dir_path):
'''
:param dir_path:
:return: List of docx files in the current directory
'''
file_list = []
for path,dir,files in os.walk(dir_path):
for file in files:
if file.endswith("docx") == True and str(file[0]) != "~": #Locate the docx document and exclude temporary files
file_root = path+"\\"+file
file_list.append(file_root)
print("The directory found a total of {0} related files!".format(len(file_list)))
return file_list
def main():
modified_path = r"C:\...\Replaced\SWI\\"
original_path = r"C:\...\Replaced\SWI original\\"
for i, file in enumerate(get_docx_list(modified_path), start=1):
print(f"{i}、Files in progress:{file}")
for i, files in enumerate(get_docx_list(original_path), start=1):
Application = win32com.client.gencache.EnsureDispatch("Word.Application")
Application.CompareDocuments(
Application.Documents.Open(modified_path + file),
Application.Documents.Open(str(original_path) + files))
Application.ActiveDocument.SaveAs(FileName=modified_path + files + "Comparison.docx")
Application.Quit()
if __name__ == '__main__':
main()
For anyone chasing the solution to do bulk word comparisons below is the code I successfully ran through a few hundred docs. Delete the print statements once you have the naming convention sorted.
import win32com.client
import os
def main():
#path directories
modified_path = r"C:\Users\Admin\Desktop\Replaced\SOP- Plant and Equipment\\"
original_path = r"C:\Users\Admin\Desktop\Replaced\SOP - Plant and Equipment Original\\"
save_path = r"C:\Users\Admin\Desktop\Replaced\TEST\\"
file_list1 = os.listdir(r"C:\Users\Admin\Desktop\Replaced\SOP- Plant and Equipment\\")
file_list2 = os.listdir(r"C:\Users\Admin\Desktop\Replaced\SOP - Plant and Equipment Original\\")
#text counter
Number = 0
#loop through files and compare
for file in file_list1:
for files in file_list2:
#if files match do comparision, naming convention to be changed
if files[:-5] + " test.docx" == file:
Number += 1
print(f"The program has completed {Number} of a total of {len(file_list1)} related files!")
try:
Application = win32com.client.gencache.EnsureDispatch("Word.Application")
Application.CompareDocuments(
Application.Documents.Open(modified_path + file),
Application.Documents.Open(str(original_path) + files))
Application.ActiveDocument.ActiveWindow.View.Type = 3
Application.ActiveDocument.SaveAs(FileName=save_path + files[:-5] + " Comparison.docx")
except:
Application.Quit()
pass
if __name__ == '__main__':
main()
Good day.
I wrote a little Python program to help me easily create .cbc files for Calibre, which is just a renamed .zip file with a text file called comics.txt for TOC purposes. Each chapter is another zip file.
The issue is that the last zip file zipped always has the error "Unexpected end of data". The file itself is not corrupt, if I unzip it and rezip it it works perfectly. Playing around it seems that the problem is that Python doesn't close the last zip file after zipping it, since I can't delete the last zip while the program is still running since it's still open in Python. Needless to say, Calibre doesn't like the file and fails to convert it unless I manually rezip the affected chapters.
The code is as follows, checking the folders for not-image files, zipping the folders, zipping the zips while creating the text file, and "changing" extension.
import re, glob, os, zipfile, shutil, pathlib, gzip, itertools
Folders = glob.glob("*/")
items = len(Folders)
cn_list = []
cn_list_filtered = []
dirs_filtered = []
ch_id = ["c", "Ch. "]
subdir_im = []
total = 0
Dirs = next(os.walk('.'))[1]
for i in range(0, len(Dirs)):
for items in os.listdir("./" + Dirs[i]):
if items.__contains__('.png') or items.__contains__('.jpg'):
total+=1
else:
print(items + " not an accepted format.")
subdir_im.append(total)
total = 0
for fname in Folders:
if re.search(ch_id[0] + r'\d+' + r'[\S]' + r'\d+', fname):
cn = re.findall(ch_id[0] + "(\d+[\S]\d+)", fname)[0]
cn_list.append(cn)
elif re.search(ch_id[0] + r'\d+', fname):
cn = re.findall(ch_id[0] + "(\d+)", fname)[0]
cn_list.append(cn)
elif re.search(ch_id[1] + r'\d+' + '[\S]' + r'\d+', fname):
cn = re.findall(ch_id[1] + "(\d+[\S]\d+)", fname)[0]
cn_list.append(cn)
elif re.search(ch_id[1] + r'\d+', fname):
cn = re.findall(ch_id[1] + "(\d+)", fname)[0]
cn_list.append(cn)
else:
print('Warning: File found without proper filename format.')
cn_list_filtered = set(cn_list)
cn_list_filtered = sorted(cn_list_filtered)
cwd = os.getcwd()
Dirs = Folders
subdir_zi = []
total = 0
for i in range(0, len(cn_list_filtered)):
for folders in Dirs:
if folders.__contains__(ch_id[0] + cn_list_filtered[i] + " ")\
or folders.__contains__(ch_id[1] + cn_list_filtered[i] + " "):
print('Zipping folder ', folders)
namezip = "Chapter " + cn_list_filtered[i] + ".zip"
current_zip = zipfile.ZipFile(namezip, "a")
for items in os.listdir(folders):
if items.__contains__('.png') or items.__contains__('.jpg'):
current_zip.write(folders + "/" + items, items)
total+=1
subdir_zi.append(total)
total = 0
print('Folder contents in order:', subdir_im, ' Total:', sum(subdir_im))
print("Number of items per zip: ", subdir_zi, ' Total:', sum(subdir_zi))
if subdir_im == subdir_zi:
print("All items in folders have been successfully zipped")
else:
print("Warning: File count in folders and zips do not match. Please check the affected chapters")
zips = glob.glob("*.zip")
namezip2 = os.path.basename(os.getcwd()) + ".zip"
zipfinal = zipfile.ZipFile(namezip2, "a")
for i in range(0, len(zips), 1):
zipfinal.write(zips[i],zips[i])
Data = []
for i in range (0,len(cn_list_filtered),1):
Datai = ("Chapter " + cn_list_filtered[i] + ".zip" + ":Chapter " + cn_list_filtered[i] + "\r\n")
Data.append(Datai)
Dataok = ''.join(Data)
with zipfile.ZipFile(namezip2, 'a') as myzip:
myzip.writestr("comics.txt", Dataok)
zipfinal.close()
os.rename(namezip2, namezip2 + ".cbc")
os.system("pause")
I am by no means a programmer, that is just a Frankenstein monster code I eventually managed to put together by checking threads, but this last issue has me stumped.
Some solutions I tried are:
for i in range(0, len(zips), 1):
zipfinal.write(zips[i],zips[i])
zips[i].close()
Fails with:
zips[i].close()
AttributeError: 'str' object has no attribute 'close'
and:
for i in range(0, len(zips), 1):
zipfinal.write(zips[i],zips[i])
zips[len(zips)].close()
Fails with:
zips[len(zips)].close()
IndexError: list index out of range
Thanks for the help.
This solved my issue:
def generate_zip(file_list, file_name=None):
zip_buffer = io.BytesIO()
zf = zipfile.ZipFile(zip_buffer, mode="w", compression=zipfile.ZIP_DEFLATED)
for file in file_list:
print(f"Filename: {file[0]}\nData: {file[1]}")
zf.writestr(file[0], file[1])
**zf.close()**
with open(file_name, 'wb') as f:
f.write(zip_buffer.getvalue())
f.close()
I'm trying to download a export of space in a zip file. But somehow python downloads a empty and corrupted zip file. When you download the file manual by the browser everything is ok.
I use Python 2.7.13
#!/usr/bin/python
import xmlrpclib
import time
import urllib
confluencesite = "https://confluence.com"
server = xmlrpclib.ServerProxy(confluencesite + '/rpc/xmlrpc')
username = '*'
password = '*'
token = server.confluence2.login(username, password)
loginString = "?os_username=" + username + "&os_password=" + password
filelist = ""
start = True
spacesummary = server.confluence2.getSpaces(token)
for space in spacesummary:
#if space['name'] == "24-codING":
# start = True
# continue
if start:
if space['type'] == 'global':
print "Exporting space " + space['name']
spaceDownloadUrl = server.confluence2.exportSpace(token, space['key'],
"TYPE_XML",
exportAll['true'])
filename = spaceDownloadUrl.split('/')[-1].split('#')[0].split('?')[0]
time.sleep(0.5)
urllib.urlretrieve(spaceDownloadUrl + loginString, filename)
print filename + " saved."
f = open("exportedspaces.txt", 'a')
f.write(filename + "\n")
f.close()
It's solved by the answer of Coldspeed. Changing the following:
loginString = "?os_username=" + username + "&os_password=" + password
to
loginString = "?os_username=" + username + "&os_password=" + password
i have a python script that when is run from eclipse it does what i want without any errors or anything.
I want now to create a batch file, that will run my script in a loop (infinitely).
The first problem is that i when i run the bat file, i get a second cmd window that shows the logging from my python script (which shows me that it is running) but when the main process of the script starts(which can take from 1 minute to some hours) it exits within a few second without actually running all the script. I have used start wait/ but it doesn't seem to work. Here is the simple batch file i have created:
#echo off
:start
start /wait C:\Python32\python.exe C:\Users\some_user\workspace\DMS_GUI\CheckCreateAdtf\NewTest.py
goto start
So i want the bat file to run my script, wait for it to finish(even if it takes some hours) and then run it again.
I have also tried creating a bat file that calls with start wait/ the bat file shown above with no success.
Optimally i would like it to keep the window open with all the logging that i have in my script, but that is another issue that can be solved later.
def _open_read_file(self):
logging.debug("Checking txt file with OLD DB-folder sizes")
content = []
with open(self._pathToFileWithDBsize) as f:
content = f.read().splitlines()
for p in content:
name,size = (p.split(","))
self._folder_sizes_dic[name] = size
def _check_DB(self):
logging.debug("Checking current DB size")
skippaths = ['OtherData','Aa','Sss','asss','dss','dddd']
dirlist = [ item for item in os.listdir(self._pathToDBparentFolder) if os.path.isdir(os.path.join(self._pathToDBparentFolder, item)) ]
for skip in skippaths:
if skip in dirlist:
dirlist.remove(skip)
MB=1024*1024.0
for dir in dirlist:
folderPath = self._pathToDBparentFolder +"\\"+str(dir)
fso = com.Dispatch("Scripting.FileSystemObject")
folder = fso.GetFolder(folderPath)
size = str("%.5f"%(folder.Size/MB))
self._DB_folder_sizes_dic[dir] = size
def _compare_fsizes(self):
logging.debug("Comparing sizes between DB and txt file")
for (key, value) in self._DB_folder_sizes_dic.items():
if key in self._folder_sizes_dic:
if (float(self._DB_folder_sizes_dic.get(key)) - float(self._folder_sizes_dic.get(key)) < 100.0 and float(self._DB_folder_sizes_dic.get(key)) - float(self._folder_sizes_dic.get(key)) > -30.0):
pass
else:
self._changed_folders.append(key)
else:
self._changed_folders.append(key)
def _update_file_with_new_folder_sizes(self):
logging.debug("Updating txt file with new DB sizes")
file = open(self._pathToFileWithDBsize,'w')
for key,value in self._DB_folder_sizes_dic.items():
file.write(str(key)+","+str(value)+"\n")
def _create_paths_for_changed_folders(self):
logging.debug("Creating paths to parse for the changed folders")
full_changed_folder_parent_paths = []
for folder in self._changed_folders:
full_changed_folder_parent_paths.append(self._pathToDBparentFolder +"\\"+str(folder))
for p in full_changed_folder_parent_paths:
for path, dirs, files in os.walk(p):
if not dirs:
self._full_paths_to_check_for_adtfs.append(path)
def _find_dat_files_with_no_adtf(self):
logging.debug("Finding files with no adtf txt")
for path in self._full_paths_to_check_for_adtfs:
for path, dirs, files in os.walk(path):
for f in files:
if f.endswith('_AdtfInfo.txt'):
hasAdtfFilename = f.replace('_AdtfInfo.txt', '.dat')
self.hasADTFinfos.add(path + "\\" + hasAdtfFilename)
self.adtf_files = self.adtf_files + 1
elif f.endswith('.dat'):
self.dat_files = self.dat_files + 1
self._dat_file_paths.append(path + "\\" + f)
logging.debug("Checking which files have AdtfInfo.txt, This will take some time depending on the number of .dat files ")
for file in self._dat_file_paths:
if file not in self.hasADTFinfos:
self._dat_with_no_adtf.append(file)
self.files_with_no_adtf = len(self._dat_with_no_adtf)
#self.unique_paths_from_log = set(full_paths_to_check_for_adtfs)
logging.debug("Files found with no adtf " + str(self.files_with_no_adtf))
def _create_adtf_info(self):
logging.debug("Creating Adtf txt for dat files")
files_numbering = 0
for file in self._dat_with_no_adtf:
file_name = str(file)
adtf_file_name_path = file.replace('.dat','_AdtfInfo.txt')
exe_path = r"C:\Users\some_user\Desktop\some.exe "
path_to_dat_file = file_name
path_to_adtf_file = adtf_file_name_path
command_to_subprocess = exe_path + path_to_dat_file + " -d "+ path_to_adtf_file
#Call VisionAdtfInfoToCsv
subprocess.Popen(command_to_subprocess,stdout=subprocess.PIPE,stderr=subprocess.PIPE)
process_response = subprocess.check_output(command_to_subprocess)
#if index0 in response, adtf could not be created because .dat file is probably corrupted
if "index0" in str(process_response):
self._corrupted_files_paths.append(path_to_dat_file)
self._files_corrupted = self._files_corrupted + 1
self._corrupted_file_exist_flag = True
else:
self._files_processed_successfully = self._files_processed_successfully + 1
files_numbering = files_numbering + 1
The functions are called in this order
self._open_read_file()
self._check_DB()
self._compare_fsizes()
self._create_paths_for_changed_folders()
self._find_dat_files_with_no_adtf()
self._create_adtf_info()
self._check_DB()
self._update_file_with_new_folder_sizes()
Ok it seems that the .exe in the script was returning an error and that is why it the script was finishing so fast. I thought that the bat file did not wait. I should have placed the .bat file in the .exe folder and now the whole thing runs perfect.
I used to get songs with the artist and the song name in the file name (for example - "britney spears - oops i did it again".
My script have 2 purposes:
1.add the artist name and song to his MP3's attributes (Using eyed3).
2.create a new folder to the artist in my main music folder (if I already don't have one).
My problem is that if the MP3 file have no attributes, I can't add it new ones..
Here is my code (It's my first one :-))..Thanks!
#That's the 0.2 ver of my code
import os
import shutil
import eyed3.id3
songs_path = raw_input("Please insert the path of your Songs: ")
music_path = raw_input("Please insert the path of your music folders location: ")
#That's function supposed to present the files in a path
def files_in_folder(m):
Files = os.listdir(m)
return Files
mp3_files_list = files_in_folder(downloads_path)
artist_list = files_in_folder(music_path)
for i in mp3_files_list:
song_artist, song_title = i.split(' - ')
if not os.path.exists(music_path + '\\' + song_artist):
os.mkdir(music_path + '\\' + song_artist, 0777 )
src_file = os.path.join(downloads_path, i)
dst_file = os.path.join(music_path + '\\' + song_artist + '\\' + song_title)
print src_file
print dst_file
shutil.move(src_file, dst_file)
track_mp3_file = eyed3.load(dst_file)
if track_mp3_file.tag is None:
track_mp3_file.tag = eyed3.id3.Tag()
track_mp3_file.tag.file_info = eyed3.id3.FileInfo(dst_file)
track_mp3_file.tag.artist = unicode(song_artist, "UTF-8")
print track_mp3_file.tag.artist
track_mp3_file.tag.title = unicode(song_title, "UTF-8")
track_mp3_file.tag.save()
try to save the tag with another id3 version:
track_mp3_file.tag.save(version=(2, 3, 0))
from wikipedia:
Windows Explorer and Windows Media Player cannot handle ID3v2.4 tags
in any version, up to and including Windows 8 / Windows Media Player
12. Windows can understand ID3v2 up to and including version 2.3