Faster way to Unzip jsons files with python

Faster way to Unzip jsons files with python - python

I have a program that downloads directories and then unzip these directories and finally decompress all json's files inside each directory.
I have to download 1260 directories and each directory has 1000 files with 300 MB.
So looks like this:
di1.zip
|_file1.json.gz
...
|_file1000.json.gz
di2.zip
|_file1.json.gz
...
|_file1000.json.gz
....
dir1260.zip
|_file1.json.gz
...
|_file1000.json.gz
This is my code:
def ProcesssDir(dirs_links_file):
with open(dirs_links_file, 'r') as inputFile:
lines = inputFile.readlines()
for line in lines:
#Download
directory = subprocess.Popen("wget -c " + line, shell=True).wait()
#Unzip:
for nameDirZip in glob.glob('*.zip'):
UnzipDir = zipfile.ZipFile(nameDirZip)
UnzipDir.extractall()
nameDir = nameDirZip[:-4] + "/" #This is just to get the name of the new dir.
subprocess.Popen("gunzip -d " + nameDir + "*.gz", shell=True).wait()
This works but very very ..very slow. Took like 20 min for each directory.
How can i do this faster?

Related

Is there a way to stop ffmpeg from combining mp4s at max mp4 size?

I am merging 64 gigs of mp4s together, though ffmpeg will go past the file size limit and corrupt it. Is there a way to stop ffmpeg at a 100 hour mark and create another file, resume, then repeat until finished?
This is my python code, with the ffmpeg code I used to generate the mp4. It works fine with less files/
from moviepy.editor import *
import os
from natsort import natsorted
L = []
total = 0
for root, dirs, files in os.walk("F:\door"):
#files.sort()
files = natsorted(files)
with open("list.txt", "a") as filer:
for file in files:
if os.path.splitext(file)[1] == '.mp4':
filePath = os.path.join(root, file)
head, tail = os.path.split(filePath)
filePath = "file '" + str(tail)+"'\n"
print(filePath)
filer.write(filePath)
#run in cmd: ffmpeg -f concat -i list.txt -c copy output.mp4

OS Directory File Search Not Showing Files Containing Name

I have a bunch of downloaded files that I am trying to move from my downloads to a specific folder, but for some reason when I download the files it gives a random number to the filename at the front (not consistent). So I am searching for any file with the original file name as a part of it, but it's not returning anything. Example below.
files = ["testing_123.pptx","othertype.doc"]
for filename in files:
downloads = r"C:\Users\xx\Downloads"
ls = os.listdir(downloads)
ls2 = [s for s in ls if filename in s]
f = ls2[0]
original = os.path.join(downloads, f)
target = os.path.join(dest, filename)
shutil.move(original, target)
Directory shows:
ls = ["2467231_testing_123.pptx","4234_othertype.doc", .....]
Why is this not pulling anything?

Recursively convert DOS files to UNIX format in Python

I have a function that I use to organize files into folders. However, I would like to be able to have this function also recursively convert the files from DOS to UNIX as part of the routine. I know about the dos2unix command, but I keep getting a syntax error when I try to use it (dos2unix file SyntaxError: invalid syntax). I'm not sure why though.
Here's the function I'm running
def listFiles(aname,nAtoms): #This function organizes the .run files for all atoms into folders
iniPath = os.getcwd()
runfiles = []
folders = []
filenames= os.listdir (".")
#This populates the list runfiles with any files that have a .run extension
for file in glob.glob("*.run"):
dos2unix file
runfiles.append(file)
#This populates the list folders with any folders that are present in the current working directory
for file in filenames:
if os.path.isdir(os.path.join(os.path.abspath("."), file)):
folders.append(file)
#Perform a natural sort of the files and folders (i.e. C1,C2,C3...C10,C11,etc, instead of C1,C10,C11...C2,C3)
natSortKey = natsort_keygen(key=lambda y: y.lower(), alg=ns.IGNORECASE)
runfiles.sort(key = natSortKey)
folders.sort(key = natSortKey)
#This loop moves the files to their respective atom folders and deletes the version in the original directory
i = 1
nf = 0
for i in range(0,nAtoms + 1):
atomDir = aname + str(i)
for item in runfiles:
if item.startswith(atomDir):
if nf >= i*5:
break
else:
shutil.copy(path.join(iniPath,item),atomDir)
os.remove(item)
nf += 1
print("The files are:")
print(runfiles,"\n")
print("The folders are:")
print(folders,"\n")
Any suggestions?
Thanks!

Recursively for all files in the dosdir directory.
import os
os.system("find dosdir -type f -exec dos2unix -u {} \; ");
requires the find and dos2unix system commands, mostly this is true by default.

Uncompyle6 convert pyc to py file python 3 (Whole directory)

I have 200 pyc files I need to convert in a folder. I am aware of converting pyc to py files through uncompyle6 -o . 31.pyc however as I have so many pyc files, this would take a long period of time. I've founds lots of documentation but not much in bulk converting to py files. uncompyle6 -o . *.pyc was not supported.
Any idea on how I can achieve this?

Might not be perfect but it worked great for me.
import os
import uncompyle6
your_directory = ''
for dirpath, b, filenames in os.walk(your_directory):
for filename in filenames:
if not filename.endswith('.pyc'):
continue
filepath = dirpath + '/' + filename
original_filename = filename.split('.')[0]
original_filepath = dirpath + '/' + original_filename + '.py'
with open(original_filepath, 'w') as f:
uncompyle6.decompile_file(filepath, f)

This is natively supported by uncompyle6
uncompyle6 -ro <output_directory> <python_directory>
-r tells the tool to recurse into sub directories.
-o tells the tool to output to the given directory.

In operating systems with shell filename expansion, you might be able to use the shell's file expansion ability. For example:
uncompyle6 -o /tmp/unc6 myfiles/*.pyc
If you need something fancier or more control, you could always write some code that does the fancier expansion. Here is the above done in POSIX shell filtering out the single file myfiles/huge.pyc:
cd myfiles
for pyc in *.pyc; do
if [[ $pyc != huge.pyc ]] ; then
uncompyle -o /tmp/unc $pyc
fi
done
Note: It seems this question was also asked in Issue on output directory while executing commands with windows batch command "FOR /R"

thank you for the code, extending it to recursively call, nested sub directories, save as uncompile.py, in the directory to be converted, to run in command prompt type "python uncomple.py" would convert pyc to py in current working directory, with error handling and if rerun skips (recovery) files checking existing py extension match
import os
import uncompyle6
#Use current working directory
your_directory = os.getcwd()
#function processing current dir
def uncompilepath(mydir):
for dirpath, b, filenames in os.walk(mydir):
for d in b:
folderpath = dirpath + '/' + d
print(folderpath)
#recursive sub dir call
uncompilepath(folderpath)
for filename in filenames:
if not filename.endswith('.pyc'):
continue
filepath = dirpath + '/' + filename
original_filename = filename.split('.')[0]
original_filepath = dirpath + '/' + original_filename + '.py'
#ignore if already uncompiled
if os.path.exists(original_filepath):
continue
with open(original_filepath, 'w') as f:
print(filepath)
#error handling
try:
uncompyle6.decompile_file(filepath, f)
except Exception:
print("Error")
uncompilepath(your_directory)

Read all files in directory and subdirectories in Python

I'm trying to translate this bash line in python:
find /usr/share/applications/ -name "*.desktop" -exec grep -il "player" {} \; | sort | while IFS=$'\n' read APPLI ; do grep -ilqw "video" "$APPLI" && echo "$APPLI" ; done | while IFS=$'\n' read APPLI ; do grep -iql "nodisplay=true" "$APPLI" || echo "$(basename "${APPLI%.*}")" ; done
The result is to show all the videos apps installed in a Ubuntu system.
-> read all the .desktop files in /usr/share/applications/ directory
-> filter the strings "video" "player" to find the video applications
-> filter the string "nodisplay=true" and "audio" to not show audio players and no-gui apps
The result I would like to have is (for example):
kmplayer
smplayer
vlc
xbmc
So, I've tried this code:
import os
import fnmatch
apps = []
for root, dirnames, filenames in os.walk('/usr/share/applications/'):
for dirname in dirnames:
for filename in filenames:
with open('/usr/share/applications/' + dirname + "/" + filename, "r") as auto:
a = auto.read(50000)
if "Player" in a or "Video" in a or "video" in a or "player" in a:
if "NoDisplay=true" not in a or "audio" not in a:
print "OK: ", filename
filename = filename.replace(".desktop", "")
apps.append(filename)
print apps
But I've a problem with the recursive files...
How can I fix it?
Thanks

Looks like you are doing os.walk() loop incorrectly. There is no need for nested dir loop.
Please refer to Python manual for the correct example:
https://docs.python.org/2/library/os.html?highlight=walk#os.walk
for root, dirs, files in os.walk('python/Lib/email'):
for file in files:
with open(os.path.join(root, file), "r") as auto:

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Faster way to Unzip jsons files with python - python

Related

Is there a way to stop ffmpeg from combining mp4s at max mp4 size?

OS Directory File Search Not Showing Files Containing Name

Recursively convert DOS files to UNIX format in Python

Uncompyle6 convert pyc to py file python 3 (Whole directory)

Read all files in directory and subdirectories in Python

Categories

Resources