How to find and change all json.gz files by python - python

I have a folder, which has many sub folders, and in these sub folders is json.gz file with the same name.And these json files are with the same structure.
Such as ...\a\0\b.json.gz
...\a\1\b.josn.gz
...\a\2\b.josn.gz
What I want to do is give the path to .../a then alter a certain value in the json file.
My code is like this:
def getAllSourceFile(folder):
arrSource = []
for root,dirs,files in os.walk(folder):
for file in files:
if file.__str__()=='b.json.gz':
allSourceFile = os.path.join(root,file)
arrSource.append(allSourceFile)
return arrSource
def ModRootFile(rootM_file,Scale):
f=gzip.open(rootM_file,'r')
file_content=f.read()
f.close()
d=file_content.decode('utf8').split(',')
for i in d:
doc=d.get(i)
jsondoc=json.dumps(doc)
jsondict=json.loads(jsondoc)
print(i)
for k in jsondict["2"]:
k["atlas"]="false"
d.save(jsondict)
I'm trying to change "atlas" to "false", it finished without error, or at least without error hint.but nothing changed.
Could some one please tell me what is wrong? Thank you.

Related

Looping over an unknown amount of files to be put in a list

The first part of my project is to create a loop in which it reads an unknown amount of text files. I am confused how to approach this as my past projects I have entered the entry of a file specifically e.g.
for line in open('text1.txt')
How do I make it so if there's e.g. 10 files generated while it checks my code my code will actually read through 10 files? I was thinking of
for line in range(0, _input_ + 1_
for line in open ???
But I have no luck figuring out what to do. Help would be highly appreciated, thanks :D
You can use something like this where ROOTDIR is a parent directory of your files:
# path to parent folder of your files
ROOTDIR = '/home/your_name/parent_folder'
for subdir, dirs, files in os.walk(ROOTDIR):
name = str(subdir).split('/')[-1]
print(subdir)
for f in files:
print('Working on file:', f)
SOURCE = str(subdir) + '/' + str(f)
# load text
text = open(SOURCE, encoding='utf-8').readlines()

limit no of filepaths using os.walk where huge files in a directory

I have an application. One method which will allow a directory path and returns list of file paths under given directory using os.walk
I would like to read certain no of files(some threshold value like bring 20 file paths) in a directory where has huge no files and stores in Queue. Here i can have a check of file path with its status in database.
Next time when i call the same method with same directory, it should return next set of file paths by excluding already returned file paths.
Scenario:
Lets assume, D:/Sample_folder has 1000 no of files.
my_dir = "D:/Sample_folder"
def read_files(directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
file_path = os.path.join(root, filename)
file_paths.append(file_path)
return file_paths
read_files(my_dir) == which will give first 100 no of files in first turn
Next turn, it should give remaining set of 100 files
like so...
Any ideas or sample scripts for this.
Assuming you already have files populated, this should do.
import Queue
paths = Queue.Queue()
current_list = []
for i, path in enumerate(files):
# Second case to make sure we dont add a blank list
if i % 100 == 0 and i != 0:
paths.put(current_list)
current_list = []
current_list.append(path)
EDIT:
Here is a possible solution using a class, but it doesn't add much code. The main idea is to pop off an element each time it gets accessed. So the workflow is to make a FileListIter object, then call .next() on it to return a list of the next 100 files to do something with and then the object forgets them. You can call .has_next() to check if you're out of files. If you pass an argument to next like .next(2), then it will instead give back the first 2 files in a list.
CODE:
import os
class FileListIter(object):
#Initialize the files
def __init__(self,directory):
file_paths = []
for root, directories, files in os.walk(directory):
for filename in files:
file_path = os.path.join(root, filename)
file_paths.append(file_path)
self.files = file_paths
#When called w/out args give back the first 100 files, otherwise the first n
def next(self,n=100):
ret,self.files = self.files[:n],self.files[n:]
return ret
#Check if there are any files left
def has_next(self):
return len(self.files) > 0
d = '/home/rob/stack_overflow'
files_gen = FileListIter(d) #<-- this makes an object
while files_gen.has_next():
file_subset = files_gen.next(2)
print file_subset

Python: Loop through directory, check if certain amount of files is in there, if not; copy 2 files from other directory and one file based on name

I'm still in the learning proces of python.
I'm trying to make a script that does the following:
Loop through directory's based on todays date (so if I run it tomorrow, itll look for the folders with tomorrows date on it).
Check if there are .pdf files in it.
If there arent any .pdf files in them: copy standard 2 of them from another directory + copy one based on name of the excel file name. (So lets say the excel filed is called: Excelfile45 then it should copy the pdf file called: "45") EDIT: It can also be based on directory map if that is an easier way of doing things.
So this is I got this far:
import os, fnmatch
def findDir (path, filter):
for root, dirs, files in os.walk(path):
for file in fnmatch.filter(files, filter):
yield os.path.join(root, file)
for pdfFile in findDir(r'C:\new', '*.pdf'):
print(pdfFile)
Its runs through the directories and looks for PDF's in them. But now I've got no clue on how to continue.
Any help is greatly appreciated!
Also my apologies for any grammar / spelling errors.
Your specs are pretty vague, so I had to assume a lot of things. I think this code achieves what you want, but you may have to tweak it a bit (for example date format in the directory name).
I assumed a directory structure like this:
c:\new (base dir)
daily_2014_12_14
daily_2014_12_15
...
standard
And the code:
import os
import fnmatch
import time
import shutil
import re
# directories
base_dir = "C:\new"
standard_dir = os.path.join(base_dir, "standard")
# find files in directory. based on yours, but modified to return a list.
def find_dir(path, name_filter):
result = []
for root, dirs, files in os.walk(path):
for filename in fnmatch.filter(files, name_filter):
result.append(os.path.join(root, filename))
return result
# getting today's directory. you can rearrange year-month-day as you want.
def todays_dir():
date_str = time.strftime("%Y_%m_%d")
return os.path.join(base_dir, "daily_" + date_str)
# copy a file from one directory to another
def copy(filename, from_dir, to_dir):
from_file = os.path.join(from_dir, filename)
to_file = os.path.join(to_dir, filename)
shutil.copyfile(from_file, to_file)
# main logic
today_dir = todays_dir()
pdfs = find_dir(today_dir, "*.pdf")
excels = find_dir(today_dir, "*.xls")
if len(pdfs) == 0:
copy("standard1.pdf", standard_dir, today_dir)
copy("standard2.pdf", standard_dir, today_dir)
if len(excels) == 1:
excel_name = os.path.splitext(excels[0])[0]
excel_num = re.findall(r"\d+", excel_name)[-1]
copy(excel_num + ".pdf", standard_dir, today_dir)
Also: I agree with Iplodman's comment. Show us a bit more effort next time.

Concatenating multiple files through multiple folders

Im trying to create a single file out of multiple text files I have across multiple folders. This is my code for concatenating. It works only if the program file is placed in each folder:
import os
file_list = [each for each in cur_folder if each.endswith(".txt")]
print file_list
align_file = open("all_the_files.txt","w")
seq_list = []
for each_file in file_list:
f_o = open(file_path,"r")
seq = (f_o.read().replace("\n",""))
lnth = len(seq)
wholeseq = ">"+each_file+" | "+str(lnth)+" nt\n"+seq+"\n"
align_file.write(wholeseq)
print "done"
Now I tried to edit to make sure that it automatically runs through the entire Data folder and then enters the subdirectories and concatenates all the files without me having to paste the program file in each folder. This is the edit.
import os
dir_folder = os.listdir("C:\Users\GAMER\Desktop\Data")
for each in dir_folder:
cur_folder = os.listdir("C:\\Users\\GAMER\\Desktop\\Data\\"+each)
file_list = []
file_list = [each for each in cur_folder if each.endswith(".txt")]
print file_list
align_file = open("all_the_files.txt","w")
seq_list = []
for each_file in file_list:
f_o = open(file_path,"r")
seq = (f_o.read().replace("\n",""))
lnth = len(seq)
wholeseq = ">"+each_file+" | "+str(lnth)+" nt\n"+seq+"\n"
align_file.write(wholeseq)
print "done" , cur_folder
However when I run this , I get an error on the first file of the folder saying no such file exists. I can seem to understand why, specifically since it names the file which is not "hardcoded". Any help will be appreciated.
If the code looks ugly to you feel free to suggested better ways to do it.
Jamie is correct - os.walk is most likely the function you need.
An example based on your use case:
for root, dirs, files in os.walk(r"C:\Users\GAMER\Desktop\Data"):
for f in files:
if f.endswith('.txt'):
print(f)
This will print the name of every single file within every folder within the root directory passed in os.walk, as long as the filename ends in .txt.
Python's documentation is here: https://docs.python.org/2/library/os.html#os.walk

Python: Files stop being opened at a certain point

I've written the following program in Python:
import re
import os
import string
folder = 'C:\Users\Jodie\Documents\Uni\Final Year Project\All Data'
folderlisting = os.listdir(folder)
for eachfolder in folderlisting:
print eachfolder
if os.path.isdir(folder + '\\' + eachfolder):
filelisting = os.listdir('C:\Users\Jodie\Documents\Uni\Final Year Project\All Data\\' + eachfolder)
print filelisting
for eachfile in filelisting:
if re.search('.genbank.txt$', eachfile):
genbankfile = open(eachfile, 'r')
print genbankfile
if re.search('.alleles.txt$', eachfile):
allelesfile = open(eachfile, 'r')
print allelesfile
It looks through a lot of folders, and prints the following:
The name of each folder, without the path
A list of all files in each folder
Two specific files in each folder (Any files containing ".genbank.txt" and ".alleles.txt").
The code works until it reaches a certain directory, and then fails with the following error:
Traceback (most recent call last):
File "C:\Users\Jodie\Documents\Uni\Final Year Project\Altering Frequency Practice\Change_freq_data.py", line 16, in <module>
genbankfile = open(eachfile, 'r')
IOError: [Errno 2] No such file or directory: 'ABP1.genbank.txt'
The problem is:
That file most definitely exists, since the program lists it before it tries to open the file.
Even if I take that directory out of the original group of directories, the program throws up the same error for the next folder it iterates to. And the next, if that one is removed. And so on.
This makes me think that it's not the folder or any files in it, but some limitation of Python? I have no idea. It has stumped me.
Any help would be appreciated.
You should use os.walk() http://docs.python.org/library/os.html#os.walk
Also, you need to read the contents of the file, you don't want to print the file object. And you need to close the file when you're done or use a context manager to close it for you.
would look something like:
for root, dirs, files in os.walk(folder):
for file_name in files:
if re.search('.genbank.txt$', file_name) or \
re.search('.alleles.txt$', file_name):
with open(os.path.join(root, f), 'r') as f:
print f.read()
Keep in mind this is not 'exactly' what you're doing, this will walk the entire tree, you may just want to walk a single level like you are already doing.

Categories

Resources