I have multiple pickle files with the same format in one folder called pickle_files:
1_my_work.pkl
2_my_work.pkl
...
125_my_work.pkl
How would I go about loading those files into the workspace, without having to do it one file at a time?
Thank you!
Loop over the files and save the data in a structure, for example a dictionary:
# Imports
import pickle
import os
# Folder containing your files
directory = 'C://folder'
# Create empty dictionary to save data
data = {}
# Loop over files and read pickles
for file in os.listdir(directory):
if file.endswith('.pkl'):
with open(file, 'rb') as f:
data[file.split('.')[0]] = pickle.load(f)
# Now you can print 1_my_work
print(data['1_my_work'])
Related
Consider my folder structure having files in these fashion:-
abc.csv
abc.json
bcd.csv
bcd.json
efg.csv
efg.json
and so on i.e. a pair of csv files and json files having the same names, i have to perform the same operation by reading the same named files , do some operation and proceed to the next pair of files. How do i go about this?
Basically what i have in mind as a pseudo code is:-
for files in folder_name:
df_csv=pd.read_csv('abc.csv')
df_json=pd.read_json('abc.json')
# some script to execute
#now read the next pair and repeat for all files
Did you think of something like this?
import os
# collects filenames in the folder
filelist = os.listdir()
# iterates through filenames in the folder
for file in filelist:
# pairs the .csv files with the .json files
if file.endswith(".csv"):
with open(file) as csv_file:
pre, ext = os.path.splitext(file)
secondfile = pre + ".json"
with open(secondfile) as json_file:
# do something
You can use the glob module to extract the file names matching a pattern:
import glob
import os.path
for csvfile in glob.iglob('*.csv'):
jsonfile = csvfile[:-3] + 'json'
# optionaly control file existence
if not os.path.exists(jsonfile):
# show error message
...
continue
# do smth. with csvfile
# do smth. else with jsonfile
# and proceed to next pair
If the directory structure is consistent you could do the following:
import os
for f_name in {x.split('.')[0] for x in os.listdir('./path/to/dir')}:
df_csv = pd.read_csv("{f_name}.csv")
df_json = pd.read_json("{f_name}.json")
# execute the rest
I have 7 vcf files present in 2 directories:
dir
I want to concatenate all files present on both folders and then read them through python.
I am trying this code:
# Import Modules
import os
import pandas as pd
import vcf
# Folder Path
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/"
path2 = "C://Users//USER//Desktop//Anas/VCFs_2/"
#os.chdir(path1)
def read(f1,f2):
reader = vcf.Reader(open(f1,f2))
df = pd.DataFrame([vars(r) for r in reader])
out = df.merge(pd.DataFrame(df.INFO.tolist()),
left_index=True, right_index=True)
return out
# Read text File
def read_text_file(file_path1,file_path2):
with open(file_path1, 'r') as f:
with open(file_path2,'r') as f:
print(read(path1,path2))
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".vcf"):
file_path1 = f"{path1}\{file}"
file_path2 = f"{path2}\{file}"
print(file_path1,"\n\n",file_path2)
# call read text file function
#data = read_text_file(path1,path2)
print(read_text_file(path1,path2))
But its giving me permission error. I know when we try to read folders instead files then we get this error. But how can i read files present in folders? Any suggestion?
You may need to run your Python code with Administrator privileges, if you are trying to access another user's files.
I have a file path like this '/mnt/extract'. Now inside this extract folder, I have below 3 more subfolders -
subfolder1
subfolder2
subfolder3 (it has one .json file inside it)
The json in subfolder3 looks like this -
{
"x": "/mnt/extract/p",
"y": "/mnt/extract/r",
}
I want to extract the above json file from subfolder3 and concatenate the value - /mnt/extract/p for the key 'x' with one more string 'data' so that the final path will become '/mnt/extract/p/data' where I want to finally export some data. I tried the below approach but it's not working.
import os
for root, dirs, files in list(os.walk(path)):
for name in files:
print (os.path.join(root, name))
Using the in-built python Glob module, you can read files in folders and sub-folders.
Try this:
import glob
files = glob.glob('./mnt/extract/**/*.json', recursive=True)
The files list will contain paths to all json files in the extract directory.
Try this:
import glob
final_paths = []
extract_path= './mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
for file in files:
with open(file, 'r') as f:
json_file = json.load(f)
output_path = json_file['x']+'/'+'data'
final_paths.append(output_path)
The final_path variable will contain the output of all json files in the folder structure.
import glob
extract_path= '/mnt/extract'
files = glob.glob(extract_path+ '/**/*.json', recursive=True)
if len(files) != 0:
with open(files[0], 'r') as f:
dict = json.load(f)
final_output_path = dict['x']+'/'+'data'
In the above code, files object is returning a list containing JSON file as the only element. To make sure that we pass json object to the open method and not list, i took files[0] which will pick the json file element from list and then it was parsed easily.If anyone has some other suggestion to handle this list object which is retuning from glob function, feel free to answer as in how can we handle it in a more cleaner way.
For a data challenge at school we need to open a lot of json files with python. There are too many to open manually. Is there a way to open them with a for loop?
This is the way I open one of the json files and make it a dataframe (it works).
file_2016091718 = '/Users/thijseekelaar/Downloads/airlines_complete/airlines-1474121577751.json'
json_2016091718 = pd.read_json(file_2016091718, lines=True)
Here is a screenshot of how the map where the data is in looks (click here)
Yes, you can use os.listdir to list all the json files in your directory, create the full path for all of them and use the full path using os.path.join to open the json file
import os
import pandas as pd
base_dir = '/Users/thijseekelaar/Downloads/airlines_complete'
#Get all files in the directory
data_list = []
for file in os.listdir(base_dir):
#If file is a json, construct it's full path and open it, append all json data to list
if 'json' in file:
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
data_list.append(json_data)
print(data_list)
Try this :
import os
# not sure about the order
for root, subdirs, files in os.walk('your/json/dir/'):
for file in files:
with open(file, 'r'):
#your stuff here
How can i extract files from directory that exist in Zip Archive,i uploaded zip archive from form(written in HTML),now if the Zip Archive contains folders i can't extract the files in side this folder,this is a snippet from my code:
form = cgi.FieldStorage()
file_upload = form['file']
zfile=zipfile.ZipFile(file_upload.file,"r")
files_zip=zfile.namelist()
for name in files_zip:
print name
if name.endswith('/'):
print "yes"
l=list()
l=os.listdir(name)
print l
EDIT:
I tried to use StringIO() as:
s=StringIO(file_upload)
f=s.getvalue()
with zipfile.ZipFile(f,'r')as z:
for d in z.namelist():
print "%s: %s"%(d, z.read(d))
but the problem of the second snippet of code is:
No such file or directory: "FieldStorage('file', 'test.zip')
,i want to extract thse files to add them to GAE BlobStore??
Thanks in advance.
There's a working example of how to do this in appengine-mapreduce.
Look at input_readers.py for BlobstoreZipInputReader (which starts at line 898 at the moment).
I don't understand why you are using os.listdir to list files inside zip data, you should just go thru names and extract data, here is a example where I create a in-memory zip file and extract files, even in a folder e.g.
from zipfile import ZipFile
from StringIO import StringIO
# first lets create a zip file with folders to simulate data coming from user
f = StringIO()
with ZipFile(f, 'w') as z:
z.writestr('1.txt', "data of file 1")
z.writestr('folder1/2.txt', "data of file 2")
zipdata = f.getvalue()
# try to read zipped data containing folders
f = StringIO(zipdata)
with ZipFile(f, 'r') as z:
for name in z.namelist():
print "%s: %s"%(name, z.read(name))
output:
1.txt: data of file 1
folder1/2.txt: data of file 2
As appengine doesn't allow writing to file system you will need to read file data (explained aboce) and dump it to blobs, you can just have a simple structure of name and data, but in you local OS you can try z.extractall() and it will create whole folder structure and files.