Read multiple files present in 2 folders

Read multiple files present in 2 folders - python

I have 7 vcf files present in 2 directories:
dir
I want to concatenate all files present on both folders and then read them through python.
I am trying this code:
# Import Modules
import os
import pandas as pd
import vcf
# Folder Path
path1 = "C://Users//USER//Desktop//Anas/VCFs_1/"
path2 = "C://Users//USER//Desktop//Anas/VCFs_2/"
#os.chdir(path1)
def read(f1,f2):
reader = vcf.Reader(open(f1,f2))
df = pd.DataFrame([vars(r) for r in reader])
out = df.merge(pd.DataFrame(df.INFO.tolist()),
left_index=True, right_index=True)
return out
# Read text File
def read_text_file(file_path1,file_path2):
with open(file_path1, 'r') as f:
with open(file_path2,'r') as f:
print(read(path1,path2))
# iterate through all file
for file in os.listdir():
# Check whether file is in text format or not
if file.endswith(".vcf"):
file_path1 = f"{path1}\{file}"
file_path2 = f"{path2}\{file}"
print(file_path1,"\n\n",file_path2)
# call read text file function
#data = read_text_file(path1,path2)
print(read_text_file(path1,path2))
But its giving me permission error. I know when we try to read folders instead files then we get this error. But how can i read files present in folders? Any suggestion?

You may need to run your Python code with Administrator privileges, if you are trying to access another user's files.

Related

How to read pairwise csv and json files having same names inside a folder using python?

Consider my folder structure having files in these fashion:-
abc.csv
abc.json
bcd.csv
bcd.json
efg.csv
efg.json
and so on i.e. a pair of csv files and json files having the same names, i have to perform the same operation by reading the same named files , do some operation and proceed to the next pair of files. How do i go about this?
Basically what i have in mind as a pseudo code is:-
for files in folder_name:
df_csv=pd.read_csv('abc.csv')
df_json=pd.read_json('abc.json')
# some script to execute
#now read the next pair and repeat for all files

Did you think of something like this?
import os
# collects filenames in the folder
filelist = os.listdir()
# iterates through filenames in the folder
for file in filelist:
# pairs the .csv files with the .json files
if file.endswith(".csv"):
with open(file) as csv_file:
pre, ext = os.path.splitext(file)
secondfile = pre + ".json"
with open(secondfile) as json_file:
# do something

You can use the glob module to extract the file names matching a pattern:
import glob
import os.path
for csvfile in glob.iglob('*.csv'):
jsonfile = csvfile[:-3] + 'json'
# optionaly control file existence
if not os.path.exists(jsonfile):
# show error message
...
continue
# do smth. with csvfile
# do smth. else with jsonfile
# and proceed to next pair

If the directory structure is consistent you could do the following:
import os
for f_name in {x.split('.')[0] for x in os.listdir('./path/to/dir')}:
df_csv = pd.read_csv("{f_name}.csv")
df_json = pd.read_json("{f_name}.json")
# execute the rest

How to recursively generate and save csv files to a specific directory?

I am working on a code that gets as input a zip file that contains excel files, extract them in a folder, convert them in dataframes and load all these dataframes files in a list. I would like to create a new folder, convert those dataframes in csv files and save them in the above-mentioned folder. The goal is to be able to download as a zip file a folder of csv files.
The main problem for me is to make sure that every csv file has the name of the excel file it was originated from.
I'm adding my code, in the first block there's the first part of the code, while in the second one there's the part of the code in which i have a problem.
running this last part of the code i get this error:
"XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\xef\xbb\xbf;data'"
%%capture
import os
import numpy as np
import pandas as pd
import glob
import os.path
!pip install xlrd==1.2.0
from google.colab import files
uploaded = files.upload()
%%capture
zipname = list(uploaded.keys())[0]
destination_path = 'files'
infolder = os.path.join('/content/', destination_path)
!unzip -o $zipname -d $destination_path
# Load an excel file, return events dataframe + file header dataframe
def load_xlsx(fullpath):
return events, meta
tasks = [os.path.join(dp, fname) for dp, dn, filenames in os.walk(infolder) for fname in filenames if fname.lower().endswith('.xls')]
dfs = []
metas = []
for fname in tasks:
df, meta = load_xlsx(fname)
dfs.append(df)
metas.append(meta)
newpath = 'csv2021'
if not os.path.exists(newpath):
os.makedirs(newpath)
filepath = os.path.join('/content/files/', newpath)
for fname in tasks:
filename = load_xlsx(fname)
my_csv = filename.to_csv(os.path.join(filepath, filename), encoding="utf-8-sig" , sep = ';')

Python how to open multiple json files in a directory

I am trying to open a directory in which there are multiple json files, to then make a data frame of the data in each of them, I try this:
for file in os.listdir('Datasets/'):
json_read = pd.read_json(file)
However, it gives me an error:
ValueError: Expected object or value
As I inspect the type of the files, it says they are class str. When opening a single file in the directory with read_json it does work correctly as the file is recognized as json. I am not quite sure why the files are turned into strings nor how to solve it. Do you have any tips?
Thanks in advance!

import os
import pandas as pd
base_dir = '/path/to/dir'
#Get all files in the directory
data_list = []
for file in os.listdir(base_dir):
#If file is a json, construct it's full path and open it, append all json data to list
if file.endswith('json'):
json_path = os.path.join(base_dir, file)
json_data = pd.read_json(json_path, lines=True)
data_list.append(json_data)
print(data_list)

You probably need to build a list of DataFrames. You may not be able to process every file in the given directory so try this:
import pandas as pd
from glob import glob
from os.path import join
BASEDIR = 'Datasets'
dataframes = []
for file in glob(join(BASEDIR, '*.json')):
try:
dataframes.append(pd.read_json(file))
except ValueError:
print(f'Unable to process {file}')
print(f'Successfully constructed {len(dataframes)} dataframes')

import os
import json
#heres some information about get list of file in a folder: <https://www.geeksforgeeks.org/python-list-files-in-a-directory/>
#heres some information about how to open a json file: <https://www.geeksforgeeks.org/read-json-file-using-python/>
path = "./data"
file_list = os.listdir(path) #opens the directory form path and gets name of file
for i in range(len(file_list)): #loop the list with index number
current = open(path+"/"+file_list[i]) #opens the file in order by current index number
data = json.load(current) #loads the data form file
for k in data['01']: #
print(k)
output
main.json :
{'name': 'Nava', 'Surname': 'No need for to that'}
data.json :
{'name': 'Nava', 'watchs': 'Anime'}
heres a link to run code online https://replit.com/#Nava10Y/opening-open-multiple-json-files-in-a-directory

Using Python Pandas, read multiple folder paths written in xlsx file and process each csv file separately

I have an excel file with the name F_Path.xlsx listing the folder paths like below:
path = "C:/Users/Axel/Documents/Work/F_Path.xlsx"
df_input = pd.read_excel(path1, sheet_name=0) #reading the excel file
folder_path = list(df_input['Folder Path']
path_csv = #1st csv file from C:/Users/Axel/Documents/Work/Folder_1, then 2nd read in for loop, but don't know how.. once all the csv are read from Folder_1, it has to read folder_path[1 to n] and read all the csv files and process it separately.
.
.
.
.
.
df = pd.read_csv(path_csv) # read all the *.csv file one by one and process each df separately.
#process the data

Try the following:
# you'll need to import os
import os
# loop your folders
for folder in folder_path:
# get the csvs in that folder
csv_files = os.listdir(folder)
# loop the csvs
for csvfile in csv_files:
df = pd.read_csv(os.path.join(folder, csvfile))
# do your processing here

Changing the name of multiple files based on an index

I currently have several hundred pdf files with file names that I would like to change.
The current names of the files don't really follow a pattern, however I have a Excel file with what the current file name is and what the new file name is that I want for a specific file. This looks similar to this:
I am looking for a way in python to rename all of my files (about 500) according to my excel index.
What I tried:
import os
path = 'C:\\Users\\Desktop\\Project\\'
files = os.listdir(path)
for file in files:
os.rename(os.path.join(path, file), os.path.join(path, '00' + file + '.pdf'))
Thanks.

If you can save the excel file as csv, this should work
import os
import csv
path = 'C:\\Users\\Desktop\\Project\\'
with open('my_csv.csv') as f:
reader = csv.reader(f)
next(reader) # Skip first row
for line in reader:
src = os.path.join(path, line[0])
dest = os.path.join(path, line[1])
os.rename(src, dest)

You are really close!
You need to iterate over the names in your xlsx file. One simple way is to load the data using pandas.read_excel and finally iterate over the source and dest column and renaming the file.
You can use os.pth.join to create the full path from a given folder and a given file.
Here the code:
# Import module
import os # Rename file
import pandas as pd # read csv
# Your different folders
path_folder = r'C:\Users\Desktop\Project'
path_csv = r'C:\Users\Desktop\Project\csv_file.xlsx'
# Load data
df = pd.read_excel(path_csv)
print(df)
# Current file name Desired file name
# 0 a.pdf 001.pdf
# 1 b.pdf 002.pdf
# 2 c.pdf 003.pdf
# Iterate over each row of the dataframe
for old_name, new_name in zip(df["Current file name"], df["Desired file name"]):
# Create source path and destination path
source_file = os.path.join(path_folder, old_name)
dest_file = os.path.join(path_folder, new_name)
# Rename the current file using the source path (old name)
# and the destination path (new name)
os.rename(source_file, dest_file )
Excel file used:
Hope that helps !

Provided you have table with names, you can use the following code:
import os
names = '''a.pdf 001.pdf
b.pdf 002.pdf
c.pdf 003.pdf'''
os.chdir(r'C:\Users\Desktop\Project')
for line in names.splitlines(False):
old, new = line.split()
os.rename(old, new)
You can copy table from Excel to this piece of code
If you don't care about table, you can try
import os
from itertools import count
numbers = count(1)
os.chdir(r'C:\Users\Desktop\Project')
for old in os.listdir('.'):
if not old.endswith('.pdf'):
continue
new = '%03d.pdf' % next(numbers)
os.rename(old, new)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read multiple files present in 2 folders - python

You may need to run your Python code with Administrator privileges, if you are trying to access another user's files.

Related

How to read pairwise csv and json files having same names inside a folder using python?

How to recursively generate and save csv files to a specific directory?

Python how to open multiple json files in a directory

Using Python Pandas, read multiple folder paths written in xlsx file and process each csv file separately

Changing the name of multiple files based on an index

Categories

Resources