I have a folder named "Photos" that contains several images. I am using Glob to list all these images along with their full directory paths. I can print the list and see the full list of paths, however, I am now struggling to export this list into a CSV with a single column. My code is as follows:
Import glob
for file in glob.glob(r"C:\Users\myself\Photos*"):
print(file)
Normally I would use Pandas to read CSVs by putting them into a dataframe, but for a glob list I am struggling
appreciate any guidance or help
You're close. Use this :
import glob
import pandas as pd
list_of_pictures = []
for file in glob.glob(r"C:\Users\myself\Photos\*"):
list_of_pictures.append(file)
pd.DataFrame(list_of_pictures).to_csv(r'path&name_of_your_csvfile.csv', index=False, header=None)
Or with pathlib :
from pathlib import Path
import pandas as pd
list_of_pictures=[]
for file in Path(r'C:\Users\myself\Photos').glob('**/*'):
list_of_pictures.append(str(file.absolute()))
pd.DataFrame(list_of_pictures).to_csv(r'path&name_of_your_csvfile.csv', index=False, header=None)
Related
I have a folder which contain files like: ANGOSTURA_U1_20220901.csv ,ANGOSTURA_U1_20220902.csv, ANGOSTURA_U1_20220903.csv
I want to read all files and concatenated in one csv and print this cocatenated df as ANGOSTURA_U1_202209_month.csv
Take into consideration that these files can be called Colbun_U1_20220801.csv, Colbun_U1_20220802.csv, Colbun_U1_20220803.csv , but I want the file name output always be the first name and the date. In this case it would be : Colbun_U1_202208_month.csv if the files are ANGOSTURA_U1_XXXX01.csv output file name: ANGOSTURA_U1_XXXX_month.csv, if the files are Colbun_U2_XXXX01.csv output file name: Colbun_U2_XXXX_month.csv but it always be in the folder either Colbun or Angostura not both
This is my code: (i try os.listdir and glob.glob)
import pandas as pd
import numpy as np
import glob
import os
import csv
all_files = glob.glob("C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/saturnmensual/*.csv")
file_list = []
for f in (all_files):
data = pd.read_csv(f,usecols=["t","f"])
file_list.append(data)
df=pd.concat(file_list,ignore_index=True)
df.to_csv(f'C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/Saturn2mensual/{os.path.basename(f).split(".")[0]}_mensual.csv')
You could try the following:
from itertools import groupby
from pathlib import Path
def key(file_path): return file_path.stem[:-2]
base = Path("C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/saturnmensual/")
all_files = sorted(base.glob("*.csv"))
for key, files in groupby(all_files, key=key):
pd.concat(
[pd.read_csv(file, usecols=["t", "f"]) for file in files]
).to_csv(base / f"{key}_month.csv", index=False)
Use pathlib from the standard library instead of os: set the base path to the folder that contains the CSV-files.
glob all CSV-files in it and sort them into a list all_files.
Now group the files into monthly buckets with groupby from the standard library module itertools. The grouping key is the file name without the extension and the last to characters (the days according to your specification).
Then concat all the dataframes from one month and write the new dataframe to a new CSV-file.
I currently have several csv files in a folder. I am wanting to use Python to loop over the files in the folder and make small changes to each csv file. Please see my code below which is not currently working:
import os
import pandas as pd
folder_to_view = "C:/path"
for file in os.listdir(folder_to_view):
df = pd.read_csv(file)
df.columns = ['Location','Subscriber','Speed','IP','Start','End','Bytes','Test Status','Comment']
df.to_csv(file, index=False)
I imagine that the issue is not forming the path correctly as the renaming of columns should be fine. os.listdir() returns a list of the files within that directory without the directory name prepended, so try this:
import os
import pandas as pd
folder_to_view = "C:/path"
for file in os.listdir(folder_to_view):
full_path = f'{folder_to_view}/{file}'
df = pd.read_csv(full_path)
df.columns = ['Location','Subscriber','Speed','IP','Start','End','Bytes','Test Status','Comment']
df.to_csv(full_path, index=False)
I found here how to import multiple text files to one data frame. However, it gives an error. Files are with the names as footballseason1,footballseason2,footballseason3 ... (until footballseason5000)
import pandas as pd
import datetime as dt
import os, glob
os.chdir("~/Downloads/data")
filenames = [i for i in glob.glob("*.txt")]
FileNotFoundError: [Errno 2] No such file or directory: '~/Downloads/data'
However, if I try to import one file, everything is working and the directory is found
df = pd.read_csv("~/Downloads/data/footballseason1.txt", sep=",")
Could you help to fix the problem? and are there any ways to do it without changing directory and simply do all the steps using the path where all files are located?
Python's os does not understand ~ by default, so it needs to be expanded manually:
filenames = [i for i in glob.glob(os.path.expanduser("~/Downloads/data/*.txt"))]
You can use python's list comprehension and pd.concat like below
df = pd.concat([pd.read_csv(i, sep=',') for i in glob.glob("~/Downloads/data/*.txt", recursive=True)])
Via pathlib ->
import pandas as pd
from pathlib import Path
inp_path = Path("~/Downloads/data")
df = pd.concat([
pd.read_csv(txt_file, sep=',') for txt_file in inp_path.glob('*.txt')
])
With added check - >
import pandas as pd
from pathlib import Path
inp_path = Path("~/Downloads/data")
if inp_path.exists():
df = pd.concat([
pd.read_csv(txt_file, sep=',') for txt_file in inp_path.glob('*.txt')
])
else:
print('input dir doesn\'t exist please check path')
Importing Data from Multiple files
Now let’s see how can we import data from multiple files from a specific directory. There are many ways to do so, but I personally believe this is an easier and simpler way to use and also to understand especially for beginners.
1)First, we are going to import the OS and glob libraries. We need them to navigate through different working directories and getting their paths.
import os
import glob
2) We also need to import the pandas library as we need to work with data frames.
import pandas as pd
3) Let’s change our working directory to the directory where we have all the data files.
os.chdir(r"C:\Users\HARISH\Path_for_our_files")
4) Now we need to create a for loop which iterates through all the .csv file in the current working directory
filenames = [i for i in glob.glob("*.csv")]
I would like to automatically import all csv files that are in one folder as dataframes and set the dataframe's variable name to the respective filename.
For example, in the folder are the following three files: data1.csv, data2.csv and data3.csv
How can I automatically import all three files having three dataframes (data1, data2 and data3) as the result?
If you want to save dataframe as variable with own file name. But it is not secure. This could cause code injection.
import pandas
import os
path = "path_of_directory"
files = os.listdir(path) # Returns list of files in the folder which is specifed path
for file in files:
if file.endswith(".csv"):# Checking wheter file endswith .csv
# os.sep returns the separtor of operator system
exec(f"{file[:-4]} = pandas.read_csv({path}+{os.sep}+{file})")
You can loop over the directory using pathlib and build a dictionary of name->DataFrame, eg:
import pathlib
import pandas as pd
dfs = {path.stem: pd.read_csv(path) for path in pathlib.Path('thepath/').glob(*.csv')}
Then access as dfs['test1'] etc...
Since the answer that was given includes an exec command, and munir.aygun already warned you what could go wrong with that approach. Now I want to show you the way to do it as Justin Ezequiel or munir.aygun already suggested:
import os
import glob
import pandas as pd
# Path to your data
path = r'D:\This\is\your\path'
# Get all .csv files at your path
allFiles = glob.glob(path + "/*.csv")
# Read in the data from files and safe to dictionary
dataStorage = {}
for filename in allFiles:
name = os.path.basename(filename).split(".")[0]
dataStorage[name] = pd.read_csv(filename)
# Can be used then like this (for printing here)
if "data1" in dataStorage:
print(dataStorage["data1"])
Hope this can still be helpful.
I hope this is not trivial but I am wondering the following:
If I have a specific folder with n csv files, how could I iteratively read all of them, one at a time, and perform some calculations on their values?
For a single file, for example, I do something like this and perform some calculations on the x array:
import csv
import os
directoryPath=raw_input('Directory path for native csv file: ')
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2] #Creates the array that will undergo a set of calculations
I know that I can check how many csv files there are in a given folder (check here):
import glob
for files in glob.glob("*.csv"):
print files
But I failed to figure out how to possibly nest the numpy.genfromtxt() function in a for loop, so that I read in all the csv files of a directory that it is up to me to specify.
EDIT
The folder I have only has jpg and csv files. The latter are named eventX.csv, where X ranges from 1 to 50. The for loop I am referring to should therefore consider the file names the way they are.
That's how I'd do it:
import os
directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file, 'r')
# perform calculation
f.close()
Using pandas and glob as the base packages
import glob
import pandas as pd
glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
x = pd.read_csv(file_name, low_memory=False)
glued_data = pd.concat([glued_data,x],axis=0)
I think you look for something like this
import glob
for file_name in glob.glob(directoryPath+'*.csv'):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
Edit
If you want to get all csv files from a folder (including subfolder) you could use subprocess instead of glob (note that this code only works on linux systems)
import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]
for i,file_name in enumerate(file_list):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
# now you can use i as an index
It first searches the folder and sub-folders for all file_names using the find command from the shell and applies your calculations afterwards.
According to the documentation of numpy.genfromtxt(), the first argument can be a
File, filename, or generator to read.
That would mean that you could write a generator that yields the lines of all the files like this:
def csv_merge_generator(pattern):
for file in glob.glob(pattern):
for line in file:
yield line
# then using it like this
numpy.genfromtxt(csv_merge_generator('*.csv'))
should work. (I do not have numpy installed, so cannot test easily)
Here's a more succinct way to do this, given some path = "/path/to/dir/".
import glob
import pandas as pd
pd.concat([pd.read_csv(f) for f in glob.glob(path+'*.csv')])
Then you can apply your calculation to the whole dataset, or, if you want to apply it one by one:
pd.concat([process(pd.read_csv(f)) for f in glob.glob(path+'*.csv')])
The function below will return a dictionary containing a dataframe for each .csv file in the folder within your defined path.
import pandas as pd
import glob
import os
import ntpath
def panda_read_csv(path):
pd_csv_dict = {}
csv_files = glob.glob(os.path.join(path, "*.csv"))
for csv_file in csv_files:
file_name = ntpath.basename(csv_file)
pd_csv_dict['pd_' + file_name] = pd.read_csv(csv_file, sep=";", encoding='mac_roman')
locals().update(pd_csv_dict)
return pd_csv_dict
You can use pathlib glob functionality to list all .csv in a path, and pandas to read them.
Then it's only a matter of applying whatever function you want (which, if systematic, can also be done within the list comprehension)
import pands as pd
from pathlib import Path
path2csv = Path("/your/path/")
csvlist = path2csv.glob("*.csv")
csvs = [pd.read_csv(g) for g in csvlist ]
Another answer using list comprehension:
from os import listdir
files= [f for f in listdir("./") if f.endswith(".csv")]
You need to import the glob library and then use it like following:
import glob
path='C:\\Users\\Admin\\PycharmProjects\\db_conection_screenshot\\seclectors_absent_images'
filenames = glob.glob(path + "\*.png")
print(len(filenames))