I need to read certain csv files which don't have a certain name it is alreadry in a folder which I define in the directory glob.glob, select two columns which I specify and print these csv one by one
This is my code:
import pandas as pd;
import numpy as np;
import glob;
import os;
all_files = glob.glob("C:/Users/Gamer/Documents/Colbun/Saturn/*.csv");
file_list = [];
for f in all_files:;
df = pd.read_csv(f,header=0,usecols=["t","f"]);
df.to_csv(file_list);
Reed 3(or more) csv and print 3 csv (or more)
This will save to a file not print.
file_list = []
for i,f in enumerate(all_files):
df = pd.read_csv(f,header=0,usecols=["t","f"])
df.to_csv(f'filename{n}.csv')
file_list.append(f'filename{i}.csv') #??
Incorrect use of df.to_csv, the function takes in path object (implementing os.PathLike[str]). I would suggest appending the df in file_list inside the loop. Refer: df.to_csv function on how to use the save csv function.
Related
I have a folder which contain files like: ANGOSTURA_U1_20220901.csv ,ANGOSTURA_U1_20220902.csv, ANGOSTURA_U1_20220903.csv
I want to read all files and concatenated in one csv and print this cocatenated df as ANGOSTURA_U1_202209_month.csv
Take into consideration that these files can be called Colbun_U1_20220801.csv, Colbun_U1_20220802.csv, Colbun_U1_20220803.csv , but I want the file name output always be the first name and the date. In this case it would be : Colbun_U1_202208_month.csv if the files are ANGOSTURA_U1_XXXX01.csv output file name: ANGOSTURA_U1_XXXX_month.csv, if the files are Colbun_U2_XXXX01.csv output file name: Colbun_U2_XXXX_month.csv but it always be in the folder either Colbun or Angostura not both
This is my code: (i try os.listdir and glob.glob)
import pandas as pd
import numpy as np
import glob
import os
import csv
all_files = glob.glob("C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/saturnmensual/*.csv")
file_list = []
for f in (all_files):
data = pd.read_csv(f,usecols=["t","f"])
file_list.append(data)
df=pd.concat(file_list,ignore_index=True)
df.to_csv(f'C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/Saturn2mensual/{os.path.basename(f).split(".")[0]}_mensual.csv')
You could try the following:
from itertools import groupby
from pathlib import Path
def key(file_path): return file_path.stem[:-2]
base = Path("C:/Users/ep_irojaso/Desktop/PROGRAMA DESEMPEÑO/saturnmensual/")
all_files = sorted(base.glob("*.csv"))
for key, files in groupby(all_files, key=key):
pd.concat(
[pd.read_csv(file, usecols=["t", "f"]) for file in files]
).to_csv(base / f"{key}_month.csv", index=False)
Use pathlib from the standard library instead of os: set the base path to the folder that contains the CSV-files.
glob all CSV-files in it and sort them into a list all_files.
Now group the files into monthly buckets with groupby from the standard library module itertools. The grouping key is the file name without the extension and the last to characters (the days according to your specification).
Then concat all the dataframes from one month and write the new dataframe to a new CSV-file.
I need to read in about 100 excel files under the same folder. Each file name contains a company name starting with an identifying number (e.g. 1CompanyA, 2CompanyB) and each of them has 10 same tab names (sheet1, sheet2, ....sheet10).
This website shows how to read all files (with a single sheet) under a directory as a Pandas DataFrame. However, I wonder if there's a way to read in multiple sheets under the same situation. It would also be nice if the read-in dataframes can be named as starting number of the company+sheet_name. For example, sheet1 of 1CompanyA.xlsx would be assigned to dataframe c1_sheet1 (c stands for company)
Here's the code to read all files (with a single sheet) under a directory as a Pandas DataFrame.
# import necessary libraries
import pandas as pd
import os
import glob
# use glob to get all the csv files
# in the folder
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.xlsx"))
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_excel(f)
# print the location and filename
print('Location:', f)
print('File Name:', f.split("\\")[-1])
# print the content
print('Content:')
display(df)
print()
It's easier to read and access if you build a dictionary to store your DataFrames in, imo. By creating ExcelFile objects first, we can get the sheet names separately, so that we can read all sheets even if we don't know how many there are and what their names are.
import pandas as pd
import os
import glob
import re
path = os.getcwd()
csv_files = glob.glob(os.path.join(path, "*.xlsx"))
files = {}
for f in csv_files:
file = pd.ExcelFile(f)
c_id = re.findall(r'.\\(\d+)\w+', f)[0]
for sn in file.sheet_names:
key = "c{}_{}".format(c_id, sn)
files[key] = file.parse(sheet_name=f"{sn}")
This creates a dictionary of 1000 DataFrames. Then to access sheet5 of 1CompanyA.xlsx, you can use files['c1_sheet5'], etc.
Now if you insist on creating a variable named c1_sheet5 etc., one way is to use globals():
for f in csv_files:
file = pd.ExcelFile(f)
c_id = re.findall(r'.\\(\d+)\w+', f)[0]
for sn in file.sheet_names:
key = "c{}_{}".format(c_id, sn)
globals()[key] = file.parse(sheet_name=f"{sn}")
This creates 1000 DataFrames. Then to access sheet5 of 1CompanyA.xlsx, you can use c1_sheet5, etc. Note that using globals() is considered a bad idea (I can't find a link but there are numerous discussion about it on SO). The gist of it is, globals implicitly creates global variables which require dictionary notation; so it's better to build an explicit dictionary like files as created above instead.
I would like to automatically import all csv files that are in one folder as dataframes and set the dataframe's variable name to the respective filename.
For example, in the folder are the following three files: data1.csv, data2.csv and data3.csv
How can I automatically import all three files having three dataframes (data1, data2 and data3) as the result?
If you want to save dataframe as variable with own file name. But it is not secure. This could cause code injection.
import pandas
import os
path = "path_of_directory"
files = os.listdir(path) # Returns list of files in the folder which is specifed path
for file in files:
if file.endswith(".csv"):# Checking wheter file endswith .csv
# os.sep returns the separtor of operator system
exec(f"{file[:-4]} = pandas.read_csv({path}+{os.sep}+{file})")
You can loop over the directory using pathlib and build a dictionary of name->DataFrame, eg:
import pathlib
import pandas as pd
dfs = {path.stem: pd.read_csv(path) for path in pathlib.Path('thepath/').glob(*.csv')}
Then access as dfs['test1'] etc...
Since the answer that was given includes an exec command, and munir.aygun already warned you what could go wrong with that approach. Now I want to show you the way to do it as Justin Ezequiel or munir.aygun already suggested:
import os
import glob
import pandas as pd
# Path to your data
path = r'D:\This\is\your\path'
# Get all .csv files at your path
allFiles = glob.glob(path + "/*.csv")
# Read in the data from files and safe to dictionary
dataStorage = {}
for filename in allFiles:
name = os.path.basename(filename).split(".")[0]
dataStorage[name] = pd.read_csv(filename)
# Can be used then like this (for printing here)
if "data1" in dataStorage:
print(dataStorage["data1"])
Hope this can still be helpful.
I want to open multiple csv files in python, collate them and have python create a new file with the data from the multiple files reorganised...
Is there a way for me to read all the files from a single directory on my desktop and read them in python like this?
Thanks a lot
If you a have a directory containing your csv files, and they all have the extension .csv, then you could use, for example, glob and pandas to read them all in and concatenate them into one csv file. For example, say you have a directory, like this:
csvfiles/one.csv
csvfiles/two.csv
where one.csv contains:
name,age
Keith,23
Jane,25
and two.csv contains:
name,age
Kylie,35
Jake,42
Then you could do the following in Python (you will need to install pandas with, e.g., pip install pandas):
import glob
import os
import pandas as pd
# the path to your csv file directory
mycsvdir = 'csvdir'
# get all the csv files in that directory (assuming they have the extension .csv)
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))
# loop through the files and read them in with pandas
dataframes = [] # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
df = pd.read_csv(csvfile)
dataframes.append(df)
# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)
# print out to a new csv file
result.to_csv('all.csv')
Note that the output csv file will have an additional column at the front containing the index of the row. To avoid this you could instead use:
result.to_csv('all.csv', index=False)
You can see the documentation for the to_csv() method here.
Hope that helps.
Here is a very simple way to do what you want to do.
import pandas as pd
import glob, os
os.chdir("C:\\your_path\\")
results = pd.DataFrame([])
for counter, file in enumerate(glob.glob("1*")):
namedf = pd.read_csv(file, skiprows=0, usecols=[1,2,3])
results = results.append(namedf)
results.to_csv('C:\\your_path\\combinedfile.csv')
Notice this part: glob("1*")
This will look only for files that start with '1' in the name (1, 10, 100, etc). If you want everything, change it to this: glob("*")
Sometimes it's necessary to merge all CSV files into a single CSV file, and sometimes you just want to merge some files that match a certain naming convention. It's nice to have this feature!
I know that the post is a little bit old, but using Glob can be quite expensive in terms of memory if you are trying to read large csv files, because you will store all that data into a list in then you'll still have to have enough memory to concatenate the dataframes inside that list into a dataframe with all the data. Sometimes this is not possible.
dir = 'directory path'
df= pd.DataFrame()
for i in range(0,24):
csvfile = pd.read_csv(dir+'/file name{i}.csv'.format(i), encoding = 'utf8')
df = df.append(csvfile)
del csvfile
So, in case your csv files have the same name and have some kind of number or string that differentiates them, you could just do a for loop trough the files and delete them after they are stored in a dataframe variable using pd.append! In this case all my csv files have the same name except they are numbered in a range that goes from 0 to 23.
I hope this is not trivial but I am wondering the following:
If I have a specific folder with n csv files, how could I iteratively read all of them, one at a time, and perform some calculations on their values?
For a single file, for example, I do something like this and perform some calculations on the x array:
import csv
import os
directoryPath=raw_input('Directory path for native csv file: ')
csvfile = numpy.genfromtxt(directoryPath, delimiter=",")
x=csvfile[:,2] #Creates the array that will undergo a set of calculations
I know that I can check how many csv files there are in a given folder (check here):
import glob
for files in glob.glob("*.csv"):
print files
But I failed to figure out how to possibly nest the numpy.genfromtxt() function in a for loop, so that I read in all the csv files of a directory that it is up to me to specify.
EDIT
The folder I have only has jpg and csv files. The latter are named eventX.csv, where X ranges from 1 to 50. The for loop I am referring to should therefore consider the file names the way they are.
That's how I'd do it:
import os
directory = os.path.join("c:\\","path")
for root,dirs,files in os.walk(directory):
for file in files:
if file.endswith(".csv"):
f=open(file, 'r')
# perform calculation
f.close()
Using pandas and glob as the base packages
import glob
import pandas as pd
glued_data = pd.DataFrame()
for file_name in glob.glob(directoryPath+'*.csv'):
x = pd.read_csv(file_name, low_memory=False)
glued_data = pd.concat([glued_data,x],axis=0)
I think you look for something like this
import glob
for file_name in glob.glob(directoryPath+'*.csv'):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
Edit
If you want to get all csv files from a folder (including subfolder) you could use subprocess instead of glob (note that this code only works on linux systems)
import subprocess
file_list = subprocess.check_output(['find',directoryPath,'-name','*.csv']).split('\n')[:-1]
for i,file_name in enumerate(file_list):
x = np.genfromtxt(file_name,delimiter=',')[:,2]
# do your calculations
# now you can use i as an index
It first searches the folder and sub-folders for all file_names using the find command from the shell and applies your calculations afterwards.
According to the documentation of numpy.genfromtxt(), the first argument can be a
File, filename, or generator to read.
That would mean that you could write a generator that yields the lines of all the files like this:
def csv_merge_generator(pattern):
for file in glob.glob(pattern):
for line in file:
yield line
# then using it like this
numpy.genfromtxt(csv_merge_generator('*.csv'))
should work. (I do not have numpy installed, so cannot test easily)
Here's a more succinct way to do this, given some path = "/path/to/dir/".
import glob
import pandas as pd
pd.concat([pd.read_csv(f) for f in glob.glob(path+'*.csv')])
Then you can apply your calculation to the whole dataset, or, if you want to apply it one by one:
pd.concat([process(pd.read_csv(f)) for f in glob.glob(path+'*.csv')])
The function below will return a dictionary containing a dataframe for each .csv file in the folder within your defined path.
import pandas as pd
import glob
import os
import ntpath
def panda_read_csv(path):
pd_csv_dict = {}
csv_files = glob.glob(os.path.join(path, "*.csv"))
for csv_file in csv_files:
file_name = ntpath.basename(csv_file)
pd_csv_dict['pd_' + file_name] = pd.read_csv(csv_file, sep=";", encoding='mac_roman')
locals().update(pd_csv_dict)
return pd_csv_dict
You can use pathlib glob functionality to list all .csv in a path, and pandas to read them.
Then it's only a matter of applying whatever function you want (which, if systematic, can also be done within the list comprehension)
import pands as pd
from pathlib import Path
path2csv = Path("/your/path/")
csvlist = path2csv.glob("*.csv")
csvs = [pd.read_csv(g) for g in csvlist ]
Another answer using list comprehension:
from os import listdir
files= [f for f in listdir("./") if f.endswith(".csv")]
You need to import the glob library and then use it like following:
import glob
path='C:\\Users\\Admin\\PycharmProjects\\db_conection_screenshot\\seclectors_absent_images'
filenames = glob.glob(path + "\*.png")
print(len(filenames))