I wrote a loop to read in every excel file within a directory and append it to a dataframe. It works.
file_list = glob.glob(path + "/*.xls")
for file in file_list:
excl_list.append(pd.read_excel(file))
excl_merged = pd.concat(excl_list, ignore_index=True)
And, if I read in one file and pass in all the agruments I want, that also works. For example:
df = pd.read_excel('sample.xlsx', usecoles=['code','name','date','hours'],skiprows= [0,1,2,3,4,5,6,7])
But, if I try to add those same arguments into my loop, it doesn't work. Any suggestions??
file_list = glob.glob(path + "/*.xls")
for file in file_list:
excl_list.append(pd.read_excel(file,usecoles=['code','name','date','hours'],skiprows= [0,1,2,3,4,5,6,7]))
excl_merged = pd.concat(excl_list, ignore_index=True)
file_list = glob.glob(path + "/*.xls")
excl_merged = pd.concat(
[pd.read_excel(file,
# usecols, not usecoles
usecols=['code','name','date','hours'],
skiprows= [0,1,2,3,4,5,6,7])
for file in file_list
],
ignore_index=True,
)
Related
How i can select last raw in text files with for?
this my first idea code :
import glob
import pandas as pd
path = input("Insert location:")
file_list = glob.glob(path + "/*.txt")
txt_list = []
for file in file_list:
txt_list.append(pd.read_csv(file))
for file in file_list:
txt_list[-7::3]
excl_merged = pd.concat(txt_list, ignore_index=True)
excl_merged.to_excel('Total.xlsx', index=False) ]
Your code is incorrect. Here is a version that should work:
import glob
import pandas as pd
path = input("Insert location:")
file_list = glob.glob(path + "/*.txt")
df_list = []
for file in file_list:
df = pd.read_csv(file)
df_list.append(df.tail(3)) # last 3 rows from each file dataframe
excl_merged = pd.concat(df_list, ignore_index=True)
excl_merged.to_excel('Total.xlsx', index=False)
Explaination: tail() method takes the last several rows (provided as an argument) from a dataframe.
I have 100 csv file. I want to print particular columns from all the csv file with the file name. Here in this code I can print all of the csv file.
path = r'F:\11 semister\TPC_MEMBER'
all_files = glob.glob(path + "/*.csv")
dataStorage = {}
for filename in all_files:
name = os.path.basename(filename).split(".csv")[0]
dataStorage[name] = pd.read_csv(filename)
print(name)
dataStorage
May be you want this.
import pandas as pd
import numpy as np
import glob
path = r'folderpath' #provide your folder path where your csv files are stored.
all_csv= glob.glob(path + "/*.csv")
li = []
for filename in all_csv:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
data_frame = pd.concat(li, axis=0, ignore_index=True)
data_frame['columnname'] # enter the name of your dataframe's column.
print(data_frame)
I am trying to merge differents csv in Python. The files are in the same folder. All files have one column in common 'client_ID'. I tried this code:
path= r'/folder_path/'
allfiles = glob.glob(path + "/*.csv")
df = pd.DataFrame()
for file in allfiles:
df_file = pd.read_csv(file)
df_file = pd.merge(df, df_file, on='partner_id')
df
You can read the first csv file first so that you don't start with an empty dataframe. I would edit your code like this:
path= r'/folder_path/'
allfiles = glob.glob(path + "/*.csv")
for i, file in enumerate(allfiles):
if i < 1:
df = pd.read_csv(file)
else:
df_file = pd.read_csv(file)
df = pd.merge(df, df_file, on='partner_id')
df
I am trying to parse a list of .txt files within a zip folder but it's only parsing one file from that list
Code:
def custom_parse(self, response):
self.logger.info(response.url)
links = response.xpath("//a[contains(#href, '.zip')]/#href").getall()
for link in list(set(links)):
print(link)
local_path = self.download_file("https://www.sec.gov" + link)
zip_file = zipfile.ZipFile(local_path)
zip_csv_files = [file_name for file_name in zip_file.namelist() if file_name.endswith(".txt") and "pre" not in file_name]
zip_csv_file = zip_csv_files[0]
with zip_file.open(zip_csv_file, "r") as zip:
# df = pd.read_csv(BytesIO(zip.read()), dtype=object)
df = pd.read_csv(zip, dtype=object, header=None, sep='delimiter')
df = self.standardized(df)
for k, row in df.iterrows():
yield dict(row)
def standardized(self, df):
# df.columns = [col.lower().strip().replace(" ", "_") for col in df.columns]
df = df.fillna('')
return df
I am going to assume it's due to zip_csv_file = zip_csv_files[0] but I am unsure how I can modify my current code to parse all the .txt files in a given zip folder.
You already pull out all the .txt files with your list comprehension, so just read those in a loop and concatenate them. This is untested, but should be close
replace the appropriate section of your code with this:
UPDATE:
zip_file = zipfile.ZipFile(local_path)
text_files = zip_file.infolist()
df_list =[]
for file_name in text_files:
if file_name.filename.endswith(".txt") and "pre" not in file_name.filename:
df_list.append(pd.read_csv(zip_file(open(file_name.filename)), dtype=object, header=None, sep='delimiter'))
df = pd.concat(df_list)
df = self.standardized(df)
I have a set of files that do not have any extension. They are currently stored in a folder that is referenced by this variable "allFiles".
allFiles = glob.glob(base2 + "/*")
I am trying to add an extension to each of the files in allFiles. Add .csv to the file name. I do it using the below code:
for file in allFiles:
os.rename(os.path.join(base2, file), os.path.join(base2, file+'.csv'))
Next I try to append each of these csv files into one as per the below code.
list_ = []
for file_ in allFiles:
try:
df = pd.read_csv(file_, index_col=None, header=None,delim_whitespace = True, error_bad_lines=False)
list_.append(df)
except pd.errors.EmptyDataError:
continue
When I run the above code, I get an error stating one of the files do not exist.
Error : FileNotFoundError: File b'/Users/base2/file1' does not exist
But file1 has now been renamed to file1.csv
Could anyone advice as to where am I going wrong in the above. Thanks
Update:
allFiles = glob.glob(base2 + "/*")
print(allFiles)
list_ = []
print(list_)
allFiles = [x + '.csv' for x in allFiles]
print(allFiles)
for file_ in allFiles:
try:
df = pd.read_csv(file_, index_col=None, header=None)
list_.append(df)
except pd.errors.EmptyDataError:
continue
Error : FileNotFoundError: File b'/Users/base2/file1.csv' does not exist
Before running your loop, do:
EDIT for clarity:
for file in allFiles:
os.rename(os.path.join(base2, file), os.path.join(base2, file+'.csv'))
###What you're adding###
allFiles = [x+'.csv' for x in allFiles]
########################
for file_ in allFiles:
try:
Basically, the problem is that you're changing the file names, but you're not changing the strings in your list to reflect the new file names. You can see this if you print allFiles. The above will make the necessary change for you.