I am trying to merge differents csv in Python. The files are in the same folder. All files have one column in common 'client_ID'. I tried this code:
path= r'/folder_path/'
allfiles = glob.glob(path + "/*.csv")
df = pd.DataFrame()
for file in allfiles:
df_file = pd.read_csv(file)
df_file = pd.merge(df, df_file, on='partner_id')
df
You can read the first csv file first so that you don't start with an empty dataframe. I would edit your code like this:
path= r'/folder_path/'
allfiles = glob.glob(path + "/*.csv")
for i, file in enumerate(allfiles):
if i < 1:
df = pd.read_csv(file)
else:
df_file = pd.read_csv(file)
df = pd.merge(df, df_file, on='partner_id')
df
Related
How i can select last raw in text files with for?
this my first idea code :
import glob
import pandas as pd
path = input("Insert location:")
file_list = glob.glob(path + "/*.txt")
txt_list = []
for file in file_list:
txt_list.append(pd.read_csv(file))
for file in file_list:
txt_list[-7::3]
excl_merged = pd.concat(txt_list, ignore_index=True)
excl_merged.to_excel('Total.xlsx', index=False) ]
Your code is incorrect. Here is a version that should work:
import glob
import pandas as pd
path = input("Insert location:")
file_list = glob.glob(path + "/*.txt")
df_list = []
for file in file_list:
df = pd.read_csv(file)
df_list.append(df.tail(3)) # last 3 rows from each file dataframe
excl_merged = pd.concat(df_list, ignore_index=True)
excl_merged.to_excel('Total.xlsx', index=False)
Explaination: tail() method takes the last several rows (provided as an argument) from a dataframe.
I have 100 csv file. I want to print particular columns from all the csv file with the file name. Here in this code I can print all of the csv file.
path = r'F:\11 semister\TPC_MEMBER'
all_files = glob.glob(path + "/*.csv")
dataStorage = {}
for filename in all_files:
name = os.path.basename(filename).split(".csv")[0]
dataStorage[name] = pd.read_csv(filename)
print(name)
dataStorage
May be you want this.
import pandas as pd
import numpy as np
import glob
path = r'folderpath' #provide your folder path where your csv files are stored.
all_csv= glob.glob(path + "/*.csv")
li = []
for filename in all_csv:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
data_frame = pd.concat(li, axis=0, ignore_index=True)
data_frame['columnname'] # enter the name of your dataframe's column.
print(data_frame)
I wrote a loop to read in every excel file within a directory and append it to a dataframe. It works.
file_list = glob.glob(path + "/*.xls")
for file in file_list:
excl_list.append(pd.read_excel(file))
excl_merged = pd.concat(excl_list, ignore_index=True)
And, if I read in one file and pass in all the agruments I want, that also works. For example:
df = pd.read_excel('sample.xlsx', usecoles=['code','name','date','hours'],skiprows= [0,1,2,3,4,5,6,7])
But, if I try to add those same arguments into my loop, it doesn't work. Any suggestions??
file_list = glob.glob(path + "/*.xls")
for file in file_list:
excl_list.append(pd.read_excel(file,usecoles=['code','name','date','hours'],skiprows= [0,1,2,3,4,5,6,7]))
excl_merged = pd.concat(excl_list, ignore_index=True)
file_list = glob.glob(path + "/*.xls")
excl_merged = pd.concat(
[pd.read_excel(file,
# usecols, not usecoles
usecols=['code','name','date','hours'],
skiprows= [0,1,2,3,4,5,6,7])
for file in file_list
],
ignore_index=True,
)
I am trying to parse a list of .txt files within a zip folder but it's only parsing one file from that list
Code:
def custom_parse(self, response):
self.logger.info(response.url)
links = response.xpath("//a[contains(#href, '.zip')]/#href").getall()
for link in list(set(links)):
print(link)
local_path = self.download_file("https://www.sec.gov" + link)
zip_file = zipfile.ZipFile(local_path)
zip_csv_files = [file_name for file_name in zip_file.namelist() if file_name.endswith(".txt") and "pre" not in file_name]
zip_csv_file = zip_csv_files[0]
with zip_file.open(zip_csv_file, "r") as zip:
# df = pd.read_csv(BytesIO(zip.read()), dtype=object)
df = pd.read_csv(zip, dtype=object, header=None, sep='delimiter')
df = self.standardized(df)
for k, row in df.iterrows():
yield dict(row)
def standardized(self, df):
# df.columns = [col.lower().strip().replace(" ", "_") for col in df.columns]
df = df.fillna('')
return df
I am going to assume it's due to zip_csv_file = zip_csv_files[0] but I am unsure how I can modify my current code to parse all the .txt files in a given zip folder.
You already pull out all the .txt files with your list comprehension, so just read those in a loop and concatenate them. This is untested, but should be close
replace the appropriate section of your code with this:
UPDATE:
zip_file = zipfile.ZipFile(local_path)
text_files = zip_file.infolist()
df_list =[]
for file_name in text_files:
if file_name.filename.endswith(".txt") and "pre" not in file_name.filename:
df_list.append(pd.read_csv(zip_file(open(file_name.filename)), dtype=object, header=None, sep='delimiter'))
df = pd.concat(df_list)
df = self.standardized(df)
This is what I have so far. I need to combine 3 files from my google drive to one. I do not get an error with this code, but it only imports 1 file.
import pandas as pd
import glob
path = '/content/gdrive/My Drive/Colab Datasets/'
all_files = glob.glob(path + "/*.csv") # this is new
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None, header=0)
li.append(df)
frame = pd.concat(li, axis=0, ignore_index=True)
Simply by following this code:
import os
import glob
import pandas as pd
os.chdir("content/gdrive/My Drive/Colab Datasets")
extension = 'csv'
all_filenames = [i for i in glob.glob('*.{}'.format(extension))]
#combine all files in the list
combined_csv = pd.concat([pd.read_csv(f) for f in all_filenames])
#export to csv
combined_csv.to_csv( "combined_csv.csv", index=False, encoding='utf-8-sig')