Import multiple csv files using loops - python

I'm using google colab and I'm trying to import multiple csv files from google drive to the program.
I know how to import the datasets one by one but I'm not sure how to create a loop that reads in all of the csv files so that I can just have one line of code that imports all of the datasets for me.

You can create a dictionary with all dataframes like this:
from glob import glob
import pandas as pd
filepaths = glob('/content/drive/My Drive/location_of_the_files/*.csv')
dfs = {f'df{n}': pd.read_csv(i) for n, i in enumerate(filepaths)}
Individual dataframes can then be accessed with dfs['df0'], dfs['df1'], etc.

Related

How to automate the process of converting the list of .dat files, with their dictionaries (in seperate .dct format), to pandas data frames?

The following code coverts .dat files into data frames with the use of its dictionary file in .dct format. It works well. But my problem is that I was unable to automate this process, creating a loop that takes the pairs of these files from lists is a little bit tricky, atleast for me. I could really use some help with that.
try:
from statadict import parse_stata_dict
except ImportError:
!pip install statadict
import pandas as pd
from statadict import parse_stata_dict
dict_file = '2015_2017_FemPregSetup.dct'
data_file = '2015_2017_FemPregData.dat'
stata_dict = parse_stata_dict(dict_file)
stata_dict
nsfg = pd.read_fwf(data_file,
names=stata_dict.names,
colspecs=stata_dict.colspecs)
# nsfg is now a pandas DataFrame
These are the lists of files that I would like to convert into data frames. Every .dat file has its own dictionary file:
dat_name = ['2002FemResp.dat',
'2002Male.dat'...
dct_name = ['2002FemResp.dct',
'2002Male.dct'...
Assuming both lists have the same length and you will want to save the csv dataframe you could try:
c=0
for dat,dct in zip(dat_name, dct_name):
c+=1
stata_dict = parse_stata_dict(dct)
pd.read_fwf(dat, names=stata_dict.names, colspecs=stata_dict.colspecs).to_csv(r'path_name\file_name_{}.csv'.format(c))
# don't forget the '.csv'!
Also consider that if you are not using windows you need to use '/' rather than '\' in your path (or you can use os.path.join() to avoid this issue.

I need to capture date from multiple csv filenames and add that date in each file as a new column using Python

I need to capture date from multiple csv filenames and add that date in each file as a new column using Python , I have this code that works well with Excel files and I am trying to do exactly the same with CSV files, If someone could help me that would be much appreciated.
Filenames are as following...
Scan_05-22-2021.csv
Scan_05-23-2021.csv
Scan_05-24-2021.csv and so on..
Excel code that works..
import openpyexcel
import os
import pandas as pd
import glob
import csv
from openpyexcel import load_workbook
import os
path_to_xls = os.getcwd() # or r'<path>'
for xls in os.listdir ('C:\Python'):
if xls.endswith(".csv") or xls.endswith(".xlsx"):
f = load_workbook(filename=xls)
sheet = f.active
# Change here the name of the new column
sheet.cell(row=1, column=25).value = "DateTest"
for i in range(sheet.max_row-1):
#xls.split('_')[1][:-5] #kaes value of Col1 and dumps/overwrites in column 3
sheet.cell(row=i+2, column=25).value = xls.split('_')[1][:-5]
f.save(xls)
f.close()
You should be able to do this with pandas
use pd.read_csv to load the files as DataFrames
you can use the iterrows method to go ever rows
and simply append to the new file.
this cheatsheet could be of use
Good luck!

How to import multiple csv files at once

I have 30 csv files of wind speed data on my computer- each file represents data at a different location. I have written code to calculate the statistics I need to run for each site; however, I am currently pulling in each csv file individually to do so(see code below):
from google.colab import files
data_to_load = files.upload()
import io
df = pd.read_csv(io.BytesIO(data_to_load['Downtown.csv']))
Is there a way to pull in all 30 csv files at once so each file is run through my statistical analysis code block and spits out an array with the file name and the statistic calculated?
use a loop
https://intellipaat.com/community/17913/import-multiple-csv-files-into-pandas-and-concatenate-into-one-dataframe
import glob
import pandas as pd
# get data file names
local_path = r'/my_files'
filenames = glob.glob(local_path + "/*.csv")
dfs = [pd.read_csv(filename)) for filename in filenames]
# if needed concatenate all data into one DataFrame
big_frame = pd.concat(dfs, ignore_index=True)
Also you can try put data online: github or google drive and read from there
https://towardsdatascience.com/3-ways-to-load-csv-files-into-colab-7c14fcbdcb92

How to extract the name of the file uploaded on a jupyter file using python?

My first question here.
I have been working with python on jupyter notebook for a personal project. I am using a code to dynamically allow users to select a csv file on which they wish to test my code on. However, I am not sure how to extract the name of this file once I have uploaded this file. The code goes on as follows:
***import numpy as np
import pandas as pd
from pandas import Series, DataFrame
import io
from google.colab import files
from scipy import stats
uploaded = files.upload()
df = pd.read_csv(io.BytesIO(uploaded['TestData.csv']))
df.head()
.
.
.***
As you can see, after the upload when I try to read the file, I have to type its name manually in the code. Is there a way to automatically capture the name of the file in a variable and then I can use the same while calling the pandas read function?

Renaming all the excel files as per the list in DataFrame in Python

I have approximately 300 files which are to be renamed as per the excel sheet mentioned below
The folder looks something like this :
I have tried writing following code, I think there will be a need of looping aswell. But it is not able to rename even one file. Any clue how this can be corrected.
import os
import pandas as pd
os.path.abspath('C:\\Users\\Home\\Desktop')
master=pd.read_excel('C:\\Users\\Home\\Desktop\\Test_folder\\master.xlsx')
master['old']=
('C:\\Users\\Home\\Desktop\\Test_folder\\'+master['oldname']+'.xlsx')
master['new']=
('C:\\Users\\Home\\Desktop\\Test_folder\\'+master['newname']+'.xlsx')
newmaster=master[['old','new']]
os.rename(newmaster['old'],newmaster['new'])
Load stuff.
import os
import pandas as pd
master = pd.read_excel('C:\\Users\\Home\\Desktop\\Test_folder\\master.xlsx')
Set your current directory to the folder.
os.chdir('C:\\Users\\Home\\Desktop\\Test_folder\\')
Rename things one at a time. While it would be cool, os.rename is not designed to work with pandas.
for row in master.iterrows():
oldname, newname = row[1]
os.rename(oldname+'.xlsx', newname+'.xlsx')
Basically, you are passing two pandas Series into os.rename() which expects two strings. Consider passing each Series values elementwise using apply(). And use the os-agnostic, os.path.join to concatenate folder and file names:
import os
import pandas as pd
cd = r'C:\Users\Home\Desktop\Test_folder'
master = pd.read_excel(os.path.join(cd, 'master.xlsx'))
def change_names(row):
os.rename(os.path.join(cd, row[0] +'.xlsx'), os.path.join(cd, row[1] +'.xlsx'))
master[['oldname', 'newname']].apply(change_names, axis=1)

Categories

Resources