python: error reading files into data frame - python

I'm trying to import multiple csv files in one folder into one data frame. This is my code. It can iterate through the files and print them successfully and it can read one file into a data frame but combining them is printing an error. I saw many questions similar but the responses are complex, I thought the 'pythonic' way is to be simple because I am new to this. Thanks in advance for any help. The error message is always: No such file or directory: 'some file name' which makes no sense because it successfully printed the file name in the print step.
import pandas as pd
# this works
df = pd.read_csv("headlines/2017-1.csv")
print(df)
path = 'C:/.../... /.../headlines/' <--- full path I shortened it here
files = os.listdir(path)
print(files) <-- prints all file names successfully
for filename in files:
print(filename) # <-- successfully prints all file names
df = pd.read_csv(filename) # < -- error here
df2.append(df) # append to data frame

It seems like your current working directory is different from your path. Please use
os.chdir(path) before attempting to read your csv.

Related

not recognised excel file

I have been trying to read some excel files using pandas but when I use the for loop to go through all the files I get an error
I have checked if the files name are stored in week and they are, actually if I do try to read individually I can read the files, but as soon as I use the for I get this error
import pandas as pd
import os
week = [file for file in os.listdir("./data_excel")]
week_concatenated = pd.DataFrame()
for file in week:
df = pd.read_excel('data_excel/'+file, 'DIGITAL_TASKS')
week_concatenated = pd.concat([week_concatenated, df])
Well, there was actually a file inside the folder that the system created and I didnt see it before, thats why when the loop was started was reading that file and throwing that error as this file was not xlsx.

Read csv files in multiple zip files by using one csv as an example and loop

I have multiple zip files in a folder and within the zip files are multiple csv files.
All csv files dont have all the columns but a few have all the columns.
How can I use the file that has all the columns as an example and then loop it to extract all the data into one dataframe and save it into one csv for further use?
The code I am following right now is as below:
import glob
import zipfile
import pandas as pd
dfs = []
for zip_file in glob.glob(r"C:\Users\harsh\Desktop\Temp\*.zip"):
zf = zipfile.ZipFile(zip_file)
dfs += [pd.read_csv(zf.open(f), sep=";", encoding='latin1') for f in zf.namelist()]
df = pd.concat(dfs,ignore_index=True)
print(df)
However, I am not getting the columns and headers at all. I am stuck at this stage.
If you'd like to know the file structure,
Please find the output of the code here and
The example csv file here.
If you would like to see my project files for this code, Please find the shared google drive link here
Also, at the risk of sounding redundant, why am I required to use the sep=";", encoding='latin1' part? The code gives me an error without it otherwise.

'EmptyDataError: No columns to parse from file' in Pandas when concatenating all files in a directory into single CSV

So I'm working on a project that analyzes Covid-19 data from this entire year. I have multiple csv files in a given directory. I am trying to merge all the files' contents from each month into a single, comprehensive csv file. Here's what I got so far as shown below...Specifically, the error message that appears is 'EmptyDataError: No columns to parse from file.' If I were to delete df = pd.read_csv('./csse_covid_19_daily_reports_us/' + file) and simply run print(file) It lists all the correct files that I am trying to merge. However, when trying to merge all data into one I get that error message. What gives?
import pandas as pd
import os
df = pd.read_csv('./csse_covid_19_daily_reports_us/09-04-2020.csv')
files = [file for file in os.listdir('./csse_covid_19_daily_reports_us')]
all_data = pd.DataFrame()
for file in files:
df = pd.read_csv('./csse_covid_19_daily_reports_us/' + file)
all_data = pd.concat([all_data, df])
all_data.head()
Folks, I have resolved this issue. Instead of sifting through files with files = [file for file in os.listdir('./csse_covid_19_daily_reports_us')], I have instead used files=[f for f in os.listdir("./") if f.endswith('.csv')]. This filtered out some garbage files that were not .csv, thus allowing me to compile all data into a single csv.

Saving multiple Excel Files to a Specific Path with Unique Filenames

In a loop I adjust the CSV structure of each file.
Now I want them to save in to the assigned folder with unique file names.
I can save to a CSV file, but than CSV file gets overwritten resulting in only the final modified result of the test5 file. I want save the CSV under their own filename+string _modified format.
I have 5 csv files:
Test1.csv
test2.csv
test3.csv
test4.csv
test5.csv
I import them:
for x in allFiles:
print(x)
stop=1
with open(x, 'r') as thecsv:
base=os.path.basename(ROT)
filename=os.path.splitext(base)[0]
print(name)
Now I loop through the files manipulate them and save it as DataFrame.
This is working fine.
Now I want to save each file separately in the output folder with a unique name (filename + _modified)
Output='J:\Temp\Output'
This is what I tried:
df2.to_csv(output+filename+'//_modified.csv'),sep=';',header=False,index=False)
also tried:
df2.to_csv(output(os.path.join(name+'//_modified.csv'),sep=';',header=False,index=False)
Hoping for the output folder looks like this:
test1_modified.csv
test2_modified.csv
test3_modified.csv
test4_modified.csv
test5_modified.csv
I would do something like this, making a new name before the call to write it out:
testFiles = ["test1.csv", "test2.csv", "test3.csv",
"test4.csv", "test5.csv"]
# iterate over each one
for f in testFiles:
# strip old extensions, replace with nothing
f = f.replace(".csv", "")
# I'd use join but you can you +
newName = "_".join([f, "_modified.csv"])
print(newName)
# make your call to write it out
I would also check the pandas docs for writing out, it's simpler than what you're trying:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
import pandas as pd
# read data
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
# write data to local
iris.to_csv("iris.csv")
I found the solution to my problem
df.to_csv(output+'\'+filename+'.csv',sep=';',header=False,index=False)

How to write script that edits every excel file inside a folder

I want to make a script that writes a certain text to the A1 cell of every excel file inside a folder. I'm not sure how to get Python to open every file one by one, make changes to A1, and then overwrite save the original file.
import os
import openpyxl
os.chdir('C:/Users/jdal/Downloads/excelWorkdir')
folderList = os.listdir()
for file in in os.walk():
for name in file:
if name.endswith(".xlsx" and ".xls")
wb = openpyxl.load_workbook()
sheet = wb.get_sheet_by_name('Sheet1')
sheet['A1'] = 'input whatever here!'
sheet['A1'].value
wb.save()
I see following errors in your code:
You have an error in .endswith, it should be
name.endswith((".xlsx",".xls"))
i.e. it needs to be feed with tuple of allowed endings.
Your if lacks : at end and your indentation seems to be broken.
You should deliver one argument to .load_workbook and one argument to .save, i.e. name of file to read/to write.
I would iterate through the folder and use pandas to read the files as temporary data frames. From there, they are easily manipulable.
Assuming you are in the relevant directory:
import pandas as pd
import os
files = os.listdir()
for i in range(len(files)):
if files[i].endswith('.csv'):
# Store file name for future replacement
name = str(files[i])
# Save file as dataframe and edit cell
df = pd.read_csv(files[i])
df.iloc[0,0] = 'changed value'
# Replace file with df
df.to_csv(name, index=False)

Categories

Resources