not recognised excel file

not recognised excel file - python

I have been trying to read some excel files using pandas but when I use the for loop to go through all the files I get an error
I have checked if the files name are stored in week and they are, actually if I do try to read individually I can read the files, but as soon as I use the for I get this error
import pandas as pd
import os
week = [file for file in os.listdir("./data_excel")]
week_concatenated = pd.DataFrame()
for file in week:
df = pd.read_excel('data_excel/'+file, 'DIGITAL_TASKS')
week_concatenated = pd.concat([week_concatenated, df])

Well, there was actually a file inside the folder that the system created and I didnt see it before, thats why when the loop was started was reading that file and throwing that error as this file was not xlsx.

Related

Cant read a csv file in python because the file is not found

I am trying to read a csv file in python, but I keep getting a "FileNotFoundError". No such file or directory
I have it written as:
file = open('case_study2.csv')
I used:
import os
os.getcwd()
to get the current directory for my python file which came back as:
runfile('/Users/natestewart/casestudy2/Task4_CaseStudy2', wdir='/Users/natestewart/casestudy2')

To read csv files, one way is by using pandas library.
import pandas as pd
path = '/Users/natestewart/casestudy2/Task4_CaseStudy2'
file = pd.read_csv('path/case_study2.csv')

python: error reading files into data frame

I'm trying to import multiple csv files in one folder into one data frame. This is my code. It can iterate through the files and print them successfully and it can read one file into a data frame but combining them is printing an error. I saw many questions similar but the responses are complex, I thought the 'pythonic' way is to be simple because I am new to this. Thanks in advance for any help. The error message is always: No such file or directory: 'some file name' which makes no sense because it successfully printed the file name in the print step.
import pandas as pd
# this works
df = pd.read_csv("headlines/2017-1.csv")
print(df)
path = 'C:/.../... /.../headlines/' <--- full path I shortened it here
files = os.listdir(path)
print(files) <-- prints all file names successfully
for filename in files:
print(filename) # <-- successfully prints all file names
df = pd.read_csv(filename) # < -- error here
df2.append(df) # append to data frame

It seems like your current working directory is different from your path. Please use
os.chdir(path) before attempting to read your csv.

Read csv files in multiple zip files by using one csv as an example and loop

I have multiple zip files in a folder and within the zip files are multiple csv files.
All csv files dont have all the columns but a few have all the columns.
How can I use the file that has all the columns as an example and then loop it to extract all the data into one dataframe and save it into one csv for further use?
The code I am following right now is as below:
import glob
import zipfile
import pandas as pd
dfs = []
for zip_file in glob.glob(r"C:\Users\harsh\Desktop\Temp\*.zip"):
zf = zipfile.ZipFile(zip_file)
dfs += [pd.read_csv(zf.open(f), sep=";", encoding='latin1') for f in zf.namelist()]
df = pd.concat(dfs,ignore_index=True)
print(df)
However, I am not getting the columns and headers at all. I am stuck at this stage.
If you'd like to know the file structure,
Please find the output of the code here and
The example csv file here.
If you would like to see my project files for this code, Please find the shared google drive link here
Also, at the risk of sounding redundant, why am I required to use the sep=";", encoding='latin1' part? The code gives me an error without it otherwise.

Data set file not found in python

I am coding a python program to read a data set file writing this line:
df = pd.read_csv (r'C:\Users\user118\Desktop\StudentsPerformance.csv')
This line works, but I have to upload this project as an assignment , so the computer path must be changed. I think about putting the csv file in the project folder and i did and wrote this line:
df = pd.read_csv ("StudentsPerformance.csv")
but it gave me an error saying that the file isn't found. Where to correctly put the file in the project folder? Or what I should do?

To read your csv file with a df = pd.read_csv ("StudentsPerformance.csv") line you should put it right next to executing .py file.
To do that you can just read your csv file using full path like this:
df = pd.read_csv (r'C:\Users\user118\Desktop\StudentsPerformance.csv')
and than write df.to_csv('StudentsPerformance.csv')
After that you would be able to read your csv as you wanted using
df = pd.read_csv('StudentsPerformance.csv')

How to write script that edits every excel file inside a folder

I want to make a script that writes a certain text to the A1 cell of every excel file inside a folder. I'm not sure how to get Python to open every file one by one, make changes to A1, and then overwrite save the original file.
import os
import openpyxl
os.chdir('C:/Users/jdal/Downloads/excelWorkdir')
folderList = os.listdir()
for file in in os.walk():
for name in file:
if name.endswith(".xlsx" and ".xls")
wb = openpyxl.load_workbook()
sheet = wb.get_sheet_by_name('Sheet1')
sheet['A1'] = 'input whatever here!'
sheet['A1'].value
wb.save()

I see following errors in your code:
You have an error in .endswith, it should be
name.endswith((".xlsx",".xls"))
i.e. it needs to be feed with tuple of allowed endings.
Your if lacks : at end and your indentation seems to be broken.
You should deliver one argument to .load_workbook and one argument to .save, i.e. name of file to read/to write.

I would iterate through the folder and use pandas to read the files as temporary data frames. From there, they are easily manipulable.
Assuming you are in the relevant directory:
import pandas as pd
import os
files = os.listdir()
for i in range(len(files)):
if files[i].endswith('.csv'):
# Store file name for future replacement
name = str(files[i])
# Save file as dataframe and edit cell
df = pd.read_csv(files[i])
df.iloc[0,0] = 'changed value'
# Replace file with df
df.to_csv(name, index=False)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

not recognised excel file - python

Well, there was actually a file inside the folder that the system created and I didnt see it before, thats why when the loop was started was reading that file and throwing that error as this file was not xlsx.

Related

Cant read a csv file in python because the file is not found

python: error reading files into data frame

Read csv files in multiple zip files by using one csv as an example and loop

Data set file not found in python

How to write script that edits every excel file inside a folder

Categories

Resources