Read CSV from zipped file [duplicate]

Read CSV from zipped file [duplicate] - python

This question already has answers here:
reading multiple files contained in a zip file with pandas
(3 answers)
Closed 2 years ago.
I want to read all csv from zipped file but there is more than one CSV present in zipped file. attached zipped file URL. Anyone answer would be appreciated.
i have tried this but get error that there is more than one CSV file in zipped file.
df = pd.read_csv('http://dados.cvm.gov.br/dados/FIDC/DOC/INF_MENSAL/DADOS/inf_mensal_fidc_202006.zip', compression='zip', header=1, sep=';', quotechar='"')
print(df)

import urllib.request
import zipfile
import pandas as pd
#first you unzip it
url = 'http://dados.cvm.gov.br/dados/FIDC/DOC/INF_MENSAL/DADOS/inf_mensal_fidc_202006.zip'
file_name = "dados.zip"
path = os.getcwd()
destination = path + "\\" + file_name
urllib.request.urlretrieve(url, destination)
#second you extract evrything from the zip into your folder destination
directory_name = "dados_folder"
zip_destination = path + "\\" + directory_name
os.mkdir(zip_destination)
with zipfile.ZipFile(destination, 'r') as zip_ref:
zip_ref.extractall(zip_destination)
#now you read each csv one by one and put it into a dataframe
roman_numerals = ["I", "II", "III", "IV", "IX", "V", "VI", "VII", "X_1", "X_2", "X_3", "X_4", "X_5", "X_6", "X_7", "X_1_1"]
for x in roman_numerals:
name_csv = path + "\\" + "inf_mensal_fidc_tab_" + x + "_202006.csv"
with open(name_csv, "r+") as f: # no need to close it as by using with it closes by itself
df = read_csv(name_csv)

Related

read csv in a for loop using pandas

inp_file=os.getcwd()
files_comp = pd.read_csv(inp_file,"B00234*.csv", na_values = missing_values, nrows=10)
for f in files_comp:
df_calculated = pd.read_csv(f, na_values = missing_values, nrows=10)
col_length=len(df.columns)-1
Hi folks, How can I read 4 csv files in a for a loop. I am getting an error while reading the CSV in above format. Kindly help me

You basically need this:
Get a list of all target files. files=os.listdir(path) and then keep only the filenames that start with your pattern and end with .csv.
You could also improve it using regular expression (by importing re library for more sophistication, or use glob.glob).
filesnames = os.listdir(path)
filesnames = [f for f in filesnames if (f.startswith("B00234") and f.lower().endswith(".csv"))]
Read in files using a for loop:
dfs = list()
for filename in filesnames:
df = pd.read_csv(filename)
dfs.append(df)
Complete Example
We will first make some dummy data and then save that to some .csv and .txt files. Some of these .csv files will begin with "B00234" and some other would not. We will write the dumy data to these files. And then selectively only read in the .csv files into a list of dataframes, dfs.
import pandas as pd
from IPython.display import display
# Define Temporary Output Folder
path = './temp_output'
# Clean Temporary Output Folder
import shutil
reset = True
if os.path.exists(path) and reset:
shutil.rmtree(path, ignore_errors=True)
# Create Content
df0 = pd.DataFrame(np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]),
columns=['a', 'b', 'c'])
display(df0)
# Make Path
import os
if not os.path.exists(path):
os.makedirs(path)
else:
print('Path Exists: {}'.format(path))
# Make Filenames
filenames = list()
for i in range(10):
if i<5:
# Create Files starting with "B00234"
filenames.append("B00234_{}.csv".format(i))
filenames.append("B00234_{}.txt".format(i))
else:
# Create Files starting with "B00678"
filenames.append("B00678_{}.csv".format(i))
filenames.append("B00678_{}.txt".format(i))
# Create files
# Make files with extensions: .csv and .txt
# and file names starting
# with and without: "B00234"
for filename in filenames:
fpath = path + '/' + filename
if filename.lower().endswith(".csv"):
df0.to_csv(fpath, index=False)
else:
with open(fpath, 'w') as f:
f.write(df0.to_string())
# Get list of target files
files = os.listdir(path)
files = [f for f in files if (f.startswith("B00234") and f.lower().endswith(".csv"))]
print('\nList of target files: \n\t{}\n'.format(files))
# Read each csv file into a dataframe
dfs = list() # a list of dataframes
for csvfile in files:
fpath = path + '/' + csvfile
print("Reading file: {}".format(csvfile))
df = pd.read_csv(fpath)
dfs.append(df)
The list dfs should have five elements, where each is dataframe read from the files.
Ouptput:
a b c
0 1 2 3
1 4 5 6
2 7 8 9
List of target files:
['B00234_3.csv', 'B00234_4.csv', 'B00234_0.csv', 'B00234_2.csv', 'B00234_1.csv']
Reading file: B00234_3.csv
Reading file: B00234_4.csv
Reading file: B00234_0.csv
Reading file: B00234_2.csv
Reading file: B00234_1.csv

Problrms reading csv files via numpy and pandas

I have a folder with a certain path and I want to go through all the folders within this folder. Each subfolder contains multiple files of which I only want a certain ".csv" file. I succeed in reading the different folders and selecting the correct file, but when I try to open it (bith with pandas and numpy), I get an IOError stating that the corresponding file doesn't exist..
import pandas as pd
import numpy as np
path = "some_path"
data_list = os.listdir(path)
array = []
for file in data_list:
if file.endswith("name_requirement"):
array.append(function(file))
file_filter = "second_name_requirement"
def function(second_path):
file_list = os.listdir(path + "\\" + str(second_path))
average = 0
for file in file_list:
if str(file)[-20:-4] == file_filter:
measurements = pd.read_csv(file, delimiter = ";", skiprows = 1, usecols = [1])
# measurements = np.loadtxt(file, delimiter =";", skiprows =1)
.....
`

FileNotFoundError: [duplicate]

This question already has answers here:
How to do a recursive sub-folder search and return files in a list?
(13 answers)
Closed 7 months ago.
There are lot of csv excel sheets in a folder. All excel sheet contains data only in first 3 columns. I will select corresponding csv sheet from a lot of csv sheets and then plot it.Here is the code
import os
path = "F:\\Users\\Desktop\\Data\\Summary"
files = []
folder_data = os.listdir(path)
folder_data = [i+"\\" for i in folder_data]
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.csv' in file:
files.append(file)
for i, f in enumerate(files):
print(( i,f))
print('\n'.join(f'{i}-{v}' for i,v in enumerate(files)))
csv_code = str(int(input("Enter corresponding code to plot: ")))
csv_path = path + "\\" + folder_data[csv_code]
df = pd.read_csv(csv_path, header=None)
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
When i run the code i want the Output to be displayed as follows (i mean i want all csv files from the folder to be displayed so that i can select what i want):
0-Test_Summary_1.csv
1-Test_Summary_2.csv
2-Test_Summary_3.csv
3-Test_Summary_4.csv
4-Test_Summary_5.csv
5-Test_Summary_6.csv etc
The error i a getting is
FileNotFoundError:
In spite of csv file is there in the folder. I am getting error as File not found error

If I understand your question correctly, then you could try something like this:
import os
import pandas as pd
# see this answer about absolute paths in windows
# https://stackoverflow.com/a/7767925/9225671
base_path = os.path.join('f:', os.sep, 'Users', 'Desktop', 'Data', 'Summary')
# collect all CSV files in 'base_path' and its subfolders
csv_file_list = []
for dir_path, _, file_name_list in os.walk(base_path):
for file_name in file_name_list:
if file_name.endswith('.csv'):
# add full path to the list, not just 'file_name'
csv_file_list.append(
os.path.join(dir_path, file_name))
print('CSV files that were found:')
for i, file_path in enumerate(csv_file_list):
print(' {:3d} {}'.format(i, file_path))
selected_i = int(input('Enter corresponding number of the file to plot: '))
selected_file_path = csv_file_list[selected_i]
print('selected_file_path:', selected_file_path)
df = pd.read_csv(selected_file_path, header=None)
...
Does this work for you?

Plot graph from the bunch of selected excel sheet

There are lot of csv excel sheets in a folder. All excel sheet contains data only in first 3 columns. I will select corresponding csv sheet from a lot of csv sheets and then plot it.Here is the code
import os
path = "F:\\Users\\Desktop\\Data\\Summary"
files = []
test_folders = os.listdir(path)
folder_data = os.listdir(path)
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.csv' in file:
files.append(file)
for i, f in enumerate(files):
print("%d-%s"%( i,f))
csv_code = int(input("Enter corresponding code to plot: "))
csv_path = folder_data + "\\" + folder_data[csv_code]
df = pd.read_csv(csv_path, header=None)
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
When i run the code i want the Output to be displayed as follows (i mean i want all csv files from the folder to be displayed so that i can select what i want):
0-Test_Summary_1.csv
1-Test_Summary_2.csv
2-Test_Summary_3.csv
3-Test_Summary_4.csv
4-Test_Summary_5.csv
5-Test_Summary_6.csv etc
so that i select the corresponding code like 1 or 2 or 3 to plot.
Here is the error
csv_path = folder_data + "\\" + folder_data[csv_code]
TypeError: can only concatenate list (not "str") to list

folder_data is not a defined variable in the code you have provided? By extensions folder_data[csv_code] isn't either.
Is the line you're receiveing an error on supposed to be:
csv_path = path + "\\" + csv_code

Regarding your actual error message:
folder_data is a list, to which you want to add a string "\\", which does not work. In case you want to append "\\" to each element in the list, you would have to do the following: folder_data = [i+"\\" for i in folder_data]. What you might be wanting to do is use path + "\\" + folder_data[csv_code} instead to get the full path to a single csv file.

Delete the first column from a csv file in Python [duplicate]

This question already has an answer here:
Delete specific columns from csv file in python3
(1 answer)
Closed 4 years ago.
I have the following code in Python to delete the first row from csv files in the folder in_folder and then save them in the folder out_folder. Now I need to delete the first column of a csv file. Does anyone know how to do this?
import csv
import glob
import os
import shutil
path = 'in_folder/*.csv'
files=glob.glob(path)
#Read every file in the directory
x = 0 #counter
for filename in files:
with open(filename, 'r') as fin:
data = fin.read().splitlines(True)
with open(filename, 'w') as fout:
fout.writelines(data[1:])
x+=1
print(x)
dir_src = "in_folder"
dir_dst = "out_folder"
for file in os.listdir(dir_src):
if x>0:
src_file = os.path.join(dir_src, file)
dst_file = os.path.join(dir_dst, file)
shutil.move(src_file, dst_file)

What you can do is to use Pandas as it can achive DataFrame manipulation.
file.csv
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
Your code should look like
import pandas as pd
df = pd.read_csv('file.csv')
# If you know the name of the column skip this
first_column = df.columns[0]
# Delete first
df = df.drop([first_column], axis=1)
df.to_csv('file.csv', index=False)
file.csv
2,3,4,5
2,3,4,5
2,3,4,5

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Read CSV from zipped file [duplicate] - python

Related

read csv in a for loop using pandas

Problrms reading csv files via numpy and pandas

FileNotFoundError: [duplicate]

Plot graph from the bunch of selected excel sheet

Delete the first column from a csv file in Python [duplicate]

Categories

Resources