There are lot of csv excel sheets in a folder. All excel sheet contains data only in first 3 columns. I will select corresponding csv sheet from a lot of csv sheets and then plot it.Here is the code
import os
path = "F:\\Users\\Desktop\\Data\\Summary"
files = []
test_folders = os.listdir(path)
folder_data = os.listdir(path)
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.csv' in file:
files.append(file)
for i, f in enumerate(files):
print("%d-%s"%( i,f))
csv_code = int(input("Enter corresponding code to plot: "))
csv_path = folder_data + "\\" + folder_data[csv_code]
df = pd.read_csv(csv_path, header=None)
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
When i run the code i want the Output to be displayed as follows (i mean i want all csv files from the folder to be displayed so that i can select what i want):
0-Test_Summary_1.csv
1-Test_Summary_2.csv
2-Test_Summary_3.csv
3-Test_Summary_4.csv
4-Test_Summary_5.csv
5-Test_Summary_6.csv etc
so that i select the corresponding code like 1 or 2 or 3 to plot.
Here is the error
csv_path = folder_data + "\\" + folder_data[csv_code]
TypeError: can only concatenate list (not "str") to list
folder_data is not a defined variable in the code you have provided? By extensions folder_data[csv_code] isn't either.
Is the line you're receiveing an error on supposed to be:
csv_path = path + "\\" + csv_code
Regarding your actual error message:
folder_data is a list, to which you want to add a string "\\", which does not work. In case you want to append "\\" to each element in the list, you would have to do the following: folder_data = [i+"\\" for i in folder_data]. What you might be wanting to do is use path + "\\" + folder_data[csv_code} instead to get the full path to a single csv file.
Related
I'm trying to merge my 119 csv files into one file through a python code. The only issue I'm facing is that even though I've applied the sort method it isnt working and my files are not ordered , which is causing the date column to be un-ordered. Below is the code, when I run this and open my new csv file "call.sms.merged" it appears that after my 1st csv file, data is inserted or merged from the 10th csv then 100th csv till 109 csv & then it starts to begin from csv 11. I'm attaching an image for better understanding.
file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path + f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []
for file in sorted(file_list):
csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
csv_merged = pd.concat(csv_list, ignore_index=True)
csv_merged.to_csv(file_path + 'calls.sms.merged.csv', index=False)
UN-SORTED DATA
Incorrect order of csv
un-ordered
Python Code and Error :
Python Code Screenshot
Error Screenshot
You can extract the number of each call/file with pandas.Series.str.extract then use pandas.DataFrame.sort_values to make an ascending sort along this column/number.
Try this :
file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path + f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []
for file in file_list:
csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
csv_merged = (
pd.concat(csv_list, ignore_index=True)
.assign(num_call= lambda x: x["File_Name"].str.extract("(\d{1,})", expand=False).astype(int))
.sort_values(by="num_call", ignore_index=True)
.drop(columns= "num_call")
)
csv_merged.to_csv(file_path + 'calls.sms.merged.csv', index=False)
I am developing a code to process multiple .csv files in a for loop and then extract (into new .csv files) only the rows that match non-empty string cells across a specific column named "20210-2.0". The non-empty string cells are named the same (i.e. 20210-2.0). Here is a screenshot showing part of the csv file:
https://uoe-my.sharepoint.com/:i:/g/personal/gpapanas_ed_ac_uk/EayBblFTHmVJvRfsB6h8Vr4B09IfjQ2L1I5OQKUN2p5wzw?e=2gXW61
import pandas as pd
import glob
import os
path = './'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None)
li.append(df)
df = li[li['20201-2.0'].notnull()]
print('extracting info from cvs...')
print(df)
# You can now export all outcomes in new csv files
file_name = filename + 'new' + '.csv'
save_path = os.path.abspath(
os.path.join(
path, file_name
)
)
print('saving ...')
export_csv = df.to_csv(save_path, index=None)
I get the following error:
df = li[li['20201-2.0'].notnull()]
TypeError: list indices must be integers or slices, not str
Inside your loop, after you read the file, automatically filter it, before storing it in a list.
df = pd.read_csv(filename, index_col=None, header = 0) # You read the file in your directory under the variable filename, but it needs to know that you have a column header. Your '20201-2.0' value is a column name right?
df = df[df['20201-2.0'].notnull()] # You now get a new dataframe from the one you load, but now you only got the rows in which the column named '20201-2.0' has been populated.
li.append(df) # Store that dataframe in a list called `li`
I also noticed that in the saving as new csv file you have, you are adding "new" string and ".csv" string in each filname string variable you have.
Have you ran this code? Does it not save your file as "something.csvnew.csv"?
This question already has answers here:
How to do a recursive sub-folder search and return files in a list?
(13 answers)
Closed 7 months ago.
There are lot of csv excel sheets in a folder. All excel sheet contains data only in first 3 columns. I will select corresponding csv sheet from a lot of csv sheets and then plot it.Here is the code
import os
path = "F:\\Users\\Desktop\\Data\\Summary"
files = []
folder_data = os.listdir(path)
folder_data = [i+"\\" for i in folder_data]
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.csv' in file:
files.append(file)
for i, f in enumerate(files):
print(( i,f))
print('\n'.join(f'{i}-{v}' for i,v in enumerate(files)))
csv_code = str(int(input("Enter corresponding code to plot: ")))
csv_path = path + "\\" + folder_data[csv_code]
df = pd.read_csv(csv_path, header=None)
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
When i run the code i want the Output to be displayed as follows (i mean i want all csv files from the folder to be displayed so that i can select what i want):
0-Test_Summary_1.csv
1-Test_Summary_2.csv
2-Test_Summary_3.csv
3-Test_Summary_4.csv
4-Test_Summary_5.csv
5-Test_Summary_6.csv etc
The error i a getting is
FileNotFoundError:
In spite of csv file is there in the folder. I am getting error as File not found error
If I understand your question correctly, then you could try something like this:
import os
import pandas as pd
# see this answer about absolute paths in windows
# https://stackoverflow.com/a/7767925/9225671
base_path = os.path.join('f:', os.sep, 'Users', 'Desktop', 'Data', 'Summary')
# collect all CSV files in 'base_path' and its subfolders
csv_file_list = []
for dir_path, _, file_name_list in os.walk(base_path):
for file_name in file_name_list:
if file_name.endswith('.csv'):
# add full path to the list, not just 'file_name'
csv_file_list.append(
os.path.join(dir_path, file_name))
print('CSV files that were found:')
for i, file_path in enumerate(csv_file_list):
print(' {:3d} {}'.format(i, file_path))
selected_i = int(input('Enter corresponding number of the file to plot: '))
selected_file_path = csv_file_list[selected_i]
print('selected_file_path:', selected_file_path)
df = pd.read_csv(selected_file_path, header=None)
...
Does this work for you?
There are lot of csv excel sheets in a folder. All excel sheet contains data only in first 3 columns. I will select corresponding csv sheet from a lot of csv sheets and then plot it.Here is the code
import os
path = "F:\\Users\\Desktop\\Data\\Summary"
files = []
folder_data = os.listdir(path)
folder_data = [i+"\\" for i in folder_data]
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.csv' in file:
files.append(file)
for i, f in enumerate(files):
print(( i,f))
print('\n'.join(f'{i}-{v}' for i,v in enumerate(files)))
csv_code = str(int(input("Enter corresponding code to plot: ")))
csv_path = path + "\\" + folder_data[csv_code]
df = pd.read_csv(csv_path, header=None)
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
Currently only 1 csv sheet is displayed when i run code so When i run the code i want the Output to be displayed as follows (i mean i want all csv files from the folder to be displayed so that i can select what i want):
0-Test_Summary_1.csv
1-Test_Summary_2.csv
2-Test_Summary_3.csv
3-Test_Summary_4.csv
4-Test_Summary_5.csv
5-Test_Summary_6.csv etc
so that i select the corresponding code like 1 or 2 or 3 to plot.This is the error i am getting
csv_path = path + "\\" + folder_data[csv_code]
TypeError: list indices must be integers or slices, not str
Your input call shouldn't be inside the for loop, because you want to print out all the items first. You can fix it as follows:
print('\n'.join(f'{i}-{v}' for i,v in enumerate(files))) # a shorter version of the loop
csv_code = int(input("Enter corresponding code to plot: "))
csv_path = os.path.join(path, files[csv_code])
df = pd.read_csv(csv_path, header=None)
df1 = df[0:2]
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
Update:
In your new code, you are casting csv_code to string for some reason. Firstly, input() will return a string, so there is no need to do str(int(input())), and secondly, you require it to be a list index. So remove the str cast and leave it at int(input()) as before.
I have the following architecture of the text files in the folders and subfolders.
I want to read them all and create a df. I am using this code, but it dont work well for me as the text is not what I checked and the files are not equivalent to my counting.
l = [pd.read_csv(filename,header=None, encoding='iso-8859-1') for filename in glob.glob("2018_01_01/*.txt")]
main_df = pd.concat(l, axis=1)
main_df = main_df.T
for i in range(2):
l = [pd.read_csv(filename, header=None, encoding='iso-8859-1',quoting=csv.QUOTE_NONE) for filename in glob.glob(str(foldernames[i+1])+ '/' + '*.txt')]
df = pd.concat(l, axis=1)
df = df.T
main_df = pd.merge(main_df, df)
file
Assuming those directories contain txt files in which information have the same structure on all of them:
import os
import pandas as pd
df = pd.DataFrame(columns=['observation'])
path = '/path/to/directory/of/directories/'
for directory in os.listdir(path):
if os.path.isdir(directory):
for filename in os.listdir(directory):
with open(os.path.join(directory, filename)) as f:
observation = f.read()
current_df = pd.DataFrame({'observation': [observation]})
df = df.append(current_df, ignore_index=True)
Once all your files have been iterated, df should be the DataFrame containing all the information in your different txt files.
You can do that using a for loop. But before that, you need to give a sequenced name to all the files like 'fil_0' within 'fol_0', 'fil_1' within 'fol_1', 'fil_2' within 'fol_2' and so on. That would facilitate the use of a for loop:
dataframes = []
import pandas as pd
for var in range(1000):
name = "fol_" + str(var) + "/fil_" + str(var) + ".txt"
dataframes.append(pd.read_csv(name)) # if you need to use all the files at once
#otherwise
df = pd.read_csv(name) # you can use file one by one
It will automatically create dataframes for each file.