Plot graph from the selected excel sheet - python

There are lot of csv excel sheets in a folder. All excel sheet contains data only in first 3 columns. I will select corresponding csv sheet from a lot of csv sheets and then plot it.Here is the code
import os
path = "F:\\Users\\Desktop\\Data\\Summary"
files = []
folder_data = os.listdir(path)
folder_data = [i+"\\" for i in folder_data]
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.csv' in file:
files.append(file)
for i, f in enumerate(files):
print(( i,f))
print('\n'.join(f'{i}-{v}' for i,v in enumerate(files)))
csv_code = str(int(input("Enter corresponding code to plot: ")))
csv_path = path + "\\" + folder_data[csv_code]
df = pd.read_csv(csv_path, header=None)
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
Currently only 1 csv sheet is displayed when i run code so When i run the code i want the Output to be displayed as follows (i mean i want all csv files from the folder to be displayed so that i can select what i want):
0-Test_Summary_1.csv
1-Test_Summary_2.csv
2-Test_Summary_3.csv
3-Test_Summary_4.csv
4-Test_Summary_5.csv
5-Test_Summary_6.csv etc
so that i select the corresponding code like 1 or 2 or 3 to plot.This is the error i am getting
csv_path = path + "\\" + folder_data[csv_code]
TypeError: list indices must be integers or slices, not str

Your input call shouldn't be inside the for loop, because you want to print out all the items first. You can fix it as follows:
print('\n'.join(f'{i}-{v}' for i,v in enumerate(files))) # a shorter version of the loop
csv_code = int(input("Enter corresponding code to plot: "))
csv_path = os.path.join(path, files[csv_code])
df = pd.read_csv(csv_path, header=None)
df1 = df[0:2]
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
Update:
In your new code, you are casting csv_code to string for some reason. Firstly, input() will return a string, so there is no need to do str(int(input())), and secondly, you require it to be a list index. So remove the str cast and leave it at int(input()) as before.

Related

Merging multiple CSV files into 1 in Python but files are appearing unsorted which affects data in date column

I'm trying to merge my 119 csv files into one file through a python code. The only issue I'm facing is that even though I've applied the sort method it isnt working and my files are not ordered , which is causing the date column to be un-ordered. Below is the code, when I run this and open my new csv file "call.sms.merged" it appears that after my 1st csv file, data is inserted or merged from the 10th csv then 100th csv till 109 csv & then it starts to begin from csv 11. I'm attaching an image for better understanding.
file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path + f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []
for file in sorted(file_list):
csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
csv_merged = pd.concat(csv_list, ignore_index=True)
csv_merged.to_csv(file_path + 'calls.sms.merged.csv', index=False)
UN-SORTED DATA
Incorrect order of csv
un-ordered
Python Code and Error :
Python Code Screenshot
Error Screenshot
You can extract the number of each call/file with pandas.Series.str.extract then use pandas.DataFrame.sort_values to make an ascending sort along this column/number.
Try this :
file_path = "C:\\Users\\PROJECT\\Data Set\\SMS Data\\"
file_list = [file_path + f for f in os.listdir(file_path) if f.startswith('call. sms ')]
csv_list = []
for file in file_list:
csv_list.append(pd.read_csv(file).assign(File_Name = os.path.basename(file)))
csv_merged = (
pd.concat(csv_list, ignore_index=True)
.assign(num_call= lambda x: x["File_Name"].str.extract("(\d{1,})", expand=False).astype(int))
.sort_values(by="num_call", ignore_index=True)
.drop(columns= "num_call")
)
csv_merged.to_csv(file_path + 'calls.sms.merged.csv', index=False)

How to write the filename and rowcount in a csv in python?

I've been trying to make a CSV from a big list of another CSVs and here's the deal: I want to get the names of these CSV files and put them in the CSV that I want to create, plus, I also need the row count from the CSV files that I'm getting the names of, here's what I've tried so far:
def getRegisters(file):
results = pd.read_csv(file, header = None, error_bad_lines= False, sep = '\t', low_memory = False)
print(len(results))
return len(results)
path = "C:/Users/gdldieca/Desktop/TESTSFORPANW/New folder"
dirs = os.listdir(path)
with open("C:/Users/gdldieca/Desktop/TESTSFORPANW/New folder/FilesNames.csv", 'w', newline='') as f:
writer = csv.writer(f, delimiter = '\t')
writer.writerow(("File", "Rows"))
for names in dirs:
sfile = getRegisters("C:/Users/gdldieca/Desktop/TESTSFORPANW/New folder/" + str(names))
writer.writerow((names, sfile))
However I can't seem to get the files row count even tho Pandas actually returns it. I'm getting this error:
_csv.Error: iterable expected, not int
The final result would be something like this written into the CSV
File1 90
File2 10
If you are using pandas , I think you can use also for make a csv file with all values that you need..Here an alternative
import os
import pandas as pd
directory='D:\\MY\\PATH\\ALLCSVFILE\\'
#create a list for add all
rows_list = []
for filename in os.listdir(directory):
if filename.endswith(".csv"):
file=os.path.join(directory, filename)
df=pd.read_csv(file)
#Count rows
rowcount=len(df.index)
new_row = {'namefile':filename, 'count':rowcount}
rows_list.append(new_row)
#pass list to dataframe
df1 = pd.DataFrame(rows_list)
print(df1)
df1.to_csv('test.csv', sep=',')
result :

Process .csv files successively and extract rows whenever a specific column has non-empty cells

I am developing a code to process multiple .csv files in a for loop and then extract (into new .csv files) only the rows that match non-empty string cells across a specific column named "20210-2.0". The non-empty string cells are named the same (i.e. 20210-2.0). Here is a screenshot showing part of the csv file:
https://uoe-my.sharepoint.com/:i:/g/personal/gpapanas_ed_ac_uk/EayBblFTHmVJvRfsB6h8Vr4B09IfjQ2L1I5OQKUN2p5wzw?e=2gXW61
import pandas as pd
import glob
import os
path = './'
all_files = glob.glob(path + "/*.csv")
li = []
for filename in all_files:
df = pd.read_csv(filename, index_col=None)
li.append(df)
df = li[li['20201-2.0'].notnull()]
print('extracting info from cvs...')
print(df)
# You can now export all outcomes in new csv files
file_name = filename + 'new' + '.csv'
save_path = os.path.abspath(
os.path.join(
path, file_name
)
)
print('saving ...')
export_csv = df.to_csv(save_path, index=None)
I get the following error:
df = li[li['20201-2.0'].notnull()]
TypeError: list indices must be integers or slices, not str
Inside your loop, after you read the file, automatically filter it, before storing it in a list.
df = pd.read_csv(filename, index_col=None, header = 0) # You read the file in your directory under the variable filename, but it needs to know that you have a column header. Your '20201-2.0' value is a column name right?
df = df[df['20201-2.0'].notnull()] # You now get a new dataframe from the one you load, but now you only got the rows in which the column named '20201-2.0' has been populated.
li.append(df) # Store that dataframe in a list called `li`
I also noticed that in the saving as new csv file you have, you are adding "new" string and ".csv" string in each filname string variable you have.
Have you ran this code? Does it not save your file as "something.csvnew.csv"?

Plot graph from the bunch of selected excel sheet

There are lot of csv excel sheets in a folder. All excel sheet contains data only in first 3 columns. I will select corresponding csv sheet from a lot of csv sheets and then plot it.Here is the code
import os
path = "F:\\Users\\Desktop\\Data\\Summary"
files = []
test_folders = os.listdir(path)
folder_data = os.listdir(path)
# r=root, d=directories, f = files
for r, d, f in os.walk(path):
for file in f:
if '.csv' in file:
files.append(file)
for i, f in enumerate(files):
print("%d-%s"%( i,f))
csv_code = int(input("Enter corresponding code to plot: "))
csv_path = folder_data + "\\" + folder_data[csv_code]
df = pd.read_csv(csv_path, header=None)
df1 = df.iloc[:,0:2]
plt.plot(df1[0], df1[1])
When i run the code i want the Output to be displayed as follows (i mean i want all csv files from the folder to be displayed so that i can select what i want):
0-Test_Summary_1.csv
1-Test_Summary_2.csv
2-Test_Summary_3.csv
3-Test_Summary_4.csv
4-Test_Summary_5.csv
5-Test_Summary_6.csv etc
so that i select the corresponding code like 1 or 2 or 3 to plot.
Here is the error
csv_path = folder_data + "\\" + folder_data[csv_code]
TypeError: can only concatenate list (not "str") to list
folder_data is not a defined variable in the code you have provided? By extensions folder_data[csv_code] isn't either.
Is the line you're receiveing an error on supposed to be:
csv_path = path + "\\" + csv_code
Regarding your actual error message:
folder_data is a list, to which you want to add a string "\\", which does not work. In case you want to append "\\" to each element in the list, you would have to do the following: folder_data = [i+"\\" for i in folder_data]. What you might be wanting to do is use path + "\\" + folder_data[csv_code} instead to get the full path to a single csv file.

Reading text files from subfolders and folders and creating a dataframe in pandas for each file text as one observation

I have the following architecture of the text files in the folders and subfolders.
I want to read them all and create a df. I am using this code, but it dont work well for me as the text is not what I checked and the files are not equivalent to my counting.
l = [pd.read_csv(filename,header=None, encoding='iso-8859-1') for filename in glob.glob("2018_01_01/*.txt")]
main_df = pd.concat(l, axis=1)
main_df = main_df.T
for i in range(2):
l = [pd.read_csv(filename, header=None, encoding='iso-8859-1',quoting=csv.QUOTE_NONE) for filename in glob.glob(str(foldernames[i+1])+ '/' + '*.txt')]
df = pd.concat(l, axis=1)
df = df.T
main_df = pd.merge(main_df, df)
file
Assuming those directories contain txt files in which information have the same structure on all of them:
import os
import pandas as pd
df = pd.DataFrame(columns=['observation'])
path = '/path/to/directory/of/directories/'
for directory in os.listdir(path):
if os.path.isdir(directory):
for filename in os.listdir(directory):
with open(os.path.join(directory, filename)) as f:
observation = f.read()
current_df = pd.DataFrame({'observation': [observation]})
df = df.append(current_df, ignore_index=True)
Once all your files have been iterated, df should be the DataFrame containing all the information in your different txt files.
You can do that using a for loop. But before that, you need to give a sequenced name to all the files like 'fil_0' within 'fol_0', 'fil_1' within 'fol_1', 'fil_2' within 'fol_2' and so on. That would facilitate the use of a for loop:
dataframes = []
import pandas as pd
for var in range(1000):
name = "fol_" + str(var) + "/fil_" + str(var) + ".txt"
dataframes.append(pd.read_csv(name)) # if you need to use all the files at once
#otherwise
df = pd.read_csv(name) # you can use file one by one
It will automatically create dataframes for each file.

Categories

Resources