In a loop I adjust the CSV structure of each file.
Now I want them to save in to the assigned folder with unique file names.
I can save to a CSV file, but than CSV file gets overwritten resulting in only the final modified result of the test5 file. I want save the CSV under their own filename+string _modified format.
I have 5 csv files:
Test1.csv
test2.csv
test3.csv
test4.csv
test5.csv
I import them:
for x in allFiles:
print(x)
stop=1
with open(x, 'r') as thecsv:
base=os.path.basename(ROT)
filename=os.path.splitext(base)[0]
print(name)
Now I loop through the files manipulate them and save it as DataFrame.
This is working fine.
Now I want to save each file separately in the output folder with a unique name (filename + _modified)
Output='J:\Temp\Output'
This is what I tried:
df2.to_csv(output+filename+'//_modified.csv'),sep=';',header=False,index=False)
also tried:
df2.to_csv(output(os.path.join(name+'//_modified.csv'),sep=';',header=False,index=False)
Hoping for the output folder looks like this:
test1_modified.csv
test2_modified.csv
test3_modified.csv
test4_modified.csv
test5_modified.csv
I would do something like this, making a new name before the call to write it out:
testFiles = ["test1.csv", "test2.csv", "test3.csv",
"test4.csv", "test5.csv"]
# iterate over each one
for f in testFiles:
# strip old extensions, replace with nothing
f = f.replace(".csv", "")
# I'd use join but you can you +
newName = "_".join([f, "_modified.csv"])
print(newName)
# make your call to write it out
I would also check the pandas docs for writing out, it's simpler than what you're trying:https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_csv.html
import pandas as pd
# read data
iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
# write data to local
iris.to_csv("iris.csv")
I found the solution to my problem
df.to_csv(output+'\'+filename+'.csv',sep=';',header=False,index=False)
Related
I'm quite new to python and encountered a problem: I want to write a script that is capable of starting in a base directory with several folders, which have all the same structure in the subdirectories and are numbered with a control variable (scan00, scan01, ...)
I read out the names of the folders in the directory and store them in a variable called foldernames.
Then, the script should go in each of these folders in a subdirectory where multiple txt files are stored. I store them in the variable called "myFiles"
These txt files consits of 3 columns with float values which are separated with tabs and each of the txt files has 3371 rows (they are all the same in terms of rows and columns).
Now my issue: I want the script to copy only the third column of all txt files and put it into a new txt or csv file. The only exception is the first txt file, there it is important that all three columns are copied to the new file.
In the other files, every third column of the txt files should be copied in an adjacent column in the new txt/csv file.
So I would like to end up with x columns in the in the generated txt/csv file, where x is the number of the original txt files. If possible, I would like to write the corresponding file names in the first line of the new txt/csv file (here defined as column_names).
At the end, each folder should contain a txt/csv file, which contains all single (297) txt files.
import os
import glob
foldernames1 = []
for foldernames in os.listdir("W:/certaindirectory/"):
if foldernames.startswith("scan"):
# print(foldernames)
foldernames1.append(foldernames)
for i in range(1, len(foldernames1)):
workingpath = "W:/certaindirectory/"+foldernames1[i]+"/.../"
os.chdir(workingpath)
myFiles = glob.glob('*.txt')
column_names = ['X','Y']+myFiles[1:len(myFiles)]
files = [open(f) for f in glob.glob('*.txt')]
fout = open ("ResultsCombined.txt", 'w')
for row in range(1, 3371): #len(files)):
for f in files:
fout.write(f.readline().strip().split('\t')[2])
fout.write('\t')
fout.write('\t')
fout.close()
As an alternative I also tried to fix it via a csv file, but I wasn't able to fix my problem:
import os
import glob
import csv
foldernames1 = []
for foldernames in os.listdir("W:/certain directory/"):
if foldernames.startswith("scan"):
# print(foldernames)
foldernames1.append(foldernames)
for i in range(1, len(foldernames1)):
workingpath = "W:/certain direcotry/"+foldernames1[i]+"/.../"
os.chdir(workingpath)
myFiles = glob.glob('*.txt')
column_names = ['X','Y']+myFiles[0:len(myFiles)]
# print(column_names)
with open(""+foldernames1[i]+".csv", 'w', newline='') as target:
writer = csv.DictWriter(target, fieldnames=column_names)
writer.writeheader() # if you want a header
for path in glob.glob('*.txt'):
with open(path, newline='') as source:
reader = csv.DictReader(source, delimiter='\t', fieldnames=column_names)
writer.writerows(reader)
Can anyone help me? Both codes do not deliver what I want. They are reading out something, but not the values I am interesed in. I have the feeling my code has also some issues with float numbers?
Many thanks and best regards,
quester
pathlib and pandas should make the solution here relatively simple even without knowing the specific file names:
import pandas as pd
from pathlib import Path
p = Path("W:/certain directory/")
# recursively search for .txt files inside all sub directories
txt_files = [txt_file for txt_file in p.rglob("*.txt")] # p.iterdir() --> glob("*.txt") for none recursive iteration
df = pd.DataFrame()
for path in txt_files:
# use tab separator, read only 3rd column, name the column, read as floats
current = pd.read_csv(path,
sep="\t",
usecols=[2],
names=[path.name],
dtype="float64")
# add header=0 to pd.read_csv if there's a header row in the .txt files
pd.concat([df, current], axis=1)
df.to_csv("W:/certain directory/floats_third_column.csv", index=False)
Hope this helps!
I try to zip files, I used the example from https://thispointer.com/python-how-to-create-a-zip-archive-from-multiple-files-or-directory/
with ZipFile('sample2.zip', 'w') as zipObj2:
# Add multiple files to the zip
zipObj2.write('sample_file.csv')
sample2.zip is created, but it is empty. Of course that the csv file exists and is not empty.
I run this code from Jupyter Notebook
edit: I'm using relative paths -
input_dir = "../data/example/"
with zipfile.ZipFile(os.path.join(input_dir, 'f.zip'), 'a') as zipObj2:
zipObj2.write(os.path.join(input_dir, 'f.tif'))
you tried to close zip file to save ?
from zipfile import ZipFile
with ZipFile('sample2.zip', 'w') as zipObj2:
zipObj2.write('sample_file.csv')
zipObj2.close()
I'm a little confused by your question, but if I'm correct it sounds like you're trying to place multiple CSV files within a single zipped file? If so, this is what you're looking for:
#initiate files variable that contains the directory from which you wish to zip csv files
files=[f for f in os.listdir("./your_directory") if f.endswith('.csv')]
#initalize empty DataFrame
all_data = pd.DataFrame()
#iterate through the files variable and concatenate them to all_data
for file in files:
df = pd.read_csv('./your_directory' + file)
all_data = pd.concat([all_data, df])
Then call your new DataFrame(all_data) to verify that contents were transferred.
I want to make a script that writes a certain text to the A1 cell of every excel file inside a folder. I'm not sure how to get Python to open every file one by one, make changes to A1, and then overwrite save the original file.
import os
import openpyxl
os.chdir('C:/Users/jdal/Downloads/excelWorkdir')
folderList = os.listdir()
for file in in os.walk():
for name in file:
if name.endswith(".xlsx" and ".xls")
wb = openpyxl.load_workbook()
sheet = wb.get_sheet_by_name('Sheet1')
sheet['A1'] = 'input whatever here!'
sheet['A1'].value
wb.save()
I see following errors in your code:
You have an error in .endswith, it should be
name.endswith((".xlsx",".xls"))
i.e. it needs to be feed with tuple of allowed endings.
Your if lacks : at end and your indentation seems to be broken.
You should deliver one argument to .load_workbook and one argument to .save, i.e. name of file to read/to write.
I would iterate through the folder and use pandas to read the files as temporary data frames. From there, they are easily manipulable.
Assuming you are in the relevant directory:
import pandas as pd
import os
files = os.listdir()
for i in range(len(files)):
if files[i].endswith('.csv'):
# Store file name for future replacement
name = str(files[i])
# Save file as dataframe and edit cell
df = pd.read_csv(files[i])
df.iloc[0,0] = 'changed value'
# Replace file with df
df.to_csv(name, index=False)
I want to open multiple csv files in python, collate them and have python create a new file with the data from the multiple files reorganised...
Is there a way for me to read all the files from a single directory on my desktop and read them in python like this?
Thanks a lot
If you a have a directory containing your csv files, and they all have the extension .csv, then you could use, for example, glob and pandas to read them all in and concatenate them into one csv file. For example, say you have a directory, like this:
csvfiles/one.csv
csvfiles/two.csv
where one.csv contains:
name,age
Keith,23
Jane,25
and two.csv contains:
name,age
Kylie,35
Jake,42
Then you could do the following in Python (you will need to install pandas with, e.g., pip install pandas):
import glob
import os
import pandas as pd
# the path to your csv file directory
mycsvdir = 'csvdir'
# get all the csv files in that directory (assuming they have the extension .csv)
csvfiles = glob.glob(os.path.join(mycsvdir, '*.csv'))
# loop through the files and read them in with pandas
dataframes = [] # a list to hold all the individual pandas DataFrames
for csvfile in csvfiles:
df = pd.read_csv(csvfile)
dataframes.append(df)
# concatenate them all together
result = pd.concat(dataframes, ignore_index=True)
# print out to a new csv file
result.to_csv('all.csv')
Note that the output csv file will have an additional column at the front containing the index of the row. To avoid this you could instead use:
result.to_csv('all.csv', index=False)
You can see the documentation for the to_csv() method here.
Hope that helps.
Here is a very simple way to do what you want to do.
import pandas as pd
import glob, os
os.chdir("C:\\your_path\\")
results = pd.DataFrame([])
for counter, file in enumerate(glob.glob("1*")):
namedf = pd.read_csv(file, skiprows=0, usecols=[1,2,3])
results = results.append(namedf)
results.to_csv('C:\\your_path\\combinedfile.csv')
Notice this part: glob("1*")
This will look only for files that start with '1' in the name (1, 10, 100, etc). If you want everything, change it to this: glob("*")
Sometimes it's necessary to merge all CSV files into a single CSV file, and sometimes you just want to merge some files that match a certain naming convention. It's nice to have this feature!
I know that the post is a little bit old, but using Glob can be quite expensive in terms of memory if you are trying to read large csv files, because you will store all that data into a list in then you'll still have to have enough memory to concatenate the dataframes inside that list into a dataframe with all the data. Sometimes this is not possible.
dir = 'directory path'
df= pd.DataFrame()
for i in range(0,24):
csvfile = pd.read_csv(dir+'/file name{i}.csv'.format(i), encoding = 'utf8')
df = df.append(csvfile)
del csvfile
So, in case your csv files have the same name and have some kind of number or string that differentiates them, you could just do a for loop trough the files and delete them after they are stored in a dataframe variable using pd.append! In this case all my csv files have the same name except they are numbered in a range that goes from 0 to 23.
In one of my directory, I have multiple CSV files. I wanted to read the content of all the CSV file through a python code and print the data but till now I am not able to do so.
All the CSV files have the same number of columns and the same column names as well.
I know a way to list all the CSV files in the directory and iterate over them through "os" module and "for" loop.
for files in os.listdir("C:\\Users\\AmiteshSahay\\Desktop\\test_csv"):
Now use the "csv" module to read the files name
reader = csv.reader(files)
till here I expect the output to be the names of the CSV files. which happens to be sorted. for example, names are 1.csv, 2.csv so on. But the output is as below
<_csv.reader object at 0x0000019F97E0E730>
<_csv.reader object at 0x0000019F97E0E528>
<_csv.reader object at 0x0000019F97E0E730>
<_csv.reader object at 0x0000019F97E0E528>
<_csv.reader object at 0x0000019F97E0E730>
<_csv.reader object at 0x0000019F97E0E528>
if I add next() function after the csv.reader(), I get below output
['1']
['2']
['3']
['4']
['5']
['6']
This happens to be the initials of my CSV files name. Which is partially correct but not fully.
Apart from this once I have the files iterated, how to see the contents of the CSV files on the screen? Today I have 6 files. Later on, I could have 100 files. So, it's not possible to use the file handling method in my scenario.
Any suggestions?
The easiest way I found during developing my project is by using dataframe, read_csv, and glob.
import glob
import os
import pandas as pd
folder_name = 'train_dataset'
file_type = 'csv'
seperator =','
dataframe = pd.concat([pd.read_csv(f, sep=seperator) for f in glob.glob(folder_name + "/*."+file_type)],ignore_index=True)
Here, all the csv files are loaded into 1 big dataframe.
I would recommend reading your CSVs using the pandas library.
Check this answer here: Import multiple csv files into pandas and concatenate into one DataFrame
Although you asked for python in general, pandas does a great job at data I/O and would help you here in my opinion.
till here I expect the output to be the names of the CSV files
This is the problem. csv.reader objects do not represent filenames. They represent lazy objects which may be iterated to yield rows from a CSV file. Or, if you wish to print the entire CSV file, you can call list on the csv.reader object:
for files in os.listdir("C:\\Users\\AmiteshSahay\\Desktop\\test_csv"):
reader = csv.reader(files)
print(list(reader))
if I add next() function after the csv.reader(), I get below output
Yes, this is what you should expect. Calling next on an iterator will give you the next value which comes out of that iterator. This would be the first line of each file. For example:
from io import StringIO
import csv
some_file = StringIO("""1
2
3""")
with some_file as fin:
reader = csv.reader(fin)
print(next(reader))
['1']
which happens to be sorted. for example, names are 1.csv, 2.csv so on.
This is either a coincidence or a correlation between the filename and the contents of the respective file. Calling next(reader) will not output part of a filename.
Apart from this once I have the files iterated, how to see the
contents of the csv files on the screen?
Use the print command, as in the examples above.
Today I have 6 files. Later on, I could have 100 files. So, it's not
possible to use the file handling method in my scenario.
This is not true. You can define a function to print all or part or your csv file. Then call that function in a for loop with filename as an input.
If you want to import your files as separate dataframes, you can try this:
import pandas as pd
import os
filenames = os.listdir("../data/") # lists all csv files in your directory
def extract_name_files(text): # removes .csv from the name of each file
name_file = text.strip('.csv').lower()
return name_file
names_of_files = list(map(extract_name_files,filenames)) # creates a list that will be used to name your dataframes
for i in range(0,len(names_of_files)): # saves each csv in a dataframe structure
exec(names_of_files[i] + " = pd.read_csv('../data/'+filenames[i])")
You can read and store several dataframes into separate variables using two lines of code.
import pandas as pd
datasets_list = ['users', 'calls', 'messages', 'internet', 'plans']
users, calls, messages, internet, plans = [(pd.read_csv(f'datasets/{dataset_name}.csv')) for dataset_name in datasets_list]