Looking to break up and check individual cells from a CSV file that was pulled from Excel with Python 3.8. For example, I have a CSV file with the information Honda 1, Toyota 2, Nissan 3... I want to check each cell (not sure what to call the data before the comma delimiter) for an integer and then I want to remove it but also put it in its own cell. So the CSV would then read Honda, 1, Toyota, 2, Nissan, 3... The main goal would be to get those integers in a column next to the manufacturers in Excel.
I am pretty new to python but have some coding background. The logic I was thinking of would be something along the lines of, if char is int then add to new file else add N/A. My main problem is using the data in a csv file to do it. I thought about putting the data from the csv into a variable but the real csv file has over 20,000 cells so I'm not sure if that would be very efficient.
So far my code looks like this:
import csv
path = '/Users/testFolder/Test.csv'
new_path = '/Users/testFolder/Test2.csv'
test_file = open(path,'r')
data = test_file.read()
write_file = open(new_path,'w')
write_file.write(data)
print(data)
file = csv.reader(open(path), delimiter = ',')
for line in file:
print(line)
test_file.close()
write_file.close()
Assuming the parts of each item are separated by one or more spaces, you can do it a row-at-time (instead of reading the whole file into memory) like this:
import csv
path = 'remove_test.csv'
new_path = 'remove_test2.csv'
with open(path, 'r', newline='') as test_file, \
open(new_path, 'w', newline='') as write_file:
reader = csv.reader(test_file, delimiter=',')
writer = csv.writer(write_file, delimiter=',')
for row in reader:
new_row = [part for item in row for part in item.split()]
writer.writerow(new_row)
Related
I am a beginner of Python and would like to have your opinion..
I wrote this code that reads the only column in a file on my pc and puts it in a list.
I have difficulties understanding how I could modify the same code with a file that has multiple columns and select only the column of my interest.
Can you help me?
list = []
with open(r'C:\Users\Desktop\mydoc.csv') as file:
for line in file:
item = int(line)
list.append(item)
results = []
for i in range(0,1086):
a = list[i-1]
b = list[i]
c = list[i+1]
results.append(b)
print(results)
You can use pandas.read_csv() method very simply like this:
import pandas as pd
my_data_frame = pd.read_csv('path/to/your/data')
results = my_data_frame['name_of_your_wanted_column'].values.tolist()
A useful module for the kind of work you are doing is the imaginatively named csv module.
Many csv files have a "header" at the top, this by convention is a useful way of labeling the columns of your file. Assuming you can insert a line at the top of your csv file with comma delimited fieldnames, then you could replace your program with something like:
import csv
with open(r'C:\Users\Desktop\mydoc.csv') as myfile:
csv_reader = csv.DictReader(myfile)
for row in csv_reader:
print ( row['column_name_of_interest'])
The above will print to the terminal all the values that match your specific 'column_name_of_interest' after you edit it to match your particular file.
It's normal to work with lots of columns at once, so that dictionary method of packing a whole row into a single object, addressable by column-name can be very convenient later on.
To a pure python implementation, you should use the package csv.
data.csv
Project1,folder1/file1,data
Project1,folder1/file2,data
Project1,folder1/file3,data
Project1,folder1/file4,data
Project1,folder2/file11,data
Project1,folder2/file42a,data
Project1,folder2/file42b,data
Project1,folder2/file42c,data
Project1,folder2/file42d,data
Project1,folder3/filec,data
Project1,folder3/fileb,data
Project1,folder3/filea,data
Your python program should read it by line
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
print(row)
# ['Project1', 'folder1/file1', 'data']
If you print the row element you will see it is a list like that
['Project1', 'folder1/file1', 'data']
If I would like to put in my list all elements in column 1, I need to put that element in my list, doing:
a.append(row[1])
Now in list a I will have a list like:
['folder1/file1', 'folder1/file2', 'folder1/file3', 'folder1/file4', 'folder2/file11', 'folder2/file42a', 'folder2/file42b', 'folder2/file42c', 'folder2/file42d', 'folder3/filec', 'folder3/fileb', 'folder3/filea']
Here is the complete code:
import csv
a = []
with open('data.csv') as csv_file:
reader = csv.reader(csv_file, delimiter=',')
for row in reader:
a.append(row[1])
I am trying to combine multiple rows in a csv file together. I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. I have tried to store rows in arrays but it doesn't seem to work. I am using Python to do it.
So lets say I have a csv file;
1,2,3
4,5,6
7,8,9
All I want to do is to have a csv file as this;
1,2,3,4,5,6,7,8,9
The code I have tried is this;
fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
new = line.replace(',', ' ', 1)
fout.write (new)
fin.close()
fout.close()
Could you please help?
You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). The CSV module uses lists of values to represent single rows.
import csv
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')
print(data1)
print(data2)
combined = []
for row in data1:
combined.extend(row)
for row in data2:
combined.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined)
That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. This is the reason that I packed the reading code into the return_contents function; we can repeat the same process millions of times on different files with only one set of code to do the actual reading. Something like this:
import csv
import os
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
all_files = os.listdir('my_csvs')
combined_output = []
for file in all_files:
data = return_contents('my_csvs/{}'.format(file))
for row in data:
combined_output.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined_output)
If you are specially dealing with csv file format. I recommend you to use csv package for the file operations. If you also use with...as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files
Here is what you can do:
PATH = "your folder path"
def order_list():
data_list = []
for filename in os.listdir(PATH):
if filename.endswith(".csv"):
with open("data.csv") as csvfile:
read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
for row in read_csv:
data_list.extend(row)
print(data_list)
if __name__ == '__main__':
order_list()
Store your data in pandas df
import pandas as pd
df = pd.read_csv('file.csv')
Store the modified dataframe into new one
df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column
Write the df to new csv
df2.to_csv("file_modified.csv")
You could do it also like this:
fIn = open("test.csv", "r")
fOut = open("output.csv", "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
I've you want now to run it on multiple file you can run it as script with arguments:
import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
So now expect you use some Linux System and the script is called csvOnliner.py you could call it with:
for i in *.csv; do python csvOnliner.py $i changed_$i; done
With windows you could do it in a way like this:
FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i
I have a .csv file (see image):
In the image there is a time column with datetime strings, I have a program that takes this column and only reads the times H:M:S. Yet, not only in my program I am attempting to take the column to read only the time stamp H:M:S , but I am also attempting to overwrite the time column of the first file and replace it with only the H:M:S time stamp onto a the new .csv with the following code.
CODE:
import csv
import datetime as dt
import os
File = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
with open(File,'r') as csvinput,open(output, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
all = []
row = next(reader)
for line in reader:
row.append(dt.datetime.strptime(line[0],'%m/%d/%Y %H:%M:%S').time())
all.append(row)
for row in reader:
row.append(row[0])
all.append(row)
writer.writerows(all)
The program works, and takes the datetime strings and overwrites the string with the time stamp H:M:S in a new .csv file. However, here is the problem, the output file instead of replacing the time column it replaced every column obtaining an output file that looks like this. See 2nd image:
At this point I don' t really know how to make the new output file to look like the file of the first image, with the format H:M:S in the first column ONLY, not all scrambled like in the second image. Any suggestions?
SCREENSHOT FOR BAH:
See the K column, it should be column A of the first image, and columns B,C,D,E,F,G,I,and J should stay the same like in image 1.
Download LInk of .csv file: http://www.speedyshare.com/z2jwq/HiSAM1-data-160215-164858.csv
The main problem with your code seems that you're keeping appending to the first row the time of each of the line in the csv, which results in the second image posted in the question.
The idea is to keep track of the different lines and modify just the first element of each line. Also, if you want, you should keep the first line, which indicates the labels of the column. For solving the issue, the code would look like:
import csv
import datetime as dt
import os
File = 'C:/Users/Alan Cedeno/Desktop/Test_Folder/HiSAM1_data_160215_164858.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
with open(File,'r') as csvinput,open(output, 'w') as csvoutput:
writer = csv.writer(csvoutput, lineterminator='\n')
reader = csv.reader(csvinput)
rows = [next(reader)]
for line in reader:
line[0] = str(dt.datetime.strptime(line[0],'%m/%d/%Y %H:%M:%S').time())
rows.append(line)
writer.writerows(rows)
Note the list rows has the modified lines from the csvinput.
The resulting output csv file (tested with the first line in the question duplicated) would be
With some simplified data:
#!python3
import csv
import datetime as dt
import os
File = 'data.csv'
root, ext = os.path.splitext(File)
output = root + '-new.csv'
# csv module documents opening with `newline=''` mode in Python 3.
with open(File,'r',newline='') as csvinput,open(output, 'w',newline='') as csvoutput:
writer = csv.writer(csvoutput)
reader = csv.reader(csvinput)
# Copy the header
row = next(reader)
writer.writerow(row)
# Edit the first column of each row.
for row in reader:
row[0] = dt.datetime.strptime(row[0],'%m/%d/%Y %H:%M:%S').time()
writer.writerow(row)
Input:
Time,0.3(/L),0.5(/L)
02/15/2016 13:44:01,88452,16563
02/15/2016 13:44:02,88296,16282
Output:
Time,0.3(/L),0.5(/L)
13:44:01,88452,16563
13:44:02,88296,16282
If actually on Python 2, the csv module documents using binary mode. Replace the with line with:
with open(File,'rb') as csvinput,open(output, 'wb') as csvoutput:
You cannot overwrite a single row in the CSV file. You'll have to write all the rows you want to a new file and then rename it back to the original file name.
Your pattern of usage may fit a database better than a CSV file. Look into the sqlite3 module for a lightweight database.
I have a csv file with 5 columns and I want to add data in a 6th column. The data I have is in an array.
Right now, the code that I have will insert the data I would want in the 6th column only AFTER all the data that already exists in the csv file.
For instance I have:
wind, site, date, time, value
10, 01, 01-01-2013, 00:00, 5.1
89.6 ---> this is the value I want to add in a 6th column but it puts it after all the data from the csv file
Here is the code I am using:
csvfile = 'filename'
with open(csvfile, 'a') as output:
writer = csv.writer(output, lineterminator='\n')
for val in data:
writer.writerow([val])
I thought using 'a' would append the data in a new column, but instead it just puts it after ('under') all the other data... I don't know what to do!
Appending writes data to the end of a file, not to the end of each row.
Instead, create a new file and append the new value to each row.
csvfile = 'filename'
with open(csvfile, 'r') as fin, open('new_'+csvfile, 'w') as fout:
reader = csv.reader(fin, newline='', lineterminator='\n')
writer = csv.writer(fout, newline='', lineterminator='\n')
if you_have_headers:
writer.writerow(next(reader) + [new_heading])
for row, val in zip(reader, data)
writer.writerow(row + [data])
On Python 2.x, remove the newline='' arguments and change the filemodes from 'r' and 'w' to 'rb' and 'wb', respectively.
Once you are sure this is working correctly, you can replace the original file with the new one:
import os
os.remove(csvfile) # not needed on unix
os.rename('new_'+csvfile, csvfile)
csv module does not support writing or appending column. So the only thing you can do is: read from one file, append 6th column data, and write to another file. This shows as below:
with open('in.txt') as fin, open('out.txt', 'w') as fout:
index = 0
for line in fin:
fout.write(line.replace('\n', ', ' + str(data[index]) + '\n'))
index += 1
data is a int list.
I test these codes in python, it runs fine.
We have a CSV file i.e. data.csv and its contents are:
#data.csv
1,Joi,Python
2,Mark,Laravel
3,Elon,Wordpress
4,Emily,PHP
5,Sam,HTML
Now we want to add a column in this csv file and all the entries in this column should contain the same value i.e. Something text.
Example
from csv import writer
from csv import reader
new_column_text = 'Something text'
with open('data.csv', 'r') as read_object, \
open('data_output.csv', 'w', newline='') as write_object:
csv_reader = reader(read_object)
csv_writer = writer(write_object)
for row in csv_reader:
row.append(new_column_text)
csv_writer.writerow(row)
Output
#data_output.csv
1,Joi,Python,Something text
2,Mark,Laravel,Something text
3,Elon,Wordpress,Something text
4,Emily,PHP,Something text
5,Sam,HTML,Something text
The append mode of opening files is meant to add data to the end of a file. what you need to do is provide random access to your file writing. you need to use the seek() method
you can see and example here:
http://www.tutorialspoint.com/python/file_seek.htm
or read the python docs on it here: https://docs.python.org/2.4/lib/bltin-file-objects.html which isn't terribly useful
if you want to add to the end of a column you may want to open the file read a line to figure out it's length then seek to the end.
So I have a text file that looks like this:
1,989785345,"something 1",,234.34,254.123
2,234823423,"something 2",,224.4,254.123
3,732847233,"something 3",,266.2,254.123
4,876234234,"something 4",,34.4,254.123
...
I'm running this code right here:
file = open("file.txt", 'r')
readFile = file.readline()
lineID = readFile.split(",")
print lineID[1]
This lets me break up the content in my text file by "," but what I want to do is separate it into columns because I have a massive number of IDs and other things in each line. How would I go about splitting the text file into columns and call each individual row in the column one by one?
You have a CSV file, use the csv module to read it:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
This still gives you data by row, but with the zip() function you can transpose this to columns instead:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for column in zip(*reader):
Do be careful with the latter; the whole file will be read into memory in one go, and a large CSV file could eat up all your available memory in the process.