Adding a column to existing csv file as an array (Python) - python

Have a csv file called 'data.csv' in the following format:
test1,test2,test3
1,2,3
4,5,6
7,8,9
Given a list in the format ['test4', 4, 7, 10], how can I create a new csv file 'adjusted.csv' with all the data from data.csv and the added column like so:
test1,test2,test3, test4
1,2,3,4
4,5,6,7
7,8,9,10

read lines in
with open('data.csv', 'r') as fi:
lines = [[i.strip() for i in line.strip().split(',')] \
for line in fi.readlines()]
col = ['test4', 4, 7, 10]
Concatenate each row with corresponding element of col. Using enumerate to help keep track of which list index to use.
new_lines = [line + [str(col[i])] for i, line in enumerate(lines)]
Output to file
with open('adjusted.csv', 'w') as fo:
for line in new_lines:
fo.write(','.join(line) + '\n')

I would just treat the csv like the raw text it is. Load in each line, strip off the line break, append the new entry, then put the line break back. This only works if the entries in test4 are guaranteed to be in the same order as the rows in data.csv.
If instead test4 needs to be added to rows based on meeting certain conditions, that would change things a lot. In that case you would probably want to turn both into Pandas dataframes, then perform a proper merge on the required conditions.
test4 = ['test4', 4, 7, 10]
with open(data.csv, 'r') as ifile
with open(adjusted.csv, 'w') as ofile:
for line, new in zip(ifile, test4):
new_line = line.rstrip('\n') + ',' + str(new) + '\n'
ofile.write(new_line)
You can also condense the first two lines into this:
with open(data.csv, 'r') as ifile, open(adjusted.csv, 'w') as ofile:
Do whichever reads more clearly.

Since you're working with csv files use the csv readers and writers to improve readability:
import csv
new_data = ['test4', 4, 7, 10]
with open(r'data.csv', 'r') as in_csv, open(r'adj_data.csv', 'w') as out_csv:
reader = csv.reader(in_csv)
writer = csv.writer(out_csv)
for row, new_col in zip(reader, new_data):
row.append(new_col)
writer.writerow(row)

Related

Checking CSV data for integer then removing that int

Looking to break up and check individual cells from a CSV file that was pulled from Excel with Python 3.8. For example, I have a CSV file with the information Honda 1, Toyota 2, Nissan 3... I want to check each cell (not sure what to call the data before the comma delimiter) for an integer and then I want to remove it but also put it in its own cell. So the CSV would then read Honda, 1, Toyota, 2, Nissan, 3... The main goal would be to get those integers in a column next to the manufacturers in Excel.
I am pretty new to python but have some coding background. The logic I was thinking of would be something along the lines of, if char is int then add to new file else add N/A. My main problem is using the data in a csv file to do it. I thought about putting the data from the csv into a variable but the real csv file has over 20,000 cells so I'm not sure if that would be very efficient.
So far my code looks like this:
import csv
path = '/Users/testFolder/Test.csv'
new_path = '/Users/testFolder/Test2.csv'
test_file = open(path,'r')
data = test_file.read()
write_file = open(new_path,'w')
write_file.write(data)
print(data)
file = csv.reader(open(path), delimiter = ',')
for line in file:
print(line)
test_file.close()
write_file.close()
Assuming the parts of each item are separated by one or more spaces, you can do it a row-at-time (instead of reading the whole file into memory) like this:
import csv
path = 'remove_test.csv'
new_path = 'remove_test2.csv'
with open(path, 'r', newline='') as test_file, \
open(new_path, 'w', newline='') as write_file:
reader = csv.reader(test_file, delimiter=',')
writer = csv.writer(write_file, delimiter=',')
for row in reader:
new_row = [part for item in row for part in item.split()]
writer.writerow(new_row)

How to use with open to filter datafiles in python and create new file?

i have huge csv and i tried to filter data using with open.
I know i can use FINDSTR on command line but i would like to use python to create a new file filtered or i would like to create a pandas dataframe as output.
here is my code:
outfile = open('my_file2.csv', 'a')
with open('my_file1.csv', 'r') as f:
for lines in f:
if '31/10/2018' in lines:
print(lines)
outfile.write(lines)
The problem is that the output file generated is = input file and there is no filter(and the size of file is the same)
Thanks to all
The problem with your code is the indentation of the last line. It should be within the if-statement, so only lines that contain '31/10/2018' get written.
outfile = open('my_file2.csv', 'a')
with open('my_file1.csv', 'r') as f:
for lines in f:
if '31/10/2018' in lines:
print(lines)
outfile.write(lines)
To filter using Pandas and creating a DataFrame, do something along the lines of:
import pandas as pd
import datetime
# I assume here that the date is in a seperate column, named 'Date'
df = pd.read_csv('my_file1.csv', parse_dates=['Date'])
# Filter on October 31st 2018
df_filter = df[df['Date'].dt.date == datetime.date(2018, 10, 31)]
# Output to csv
df_filter.to_csv('my_file2.csv', index=False)
(For very large csv's, look at the pd.read_csv() argument 'chunksize')
To use with open(....) as f:, you could do something like:
import pandas as pd
filtered_list = []
with open('my_file1.csv', 'r') as f:
for lines in f:
if '31/10/2018' in lines:
print(lines)
# Split line by comma into list
line_data = lines.split(',')
filtered_list.append(line_data)
# Convert to dataframe and export as csv
df = pd.DataFrame(filtered_list)
df_filter.to_csv('my_file2.csv', index=False)

writing a text file to a csv file

I have a text file that contains a sentence in each line. Some lines are also empty.
sentence 1
sentence 2
empty line
I want to write the content of this file in a csv file in a way that the csv file has only one column and in each row the corresponding sentence is written. This is what I have tried:
import csv
f = open('data 2.csv', 'w')
with f:
writer = csv.writer(f)
for row in open('data.txt', 'r):
writer.writerow(row)
import pandas as pd
df = pd.read_csv('data 2.csv')
Supposing that I have three sentences in my text file, I want a csv file to have one column with 3 rows. However, when I run the code above, I will get the output below:
[1 rows x 55 columns]
It seems that each character in the sentences is written in one cell and all sentences are written in one row. How should I fix this problem?
So you want to load a text file into a single column of a dataframe, one line per dataframe row. It can be done directly:
with open(data.txt) as file:
df = pd.DataFrame((line.strip() for line in file), columns=['text'])
You can even filter empty lines at read time with filter:
with open(data.txt) as file:
df = pd.DataFrame(filter(lambda x: len(x) > 0, (line.strip() for line in file)),
columns=['text'])
In your code, you iterate through each character in the text file. Try reading line by line through readlines() method:
import csv
f = open('data 2.csv', 'w')
with f:
writer = csv.writer(f)
text_file = open('data.txt', 'r')
for row in text_file.readlines():
writer.writerow(row)

How to get data with python in certain rows and columns

I have data, example :
2017/06/07 10:42:35,THREAT,url,192.168.1.100,52.25.xxx.xxx,Rule-VWIRE-03,13423523,,web-browsing,80,tcp,block-url
2017/06/07 10:43:35,THREAT,url,192.168.1.101,52.25.xxx.xxx,Rule-VWIRE-03,13423047,,web-browsing,80,tcp,allow
2017/06/07 10:43:36,THREAT,end,192.168.1.100,52.25.xxx.xxx,Rule-VWIRE-03,13423047,,web-browsing,80,tcp,block-url
2017/06/07 10:44:09,TRAFFIC,end,192.168.1.101,52.25.xxx.xxx,Rule-VWIRE-03,13423111,,web-browsing,80,tcp,allow
2017/06/07 10:44:09,TRAFFIC,end,192.168.1.103,52.25.xxx.xxx,Rule-VWIRE-03,13423111,,web-browsing,80,tcp,block-url
How to parse that only get data columns 4,5,7, and 12 in all rows?
This is my code :
import csv
file=open('filename.log', 'r')
f=open('fileoutput', 'w')
lines = file.readlines()
for line in lines:
result.append(line.split(' ')[4,5,7,12])
f.write (line)
f.close()
file.close()
The right way with csv.reader and csv.writer objects:
import csv
with open('filename.log', 'r') as fr, open('filoutput.csv', 'w', newline='') as fw:
reader = csv.reader(fr)
writer = csv.writer(fw)
for l in reader:
writer.writerow(v for k,v in enumerate(l, 1) if k in (4,5,7,12))
filoutput.csv contents:
192.168.1.100,52.25.xxx.xxx,13423523,block-url
192.168.1.101,52.25.xxx.xxx,13423047,allow
192.168.1.100,52.25.xxx.xxx,13423047,block-url
192.168.1.101,52.25.xxx.xxx,13423111,allow
192.168.1.103,52.25.xxx.xxx,13423111,block-url
This is wrong:
line.split(' ')[4,5,7,12]
You want this:
fields = line.split(' ')
fields[4], fields[5], fields[7], fields[12]
a solution using pandas
import pandas as pd
df = pd.read_csv('filename.log', sep=',', header=None, index_col=False)
df[[3, 4, 6, 11]].to_csv('fileoutput.csv', header=False, index=False)
Note the use of [3, 4, 6, 11] instead of [4, 5, 7, 12] to account for 0-indexing in the dataframe's columns.
Content of fileoutput.csv:
192.168.1.100,52.25.xxx.xxx,13423523,block-url
192.168.1.101,52.25.xxx.xxx,13423047,allow
192.168.1.100,52.25.xxx.xxx,13423047,block-url
192.168.1.101,52.25.xxx.xxx,13423111,allow
192.168.1.103,52.25.xxx.xxx,13423111,block-url
You're on the right path, but your syntax is off. Here's an example using csv module:
import csv
log = open('filename.log')
# newline='\n' to prevent csv.writer to include additional newline when writing to file
log_write = open('fileoutput', 'w', newline='\n')
csv_log = csv.reader(log, delimiter=',')
csv_writer = csv.writer(log_write, delimiter=',')
for line in csv_log:
csv_writer.writerow([line[0], line[1], line[2], line[3]]) # output first 4 columns
log.close()
log_write.close()
Looking at the list compressions, you could have something like this without necessarily using csv module
file=open('filename.log','r')
f=open('fileoutput', 'w')
lines = file.readlines()
for line in lines:
f.write(','.join(line.split(',')[i] for i in [3,4,6,11]))
f.close()
file.close()
Notice the indices are 3,4,6,11 for our zero index based list
output
cat fileoutput
192.168.1.100,52.25.xxx.xxx,13423523,block-url
192.168.1.101,52.25.xxx.xxx,13423047,allow
192.168.1.100,52.25.xxx.xxx,13423047,block-url
192.168.1.101,52.25.xxx.xxx,13423111,allow
192.168.1.103,52.25.xxx.xxx,13423111,block-url

improve my python program to fetch the desire rows by using if condition

unique.txt file contains: 2 columns with columns separated by tab. total.txt file contains: 3 columns each column separated by tab.
I take each row from unique.txt file and find that in total.txt file. If present then extract entire row from total.txt and save it in new output file.
###Total.txt
column a column b column c
interaction1 mitochondria_205000_225000 mitochondria_195000_215000
interaction2 mitochondria_345000_365000 mitochondria_335000_355000
interaction3 mitochondria_345000_365000 mitochondria_5000_25000
interaction4 chloroplast_115000_128207 chloroplast_35000_55000
interaction5 chloroplast_115000_128207 chloroplast_15000_35000
interaction15 2_10515000_10535000 2_10505000_10525000
###Unique.txt
column a column b
mitochondria_205000_225000 mitochondria_195000_215000
mitochondria_345000_365000 mitochondria_335000_355000
mitochondria_345000_365000 mitochondria_5000_25000
chloroplast_115000_128207 chloroplast_35000_55000
chloroplast_115000_128207 chloroplast_15000_35000
mitochondria_185000_205000 mitochondria_25000_45000
2_16595000_16615000 2_16585000_16605000
4_2785000_2805000 4_2775000_2795000
4_11395000_11415000 4_11385000_11405000
4_2875000_2895000 4_2865000_2885000
4_13745000_13765000 4_13735000_13755000
My program:
file=open('total.txt')
file2 = open('unique.txt')
all_content=file.readlines()
all_content2=file2.readlines()
store_id_lines = []
ff = open('match.dat', 'w')
for i in range(len(all_content)):
line=all_content[i].split('\t')
seq=line[1]+'\t'+line[2]
for j in range(len(all_content2)):
if all_content2[j]==seq:
ff.write(seq)
break
Problem:
but istide of giving desire output (values of those 1st column that fulfile the if condition). i nead somthing like if jth of unique.txt == ith of total.txt then write ith row of total.txt into new file.
import csv
with open('unique.txt') as uniques, open('total.txt') as total:
uniques = list(tuple(line) for line in csv.reader(uniques))
totals = {}
for line in csv.reader(total):
totals[tuple(line[1:])] = line
with open('output.txt', 'w') as outfile:
writer = csv.writer(outfile)
for line in uniques:
writer.writerow(totals.get(line, []))
I will write your code in this way:
file=open('total.txt')
list_file = list(file)
file2 = open('unique.txt')
list_file2 = list(file2)
store_id_lines = []
ff = open('match.dat', 'w')
for curr_line_total in list_file:
line=curr_line_total.split('\t')
seq=line[1]+'\t'+ line[2]
if seq in list_file2:
ff.write(curr_line_total)
Please, avoid readlines() and use the with syntax when you open your files.
Here is explained why you don't need to use readlines()

Categories

Resources