Create new CSV that excludes rows from old CSV

Create new CSV that excludes rows from old CSV - python

I need guidance on code to write a CSV file that drops rows with specific numbers in the first column [0]. My script writes a file, but it contains the rows that I am working to delete. I suspect that I may have an issue with the spreadsheet being read as one long string rather than ~150 rows.
import csv
Property_ID_To_Delete = {4472738, 4905985, 4905998, 4678278, 4919702, 4472936, 2874431, 4949190, 4949189, 4472759, 4905977, 4905995, 4472934, 4905982, 4906002, 4472933, 4905985, 4472779, 4472767, 4472927, 4472782, 4472768, 4472750, 4472769, 4472752, 4472748, 4472751, 4905989, 4472929, 4472930, 4472753, 4933246, 4472754, 4472772, 4472739, 4472761, 4472778}
with open('2015v1.csv', 'rt') as infile:
with open('2015v1_edit.csv', 'wt') as outfile:
writer = csv.writer(outfile)
for row in csv.reader(infile):
if row[0] != Property_ID_To_Delete:
writer.writerow(row)
Here is the data:
https://docs.google.com/spreadsheets/d/19zEMRcir_Impfw3CuexDhj8PBcKPDP46URZ9OA3uV9w/edit?usp=sharing

You need to check if an id, converted into an integer as you set as integers,
is contained in the ids to delete.
Write the line only if its not contained. You compare the id in the
first column with the whole set of ids to be deleted. A string is always
not equal to a set:
>>> '1' != {1}
True
Therefore, you get all rows in your output.
Change:
if row[0] != Property_ID_To_Delete:
into:
if int(row[0]) not in Property_ID_To_Delete:
EDIT
You need tow write the header of your infile first before trying to convert the first column entry into an integer:
with open('2015v1.csv', 'rt') as infile:
with open('2015v1_edit.csv', 'wt') as outfile:
writer = csv.writer(outfile)
reader = csv.reader(infile)
writer.writerow(next(reader))
for row in reader:
if int(row[0]) not in Property_ID_To_Delete:
writer.writerow(row)

Related

python csv file add to field based off another field

I have a csv file looks like this:
I have a column called “Inventory”, within that column I pulled data from another source and it put it in a dictionary format as you see.
What I need to do is iterate through the 1000+ lines, if it sees the keywords: comforter, sheets and pillow exist than write “bedding” to the “Location” column for that row, else write “home-fashions” if the if statement is not true.
I have been able to just get it to the if statement to tell me if it goes into bedding or “home-fashions” I just do not know how I tell it to write the corresponding results to the “Location” field for that line.
In my script, im printing just to see my results but in the end I just want to write to the same CSV file.
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for line in csv_dict_reader:
if 'comforter' in line['Inventory'] and 'sheets' in line['Inventory'] and 'pillow' in line['Inventory']:
print('Bedding')
print(line['Inventory'])
else:
print('home-fashions')
print(line['Inventory'])

The last column of your csv contains commas. You cannot read it using DictReader.
import re
data = []
with open('test.csv', 'r') as f:
# Get the header row
header = next(f).strip().split(',')
for line in f:
# Parse 4 columns
row = re.findall('([^,]*),([^,]*),([^,]*),(.*)', line)[0]
# Create a dictionary of one row
item = {header[0]: row[0], header[1]: row[1], header[2]: row[2],
header[3]: row[3]}
# Add each row to the list
data.append(item)
After preparing your data, you can check with your conditions.
for item in data:
if all([x in item['Inventory'] for x in ['comforter', 'sheets', 'pillow']]):
item['Location'] = 'Bedding'
else:
item['Location'] = 'home-fashions'
Write output to a file.
import csv
with open('output.csv', 'w') as f:
dict_writer = csv.DictWriter(f, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)

csv.DictReader returns a dict, so just assign the new value to the column:
if 'comforter' in line['Inventory'] and ...:
line['Location'] = 'Bedding'
else:
line['Location'] = 'home-fashions'
print(line['Inventory'])

How to print specific rows in a CSV files which have a specific value in a specific column?

I am new to python. I have a CSV file which I want to print specific row from it I'd appreciate it if you could give me guidance. for example below table I want to print a Row if record Number is 2:
This image shows an example of my case
I have below code as starter which prints out the headers:
with open(filename, "r") as f:
reader = csv.reader(f, delimiter="\t")
first = next(reader)
print(first[0].split(','))
for row in filename:
print()
Thanks!

your example code seems somewhat confused, I presume the file is actually comma separated not tab delimited. otherwise you wouldn't need to do the first[0].split(',').
assuming that's the case, maybe something like this would work:
with open(filename, "r") as f:
reader = csv.reader(f)
# skip header row
header = next(reader)
for row in reader:
if int(row[0]) == 2:
print(row)
if you're after a specific row number, you could use enumerate to count rows and print when you get to the correct one.

In your for loop check if the record number, which is the 0th column, is == 2:
for row in file:
if row[0] == 2:
print(row)

Need help in finding the row of CSV which contains the values in array

I have an array LiveTick = ['ted3m index','US0003m index','USGG3m index'] and I am reading a CSV file book1.csv. I have to find the row which contains the values in csv.
For example, 15th row will contain ted3m index 500 | 600 and 20th row will contain US0003m index 800 | 900 and likewise.
I then have to get the values contained in the row and parse it for each value contained in array LiveTick. How do I proceed? Below is my sample code:
with open('C:\\blp\\book1.csv', 'r') as f:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf)
for row in reader:
for list in LiveTick:
if list in row:
print ('Found: {}'.format(row))

You can use pandas, it's pretty fast and will do all reading, writing and filtering job for you out of the box:
import pandas as pd
df = pd.read_csv('C:\\blp\\book1.csv')
filtered_df = df[df['your_column_name'].isin(LiveTick)]
# now you can save it
filtered_df.to_csv('C:\\blp\\book_filtered.csv')

You have the right idea, but there are a few improvements you can make:
Instead of a nested for loop which doesn't short-circuit, use any to compare the first column to multiple values.
Write to your csv as you go along instead of just print. This is memory-efficient, as you hold in memory only one line at any one time.
Define outf as an open object in your with statement.
Do not shadow built-in list. Use another identifier, e.g. i, for elements in LiveTick.
Here's a demo:
with open('in.csv', 'r') as f, open('out.csv', 'wb', newline='') as outf:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf, delimiter=',')
for row in reader:
if any(i in row[0] for i in LiveTick):
writer.writerow(row)

Replace column in csv with modified column

I got a csv file with a couple of columns and a header containing 4 rows. The first column contains the timestamp. Unfortunately it also gives milliseconds, but whenever those are at 00, they are not given in the file. It looks like that:
"TOA5","CR1000","CR1000","E9048"
"TIMESTAMP","RECORD","BattV_Avg","PTemp_C_Avg"
"TS","RN","Volts","Deg C"
"","","Avg","Avg"
"2015-08-28 12:40:23.51",1,12.91,32.13
"2015-08-28 12:50:43.23",2,12.9,32.34
"2015-08-28 13:12:22",3,12.91,32.54
As I don't need the milliseconds, I want to get rid of those, as this makes further calculations containing time a bit complicated. My approach so far:
Extract first 20 digits in each row to get a format such as 2015-08-28 12:40:23
timestamp = []
with open(filepath) as f:
for _ in xrange(4): #skip 4 header rows
next(f)
for line in f:
time = line[1:20] #Get values for the current line
timestamp.append(time) #Add values to list
From here on I'm struggling on how to procede further. I want to exchange the first column in the csv file with the newly created timestamp list.
I tried creating a dictionary, but I don't know how to use the header caption in row 2 as the key:
d = {}
with open(filepath, 'rb') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=',')
for col in csv_reader:
#use header info from row 2 as key here
This would import the whole csv file into a dict and I'd then change the TIMESTAMP entry in the dict with the timestamp list above. Is this even possible?
Or is there an easier approach on how to just change the first column in the csv with my new list so that my csv file in the end contains the timestamp just without the millisecond information?
So the first column in my csv should look like this:
"TOA5"
"TIMESTAMP"
"TS"
""
2015-08-28 12:40:23
2015-08-28 12:50:43
2015-08-28 13:12:22

This should do it and preserve the quoting:
with open(filepath1, 'rb') as fin, open(filepath2, 'wb') as fout:
reader = csv.reader(fin)
writer = csv.writer(fout, quoting=csv.QUOTE_NONNUMERIC)
for _ in xrange(4): # copy first 4 header rows
writer.writerow(next(reader))
for row in reader: # process data lines
row[0] = row[0][:19] # strip fractional seconds from first column
writer.writerow([row[0], int(row[1])] + map(float, row[2:]))
Since a csv.reader returns the columns of each row as a list of strings, it's necessary to convert any which contain numeric values into their actual int or float numeric value before they're written out to prevent them from being quoted.

I believe you can easily create a new csv from iterating over the original csv and replacing the timestamp as you want.
Example -
with open(filepath, 'rb') as csv_file, open('<new file>','wb') as outfile:
csv_reader = csv.reader(csv_file, delimiter=',')
csv_writer = csv.writer(outfile, delimiter=',')
for i, row in enumerate(csv_reader): #Enumerating as we only need to change rows after 3rd index.
if i <= 3:
csv_writer.writerow(row)
else:
csv_writer.writerow([row[0][1:20]] + row[1:])

I'm not entirely sure about how to parse your csv but I would do something of the sort:
time = time.split(".")[0]
so if it does have a millisecond it would get removed and if it doesn't nothing will happen.

Parsing CSV files using Python 2.7

I'm trying to write a script that will open a CSV file and write rows from that file to a new CSV file based on the match criteria of a unique telephone number in column 4 of csv.csv. The phone numbers are always in column 4, and are often duplicated in the file, however the other columns are often unique, thus each row is inherently unique.
A row from the csv file I'm reading looks like this: (the TN is 9259991234)
2,PPS,2015-09-17T15:44,9259991234,9DF51758-A2BD-4F65-AAA2
I hit an error with the code below saying that '_csv.writer' is not iterable and I'm not sure how to modify my code to solve the problem.
import csv
import sys
import os
os.chdir(r'C:\pTest')
with open(r'csv.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
with open (r'new_csv.csv', 'ab') as new_f:
writer = csv.writer(new_f, delimiter=',')
for row in reader:
if row[3] not in writer:
writer.writerow(new_f)

Your error stems from this expression:
row[3] not in writer
You cannot test for membership against a csv.writer() object. If you wanted to track if you already have processed a phone number, use a separate set() object to track those:
with open(r'csv.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
with open (r'new_csv.csv', 'ab') as new_f:
writer = csv.writer(new_f, delimiter=',')
seen = set()
for row in reader:
if row[3] not in seen:
seen.add(row[3])
writer.writerow(row)
Note that I also changed your writer.writerow() call; you want to write the row, not the file object.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Create new CSV that excludes rows from old CSV - python

Related

python csv file add to field based off another field

How to print specific rows in a CSV files which have a specific value in a specific column?

Need help in finding the row of CSV which contains the values in array

Replace column in csv with modified column

Parsing CSV files using Python 2.7

Categories

Resources