removing items from current csv and saving it into another csv file - python

I have a csv file which has 1000 entries (it is delimitered by a tab). I've only listed the first few.
Unique ID Name
0 60ff3ads Keith
1 C6LSI545 Shawn
2 O87SI523 Baoru
3 OM022SSI Naomi
4 3LLS34SI Alex
5 Z7423dSI blahblah
I want to remove the some of these entries by their index number from this csv file and save it into another csv file.
I've not started writing any codes for this yet because i'm not sure how i should go about doing it.. Please kindly advise.

A one-liner to solve your problem:
import pandas as pd
indexes_to_drop = [1, 7, ...]
pd.read_csv('original_file.csv', sep='\t').drop(indexes_to_drop, axis=0).to_csv('new_file.csv')
check the read_csv doc to accommodate for your particular CSV flavor if needed

The sample data suggests a tab delimitered file. You could open the input file with a csv.reader, and open an output file with csv.writer. It will be slightly simpler, however, if you simply use split() to grab the first field (index) and compare it with those indices that you want to filter out.
indices_to_delete = ['0', '3', '5']
with open('input.csv') as infile, open('output.csv', 'w') as outfile:
for line in infile:
if line.split()[0] not in indices_to_delete:
outfile.write(line)
This could be reduced to this:
with open('c.csv') as infile, open('output.csv', 'w') as outfile:
outfile.writelines(line for line in infile
if line.split()[0] not in indices_to_delete)
And that should do the trick in this case for the sort of data that you posted. If you find that you need to compare values in other fields containing whitespace, you should consider the csv module.

I don't think it is possible to remove lines. However, you could write two new files. So go over each row of the original csv. Next, for each row save it to csv-A or to csv-B. That way you end up with two seperated csvfiles.
More info here: How to Delete Rows CSV in python

Related

How to replace characters in a csv file

I'm doing some measurements in the lab and want to transform them into some nice Python plots. The problem is the way the software exports CSV files, as I can't find a way to properly read the numbers. It looks like this:
-10;-0,0000026
-8;-0,00000139
-6;-0,000000546
-4;-0,000000112
-2;-5,11E-09
0,0000048;6,21E-09
2;0,000000318
4;0,00000304
6;0,0000129
8;0,0000724
10;0,000268
Separation by ; is fine, but I need every , to be ..
Ideally I would like Python to be able to read numbers such as 6.21E-09 as well, but I should be able to fix that in excel...
My main issue: Change every , to . so Python can read them as a float.
The simplest way would be for you to convert them to string and then use the .replace() method to pretty much do anything. For i.e.
txt = "0,0000048;6,21E-09"
txt = txt.replace(';', '.')
You could also read the CSV file (I don't know how you are reading the file) but depending on the library, you could change the 'delimiter' (to : for example). CSV is Comma-separated values and as the name implies, it separates columns by means of '.
You can do whatever you want in Python, for example:
import csv
with open('path_to_csv_file', 'r') as csv_file:
data = list(csv.reader(csv_file, delimiter=';'))
data = [(int(raw_row[0]), float(raw_row[1].replace(',', '.'))) for row in data]
with open('path_to_csv_file', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=';')
writer.writerows(data)
Can you consider a regex to match the ',' all in the text, then loop the match results in a process that takes ',' to '.'.

How to print certain specific character from row to column?

I have an input file that contains data in the same format repeatedly across 5 rows. I need to format this data into one row (CSV file) and have only few fields relevant to me. How do i achieve the mentioned output with the input file provided.
Note - I'm very new to learning any language and haven't reached to this depth of details yet to write my own. I have already written the code where i'm importing the input file, reaching to a specif word and then printing the rest of the data(this is where i need help as i don't need all the information in the input as using space is delimiter is not giving the output in correct columns). I have also written the code to write the output in a csv file.
Note 2 - I'm very to this forum as well and kindly excuse me in case i have made any posting in posting my query.
Input -
Input File
Output -
Output File
import itertools, csv
You should read in the file and parse it manually, then use the csv module to write it to a .csv file:
import re
with open('myfile.txt', 'r') as f:
lines = f.readlines()
# divide on whitespace characters, but not single spaces
lines = [re.split("\s\s+", line) for line in lines]
with open('output.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL)
for line in lines:
writer.writerow(lines)
But this will include every piece of data. You can iterate through lines and remove the fields you don't want to keep. So before you do the csv writing, you could do:
def filter_line(line):
# see how the input file was parsed
print(line)
# for example, only keep the first 2 columns
return [line[0], line[1]]
lines = [filter_line(line) for line in lines]

How to read CSV with column with more than one element in Python

I have the following CSV file:
id;name;duration;predecessors;
10;A;7;;
20;B;10;10;
25;B2;3;10;
30;C;5;10;
40;D;5;20,30, 25;
That is, the last row, in the fourth column I have three elements (20,30,25) separated by comma.
I have the following code:
csv_file = open(path_to_csv, 'r')
csv_file_reader = csv.reader(csv_file, delimiter=',')
first_row = True
for row in csv_file_reader :
if not first_row:
print(row)
else :
first_row = False
but I get a weird output:
['10;A;7;;']
['20;B;10;10;']
['25;B2;3;10;']
['30;C;5;10;']
['40;D;5;20', '30', ' 25;']
Any ideas?
Thanks in advance
You have specified CSV in your description, which stands for Comma Separated Values. However, your data uses semicolons.
Consider specifying the delimiter as ; for the CSV library:
with open(path_to_csv, 'r') as csv_file:
csv_file_reader = csv.reader(csv_file, delimiter=';')
...
And while we're here, note the change to using the with statement to open the file. The with statement allows you to open the file in a language-robust manner. No matter what happens (exception, quit, etc.), Python guarantees that the file will be closed and all resources accounted for. You don't need to close the file, just exit the block (unindent). It's "Pythonic" and a good habit to get into.
โœ“ #Antonio, I appreciate the above answer. As we know CSV is a file with comma separated values and Python's csv module works based on this, by default.
โœ“ No problem, you can still read from it without using csv module.
โœ“ Based on your provided input in problem I have written another simple solution without using any Python module to read CSVs (it's ok for simple tasks).
Please read, try and comment if you are not satisfied with the code or if it fails for some of your test cases.I will modify and make it workable.
ยป Data.csv
id;name;duration;predecessors;
10;A;7;;
20;B;10;10;
25;B2;3;10;
30;C;5;10;
40;D;5;20,30, 25;
Now, have a look at the below code (that finds and prints all the lines with 4th column having more than one elements):
with open ("Data.csv") as csv_file:
for line in csv_file.readlines()[1:]:
arr = line.strip().split(";")
if len(arr[3].split(",") )> 1:
print(line) # 40;D;5;20,30, 25;

Compare CSV Values Against Another CSV And Output Results

I have a csv containing various columns (full_log.csv). One of the columns is labeled "HASH" and contains the hash value of the file shown in that row. For Example, my columns would have the following headers:
Filename - Hash - Hostname - Date
I need my python script to take another CSV (hashes.csv) containing only 1 column of multiple hash values, and compare the hash values against my the HASH column in my full_log.csv.
Anytime it finds a match I want it to output the entire row that contains the hash to an additional CSV (output.csv). So my output.csv will contain only the rows of full_log.csv that contain any of the hash values found in hashes.csv, if that makes sense.
So far I have the following. It works for the hash value that I manually enter in the script, but now I need it to look at hashes.csv to compare instead of manually putting the hash in the script, and instead of printing the results I need to export them to output.csv.
import csv
with open('full_log.csv', 'rb') as input_file1:
reader = csv.DictReader(input_file1)
rows = [row for row in reader if row ['HASH'] == 'FB7D9605D1A38E38AA4C14C6F3622E5C3C832683']
for row in rows:
print row
I would generate a set from the hashes.csv file. Using membership in that set as a filter, I would iterate over the full_log.csv file, outputting only those lines that match.
import csv
with open('hashes.csv') as hashes:
hashes = csv.reader(hashes)
hashes = set(row[0] for row in hashes)
with open('full_log.csv') as input_file:
reader = csv.DictReader(input_file)
with open('output.csv', 'w') as output_file:
writer = csv.DictWriter(output_file, reader.fieldnames)
writer.writeheader()
writer.writerows(row for row in reader if row['Hash'] in hashes)
look at pandas lib for python:
http://pandas.pydata.org/pandas-docs/stable/
it has various helpful function for your question, easily read, transform and write to csv file
Iterating through the rows of files and hashes and using a filter with any to return matches in the collection of hashes:
matching_rows = []
with open('full_log.csv', 'rb') as file1, open('hashes.csv', 'rb') as file2:
reader = csv.DictReader(file1)
hash_reader = csv.DictReader(file2)
matching_rows = [row for row in reader if any(row['Hash'] == r['Hash'] for r in hash_reader)]
with open('output.csv', 'wb') as f:
writer = csv.DictWriter(f)
writer.writerows(matching_rows)
I am a bit unclear as to exactly how much help that you require in solving this. I will assume that you do not need a full solution, but rather, simply tips on how to craft your solution.
First question, which file is larger? If you know that hashes.csv is not too large, meaning it will fit in memory with no problem, then I would simply suck that file in one line at a time and store each hash entry in a Set variable. I won't provide full code, but the general structure is as follows:
hashes = Set()
for each line in the hashes.csv file
hashes.add(hash from the line)
Now, I believe you to already know how to read a CSV file, since you have an example above, but, what you want to do is to now iterate through each row in the full log CSV file. For each of those lines, do not check to see if the hash is a specific value, instead, check to see if that value is contained in the hashes variable. if it is, then use the CSV writer to write the single line to a file.
The biggest gotcha, I think, is knowing if the hashes will always be in a particular case so that you can perform the compare. For example, if one file uses uppercase for the HASH and the other uses lowercase, then you need to be sure to convert to use the same case.

How to import data from a CSV file and store it in a variable?

I am extremely new to python 3 and I am learning as I go here. I figured someone could help me with a basic question: how to store text from a CSV file as a variable to be used later in the code. So the idea here would be to import a CSV file into the python interpreter:
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
...
and then extract the text from that file and store it as a variable (i.e. w = ["csv file text"]) to then be used later in the code to create permutations:
print (list(itertools.permutations(["w"], 2)))
If someone could please help and explain the process, it would be very much appreciated as I am really trying to learn. Please let me know if any more explanation is needed!
itertools.permutations() wants an iterable (e.g. a list) and a length as its arguments, so your data structure needs to reflect that, but you also need to define what you are trying to achieve here. For example, if you wanted to read a CSV file and produce permutations on every individual CSV field you could try this:
import csv
with open('some.csv', newline='') as f:
reader = csv.reader(f)
w = []
for row in reader:
w.extend(row)
print(list(itertools.permutations(w, 2)))
The key thing here is to create a flat list that can be passed to itertools.permutations() - this is done by intialising w to an empty list, and then extending its elements with the elements/fields from each row of the CSV file.
Note: As pointed out by #martineau, for the reasons explained here, the file should be opened with newline='' when used with the Python 3 csv module.
If you want to use Python 3 (as you state in the question) and to process the CSV file using the standard csv module, you should be careful about how to open the file. So far, your code and the answers use the Python 2 way of opening the CSV file. The things has changed in Python 3.
As shengy wrote, the CSV file is just a text file, and the csv module gets the elements as strings. Strings in Python 3 are unicode strings. Because of that, you should open the file in the text mode, and you should supply the encoding. Because of the nature of CSV file processing, you should also use the newline='' when opening the file.
Now extending the explanation of Burhan Khalid... When reading the CSV file, you get the rows as lists of strings. If you want to read all content of the CSV file into memory and store it in a variable, you probably want to use the list of rows (i.e. list of lists where the nested lists are the rows). The for loop iterates through the rows. The same way the list() function iterates through the sequence (here through the sequence of rows) and build the list of the items. To combine that with the wish to store everything in the content variable, you can write:
import csv
with open('some.csv', newline='', encoding='utf_8') as f:
reader = csv.reader(f)
content = list(reader)
Now you can do your permutation as you wish. The itertools is the correct way to do the permutations.
import csv
data = csv.DictReader(open('FileName.csv', 'r'))
print data.fieldnames
output = []
for each_row in data:
row = {}
try:
p = dict((k.strip(), v) for k, v in p.iteritems() if v.lower() != 'null')
except AttributeError, e:
print e
print p
raise Exception()
//based on the number of column
if p.get('col1'):
row['col1'] = p['col1']
if p.get('col2'):
row['col2'] = p['col2']
output.append(row)
Finally all data stored in output variable
Is this what you need?
import csv
with open('some.csv', 'rb') as f:
reader = csv.reader(f, delimiter=',')
rows = list(reader)
print('The csv file had {} rows'.format(len(rows)))
for row in rows:
do_stuff(row)
do_stuff_to_all_rows(rows)
The interesting line is rows = list(reader), which converts each row from the csv file (which will be a list), into another list rows, in effect giving you a list of lists.
If you had a csv file with three rows, rows would be a list with three elements, each element a row representing each line in the original csv file.
If all you care about is to read the raw text in the file (csv or not) then:
with open('some.csv') as f:
w = f.read()
will be a simple solution to having w="csv, file, text\nwithout, caring, about columns\n"
You should try pandas, which work both with Python 2.7 and Python 3.2+ :
import pandas as pd
csv = pd.read_csv("your_file.csv")
Then you can handle you data easily.
More fun here
First, a csv file is a text file too, so everything you can do with a file, you can do it with a csv file. That means f.read(), f.readline(), f.readlines() can all be used. see detailed information of these functions here.
But, as your file is a csv file, you can utilize the csv module.
# input.csv
# 1,david,enterprise
# 2,jeff,personal
import csv
with open('input.csv') as f:
reader = csv.reader(f)
for serial, name, version in reader:
# The csv module already extracts the information for you
print serial, name, version
More details about the csv module is here.

Categories

Resources