Remove row from CSV that contains empty cell using Python - python

I am splitting a CSV file based on a column with dates into separate files. However, some rows do contain a date but the others cells are empty. I want to remove these rows that contain empty cells from the CSV. But I'm not sure how to do this.
Here's is my code:
csv.field_size_limit(sys.maxsize)
with open(main_file, "r") as fp:
root = csv.reader(fp, delimiter='\t', quotechar='"')
result = collections.defaultdict(list)
next(root)
for row in root:
year = row[0].split("-")[0]
result[year].append(row)
for i,j in result.items():
row_count = sum(1 for row in j)
print(row_count)
file_path = "%s%s-%s.csv"%(src_path, i, row_count)
with open(file_path, 'w') as fp:
writer = csv.writer(fp, delimiter='\t', quotechar='"')
writer.writerows(j)

Pandas is perfect for this, especially if you want this to be easily adjusted to, say, other file formats. Of course one could consider it an overkill.
To just remove rows with empty cells:
>>> import pandas as pd
>>> data = pd.read_csv('example.csv', sep='\t')
>>> print data
A B C
0 1 2 5
1 NaN 1 9
2 3 4 4
>>> data.dropna()
A B C
0 1 2 5
2 3 4 4
>>> data.dropna().to_csv('example_clean.csv')
I leave performing the splitting and saving into separate files using pandas as an exercise to start learning this great package if you want :)

This would skip all all rows with at least one empty cell:
with open(main_file, "r") as fp:
....
for row in root:
if not all(map(len, row)):
continue

Pandas is Best in Python for handling any type of data processing.For help you can go through on link :- http://pandas.pydata.org/pandas-docs/stable/10min.html

Related

Remove blank cells from CSV file with python

I have a text file that I am converting to csv using python. The text file has columns that are set using several spaces. My code strips the line, converts 2 spaces in a row to commas, and then splits the lines again. When I do this, the columns don't line up because there are some columns that have more blank spaces than others. How can I add something to my code that will remove the blank cells in my csv file?
I have tried converting the csv file to a pandas database, but when I run
import pandas as pd
df = pd.read_csv('old.Csv')
delim_whitespace=True
df.to_csv("New.Csv", index=False)
it returns an error ParserError: Error tokenizing data. C error: Expected 40 fields in line 10, saw 42
The code that is stripping the lines and splitting them is
import csv
txtfile = r"Old.txt"
csvfile = r"Old.Csv"
with open(txtfile, 'r') as infile, open(csvfile, 'w', newline='') as outfile:
stripped = (line.strip() for line in infile)
replace = (line.replace(" ", ",") for line in stripped if line)
lines = (line.split(",") for line in replace if infile)
writer = csv.writer(outfile)
writer.writerows(lines)
One solution is to declare column names beforehand, so as to force pandas to data with different number of columns. Something like this should work :
df = pd.read_csv('myfilepath', names = ['col1', 'col2', 'col3'])
You will have to adapt separator and column names / number of columns yourself.
(edited)below code should work for your text file:
a b c d e
=============================
1 qwerty 3 4 5 6
2 ewer e r y i
3 asdfghjkutrehg c v b n
you can try:
import pandas as pd
df = pd.read_fwf('textfile.txt', delimiter=' ', header=0, skiprows=[1])
df.to_csv("New.csv", index=False)
print(df)
Unnamed: 0 a b c d e
0 1 qwerty 3 4 5 6
1 2 ewer e r y i
2 3 asdfghjkutrehg c v b n

Replace particular string in csv python in a particular row and column

I have looked around but didnt quite find what i have problems with.
The question i have about is that in the csv files called registro_usuarios.csv that has the following data. Which has 6 rows and 6 colmuns
registro_usuarios.csv:
RUT,Name,LastName,E-mail,ISBN1,ISBN2
111,Pablo1,Alar1,mail1,0,0
222,Pablo2,Alar2,mail2,0,0
333,Pablo3,Alar3,mail3,0,0
444,Pablo4,Alar4,mail4,0,0
555,Pablo5,Alar5,mail5,0,0
Now how do can i make a def that allows me to replace a 0 below ISBN1 or ISBN2 for a given RUT?... For example, i want to replace the ISBN1 of the rut 333 for 777. Then after using the def, it should change the data in the csv like this. since rows and columns starts with 0,0 i believe then ISBN1 of the rut 333 is row 3 column 5 if im not mistaken.
registro_usuarios.csv:
RUT,Nombre,Apellido,E-mail,ISBN1,ISBN2
111,Pablo1,Alar1,mail1,0,0
222,Pablo2,Alar2,mail2,0,0
333,Pablo3,Alar3,mail3,777,0
444,Pablo4,Alar4,mail4,0,0
555,Pablo5,Alar5,mail5,0,0
This is the simplest way I could think of:
>>> import csv
>>> with open('data.csv') as f:
... data = [r for r in csv.reader(f)]
...
>>> data[3][4] = '777'
>>> with open('data.csv', 'w') as f:
... csv.writer(f).writerows(data)
...
As you mentioned you want to alter a row with an specific RUT. In that case I would use DictReader/DictWriter and a function to change the row based on the RUT:
import csv
def change_by_rut(csv_data, rut, new_isbn1):
for row in csv_data:
if row['RUT'] == rut:
row['ISBN1'] = new_isbn1
with open('data.csv') as f:
data = [r for r in csv.DictReader(f)]
change_by_rut(data, '333', '777')
with open('data.csv', 'w') as f:
writer_obj = csv.DictWriter(f, fieldnames=data[0].keys())
writer_obj.writeheader()
writer_obj.writerows(data)
You could use pandas too:
# recreate file
from pathlib import Path
Path('temp.txt').write_text("""RUT,Name,LastName,E-mail,ISBN1,ISBN2
111,Pablo1,Alar1,mail1,0,0
222,Pablo2,Alar2,mail2,0,0
333,Pablo3,Alar3,mail3,0,0
444,Pablo4,Alar4,mail4,0,0
555,Pablo5,Alar5,mail5,0,0""")
import pandas as pd
df = pd.read_csv('temp.txt', index_col=0)
df.loc[333,'ISBN1'] = 777
print(df)
# export example
df.to_csv('temp2.txt')

How to sort nth column columns in text file

So my data looks like this:
1 3456542 5 may 2014
2 1245678 4 may 2014
3 4256876 2 may 2014
4 5643156 6 may 2014
.....
I want to sort the 2nd column of 7 digit ID numbers from greatest to least. Also depending on the first number in the ID number I'd like to send each row to a different text file (i.e. for all the ID numbers that start with 3, send that entire row into a text file, for all the ID numbers that start with 1 send that entire row to another text file... and so on). What is the easiest way to accomplish something like this?
You could try using pandas. That makes it really easy.
import pandas as pd
import sys
if sys.version_info[0] < 3:
from StringIO import StringIO
else:
from io import StringIO
txt = StringIO('''
a b c d e
1 3456542 5 may 2014
2 1245678 4 may 2014
3 4256876 2 may 2014
4 5643156 6 may 2014
''')
df = pd.read_csv(txt, delim_whitespace=True)
df.sort('b', ascending=False)
Assuming that your input data is text, I would start by separating lines from each other and columns within lines. See the str.split() function for this.
The result should be a list of lists. You can then sort by the second column with the sort() or sorted() function if you provide the keyword argument key=. You might have to convert the number columns to int so that they will be sorted from small to large (and not alphabetical).
For the last part of your question, you could use itertools.groupby() which provides you with a grouping functionality like you requested.
This should get you started. Another option would be to use pandas.
"I wasn't asking for an answer, I was asking where to start conceptually."
Start reading the text file using file.readlines, split the data using line.strip().split(" ", 2) wich will give you data in the following format:
['1', '3456542', ' 5 may 2014']
Now you should be able to complete your task.
Hint: look up the builtin functions int() and sorted().
Heres my way of doing it:
import csv
from operator import itemgetter
#read in file
file_lines = []
with open("test.txt", "r") as csv_file:
reader = csv.reader(csv_file, delimiter=" ")
for row in reader:
file_lines.append(row)
#sort
file_lines.sort(key=itemgetter(1))
#write sorted file
with open("test_sorted.txt", "w") as csv_file:
writer = csv.writer(csv_file, delimiter=" ")
for row in file_lines:
writer.writerow(row)
#separate files
for row in file_lines:
file_num = row[1][0]
with open("file_{0}.txt".format(file_num), "w") as f:
writer = csv.writer(f, delimiter=" ")
writer.writerow(row)

how to add new col at the end of csv/txt file python

I am looking for a script to add a new data column into existing csv file by python. I have a file (e.g. file.csv) which will have many rows and few columns. From for loop calculation, I got a new array (A in my code here). I want to append that new array (A) as the last column of existing csv file. I used the below code.
for xxx in xxxx:
A= xxx
f=open("file.csv")
data=[item for item in csv.reader(f)]
f.close()
new_column=[A]
new_data=[]
for i, item in enumerate (data):
try:
item.append (new_column[i])
except IndexError, e:
item.append(A)
new_data.append(item)
f=open('outfilefinal1.csv','w')
csv.writer(f).writerows(new_data)
f.close()
It did append a new column as the last column. But the problem is the whole column got one same value (A value form the last loop). So, how can I do if I want the A value from each for loop as my last column. Thanks.
Example input file
1 2
2 4
0 9
4 8
A value from each loop
3
4
0
9
So the final file should show
1 2 3
2 4 4
0 9 0
4 8 9
but in my case it shows as
1 2 9
2 4 9
0 9 9
4 8 9
Problem with your code is that you are overwriting your file inside a nested for loop.
Is A really an array that does not depend on file.csv? Then you could do something like this:
import csv
A = compute_new_values_list()
with open('file.csv') as fpi, open('out.csv', 'w') as fpo:
reader = csv.reader(fpi)
writer = csv.writer(fpo)
#optionaly handle csv header
headers = next(reader)
headers.append('new_column')
writer.writerow(headers)
for index, row in enumerate(reader):
row.append(A[index])
writer.writerow(row)
EDIT:
If you need a row of file.csv to compute your new value, you can use the same code, just compute your new value inside for loop:
import csv
with open('file.csv') as fpi, open('out.csv', 'w') as fpo:
reader = csv.reader(fpi)
writer = csv.writer(fpo)
#optionaly handle csv header
headers = next(reader)
headers.append('new_column')
writer.writerow(headers)
for row in reader:
new_value = compute_from_row(row)
row.append(new_value)
writer.writerow(row)

Merge 2 csv file with one unique column but different header [duplicate]

This question already has answers here:
Merging two CSV files using Python
(2 answers)
Closed 7 years ago.
I want to merge 2 csv file using some scripting language (like bash script or python).
1st.csv (this data is from mysql query)
member_id,name,email,desc
03141,ej,ej#domain.com,cool
00002,jes,jes#domain.com,good
00002,charmie,charm#domain.com,sweet
2nd.csv (from mongodb query)
id,address,create_date
00002,someCity,20150825
00003,newCity,20140102
11111,,20150808
The examples are not the actual, though i know that some of the member_id from qsl and the id from mongodb are the same.
(*and i wish my output will be something like this)
desiredoutput.csv
meber_id,name,email,desc,address,create_date
03141,ej,ej#domain.com,cool,,
00002,jes,jes#domain.com,good,someCity,20150825
00002,charmie,charm#domain.com,sweet,
11111,,,,20150808
help will be much appreciated. thanks in advance
#########################################################################
#!/usr/bin/python
import csv
import itertools as IT
filenames = ['1st.csv', '2nd.csv']
handles = [open(filename, 'rb') for filename in filenames]
readers = [csv.reader(f, delimiter=',') for f in handles]
with open('desiredoutput.csv', 'wb') as h:
writer = csv.writer(h, delimiter=',', lineterminator='\n', )
for rows in IT.izip_longest(*readers, fillvalue=['']*2):
combined_row = []
for row in rows:
row = row[:1] # column where 1 know there are identical data
if len(row) == 1:
combined_row.extend(row)
else:
combined_row.extend(['']*1)
writer.writerow(combined_row)
for f in handles:
f.close()
#########################################################################
just read and tried this code(manipulate) in this site too
Since you haven't posted an attempt, I'll give you a general answer (using Python) to get you started.
Create a dict, d
Iterate over all the rows of the first file, convert each row into a list and store it in d using meber_id as the key and the list as the value.
Iterate over all the rows of the second file, convert each row into a list leaving out the id column and update the list under d[id] with the new list if d[id] exists, otherwise store the new list under d[id].
Finally, iterate over the values in d and print them out comma separated to a file.
Edit
In your attempt, you are trying to use izip_longest to iterate over the rows of both files at the same time. But this would work only if there were an equal number of rows in both files and they were in the same order.
Anyhow, here is one way of doing it.
Note: This is using the Python 3.4+ csv module. For 2.7 it might look a little different.
import csv
d = {}
with open("file1.csv", newline="") as f:
for row in csv.reader(f):
d.setdefault(row[0], []).append(row + [""] * 3)
with open("file2.csv", newline="") as f:
for row in csv.reader(f):
old_row = d.setdefault(row[0][0], [row[0], "", "", ""])
old_row[4:] = row[1:]
with open("out.csv", "w", newline="") as f:
writer = csv.writer(f)
for rows in d.values():
writer.writerows(rows)
Here goes a suggestion using pandas I've got from this answer and pandas doc about merging.
import pandas as pd
first = pd.read_csv('1st.csv')
second = pd.read_csv('2nd.csv')
merged = pd.concat([first, second], axis=1)
This will output:
meber_id name email desc id address create_date
3141 ej ej#domain.com cool 2 someCity 20150825
2 jes jes#domain.com good 11 newCity 20140102
11 charmie charm#domain.com sweet 11111 NaN 20150808

Categories

Resources