I have a CSV file:It contain the classes name and type of code smell and for each class Icalculated the number of a code smell .the final calcul is on the last line so there are many repeated classes name .
I need just the last line of the class name.
This is a part of my CSV file beacause it's too long :
NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,1,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN
To filter out the last entry for groups of NameOfClass, you can make use of Python's groupby() function to return lists of rows with the same NameOfClass. The last entry from each can then be written to a file.
from itertools import groupby
import csv
with open('data_in.csv', newline='') as f_input, open('data_out.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for key, rows in groupby(csv_input, key=lambda x: x[0]):
csv_output.writerow(list(rows)[-1])
For the data you have given, this would give you the following output:
NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN
To get just the unique class names (ignoring repeated rows, not deleting them), you can do this:
import csv
with open('my_file.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
classNames = set(row[0] for row in reader)
print(classNames)
# {'com.nirhart.shortrain.MainActivity', 'com.nirhart.shortrain.path.PathParser', 'com.nirhart.shortrain.path.PathPoint', ...}
This is just using the csv module to open a file, getting the first value in each row, and then taking only the unique values of those. You can then manipulate the resulting set of strings (you might want to cast it back to a list via list(classNames)) however you need to.
If you intend to later process the data in pandas, filtering duplicates is trivial:
import pandas as pd
df = pd.read_csv('file.csv')
df = df.loc[~df.NameOfClass.duplicated(keep='last')]
If you just want to build a new csv file with only the expected lines, pandas is overkill and the csv module is enough:
import csv
with open('file.csv') as fdin, file('new_file.csv', 'w', newline='') as fdout:
rd = csv.reader(fdin)
wr = csv.writer(fdout)
wr.writerow(next(rd)) # copy the header line
old = None
for row in rd:
if old is not None and old[0] != row[0]:
wr.writerow(old)
old = row
wr.writerow(old)
I have a csv file as follow:
lat,lon,date,data1,data2
1,2,3,4,5
6,7,8,9,10
From this csv file I want to retrieve and extract the column date and data1 to another csv file. I have the following code:
import csv
os.chdir(mydir)
column_names = ["date", "data1"]
index=[]
with open("my.csv", "r") as f:
mycsv = csv.DictReader(f)
for row in mycsv:
for col in column_names:
try:
data=print(row[col])
with open("test2.txt", "w") as f:
print(data, file=f)
except KeyError:
pass
Unfortunately, the output is a file with a "none" on it... Does anyone knows how to retrieve and write to another file the data I wish to use?
There are a few issues with your code:
Everytime you open("test2.txt", "w"), w option will open your file and delete all its contents.
You are storing return value or print, which is None and then trying to print this into yout file
Read your CSV into a list of dict's, as below:
import csv
with open('your_csv.csv') as csvfile:
reader = csv.DictReader(csvfile)
read_l = [{key:value for key, value in row.items() if key in ('date', 'data1')}
for row in reader]
and then use DictWriter to write to a new CSV.
with open('new.csv', 'w') as csvfile:
fieldnames = read_l[0].keys()
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in read_l[1:]:
writer.writerow(row)
Try with below steps may help you. But they require pandas library.Install pandas library before you go for below steps. input.csv contains data that you have mentioned.
import pandas as pd
df=pd.read_csv('input.csv')
df_new=df.iloc[0:,2:4]
df_new.to_csv("output.csv",index=False)
The reason why you see None in your file is because you're assigning the result of print(row[col]) to your data variable:
data=print(row[col])
print() doesn't return anything, therefore the content of data is None. If you remove the print() and just have data = row[col], you will get something valuable.
There is one more issue that I see in your code, which you probably want to get fixed:
You're opening the file over and over again with each iteration in the first loop. Therefore, with each row you're overwriting the entire file with that rows value. If you want the entire column, then you'd have open the file once, before the loop.
I will recommend you should use panda. I haven't run this script but something like this should work.
import panda as pd
import csv
frame = pd.read_csv('my.csv')
df=frame[['date','data2']]
with open('test2.csv', 'a', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(df)
import pandas as pd
df = pd.read_csv("my.csv") #optional "header"=True
new_df = df[["date","data1"]]
new_df.to_csv("new_csv_name.csv")
#if you don't need index
new_df.to_csv('new_csv_name.csv', index=False)
I have a file "TAB.csv" with many columns. I would like to choose one column without header (index of that column is 3) from CSV file. Then create a new text file "NEW.txt" and write there that column (without header).
Below code reads that column but with the header. How to omit the header and save that column in a new text file?
import csv
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for row in reader:
print row[3]
This is the solution #tmrlvi was talking: it skips the first row (header) via next function:
import csv
with open('TAB.csv','rb') as input_file:
reader = csv.reader(input_file)
output_file = open('output.csv','w')
next(reader, None)
for row in reader:
row_str = row[3]
output_file.write(row_str + '\n')
output_file.close()
Try this:
import csv
with open('TAB.csv', 'rb') as f, open('out.txt', 'wb') as g:
reader = csv.reader(f)
next(reader) # skip header
g.writelines(row[3] + '\n' for row in reader)
enumerate is a nice function that returns a tuple. It enables to to view the index while running over an iterator.
import csv
with open('NEW.txt','wb') as outfile:
with open('TAB.csv','rb') as f:
reader = csv.reader(f)
for index, row in enumerate(reader):
if index > 0:
outfile.write(row[3])
outfile.write("\n")
Another solution would be to read one line from the file (in order to skip the header).
It's an old question but I would like to add my answer about Pandas library, I would like to say. It's better to use Pandas library for such tasks instead of writing your own code. And the simple code with Pandas will be like :
import pandas as pd
reader = pd.read_csv('TAB.csv', header = None)
I'm trying to parse a pipe-delimited file and pass the values into a list, so that later I can print selective values from the list.
The file looks like:
name|age|address|phone|||||||||||..etc
It has more than 100 columns.
Use the 'csv' library.
First, register your dialect:
import csv
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
Then, use your dialect on the file:
with open(myfile, "rb") as csvfile:
for row in csv.DictReader(csvfile, dialect='piper'):
print row['name']
Use Pandas:
import pandas as pd
pd.read_csv(filename, sep="|")
This will store the file in a dataframe. For each column, you can apply conditions to select the required values to print. It takes a very short time to execute. I tried with 111,047 rows.
If you're parsing a very simple file that won't contain any | characters in the actual field values, you can use split:
fileHandle = open('file', 'r')
for line in fileHandle:
fields = line.split('|')
print(fields[0]) # prints the first fields value
print(fields[1]) # prints the second fields value
fileHandle.close()
A more robust way to parse tabular data would be to use the csv library as mentioned in Spencer Rathbun's answer.
In 2022, with Python 3.8 or above, you can simply do:
import csv
with open(file_path, "r") as csvfile:
reader = csv.reader(csvfile, delimiter='|')
for row in reader:
print(row[0], row[1])