I have a CSV file:It contain the classes name and type of code smell and for each class Icalculated the number of a code smell .the final calcul is on the last line so there are many repeated classes name .
I need just the last line of the class name.
This is a part of my CSV file beacause it's too long :
NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,1,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN
To filter out the last entry for groups of NameOfClass, you can make use of Python's groupby() function to return lists of rows with the same NameOfClass. The last entry from each can then be written to a file.
from itertools import groupby
import csv
with open('data_in.csv', newline='') as f_input, open('data_out.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
for key, rows in groupby(csv_input, key=lambda x: x[0]):
csv_output.writerow(list(rows)[-1])
For the data you have given, this would give you the following output:
NameOfClass,LazyClass,ComplexClass,LongParameterList,FeatureEnvy,LongMethod,BlobClass,MessageChain,RefusedBequest,SpaghettiCode,SpeculativeGenerality
com.nirhart.shortrain.MainActivity,NaN,NaN,NaN,NaN,NaN,NaN,1,NaN,NaN,NaN
com.nirhart.shortrain.path.PathParser,NaN,1,2,1,1,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.PathPoint,1,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.path.TrainPath,NaN,NaN,NaN,1,NaN,NaN,NaN,NaN,NaN,NaN
com.nirhart.shortrain.rail.RailActionActivity,NaN,NaN,NaN,1,1,NaN,NaN,NaN,NaN,NaN
To get just the unique class names (ignoring repeated rows, not deleting them), you can do this:
import csv
with open('my_file.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
classNames = set(row[0] for row in reader)
print(classNames)
# {'com.nirhart.shortrain.MainActivity', 'com.nirhart.shortrain.path.PathParser', 'com.nirhart.shortrain.path.PathPoint', ...}
This is just using the csv module to open a file, getting the first value in each row, and then taking only the unique values of those. You can then manipulate the resulting set of strings (you might want to cast it back to a list via list(classNames)) however you need to.
If you intend to later process the data in pandas, filtering duplicates is trivial:
import pandas as pd
df = pd.read_csv('file.csv')
df = df.loc[~df.NameOfClass.duplicated(keep='last')]
If you just want to build a new csv file with only the expected lines, pandas is overkill and the csv module is enough:
import csv
with open('file.csv') as fdin, file('new_file.csv', 'w', newline='') as fdout:
rd = csv.reader(fdin)
wr = csv.writer(fdout)
wr.writerow(next(rd)) # copy the header line
old = None
for row in rd:
if old is not None and old[0] != row[0]:
wr.writerow(old)
old = row
wr.writerow(old)
Related
I want to append a column from 'b.csv' file and put it into 'a.csv' file but it only add a letter and not the whole string. I tried searching in google but there's no answer. I want to put the column under the headline "number". This is my code:
f = open('b.csv')
default_text = f.read()
with open('a.csv', 'r') as read_obj, \
open('output_1.csv', 'w', newline='') as write_obj:
csv_reader = reader(read_obj)
csv_writer = writer(write_obj)
for row in csv_reader:
row.append(default_text[8])
csv_writer.writerow(row)
This is the info in 'a.csv'
name,age,course,school,number
Leo,18,BSIT,STI
Rommel,23,BSIT,STI
Gaby,33,BSIT,STI
Ranel,31,BSIT,STI
This is the info in 'b.csv'
1212121
1094534
1345684
1093245
You can just concat rows read from both CSV file and pass it immediately to writer:
import csv
from operator import concat
with open(r'a.csv') as f1, \
open(r'b.csv') as f2, \
open(r'output_1.csv', 'w', newline='') as out:
f1_reader = csv.reader(f1)
f2_reader = csv.reader(f2)
writer = csv.writer(out)
writer.writerow(next(f1_reader)) # write column names
writer.writerows(map(concat, f1_reader, f2_reader))
So we initialize csv.reader() for both CSV files and csv.writer() for output. As first file (a.csv) contains column names, we read it using next() and pass to .writerow() to write them into output without any modifications. Then using map() we can iterate over both readers simultaneously applying operator.concat() which concatenate rows returned from both reader. We can pass it directly to .writerows() and let it consume generator returned by map().
If only pandas cannot be used, then it's convenient to use Table helper from convtools library (github).
from convtools.contrib.tables import Table
from convtools import conversion as c
(
Table.from_csv("tmp/1.csv", header=True)
# this step wouldn't be needed if your first file wouldn't have missing
# "number" column
.drop("number")
.zip(Table.from_csv("tmp/2.csv", header=["number"]))
.into_csv("tmp/results.csv")
)
I'm a complete newb with Python so please excuse my ignorance. I'm trying to read items in a csv file and output the values with a comma in between (and no comma at the end).
My csv file (test.csv) is as follows:
Test
AAA
BBB
CCC
and the code I'm currently using is:
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
print('`'+row['Test']+'`,')
This returns the following:
`AAA`,
`BBB`,
`CCC`,
Is there any way to have the comma remain after AAA and BBB, but not after CCC?
Thanks in advance.
import csv
from csv import DictReader
with open("test.csv") as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
print('`'+row['Test']+'`,'[:-1])
You can try this as well, this will print till second last character. So basically every last occurrence of comma will be removed.
The easiest way to modify what you have to do what you want is to store the values and then you can use ",\n".join() on the list.
from csv import DictReader
data = []
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
data.append(row['Test'])
print(",\n".join([f"`{item}`" for item in data]))
If you had a long list or a more complicated CSV, then you could take advantage of print("sometext", end="") to avoid the new line character. Then you can start each subsequent row with print(',') which will also provide you with the \n character.
from csv import DictReader
first = True
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
if not first:
print(',')
print(f"`{row['Test']}`", end='')
first = False
If your CSV is just a single column, you could use pathlib to grab the lines, slice off the first line with [1:] and then use the same join pattern on the remainder of the data.
from pathlib import Path
data = Path("test.csv").read_text().splitlines()[1:]
print(",\n".join([f"`{item}`" for item in data]))
Finally, as previously mentioned, pandas can be used to read the CSV and then you don't have to worry about the first line.
import pandas as pd
df = pd.read_csv('test.csv')
print(",\n".join([f"`{item}`" for item in df['Test'].values]))
All of these methods yield the following:
`AAA`,
`BBB`,
`CCC`
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for row in csv_dict_reader:
temp = '`'+row['Test']+'`,'
print(temp[:-1])
OR in case you like to read in as df then
import pandas as pd
df = pd.read_csv('test.csv')
Now, you run df.
pl refer: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html
I have a CSV file and when I read it by importing the CSV library I get as the output:
['exam', 'id_student', 'grade']`
['maths', '573834', '7']`
['biology', '573834', '8']`
['biology', '578833', '4']
['english', '581775', '7']`
# goes on...
I need to edit it by creating a 4th column called 'Passed' with two possible values: True or False depending on whether the grade of the row is >= 7 (True) or not (False), and then count how many times each student passed an exam.
If it's not possible to edit the CSV file that way, I would need to just read the CSV file and then create a dictionary of lists with the following output:
dict = {'id_student':[573834, 578833, 581775], 'passed_count': [2,0,1]}
# goes on...
Thanks
Try using importing csv as pandas dataframe
import pandas as pd
data=pd.read_csv('data.csv')
And then use:
data['passed']=(data['grades']>=7).astype(bool)
And then save dataframe to csv as:
data.to_csv('final.csv',index=False)
It is totally possible to "edit" CSV.
Assuming you have a file students.csv with the following content:
exam,id_student,grade
maths,573834,7
biology,573834,8
biology,578833,4
english,581775,7
Iterate over input rows, augment the field list of each row with an additional item, and save it back to another CSV:
import csv
with open('students.csv', 'r', newline='') as source, open('result.csv', 'w', newline='') as result:
csvreader = csv.reader(source)
csvwriter = csv.writer(result)
# Deal with the header
header = next(csvreader)
header.append('Passed')
csvwriter.writerow(header)
# Process data rows
for row in csvreader:
row.append(str(int(row[2]) >= 7))
csvwriter.writerow(row)
Now result.csv has the content you need.
If you need to replace the original content, use os.remove() and os.rename() to do that:
import os
os.remove('students.csv')
os.rename('result.csv', 'students.csv')
As for counting, it might be an independent thing, you don't need to modify CSV for that:
import csv
from collections import defaultdict
with open('students.csv', 'r', newline='') as source:
csvreader = csv.reader(source)
next(csvreader) # Skip header
stats = defaultdict(int)
for row in csvreader:
if int(row[2]) >= 7:
stats[row[1]] += 1
print(stats)
You can include counting into the code above and have both pieces in one place. defaultdict (stats) has the same interface as dict if you need to access that.
For some reason the pandas module does not work and I have to find another way to read a (large) csv file and have as Output specific columns within a certain range (e.g. first 1000 lines). I have the code that reads the entire csv file, but I haven't found a way to display just specific columns.
Any help is much appreciated!
import csv
fileObj = open('apartment-data-all-4-xaver.2018.csv')
csvReader = csv.reader( fileObj )
for row in csvReader:
print row
fileObj.close()
I created a small csv file with the following contents:
first,second,third
11,12,13
21,22,23
31,32,33
41,42,43
You can use the following helper function that uses namedtuple from collections module, and generates objects that allows you to access your columns like attributes:
import csv
from collections import namedtuple
def get_first_n_lines(file_name, n):
with open(file_name) as file_obj:
csv_reader = csv.reader(file_obj)
header = next(csv_reader)
Tuple = namedtuple('Tuple', header)
for i, row in enumerate(csv_reader, start=1):
yield Tuple(*row)
if i >= n: break
If you want to print first and third columns, having n=3 lines, you use the method like this (Python 3.6 +):
for line in get_first_n_lines(file_name='csv_file.csv', n=3):
print(f'{line.first}, {line.third}')
Or like this (Python 3.0 - 3.5):
for line in get_first_n_lines(file_name='csv_file.csv', n=3):
print('{}, {}'.format(line.first, line.third))
Outputs:
11, 13
21, 23
31, 33
use csv dictreader and then filter out specific rows and columns
import csv
data = []
with open('names.csv', newline='') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
data.append(row)
colnames = ['col1', 'col2']
for i in range(1000):
print(data[i][colnames[0]], data[i][colnames[1]])
I have a csv file as follow:
lat,lon,date,data1,data2
1,2,3,4,5
6,7,8,9,10
From this csv file I want to retrieve and extract the column date and data1 to another csv file. I have the following code:
import csv
os.chdir(mydir)
column_names = ["date", "data1"]
index=[]
with open("my.csv", "r") as f:
mycsv = csv.DictReader(f)
for row in mycsv:
for col in column_names:
try:
data=print(row[col])
with open("test2.txt", "w") as f:
print(data, file=f)
except KeyError:
pass
Unfortunately, the output is a file with a "none" on it... Does anyone knows how to retrieve and write to another file the data I wish to use?
There are a few issues with your code:
Everytime you open("test2.txt", "w"), w option will open your file and delete all its contents.
You are storing return value or print, which is None and then trying to print this into yout file
Read your CSV into a list of dict's, as below:
import csv
with open('your_csv.csv') as csvfile:
reader = csv.DictReader(csvfile)
read_l = [{key:value for key, value in row.items() if key in ('date', 'data1')}
for row in reader]
and then use DictWriter to write to a new CSV.
with open('new.csv', 'w') as csvfile:
fieldnames = read_l[0].keys()
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for row in read_l[1:]:
writer.writerow(row)
Try with below steps may help you. But they require pandas library.Install pandas library before you go for below steps. input.csv contains data that you have mentioned.
import pandas as pd
df=pd.read_csv('input.csv')
df_new=df.iloc[0:,2:4]
df_new.to_csv("output.csv",index=False)
The reason why you see None in your file is because you're assigning the result of print(row[col]) to your data variable:
data=print(row[col])
print() doesn't return anything, therefore the content of data is None. If you remove the print() and just have data = row[col], you will get something valuable.
There is one more issue that I see in your code, which you probably want to get fixed:
You're opening the file over and over again with each iteration in the first loop. Therefore, with each row you're overwriting the entire file with that rows value. If you want the entire column, then you'd have open the file once, before the loop.
I will recommend you should use panda. I haven't run this script but something like this should work.
import panda as pd
import csv
frame = pd.read_csv('my.csv')
df=frame[['date','data2']]
with open('test2.csv', 'a', newline='') as csvfile:
writer = csv.writer(csvfile, delimiter=',',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
writer.writerow(df)
import pandas as pd
df = pd.read_csv("my.csv") #optional "header"=True
new_df = df[["date","data1"]]
new_df.to_csv("new_csv_name.csv")
#if you don't need index
new_df.to_csv('new_csv_name.csv', index=False)