Nested List:
AIStable = [['Unknown,Unknown,-,-,9127057/-,-,-,-,0.0°/0.0kn,0.0S/0.0W,-'], ['Tanker,MarshallIslands,USHOU>DOSPM,Jan3,19:00,9683984/538005270,V7CJ5,183/32m,11.4m,112.0°/12.8kn,18.26069N/75.77137W,Jan2,202006:47UTC'], ['Productisnotfound!'], ['Productisnotfound!'], ['Productisnotfound!'], ['Productisnotfound!'], ['Cargoship,Russia,VLADIVOSTOK,RUSSIA,Dec31,00:00,9015785/273213710,UBCT2,119/18m,5.6m,7.9°/0.0kn,43.09932N/131.88229E,Jan1,202009:06UTC'], ['Tanker,Singapore,YEOSU,SK,Jan1,03:00,9370991/566376000,9V3383,100/16m,4.3m,188.0°/0.1kn,34.72847N/127.81362E,Jan2,202007:41UTC'], ['Cargoship,Italy,QINGDAO,Jan1,02:00,9511454/247283400,ICAH,292/45m,18.2m,324.0°/5.4kn,27.80572N/125.07295E,Jan2,202007:20UTC']]
Nested list within the list signifies a new line to be added in the CSV
The desired output in the CSV file is:
AIS_Type Flag Destination ETA IMO/MMSI Callsign Length/Beam Current_Draught Course/Speed Coordinates Last_Report
Unknown Unknown - - 9127057/- - - - - 0.0°/0.0kn 0.0S/0.0W -
Tanker MarshallIslands USHOU>DOSPM Jan3 19:00 9683984/538005270 V7CJ5 183/32m 11.4m 112.0°/12.8kn 18.26069N/75.77137W Jan2 202006:47UTC
Product is not found!
I tried the following:
AISUpdated = [[''.join(','.join(i).split())] for i in AIStable]
print(AISUpdated)
filename = "vesselsUpdated.csv"
with open(filename, 'w' , newline = '') as f:
writer = csv.writer(f,delimiter = ",")
headers = "AIS_Type,Flag,Destination,ETA,IMO/MMSI,Callsign,Length/Beam,Current_Draught,Course/Speed,Coordinates,Last_Report"
f.write(headers)
writer.writerows("\n")
for i in AISUpdated:
writer.writerows([i])
The output i obtain does not reflect a new column for the records while it squeeze all the new records of the new list to one single column under AIS_Type. Hence, i want it to separate by , and referencing each relevant data to the right column.
The problem is that all your lists only have one value.
You should either make them an actual lists, ['Unknown','Unknown','-','-','9127057']
or do it like this:
AISUpdated = [i[0].split(',') for i in AIStable]
with open('file.csv','w') as f:
writer = csv.writer(f)
writer.writerow(headers.split(','))
writer.writerows(AISUpdated)
Related
Old.csv:
name,department
leona,IT
New.csv:
name,department
leona,IT
lewis,Tax
With the same two columns, finding the new values from New.csv and update Old.csv with those works fine with the code below
feed = []
headers = []
with open("Old.csv", 'r') as t1, open("New.csv", 'r') as t2:
for header in t1.readline().split(','):
headers.append(header.rstrip())
fileone = t1.readlines()
filetwo = t2.readlines()[1:] # Skip csv fieldnames
for line in filetwo:
if line not in fileone:
lineItems = {}
feed.append(line.strip()) # For old file update
New problem:
1/ Add a 3rd column to store timestamp values
2/ Skip the 3rd column (timestamp) in both files and still need to compare two files for differences based on the 1st and 2nd columns
3/ Old file will be updated with the new values on all 3 columns
I tried the slicing method split(',')[0:2] but didn't seem to work at all. I feel there is just some small updates to the existing code but not sure how I can achieve that.
Expected outcome:
Old.csv:
name,department,timestamp
leona,IT,07/20/2020 <--- Existing value
lewis,Tax,08/25/2020 <--- New value from New.csv
New.csv:
name,department,timestamp
leona,IT,07/20/2020
leona,IT,07/25/2020
lewis,Tax,08/25/2020
You can do it all yourself, but why not use the tools built in to Python?
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(rec)
print(headers)
print(feed)
Result:
['name', 'department']
[['lewis', 'Tax']]
Note that you'll get this result with the data you provided, but if you add a third column, the code still works as expected and will add that data to the feed result.
To get feed to be a list of dictionaries, which you can easily turn into JSON, you could do something like:
feed.append(dict(zip(headers, rec)))
Turning feed into json is as simple as:
import json
print(json.dumps(feed))
The whole solution:
import json
from csv import reader
feed = []
with open('Old.csv', 'r') as t1, open('New.csv', 'r') as t2:
old = reader(t1)
new = reader(t2)
headers = next(old)
# skip header in new
next(new)
# relevant data is only the first two columns
old_data = [rec[:2] for rec in old]
for rec in new:
if rec[:2] not in old_data:
feed.append(dict(zip(headers, rec)))
print(json.dumps(feed))
With outputs like:
[{"name": "lewis", "department": "Tax", "timestamp": "08/25/2020"}]
I have two CSV files. The first file(state_abbreviations.csv) has only states abbreviations and their full state names side by side(like the image below), the second file(test.csv) has the state abbreviations with additional info.
I want to replace each state abbreviation in test.csv with its associated state full name from the first file.
My approach was to read reach file, built a dict of the first file(state_abbreviations.csv). Read the second file(test.csv), then compare if an abbreviation matches the first file, if so replace it with the full name.
Any help is appreacited
import csv
state_initials = ("state_abbr")
state_names = ("state_name")
state_file = open("state_abbreviations.csv","r")
state_reader = csv.reader(state_file)
headers = None
final_state_initial= []
for row in state_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in state_initials:
headers.append(i)
else:
final_state_initial.append((row[0]))
print final_state_initial
headers = None
final_state_abbre= []
for row in state_reader:
if not headers:
headers = []
for i, col in enumerate(row):
if col in state_initials:
headers.append(i)
else:
final_state_abbre.append((row[1]))
print final_state_abbre
final_state_initial
final_state_abbre
state_dictionary = dict(zip(final_state_initial, final_state_abbre))
print state_dictionary
You almost got it, the approach that is - building out a dict out of the abbreviations is the easiest way to do this:
with open("state_abbreviations.csv", "r") as f:
# you can use csv.DictReader() instead but lets strive for performance
reader = csv.reader(f)
next(reader) # skip the header
# assuming the first column holds the abbreviation, second the full state name
state_map = {state[0]: state[1] for state in reader}
Now you have state_map containing a map of all your state abbreviations, for example: state_map["FL"] contains Florida.
To replace the values in your test.csv, tho, you'll either have to load the whole file into memory, parse it, do the replacement and save it, or create a temporary file and stream-write to it the changes, then overwrite the original file with the temporary file. Assuming that test.csv is not too big to fit into your memory, the first approach is much simpler:
with open("test.csv", "r+U") as f: # open the file in read-write mode
# again, you can use csv.DictReader() for convenience, but this is significantly faster
reader = csv.reader(f)
header = next(reader) # get the header
rows = [] # hold our rows
if "state" in header: # proceed only if `state` column is found in the header
state_index = header.index("state") # find the state column index
for row in reader: # read the CSV row by row
current_state = row[state_index] # get the abbreviated state value
# replace the abbreviation if it exists in our state_map
row[state_index] = state_map.get(current_state, current_state)
rows.append(row) # append the processed row to our `rows` list
# now lets overwrite the file with updated data
f.seek(0) # seek to the file begining
f.truncate() # truncate the rest of the content
writer = csv.writer(f) # create a CSV writer
writer.writerow(header) # write back the header
writer.writerows(rows) # write our modified rows
It seems like you are trying to go through the file twice? This is absolutely not necessary: the first time you go through you are already reading all the lines, so you can then create your dictionary items directly.
In addition, comprehension can be very useful when creating lists or dictionaries. In this case it might be a bit less readable though. The alternative would be to create an empty dictionary, start a "real" for-loop and adding all the key:value pairs manually. (i.e: with state_dict[row[abbr]] = row[name])
Finally, I used the with statement when opening the file to ensure it is safely closed when we're done with it. This is good practice when opening files.
import csv
with open("state_abbreviations.csv") as state_file:
state_reader = csv.DictReader(state_file)
state_dict = {row['state_abbr']: row['state_name'] for row in state_reader}
print(state_dict)
Edit: note that, like the code you showed, this only creates the dictionary that maps abbreviations to state names. Actually replacing them in the second file would be the next step.
Step 1: Ask Python to remember the abbreviated full names, so we are using dictionary for that
with open('state_abbreviations.csv', 'r') as f:
csvreader = csv.reader(f)
next(csvreader)
abs = {r[0]: r[1] for r in csvreader}
step 2: Replace the abbreviations with full names and write to an output, I used "test_output.csv"
with open('test.csv', 'r') as reading:
csvreader = csv.reader(reading)
next(csvreader)
header = ['name', 'gender', 'birthdate', 'address', 'city', 'state']
with open( 'test_output.csv', 'w' ) as f:
writer = csv.writer(f)
writer.writerow(header)
for a in csvreader:
writer.writerow(a[0], a[1], a[2], a[3], a[4], abs[a[5]])
I am currently writing a script where I am creating a csv file ('tableau_input.csv') composed both of other csv files columns and columns created by myself. I tried the following code:
def make_tableau_file(mp, current_season = 2016):
# Produces a csv file containing predicted and actual game results for the current season
# Tableau uses the contents of the file to produce visualization
game_data_filename = 'game_data' + str(current_season) + '.csv'
datetime_filename = 'datetime' + str(current_season) + '.csv'
with open('tableau_input.csv', 'wb') as writefile:
tableau_write = csv.writer(writefile)
tableau_write.writerow(['Visitor_Team', 'V_Team_PTS', 'Home_Team', 'H_Team_PTS', 'True_Result', 'Predicted_Results', 'Confidence', 'Date'])
with open(game_data_filename, 'rb') as readfile:
scores = csv.reader(readfile)
scores.next()
for score in scores:
tableau_content = score[1::]
# Append True_Result
if int(tableau_content[3]) > int(tableau_content[1]):
tableau_content.append(1)
else:
tableau_content.append(0)
# Append 'Predicted_Result' and 'Confidence'
prediction_results = mp.make_predictions(tableau_content[0], tableau_content[2])
tableau_content += list(prediction_results)
tableau_write.writerow(tableau_content)
with open(datetime_filename, 'rb') as readfile2:
days = csv.reader(readfile2)
days.next()
for day in days:
tableau_write.writerow(day)
'tableau_input.csv' is the file I am creating. The columns 'Visitor_Team', 'V_Team_PTS', 'Home_Team', 'H_Team_PTS' come from 'game_data_filename'(e.g tableau_content = score[1::]). The columns 'True_Result', 'Predicted_Results', 'Confidence' are columns created in the first for loop.
So far, everything works but finally I tried to add to the 'Date' column data from the 'datetime_filename' using the same structure as above but when I open my 'tableau_input' file, there is no data in my 'Date' column. Can someone solve this problem?
For info, below are screenshots of csv files respectively for 'game_data_filename' and 'datetime_filename' (nb: datetime values are in datetime format)
It's hard to test this as I don't really know what the input should look like, but try something like this:
def make_tableau_file(mp, current_season=2016):
# Produces a csv file containing predicted and actual game results for the current season
# Tableau uses the contents of the file to produce visualization
game_data_filename = 'game_data' + str(current_season) + '.csv'
datetime_filename = 'datetime' + str(current_season) + '.csv'
with open('tableau_input.csv', 'wb') as writefile:
tableau_write = csv.writer(writefile)
tableau_write.writerow(
['Visitor_Team', 'V_Team_PTS', 'Home_Team', 'H_Team_PTS', 'True_Result', 'Predicted_Results', 'Confidence', 'Date'])
with open(game_data_filename, 'rb') as readfile, open(datetime_filename, 'rb') as readfile2:
scoreReader = csv.reader(readfile)
scores = [row for row in scoreReader]
scores = scores[1::]
daysReader = csv.reader(readfile2)
days = [day for day in daysReader]
if(len(scores) != len(days)):
print("File lengths do not match")
else:
for i in range(len(days)):
tableau_content = scores[i][1::]
tableau_date = days[i]
# Append True_Result
if int(tableau_content[3]) > int(tableau_content[1]):
tableau_content.append(1)
else:
tableau_content.append(0)
# Append 'Predicted_Result' and 'Confidence'
prediction_results = mp.make_predictions(tableau_content[0], tableau_content[2])
tableau_content += list(prediction_results)
tableau_content += tableau_date
tableau_write.writerow(tableau_content)
This combines both of the file reading parts into one.
As per your questions below:
scoreReader = csv.reader(readfile)
scores = [row for row in scoreReader]
scores = scores[1::]
This uses list comprehension to create a list called scores, with every element being one of the rows from scoreReader. As scorereader is a generator, every time we ask it for a row, it spits one out for us, until there are no more.
The second line scores = scores[1::] just chops off the first element of the list, as you don't want the header.
For more info try these:
Generators on Wiki
List Comprehensions
Good luck!
Even thought this might sound as a repeated question, I have not found a solution. Well, I have a large .csv file that looks like:
prot_hit_num,prot_acc,prot_desc,pep_res_before,pep_seq,pep_res_after,ident,country
1,gi|21909,21 kDa seed protein [Theobroma cacao],A,ANSPV,L,F40,EB
1,gi|21909,21 kDa seed protein [Theobroma cacao],A,ANSPVL,D,F40,EB
1,gi|21909,21 kDa seed protein [Theobroma cacao],L,SSISGAGGGGLA,L,F40,EB
1,gi|21909,21 kDa seed protein [Theobroma cacao],D,NYDNSAGKW,W,F40,EB
....
The aim is to slice this .csv file into multiple smaller .csv files according to the last two columns ('ident' and 'country').
I have used a code from an answer in a previous post and is the following:
csv_contents = []
with open(outfile_path4, 'rb') as fin:
dict_reader = csv.DictReader(fin) # default delimiter is comma
fieldnames = dict_reader.fieldnames # save for writing
for line in dict_reader: # read in all of your data
csv_contents.append(line) # gather data into a list (of dicts)
# input to itertools.groupby must be sorted by the grouping value
sorted_csv_contents = sorted(csv_contents, key=op.itemgetter('prot_desc','ident','country'))
for groupkey, groupdata in it.groupby(sorted_csv_contents,
key=op.itemgetter('prot_desc','ident','country')):
with open(outfile_path5+'slice_{:s}.csv'.format(groupkey), 'wb') as fou:
dict_writer = csv.DictWriter(fou, fieldnames=fieldnames)
dict_writer.writerows(groupdata)
However, I need that my output .csv's just contain the column 'pep_seq', a desired output like:
pep_seq
ANSPV
ANSPVL
SSISGAGGGGLA
NYDNSAGKW
What can I do?
Your code was almost correct, it just needed the fieldsnames to be set correctly and for extraaction='ignore' to be set. This tells the DictWriter to only write the fields you specify:
import itertools
import operator
import csv
outfile_path4 = 'input.csv'
outfile_path5 = r'my_output_folder\output.csv'
csv_contents = []
with open(outfile_path4, 'rb') as fin:
dict_reader = csv.DictReader(fin) # default delimiter is comma
fieldnames = dict_reader.fieldnames # save for writing
for line in dict_reader: # read in all of your data
csv_contents.append(line) # gather data into a list (of dicts)
group = ['prot_desc','ident','country']
# input to itertools.groupby must be sorted by the grouping value
sorted_csv_contents = sorted(csv_contents, key=operator.itemgetter(*group))
for groupkey, groupdata in itertools.groupby(sorted_csv_contents, key=operator.itemgetter(*group)):
with open(outfile_path5+'slice_{:s}.csv'.format(groupkey), 'wb') as fou:
dict_writer = csv.DictWriter(fou, fieldnames=['pep_seq'], extrasaction='ignore')
dict_writer.writeheader()
dict_writer.writerows(groupdata)
This will give you an output csv file containing:
pep_seq
ANSPV
ANSPVL
SSISGAGGGGLA
NYDNSAGKW
The following would output a csv file per country containing only the field you need.
You could always add another step to group by the second field you need I think.
import csv
# use a dict so you can store the list of pep_seqs found for each country
# the country value with be the dict key
csv_rows_by_country = {}
with open('in.csv', 'rb') as csv_in:
csv_reader = csv.reader(csv_in)
for row in csv_reader:
if row[7] in csv_rows_by_country:
# add this pep_seq to the list we already found for this country
csv_rows_by_country[row[7]].append(row[4])
else:
# start a new list for this country - we haven't seen it before
csv_rows_by_country[row[7]] = [row[4],]
for country in csv_rows_by_country:
# create a csv output file for each country and write the pep_seqs into it.
with open('out_%s.csv' % (country, ), 'wb') as csv_out:
csv_writer = csv.writer(csv_out)
for pep_seq in csv_rows_by_country[country]:
csv_writer.writerow([pep_seq, ])
I'm trying to learn Python but I'm stuck here, any help appreciated.
I have 2 files.
1 is a .dat file with no column headers that is fixed width containing multiple rows of data
1 is a .fmt file that contains the column headers, column length, and datatype
.dat example:
10IFKDHGHS34
12IFKDHGHH35
53IFHDHGDF33
.fmt example:
ID,2,n
NAME,8,c
CODE,2,n
Desired Output as .csv:
ID,NAME,CODE
10,IFKDHGHS,34
12,IFKDHGHH,35
53,IFHDHGDF,33
First, I'd parse the format file.
with open("format_file.fmt") as f:
# csv.reader parses away the commas for us
# and we get rows as nice lists
reader = csv.reader(f)
# this will give us a list of lists that looks like
# [["ID", "2", "n"], ...]
row_definitions = list(reader)
# iterate and just unpack the headers
# this gives us ["ID", "NAME", "CODE"]
header = [name for name, length, dtype in row_definitions]
# [2, 8, 2]
lengths = [int(length) for name, length, dtype in row_definitions]
# define a generator (possibly as a closure) that simply slices
# into the string bit by bit -- this yields, for example, first
# characters 0 - 2, then characters 2 - 10, then 8 - 12
def parse_line(line):
start = 0
for length in lengths:
yield line[start: start+length]
start = start + length
with open(data_file) as f:
# iterating over a file pointer gives us each line separately
# we call list on each use of parse_line to force the generator to a list
parsed_lines = [list(parse_line(line)) for line in data_file]
# prepend the header
table = [header] + parsed_lines
# use the csv module again to easily output to csv
with open("my_output_file.csv", 'w') as f:
writer = csv.writer(f)
writer.writerows(table)