I need to convert a .dat file that's in a specific format into a .csv file.
The .dat file has multiple rows with a repeating structure. The data is held in brackets and have tags. Below is the sample data; it repeats throughout the data file:
{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}
Can anyone provide a starting point for the script?
This will create a csv assuming each line in your .DAT is json. Just order the header list to your liking
import csv, json
header = ['ID', 'name', 'type', 'area', 'HAC', 'verticalAccuracy', 'course', 'lat', 'lng']
with open('file.DAT') as datfile:
with open('output.csv', 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=header)
writer.writeheader()
for line in datfile:
writer.writerow(json.loads(line))
Your row is in json format. So, you can use:
import json
data = json.loads('{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}')
print data.get('name')
print data.get('ID')
This is only a start point. You have to iter all the .dat file. At the end, you have to write an exporter to save the data into the csv file.
Use a regex to find all of the data items. Use ast.literal_eval to convert each data item into a dictionary. Collect the items in a list.
import re, ast
result = []
s = '''{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}'''
item = re.compile(r'{[^}]*?}')
for match in item.finditer(s):
d = ast.literal_eval(match.group())
result.append(d)
If each data item is on a separate line in the file You don't need the regex - you can just iterate over the file.
with open('file.dat') as f:
for line in f:
line = line.strip()
line = ast.literal_eval(line)
result.append(line)
Use json.load:
import json
with open (filename) as fh:
data = json.load (fh)
...
Related
I am writing and testing a function for opening a .csv file and save information in a dictionary, instead of using import csv.
The csv file is like:
16SAP,12/02/24,sapin-9,MATEYIJDNS
FAS1,01/02/21,fasiculata,MTYEUSOLD
EDS5,10/20/20,epsilon,MHGSJDKDLSKDKDJS
etc....
and .csv file has a (,) separated format and 4 fields: identifier, date, name and sequence, respectively.
My code is:
def dicfromcsv(csv_file):
with open('csv_file', 'r') as f:
d = {}
l = f.read().split(',')
for i in l:
values = i.split(':')
d[values[0]] = values[1], values[2], values[3], values[4]
dicfromcsv('PDB.csv')
But it doesn't function.
Thank in advance
Don't quote csv_file, you want to use the value of the variable.
Use the csv module to parse the file.
Loop over the records in the file, rather than splitting the entire file at , characters.
You can use a list slice to get all the fields of the record after the first field.
Your file only has 4 fields, there's no values[4]
import csv
def dicfromcsv(csv_file):
d = {}
with open(csv_file, 'r') as f:
csvf = csv.reader(f)
for values in csvf:
d[values[0]] = tuple(values[1:])
return d
I have a csv file looks like this:
I have a column called “Inventory”, within that column I pulled data from another source and it put it in a dictionary format as you see.
What I need to do is iterate through the 1000+ lines, if it sees the keywords: comforter, sheets and pillow exist than write “bedding” to the “Location” column for that row, else write “home-fashions” if the if statement is not true.
I have been able to just get it to the if statement to tell me if it goes into bedding or “home-fashions” I just do not know how I tell it to write the corresponding results to the “Location” field for that line.
In my script, im printing just to see my results but in the end I just want to write to the same CSV file.
from csv import DictReader
with open('test.csv', 'r') as read_obj:
csv_dict_reader = DictReader(read_obj)
for line in csv_dict_reader:
if 'comforter' in line['Inventory'] and 'sheets' in line['Inventory'] and 'pillow' in line['Inventory']:
print('Bedding')
print(line['Inventory'])
else:
print('home-fashions')
print(line['Inventory'])
The last column of your csv contains commas. You cannot read it using DictReader.
import re
data = []
with open('test.csv', 'r') as f:
# Get the header row
header = next(f).strip().split(',')
for line in f:
# Parse 4 columns
row = re.findall('([^,]*),([^,]*),([^,]*),(.*)', line)[0]
# Create a dictionary of one row
item = {header[0]: row[0], header[1]: row[1], header[2]: row[2],
header[3]: row[3]}
# Add each row to the list
data.append(item)
After preparing your data, you can check with your conditions.
for item in data:
if all([x in item['Inventory'] for x in ['comforter', 'sheets', 'pillow']]):
item['Location'] = 'Bedding'
else:
item['Location'] = 'home-fashions'
Write output to a file.
import csv
with open('output.csv', 'w') as f:
dict_writer = csv.DictWriter(f, data[0].keys())
dict_writer.writeheader()
dict_writer.writerows(data)
csv.DictReader returns a dict, so just assign the new value to the column:
if 'comforter' in line['Inventory'] and ...:
line['Location'] = 'Bedding'
else:
line['Location'] = 'home-fashions'
print(line['Inventory'])
I am trying to combine multiple rows in a csv file together. I could easily do it in Excel but I want to do this for hundreds of files so I need it to be as a code. I have tried to store rows in arrays but it doesn't seem to work. I am using Python to do it.
So lets say I have a csv file;
1,2,3
4,5,6
7,8,9
All I want to do is to have a csv file as this;
1,2,3,4,5,6,7,8,9
The code I have tried is this;
fin = open("C:\\1.csv", 'r+')
fout = open("C:\\2.csv",'w')
for line in fin.xreadlines():
new = line.replace(',', ' ', 1)
fout.write (new)
fin.close()
fout.close()
Could you please help?
You should be using the csv module for this as splitting CSV manually on commas is very error-prone (single columns can contain strings with commas, but you would incorrectly end up splitting this into multiple columns). The CSV module uses lists of values to represent single rows.
import csv
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
data1 = return_contents('csv1.csv')
data2 = return_contents('csv2.csv')
print(data1)
print(data2)
combined = []
for row in data1:
combined.extend(row)
for row in data2:
combined.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined)
That code gives you the basis of the approach but it would be ugly to extend this for hundreds of files. Instead, you probably want os.listdir to pull all the files in a single directory, one by one, and add them to your output. This is the reason that I packed the reading code into the return_contents function; we can repeat the same process millions of times on different files with only one set of code to do the actual reading. Something like this:
import csv
import os
def return_contents(file_name):
with open(file_name) as infile:
reader = csv.reader(infile)
return list(reader)
all_files = os.listdir('my_csvs')
combined_output = []
for file in all_files:
data = return_contents('my_csvs/{}'.format(file))
for row in data:
combined_output.extend(row)
with open('csv_out.csv', 'w', newline='') as outfile:
writer = csv.writer(outfile)
writer.writerow(combined_output)
If you are specially dealing with csv file format. I recommend you to use csv package for the file operations. If you also use with...as statement, you don't need to worry about closing the file etc. You just need to define the PATH then program will iterate all .csv files
Here is what you can do:
PATH = "your folder path"
def order_list():
data_list = []
for filename in os.listdir(PATH):
if filename.endswith(".csv"):
with open("data.csv") as csvfile:
read_csv = csv.reader(csvfile, delimiter=',', quoting=csv.QUOTE_NONNUMERIC)
for row in read_csv:
data_list.extend(row)
print(data_list)
if __name__ == '__main__':
order_list()
Store your data in pandas df
import pandas as pd
df = pd.read_csv('file.csv')
Store the modified dataframe into new one
df_2 = df.groupby('Column_Name').agg(lambda x: ' '.join(x)).reset_index() ## Write Name of your column
Write the df to new csv
df2.to_csv("file_modified.csv")
You could do it also like this:
fIn = open("test.csv", "r")
fOut = open("output.csv", "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
I've you want now to run it on multiple file you can run it as script with arguments:
import sys
fIn = open(sys.argv[1], "r")
fOut = open(sys.argv[2], "w")
fOut.write(",".join([line for line in fIn]).replace("\n",""))
fIn.close()
fOut.close()
So now expect you use some Linux System and the script is called csvOnliner.py you could call it with:
for i in *.csv; do python csvOnliner.py $i changed_$i; done
With windows you could do it in a way like this:
FOR %i IN (*.csv) DO csvOnliner.py %i changed_%i
I have a data file in csv format that consists of Pokemon names and statistics. I want to read it into python as a matrix. The column headers are the first row of the data table, columns are separated by commas and rows are seperated by "\n"
pokedex_file = 'pokedex_basic.csv'
with open(pokedex_file, 'r') as f:
raw_pd = f.read()
is the ecode I have but I am crashing my memory when using line.strip()? Any suggestions?
Python has a package called csv which makes it very easy to parse csv files.
If your CSV file has headers, like
Name,Type
Charizard,Fire/Dragon
Pikachu,Electric
then you can use the DictReader tool from csv to parse your file.
import csv
with open('pokemon.csv', 'r') as pokedex:
reader = csv.DictReader(pokedex)
for line in reader: # line is a dict to represent this line of data
print(line)
current_name = line['Name']
current_type = line['Type']
print("The pokemon {:s} has type {:s}".format(current_name, current_type))
Output:
{'Name': 'Charizard', 'Type': 'Fire/Dragon'}
The pokemon Charizard has type Fire/Dragon
{'Name': 'Pikachu', 'Type': 'Electric'}
The pokemon Pikachu has type Electric
Depending on how it is stored, you may be able to read it using dictReader.
import csv
with open('/path-name.csv', 'r') as input:
reader = csv.DictReader(input)
for dataDict in reader:
# do stuff with dataDict
stats = dataDict['pokemon_name']
I'm trying to create a list of dictionaries from my .csv file. I want to make the first row of the file the dictionary keys, and the corresponding values under them in the columns their values. This has been done successfully using a .txt file and it works perfectly. When I try to do it with the .csv format I get issues with being able to call a specific key so I don't think it's working properly.
newqstars = [meteor['M_P'] for meteor in kept2]
>>>KeyError: 'M_P'
I've been trying other methods all day such as DictReader() and csv.reader() but they don't work so I'll just ask how I can modify what I have below to be able to handle a .csv
def example_05(filename):
with open(filename,'r') as file : data = file.readlines()
header, data = data[0].split(), data[1:]
#................ convert each line to a dict, using header
# words as keys
global kept2
kept2 = []
for line in data :
line = [to_float(term) for term in line.split()]
kept2.append( dict( zip(header, line) ) )
if __name__ == '__main__' :
example_05('Geminids.csv')
DictReader is the way to go here:
import csv
with open('summ.csv') as csvfile:
reader = csv.DictReader(csvfile)
kept2 = [row for row in reader]