parsing a csv datafile in python - python

I have a data file in csv format that consists of Pokemon names and statistics. I want to read it into python as a matrix. The column headers are the first row of the data table, columns are separated by commas and rows are seperated by "\n"
pokedex_file = 'pokedex_basic.csv'
with open(pokedex_file, 'r') as f:
raw_pd = f.read()
is the ecode I have but I am crashing my memory when using line.strip()? Any suggestions?

Python has a package called csv which makes it very easy to parse csv files.
If your CSV file has headers, like
Name,Type
Charizard,Fire/Dragon
Pikachu,Electric
then you can use the DictReader tool from csv to parse your file.
import csv
with open('pokemon.csv', 'r') as pokedex:
reader = csv.DictReader(pokedex)
for line in reader: # line is a dict to represent this line of data
print(line)
current_name = line['Name']
current_type = line['Type']
print("The pokemon {:s} has type {:s}".format(current_name, current_type))
Output:
{'Name': 'Charizard', 'Type': 'Fire/Dragon'}
The pokemon Charizard has type Fire/Dragon
{'Name': 'Pikachu', 'Type': 'Electric'}
The pokemon Pikachu has type Electric

Depending on how it is stored, you may be able to read it using dictReader.
import csv
with open('/path-name.csv', 'r') as input:
reader = csv.DictReader(input)
for dataDict in reader:
# do stuff with dataDict
stats = dataDict['pokemon_name']

Related

Failing to write specific data from a csv file into a json file

I awant to get only the values of 3 columns from a csv file and write those values as a json file. I also want to remove all rows where latitude is empty.
I am trying to avoid Pandas if it is possible to avoid loading more libraries.
I manage to read csv file and print the values from the columns I want (Latitude and Longitude and (LastUL)SNR). But when I print the data the order is not the same I have writen in the code.
here an example of the output
[{'-8.881253', '-110', '38.569244417'}]
[{'-8.881253', '-110', '38.569244417'}, {'-8.910678', '-122', '38.6256140816'}]
[{'-8.881253', '-110', '38.569244417'}, {'-8.910678', '-122', '38.6256140816'}, {'38.6256782222', '-127', '-8.913913'}]
I am also failing to write dump the data into a json file and I have used the code as is in this link on other ocasions and it worked fine.
As I am new to python I am not getting the reason why that is happening.
Insights would be very appreciated:
So the csv file example is the following:
MAC LORA,Nº Série,Modelo,Type,Freguesia,PT,Latitude-Instalação,Longitude-Instalação,Latitude - API,Longitude - API,Latitude,Longitude,Data Instalação,Semana,Instalador,total instaladas,TYPO,Instaladas,Registadas no NS,Registadas Arquiled,Registadas no Máximo,Último UL,Último JoinRequest,Último Join Accept,(LastUL)RSSI,(LastUL)SNR,JR(RSSI),JR(SNR),Mesmo Poste,Substituidas,Notas,Issue,AVARIAS
0004A30B00FB82F0,202103000002777,OCTANS 40,Luminária,Freguesia de PALMELA,PT1508D2052900,38.569244417,-8.88123655,38.569244,-8.881253,38.569244417,-8.88123655,2022-04-11,2022W15,,,,1.0,1,1,1,2022-07-25 06:16:47,2022-08-10 06:18:45,2022-07-25 21:33:41,-110,"7,2","-115,00","-0,38",,,,Sem JA,
0004A30B00FA89D1,PF_A0000421451,I-TRON Zero 2Z8 4.30-3M,Luminária,Freguesia de PINHAL NOVO,PT1508D2069100,38.6256140816,-8.9107094238,38.625622,-8.910678,38.6256140816,-8.9107094238,2022-03-10,2022W10,,,,1.0,1,1,1,2022-08-10 06:31:29,2022-08-09 22:18:17,2022-08-09 22:18:17,-122,0,"-121,60","-3,00",,,,Ok,
0004A30B00FAB0D9,PF_A0000421452,I-TRON Zero 2Z8 4.30-3M,Luminária,Freguesia de PINHAL NOVO,PT1508D2026300,38.6256782222,-8.91389057769,38.625687,-8.913913,38.6256782222,-8.91389057769,2022-03-10,2022W10,,,,1.0,1,1,1,2022-07-22 06:16:25,00:00:00,2022-07-27 06:29:46,-127,"-15,5",0,0,,,,Sem JR,
The python code is as follow:
import json
import csv
from csv import DictReader
from json import dumps
csvFilePath = "csv_files/test.csv"
jsonFilePath = "rssi.json"
try:
with open(csvFilePath, 'r') as csvFile:
reader = csv.reader(csvFile)
next(reader)
data = []
for row in reader:
data.append({row[9], row[10], row[24]})
print(data)
with open(jsonFilePath, "w") as outfile:
json.dump(data, outfile, indent=4, ensure_ascii=False)
except:
print("Something didn't go as planed!")
else:
print("Successfully exported the files to json!")
It print the right columns but in the wrong order (I want Latitude, Longitude and then lastULSNR), but after that it doesn´t write to json file.
Curly braces in {row[9], row[10], row[24]} mean set in python. Set doesn't preserve the order, it only keeps the unique set of values. Set is also non-serializable to json.
Try to use tuples or lists, e.g. (row[9], row[10], row[24]).
You could also use dict to make your code/output more readable:
row = {
"Latitude": row[9],
"Longitude": row[10],
"lastULSNR": row[24]
}
if row["Latitude"]:
# if latitude is not empty
# add row to output
data.append(row)
print(data)
# [{'Latitude': 38.569244417, 'Longitude': -8.881253, 'lastULSNR': 110}]

Checking CSV data for integer then removing that int

Looking to break up and check individual cells from a CSV file that was pulled from Excel with Python 3.8. For example, I have a CSV file with the information Honda 1, Toyota 2, Nissan 3... I want to check each cell (not sure what to call the data before the comma delimiter) for an integer and then I want to remove it but also put it in its own cell. So the CSV would then read Honda, 1, Toyota, 2, Nissan, 3... The main goal would be to get those integers in a column next to the manufacturers in Excel.
I am pretty new to python but have some coding background. The logic I was thinking of would be something along the lines of, if char is int then add to new file else add N/A. My main problem is using the data in a csv file to do it. I thought about putting the data from the csv into a variable but the real csv file has over 20,000 cells so I'm not sure if that would be very efficient.
So far my code looks like this:
import csv
path = '/Users/testFolder/Test.csv'
new_path = '/Users/testFolder/Test2.csv'
test_file = open(path,'r')
data = test_file.read()
write_file = open(new_path,'w')
write_file.write(data)
print(data)
file = csv.reader(open(path), delimiter = ',')
for line in file:
print(line)
test_file.close()
write_file.close()
Assuming the parts of each item are separated by one or more spaces, you can do it a row-at-time (instead of reading the whole file into memory) like this:
import csv
path = 'remove_test.csv'
new_path = 'remove_test2.csv'
with open(path, 'r', newline='') as test_file, \
open(new_path, 'w', newline='') as write_file:
reader = csv.reader(test_file, delimiter=',')
writer = csv.writer(write_file, delimiter=',')
for row in reader:
new_row = [part for item in row for part in item.split()]
writer.writerow(new_row)

Parsing .DAT file with Python

I need to convert a .dat file that's in a specific format into a .csv file.
The .dat file has multiple rows with a repeating structure. The data is held in brackets and have tags. Below is the sample data; it repeats throughout the data file:
{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}
Can anyone provide a starting point for the script?
This will create a csv assuming each line in your .DAT is json. Just order the header list to your liking
import csv, json
header = ['ID', 'name', 'type', 'area', 'HAC', 'verticalAccuracy', 'course', 'lat', 'lng']
with open('file.DAT') as datfile:
with open('output.csv', 'wb') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=header)
writer.writeheader()
for line in datfile:
writer.writerow(json.loads(line))
Your row is in json format. So, you can use:
import json
data = json.loads('{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}')
print data.get('name')
print data.get('ID')
This is only a start point. You have to iter all the .dat file. At the end, you have to write an exporter to save the data into the csv file.
Use a regex to find all of the data items. Use ast.literal_eval to convert each data item into a dictionary. Collect the items in a list.
import re, ast
result = []
s = '''{"name":"ABSDSDSRF","ID":"AFJDKGFGHF","lat":37,"lng":-122,"type":0,"HAC":5,"verticalAccuracy":4,"course":266.8359375,"area":"san_francisco"}'''
item = re.compile(r'{[^}]*?}')
for match in item.finditer(s):
d = ast.literal_eval(match.group())
result.append(d)
If each data item is on a separate line in the file You don't need the regex - you can just iterate over the file.
with open('file.dat') as f:
for line in f:
line = line.strip()
line = ast.literal_eval(line)
result.append(line)
Use json.load:
import json
with open (filename) as fh:
data = json.load (fh)
...

Python Read Text File Column by Column

So I have a text file that looks like this:
1,989785345,"something 1",,234.34,254.123
2,234823423,"something 2",,224.4,254.123
3,732847233,"something 3",,266.2,254.123
4,876234234,"something 4",,34.4,254.123
...
I'm running this code right here:
file = open("file.txt", 'r')
readFile = file.readline()
lineID = readFile.split(",")
print lineID[1]
This lets me break up the content in my text file by "," but what I want to do is separate it into columns because I have a massive number of IDs and other things in each line. How would I go about splitting the text file into columns and call each individual row in the column one by one?
You have a CSV file, use the csv module to read it:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
This still gives you data by row, but with the zip() function you can transpose this to columns instead:
import csv
with open('file.txt', 'rb') as csvfile:
reader = csv.reader(csvfile)
for column in zip(*reader):
Do be careful with the latter; the whole file will be read into memory in one go, and a large CSV file could eat up all your available memory in the process.

Parsing a pipe-delimited file in Python

I'm trying to parse a pipe-delimited file and pass the values into a list, so that later I can print selective values from the list.
The file looks like:
name|age|address|phone|||||||||||..etc
It has more than 100 columns.
Use the 'csv' library.
First, register your dialect:
import csv
csv.register_dialect('piper', delimiter='|', quoting=csv.QUOTE_NONE)
Then, use your dialect on the file:
with open(myfile, "rb") as csvfile:
for row in csv.DictReader(csvfile, dialect='piper'):
print row['name']
Use Pandas:
import pandas as pd
pd.read_csv(filename, sep="|")
This will store the file in a dataframe. For each column, you can apply conditions to select the required values to print. It takes a very short time to execute. I tried with 111,047 rows.
If you're parsing a very simple file that won't contain any | characters in the actual field values, you can use split:
fileHandle = open('file', 'r')
for line in fileHandle:
fields = line.split('|')
print(fields[0]) # prints the first fields value
print(fields[1]) # prints the second fields value
fileHandle.close()
A more robust way to parse tabular data would be to use the csv library as mentioned in Spencer Rathbun's answer.
In 2022, with Python 3.8 or above, you can simply do:
import csv
with open(file_path, "r") as csvfile:
reader = csv.reader(csvfile, delimiter='|')
for row in reader:
print(row[0], row[1])

Categories

Resources