first time poster - long time lurker.
I'm a little rough around the edges when it comes to Python and I'm running into an issue that I imagine has an easy fix.
I have a CSV file that I'm looking to read and perform a semi-advanced lookup.
The CSV in question is not really comma delimited when inspecting it in VS Code except the last "column".
Example (direct format screenshot from the file):
screenshot
The line that seems to have issues is:
import csv
import sys
from util import Node, StackFrontier, QueueFrontier
names = {}
people = {}
titles = {}
def load_data(directory):
with open(f"{directory}/file.csv", encoding="utf-8") as f:
reader = csv.DictReader(f)
for row in reader:
people[row["id"]] = {
"primaryName": row["primaryName"],
"birthYear": row["birthYear"],
"titles": set()
}
if row["primaryName"].lower() not in names:
names[row["primaryName"].lower()] = {row["nconst"]}
else:
names[row["primaryName"].lower()].add(row["nconst"])
The error I receive is:
File "C:\GitHub\Project\data-test.py", line 24, in load_data
"primaryName": row["primaryName"],
~~~^^^^^^^^^^^^^^^
KeyError: 'primaryName'
I've tried this with other CSV files where they are comma delimited, (screenshot example below):
screenshot
And that works perfectly fine. I noticed the above CSV file has the names in ""s which I imagine could be part of the solution.
Ultimately, if I can get it to work with the code above that would be great - otherwise, is there an easy way to automatically format the CSV file to put quotations around the names and separate the value by commas like the csv that's working above?
Thanks in advance for any help.
Related
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
I am currently stuck with a Python project where I want to get the information out of an online CSV file and want to let the user search via an input function. At the moment, I am able to get the information from the online CSV file via a link but I cannot make the connection so that it searches the exact word in that CSV file.
I currently have tried multiple tutorials but most of them aren't solving my issue. So with a lot of pain, I am writing this message here, hoping someone can help me out.
The code I have so far is:
import csv
import urllib.request
metar_search = input('Enter the ICAO station\n')
url = 'https://www.aviationweather.gov/adds/dataserver_current/current/metars.cache.csv'
response = urllib.request.urlopen(url)
lines = [l.decode('utf-8') for l in response.readlines()]
cr = csv.reader(lines)
for row in cr:
if metar_search == row[0]:
print(row)
In the CSV file, the first row is what I am looking for. It is the METAR information of an airport. So, I want the user to type the ICAO code (for example KJFK), then I want the line of text the weather information of that station (example: KJFK 051851Z 15010KT 10SM FEW017 FEW035 FEW250 27/19 A3006 RMK AO2 SLP177 T02670194).
When I currently type KJFk, it is not returning any information back.
The current code is probably a bit messy because I have tried several things, I also tried to make a function of it but without luck. What am I doing wrong?
I hope someone is able to help me out with this question.
Thank you so much in advance.
Try
...
for row in cr:
if row[0].startswith(metar_search):
print(row)
or
...
lines = [l.decode('utf-8') for l in response.readlines()[5:]]
cr = csv.reader(lines)
for row in cr:
if metar_search == row[1]:
print(row)
Hint: Take a closer look at the data.
If you know that there's only one result then you could stop searching after you found the row:
...
print(row)
break
Objective
I'm trying to extract the GPS "Latitude" and "Longitude" data from a bunch of JPG's and I have been successful so far but my main problem is that when I try to write the coordinates to a text file for example I see that only 1 set of coordinates was written compared to my console output which shows that every image was extracted. Here is an example: Console Output and here is my text file that is supposed be a mirror output along my console: Text file
I don't fully understand whats the problem and why it won't just write all of them instead of one. I believe it is being overwritten somehow or the 'GPSPhoto' module is causing some issues.
Code
from glob import glob
from GPSPhoto import gpsphoto
# Scan jpg's that are located in the same directory.
data = glob("*.jpg")
# Scan contents of images and GPS values.
for x in data:
data = gpsphoto.getGPSData(x)
data = [data.get("Latitude"), data.get("Longitude")]
print("\nsource: {}".format(x), "\n ↪ {}".format(data))
# Write coordinates to a text file.
with open('output.txt', 'w') as f:
print('Coordinates:', data, file=f)
I have tried pretty much everything that I can think of including: changing the write permissions, not using glob, no loops, loops, lists, no lists, different ways to write to the file, etc.
Any help is appreciated because I am completely lost at this point. Thank you.
You're replacing the data variable each time through the loop, not appending to a list.
all_coords = []
for x in data:
data = gpsphoto.getGPSData(x)
all_coords.append([data.get("Latitude"), data.get("Longitude")])
with open('output.txt', 'w') as f:
print('Coordinates:', all_coords, file=f)
I'm a new coder, currently trying to create a sample code for something bigger. I google'd most of my problems but I could not find any answers for the "final" problem, so I decided to post it.
This code is basically something that opens a Excel file, gets a specific column's data, edits the data and saves it to a text file called "Saved.txt". The code works good till here. My problem is when I'm trying to upload all the data line by line into row by row to another Excel file. Please help me out!
import openpyxl
from openpyxl import Workbook
Code = "0507"
Save = open("Saved.txt","a")
#Reading from XLSX and writing into a TEXT FILE after appending the data.
fname = 'Wekanda 2.xlsx'
wb = openpyxl.load_workbook(fname)
sheet = wb.get_sheet_by_name('Wekanda 2')
for rowOfCellObjects in sheet['R2':'R282']:
for cellObj in rowOfCellObjects:
line_w = cellObj.value
line_w = str(line_w)
line_w = line_w.replace(" ","#")
Save.write("\n"+Code+line_w)
test = str(line_w)
Save.close()
This code works perfectly till here.
#Storing into Excel!
book = Workbook()
sheet = book.active
with open("Saved.txt","r") as f:
for line in f:
for i in range(1,281):
Pointer = "A"+str(i)
sheet[Pointer] = line
book.save("Next.xlsx")
OUTPUT
- EXCEL FILE | TEXT FILE | OUTPUT FILE
50 B2-3/3 | 050750#B2-3/3 | 050715
50 B2-3/4 | 050750#B2-3/4 | 050715
50 B2-3/5 | 050750#B2-3/5 | 050715
I want the content in the TEXT FILE to exactly be there on the OUTPUT FILE.
Content in Text FILE.
050730
050740
050740A
050740B
050740-1/1
050740-1/2
050740-2/1
050740-2/2
050740-3/1
050740-3/2
050740-4/1
050740-4/2
I assume that you found the issue by now but I thought I'd pitch in with the answer here anyway (so we can start emptying the forum on unanswered questions).
You are creating a lot of cells that are then assigned the same value i.e. line. I assume that the value that you see in the output is part of the complete list of values? As a side note I might add that you are using a hard coded value in the range which should be avoided when possible.
Being a beginner myself I venture to suggest a solution that might work better.
with open("Saved.txt","r") as f:
for i, line in enumerate(f):
Pointer = "A{}".format(i+1)
sheet[Pointer] = line
For more details on the enumerate function check out the official documentation here. Basically it provides an index value for each element starting at zero.
Edit: Accidentally kept the str() around the i.
I collected some tweets from the twitter API and stored it to mongodb, I tried exporting the data to a JSON file and didn't have any issues there, until I tried to make a python script to read the JSON and convert it to a csv. I get this traceback error with my code:
json.decoder.JSONDecodeError: Extra data: line 367 column 1 (char 9745)
So, after digging around the internet I was pointed to check the actual JSON data in an online validator, which I did. This gave me the error of:
Multiple JSON root elements
from the site https://jsonformatter.curiousconcept.com/
Here are pictures of the 1st/2nd object beginning/end of the file:
or a link to the data here
Now, the problem is, I haven't found anything on the internet of how to handle that error. I'm not sure if it's an error with the data I've collected, exported, or if I just don't know how to work with it.
My end game with these tweets is to make a network graph. I was looking at either Networkx or Gephi, which is why I'd like to get a csv file.
Robert Moskal is right. If you can address the issue at source and use --jsonArray flag when you use mongoexport then it will make the problem easier i guess. If you can't address it at source then read the below points.
The code below will extract you the individual json objects from the given file and convert them to python dictionaries.
You can then apply your CSV logic to each individual dictionary.
If you are using csv module then I would say use unicodecsv module as it would handle the unicode data in your json objects.
import json
with open('path_to_your_json_file', 'rb') as infile:
json_block = []
for line in infile:
json_block.append(line)
if line.startswith('}'):
json_dict = json.loads(''.join(json_block))
json_block = []
print json_dict
If you want to convert it to CSV using pandas you can use the below code:
import json, pandas as pd
with open('path_to_your_json_file', 'rb') as infile:
json_block = []
dictlist=[]
for line in infile:
json_block.append(line)
if line.startswith('}'):
json_dict = json.loads(''.join(json_block))
dictlist.append(json_dict)
json_block = []
df = pd.DataFrame(jsonlist)
df.to_csv('out.csv',encoding='utf-8')
If you want to flatten out the json object you can use pandas.io.json.json_normalize() method.
Elaborating on #MYGz suggestion to use --jsonArray
Your post doesn't show how you exported the data from mongo. If you use the following via the terminal, you will get valid json from mongodb:
mongoexport --collection=somecollection --db=somedb --jsonArray --out=validfile.json
Replace somecollection, somedb and validfile.json with your target collection, target database, and desired output filename respectively.
The following: mongoexport --collection=somecollection --db=somedb --out=validfile.json...will NOT give you the results you are looking for because:
By default mongoexport writes data using one JSON document for every
MongoDB document. Ref
A bit late reply, and I am not sure it was available the time this question was posted. Anyway, now there is a simple way to import the mongoexport json data as follows:
df = pd.read_json(filename, lines=True)
mongoexport provides each line as a json objects itself, instead of the whole file as json.