I've got a bunch of .dcm-files (dice-files) where I would like to extract the header and save the information there in a CSV file.
As you can see in the following picture, I've got a problem with the delimiters:
For example when looking at the second line in the picture: I'd like to split it like this:
0002 | 0000 | File Meta Information Group Length | UL | 174
But as you can see, I've not only multiple delimiters but also sometimes ' ' is one and sometimes not. Also the length of the 3rd column varies, so sometimes there is only a shorter text there, e.g. Image Type further down in the picture.
Does anyone have a clever idea, how to write it in a CSV file?
I use pydicom to read and display the files in my IDE.
I'd be very thankful for any advice :)
I would suggest going back to the data elements themselves and working from there, rather than from a string output (which is really meant for exploring in interactive sessions)
The following code should work for a dataset with no Sequences, would need some modification to work with sequences:
import csv
import pydicom
from pydicom.data import get_testdata_file
filename = get_testdata_file("CT_small.dcm") # substute your own filename here
ds = pydicom.dcmread(filename)
with open('my.csv', 'w', newline='') as csvfile:
writer = csv.writer(csvfile)
writer.writerow("Group Elem Description VR value".split())
for elem in ds:
writer.writerow([
f"{elem.tag.group:04X}", f"{elem.tag.element:04X}",
elem.description(), elem.VR, str(elem.value)
])
It may also require a bit of change to make the elem.value part look how you want it, or you may want to set the CSV writer to use quotes around items, etc.
Output looks like:
Group,Elem,Description,VR,value
0008,0005,Specific Character Set,CS,ISO_IR 100
0008,0008,Image Type,CS,"['ORIGINAL', 'PRIMARY', 'AXIAL']"
0008,0012,Instance Creation Date,DA,20040119
0008,0013,Instance Creation Time,TM,072731
...
Related
I'm just getting started on python programming.
Here is an example of my CSV file :
Name
tag.
description
Cool
cool,fun
cool ...
Cell
Cell,phone
Cell ...
Rang
first,third
rang ...
The print with the CSV module gives me a list of all rows, either:
['cool',''cool,fun'','cool...']
['cell',''cell,phone'','cell...']
What I want to do is to printer that cool or cell, phone
I'm also new to programming, but I think I know what you're asking.
How to use CSV module in python
The answer for your question
What you asked "printer that cool or cell, phone" is easy to implement, you can try below code in terminal:
import csv
with open('your_file_path', 'r', newline='', encoding='utf-8') as f:
reader = csv.reader(f)
rows = list(reader)
print(rows[1][0])
print(rows[2][1])
My thoughts
Actually, you should consider the following two points when understanding this problem:
Content, that is, the content you want to print in your terminal, you need to first make sure that what you want is a specific row or column or a specific cell;
The format, that is, the list you side or those quotation marks, these are the types of data in the file, and they must be carefully distinguished.
In addition, it would be better for you to read some articles or materials processed about CSV module, such as the following:
https://docs.python.org/3/library/csv.html#reader-objects
https://www.geeksforgeeks.org/reading-rows-from-a-csv-file-in-python
https://www.tutorialspoint.com/working-with-csv-files-in-python-programming
I am also unskilled in many places, please forgive me if there are mistakes or omissions.
I am new here to try to solve one of my interesting questions in World of Tanks. I heard that every battle data is reserved in the client's disk in the Wargaming.net folder because I want to make a batch of data analysis for our clan's battle performances.
image
It is said that these .dat files are a kind of json files, so I tried to use a couple of lines of Python code to read but failed.
import json
f = open('ex.dat', 'r', encoding='unicode_escape')
content = f.read()
a = json.loads(content)
print(type(a))
print(a)
f.close()
The code is very simple and obviously fails to make it. Well, could anyone tell me the truth about that?
Added on Feb. 9th, 2022
After I tried another set of codes via Jupyter Notebook, it seems like something can be shown from the .dat files
import struct
import numpy as np
import matplotlib.pyplot as plt
import io
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
fbuff = io.BufferedReader(f)
N = len(fbuff.read())
print('byte length: ', N)
with open('C:/Users/xukun/Desktop/br/ex.dat', 'rb') as f:
data =struct.unpack('b'*N, f.read(1*N))
The result is a set of tuple but I have no idea how to deal with it now.
Here's how you can parse some parts of it.
import pickle
import zlib
file = '4402905758116487.dat'
cache_file = open(file, 'rb') # This can be improved to not keep the file opened.
# Converting pickle items from python2 to python3 you need to use the "bytes" encoding or "latin1".
legacyBattleResultVersion, brAllDataRaw = pickle.load(cache_file, encoding='bytes', errors='ignore')
arenaUniqueID, brAccount, brVehicleRaw, brOtherDataRaw = brAllDataRaw
# The data stored inside the pickled file will be a compressed pickle again.
vehicle_data = pickle.loads(zlib.decompress(brVehicleRaw), encoding='latin1')
account_data = pickle.loads(zlib.decompress(brAccount), encoding='latin1')
brCommon, brPlayersInfo, brPlayersVehicle, brPlayersResult = pickle.loads(zlib.decompress(brOtherDataRaw), encoding='latin1')
# Lastly you can print all of these and see a lot of data inside.
The response contains a mixture of more binary files as well as some data captured from the replays.
This is not a complete solution but it's a decent start to parsing these files.
First you can look at the replay file itself in a text editor. But it won't show the code at the beginning of the file that has to be cleaned out. Then there is a ton of info that you have to read in and figure out but it is the stats for each player in the game. THEN it comes to the part that has to do with the actual replay. You don't need that stuff.
You can grab the player IDs and tank IDs from WoT developer area API if you want.
After loading the pickle files like gabzo mentioned, you will see that it is simply a list of values and without knowing what the value is referring to, its hard to make sense of it. The identifiers for the values can be extracted from your game installation:
import zipfile
WOT_PKG_PATH = "Your/Game/Path/res/packages/scripts.pkg"
BATTLE_RESULTS_PATH = "scripts/common/battle_results/"
archive = zipfile.ZipFile(WOT_PKG_PATH, 'r')
for file in archive.namelist():
if file.startswith(BATTLE_RESULTS_PATH):
archive.extract(file)
You can then decompile the python files(uncompyle6) and then go through the code to see the identifiers for the values.
One thing to note is that the list of values for the main pickle objects (like brAccount from gabzo's code) always has a checksum as the first value. You can use this to check whether you have the right order and the correct identifiers for the values. The way these checksums are generated can be seen in the decompiled python files.
I have been tackling this problem for some time (albeit in Rust): https://github.com/dacite/wot-battle-results-parser/tree/main/datfile_parser.
using python I'm trying to create summary with existing data of csv and finding difficulties in extracting data from one of the cell.
the input csv file
I want to include only the city name and file path from info 4 column and expecting the summary like - AlexxxxxyyyyzzzzzNewyork\Folder1\Folder2\Test.txt
the code
csv_data_out[csv_line_out].append(conten[Name])
csv_data_out[csv_line_out].append(conten[info 1])
csv_data_out[csv_line_out].append(conten[info 2])
csv_data_out[csv_line_out].append(conten[info 3])
csv_data_out[csv_line_out].append(conten[info 4])
csv_summary = ("".join(csv_data_out[csv_line_out]))
with open(outputfile, 'wb') as newfile:
writer = csv.writer(newfile, delimiter = ';')
writer.writerow(csv_columns_out[:])
writer.writerows(csv_data_out)
newfile.close()
any idea to fetch only the required details from info 4 col ?
Essentially you have a csv inside a csv. There's not info posted to give a fully complete answer but here's most of it.
You can take a string and process it as a csv using io.StringIO (or io.BytesIO if a byte string).
#! /usr/bin/env python
# -*- coding: utf-8 -*-
import csv
from io import StringIO
# Create somewhere to put the inputs in case needed later
stored_items = []
with open('data.csv', 'r') as csvfile:
inputs = csv.reader(csvfile)
# skip the header row
next(inputs)
for row in inputs:
# Extract the Info 4 column for processing
f = StringIO(row[4])
string_file = csv.reader(f,quotechar='"')
build_string = ""
for string_row in string_file:
build_string = f"{string_row[0]}{string_row[1]}"
# Merge everything into a summary
summary_string = f"{row[0]}{row[1]}{row[2]}{row[3]}{build_string}"
# Add all the data back to storage
stored_items.append((row[0],row[1],row[2],row[3],row[4],summary_string))
print(summary_string)
The reason why I say there's not enough information posted in because, for example, will the location always be (a) which can have a fixed text replacement, or will be conditional e.g. it could be (a) or (b) in which case it would possibly require regex. (My preference is not to use regex unless absolutely necessary). Also, is it always the first two terms you are after from Info 4, or will the terms be found in different places in the text etc. Without seeing more samples of the data it's impossible to answer definitely.
Objective
I'm trying to extract the GPS "Latitude" and "Longitude" data from a bunch of JPG's and I have been successful so far but my main problem is that when I try to write the coordinates to a text file for example I see that only 1 set of coordinates was written compared to my console output which shows that every image was extracted. Here is an example: Console Output and here is my text file that is supposed be a mirror output along my console: Text file
I don't fully understand whats the problem and why it won't just write all of them instead of one. I believe it is being overwritten somehow or the 'GPSPhoto' module is causing some issues.
Code
from glob import glob
from GPSPhoto import gpsphoto
# Scan jpg's that are located in the same directory.
data = glob("*.jpg")
# Scan contents of images and GPS values.
for x in data:
data = gpsphoto.getGPSData(x)
data = [data.get("Latitude"), data.get("Longitude")]
print("\nsource: {}".format(x), "\n ↪ {}".format(data))
# Write coordinates to a text file.
with open('output.txt', 'w') as f:
print('Coordinates:', data, file=f)
I have tried pretty much everything that I can think of including: changing the write permissions, not using glob, no loops, loops, lists, no lists, different ways to write to the file, etc.
Any help is appreciated because I am completely lost at this point. Thank you.
You're replacing the data variable each time through the loop, not appending to a list.
all_coords = []
for x in data:
data = gpsphoto.getGPSData(x)
all_coords.append([data.get("Latitude"), data.get("Longitude")])
with open('output.txt', 'w') as f:
print('Coordinates:', all_coords, file=f)
I have a log record like this (millions of rows):
previous_status>SERVICE</previous_status><reason>1</>device_id>SENSORS</device_id><DEVICE>ISCS</device_type><status>OK
I would like to to extract all the words in capital into individual columns in excel using python to look like this :
SERVICE SENSORS DEVICE
As per the comments from #peter-wood, it isn't clear what your input is. However, assuming that your input is as you posted, then here is a minimal solution that works off the given structure. If it is not quite right, you should be able to easily change it to search on whatever is really your structure.
import csv
# You need to change this path.
lines = [row.strip() for row in open('/path/to/log.txt').readlines()]
# You need to change this path to where you want to write the file.
with open('/path/to/write/to/mydata.csv', 'w') as fh:
# If you want a different delimiter, like tabs '\t', change it here.
writer = csv.writer(fh, delimiter=',')
for l in lines:
# You can cut and paste the tokens that start and stop the pieces you are looking for here.
service = l[l.find('previous_status>')+len('previous_status>'):l.find('</previous_status')]
sensor = l[l.find('device_id>')+len('device_id>'):l.find('</device_id>')]
device = l[l.find('<DEVICE>')+len('<DEVICE>'):l.find('</device_type>')]
writer.writerow([service, sensor, device])