Mykrobe predictor JSON to TSV Converter - python

I wanted to ask a question regarding file conversion.
I have a JSON file (after AMR prediction execution) that I want to covert to a TSV file based on Mykrobe-predictor scripts (json_to_tsv.py) and this is my JSON output (result_TB.json).
./json_to_tsv.py /path/to/JSON_file
When I pasted a command into the terminal, I got a IndexError at Line 78.
https://github.com/iqbal-lab/Mykrobe-predictor/blob/master/scripts/json_to_tsv.py#L78
def get_sample_name(f):
return f.split('/')[-2]
And here is the error I get:
mykrobe_version file plate_name sample drug phylo_group species lineage phylo_group_per_covg species_per_covg lineage_per_covg phylo_group_depth species_depth lineage_depth susceptibility variants (gene:alt_depth:wt_depth:conf) genes (prot_mut-ref_mut:percent_covg:depth)
Traceback (most recent call last):
File "./json_to_tsv.py", line 157, in <module>
sample_name = get_sample_name(f)
File "./json_to_tsv.py", line 78, in get_sample_name
return f.split('/')[-2]
IndexError: list index out of range
Any suggestions would be appreciated.

Looking at the code I guess they expect to call the converter with something like:
python json_to_tsv.py plate/sample1/sample1.json
Try copying your JSON file to a directory called sample1 inside a directory called plate and see if you get the same error when you call it like in the example above.
Update
The problem is indeed as described above.
Doesn't work:
python json_to_tsv.py result_TB.json
mykrobe_version file plate_name sample drug phylo_group species lineage phylo_group_per_covg species_per_covg lineage_per_covg phylo_group_depth species_depth lineage_depth susceptibility variants
(gene:alt_depth:wt_depth:conf) genes
(prot_mut-ref_mut:percent_covg:depth)
Traceback (most recent call last): File "json_to_tsv.py", line 157, in <module>
sample_name = get_sample_name(f) File "json_to_tsv.py", line 78, in get_sample_name
return f.split('/')[-2] IndexError: list index out of range
Works:
python json_to_tsv.py plate/sample/result_TB.json
mykrobe_version file plate_name sample drug phylo_group species lineage phylo_group_per_covg species_per_covg lineage_per_covg phylo_group_depth species_depth lineage_depth susceptibility variants (gene:alt_depth:wt_depth:conf) genes (prot_mut-ref_mut:percent_covg:depth)
-1 result_TB plate sample NA

Related

Key Error in Bioinformatics Program Using Pandas

I'll try to keep this as short as possible. I'm trying to create a bioinformatics program for our patient 'reporting' team. To preface this, examples I will be giving are just examples and not actual patient information.
The script I'm writing will take the results of a patients genetic test, take their nucleotide results via specific snps we test for.(organized via rsID from NCBI). This patient information is merged with a reference library I've made and will be compared with it. The goal is to 1.)Merge these files. 2.)Have patient nucleotide results compared to the nucleotides from the reference library. 3.) Create a "Flag" if the patients nucleotide is rare and from a small frequency percentage.
The issue I'm having, is that when running the script, after uploading the patient file and population data, I'm getting a Key Error, as its not able to find the rsID column on the patient .csv.
I'll add 2 photos of what each .csv file looks like
enter image description here population data
enter image description here patient data
Here is a short excerpt of the code
onClick('Upload Patient Files First')
patient_data = pd.read_csv(ask_path(),)
###patient_genotype = patient_data.loc[patient_data['rsID'] == rsID]['NCBI SNP Reference']
##Not using
onClick('Upload Population Frequency Data Next')
pop_ref_data = pd.read_csv(ask_path())
#Creating a dictionary of the population reference data
def pop_dict(pop_ref_data):
pop_ref_dict = {}
for _, row in pop_ref_data.iterrows():
variant_data ={}
rsID = row['rsID']
dominant_nucleotide = row['DomNucl']
recessive_nucleotide = row['RecNucl']
dominant_freq = row['DomAllele']
recessive_freq = row['RecessiveAllele']
variant_data[dominant_nucleotide]= dominant_freq
variant_data[recessive_nucleotide]= recessive_freq
pop_ref_dict[rsID] = variant_data
return pop_ref_dict
The population data is pretty straight forward. I'm getting stuck on the first check though. under the column "rsID" is where i'm getting the Key Error.
The patient data is further down on its respective CSV. I'm trying to get it to find the information under the columns 'NCBI SNP Reference' and 'Call'.
Quick Edit: These are my Traceback calls. Also, to answer another question... Yes, I'm trying to bypass all of the header info on the CSV so that I can just use the bulk information I actually need once the genotyping run is finished.
Traceback (most recent call last):
File "C:\Users\rcthu\PycharmProjects\WorkStuff\venv\lib\site-packages\pandas\core\indexes\base.py", line 3802, in get_loc
return self._engine.get_loc(casted_key)
File "pandas_libs\index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
File "pandas_libs\hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas_libs\hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'rsID'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\rcthu\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\Flag Process 2.12.py", line 61, in
pop_ref_row = pop_dict(pop_ref_data)
File "C:\Users\rcthu\AppData\Roaming\JetBrains\PyCharmCE2022.2\scratches\Flag Process 2.12.py", line 41, in pop_dict
rsID = row['rsID']
File "C:\Users\rcthu\PycharmProjects\WorkStuff\venv\lib\site-packages\pandas\core\series.py", line 981, in getitem
return self._get_value(key)
File "C:\Users\rcthu\PycharmProjects\WorkStuff\venv\lib\site-packages\pandas\core\series.py", line 1089, in _get_value
loc = self.index.get_loc(label)
File "C:\Users\rcthu\PycharmProjects\WorkStuff\venv\lib\site-packages\pandas\core\indexes\base.py", line 3804, in get_loc
raise KeyError(key) from err
KeyError: 'rsID'
Process finished with exit code 1
The first thing to notice is that 'rsID' is the first key that you are calling. Looking at your data, rsID may not be what you expect since it is over an index.
You should be able to set a breakpoint before the line that breaks and run your code in debug mode. Once you're at the breakpoint you should be able to see what 'row' really is and what keys it has.
You could also just print(row) then return to get the first one.
Hope this helps.

pd.to_datetime error after saving csv file without doing anything

when I was using pd.to_datetime, my code is like below
rate = pd.read_csv('P2training.csv', header=0)
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d')
rate.set_index('Date', inplace=True, drop=True)
rate.tail(10)
print(rate)
in P2training.csv, first column is 'Date' and this code ran well when I first downloaded P2training dataset. However after I open the csv file and save it without doing anything else, this code started to report errors below. If I put the original downloaded file to replace the 'saved' file, the code can still run properly.
C:\Users\yaojia\AppData\Local\Continuum\Anaconda3\lib\site-packages\statsmodels\compat\pandas.py:56:
FutureWarning: The pandas.core.datetools module is deprecated and will
be removed in a future version. Please use the pandas.tseries module
instead. from pandas.core import datetools Traceback (most recent
call last): File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 444, in _convert_listlike
values, tz = tslib.datetime_to_datetime64(arg) File "pandas_libs\tslib.pyx", line 1810, in
pandas._libs.tslib.datetime_to_datetime64 (pandas_libs\tslib.c:33275)
TypeError: Unrecognized value type:
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File
"C:/Users/yaojia/.PyCharmEdu4.0/config/scratches/scratch_7.py", line
23, in
rate['Date'] = pd.to_datetime(rate['Date'], format='%Y-%m-%d') File
"C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 509, in to_datetime
values = _convert_listlike(arg._values, False, format) File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 447, in _convert_listlike
raise e File "C:\Users\yaojia\AppData\Roaming\Python\Python36\site-packages\pandas\core\tools\datetimes.py",
line 435, in _convert_listlike
require_iso8601=require_iso8601 File "pandas_libs\tslib.pyx", line 2355, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:46617) File "pandas_libs\tslib.pyx", line
2484, in pandas._libs.tslib.array_to_datetime
(pandas_libs\tslib.c:44616) ValueError: time data '12/31/1979'
doesn't match format specified
Process finished with exit code 1
Could anyone give any hint what's going wrong?
I guess you open the csv with excel? If yes, excel recognize that column 'Date' are indeed dates and parse the column in it's own date format (in your case 'day/month/year') and save it this way while you are expecting 'year-month-day'.
I suggest you to open/save your csv with a text editor or change the default excel date format...

How merge or Concatenating two midi files in python

I am working on a project which is producing midi files as output. But those midi files are very short and i want to merge or concatenate them to produce a single long midi file. I am using python-midi library and there is not so much information or documentation about this library. I tried with this program but its giving error :
import midi
pattern=midi.read_midifile("kl.mid")
track=midi.Track()
pattern2=midi.read_midifile("oi.mid")
pattern.append(pattern2)
midi.write_midifile("aaka.mid",pattern)
error :
Traceback (most recent call last):
File "lp.py", line 6, in <module>
midi.write_midifile("aaka.mid",pattern)
File "/home/userdf/.local/lib/python2.7/site-packages/midi/fileio.py", line 152, in write_midifile
return writer.write(midifile, pattern)
File "/home/userdf/.local/lib/python2.7/site-packages/midi/fileio.py", line 102, in write
self.write_track(midifile, track)
File "/home/userdf/.local/lib/python2.7/site-packages/midi/fileio.py", line 116, in write_track
buf += self.encode_midi_event(event)
File "/home/userdf/.local/lib/python2.7/site-packages/midi/fileio.py", line 125, in encode_midi_event
ret += write_varlen(event.tick)
AttributeError: 'Track' object has no attribute 'tick'
I googled a lot but have not found a way to add two midi files via python.
Let me know how can i do it ?
Thanks in advance.
read the midi files and save the patterns
pattern1 = midi.read_midifile(file1)
pattern2 = midi.read_midifile(file2)
then read each track from each pattern
pattern = midi.Pattern()
for track in pattern1:
pattern.append(track)
for track in pattern2:
pattern.append(track)
finally save the file with new pattern
midi.write_midifile('sound.mid', pattern)

Unexpected EOF while parsing using Python tool cif2cell

I have an error when trying to use a Python tool cif2cell. What cif2cell does it take a .cif file in and return a .cell file.
Intro can be found elsewhere about what it is, but its basically a materials modelling tool, and what I'm trying to do is input a .cif file that represents this 'tile' of atoms, and return a .cell file that is a structure of many of these tiles - The supercell.
In my case I'm going for a 5x5x1 supercell, as can be seen.
Here's the terminal command;
$ ./cif2cell -p castep -f 9000046.cif -o structure1.cell --supercell = [5 5 1]
Which is yielding the following error;
Traceback (most recent call last):
File "./cif2cell", line 354, in <module>
supercellmap = safe_matheval(options.supercellmap)
File "/Users/ 'my name' /Desktop/castep-8.0-macosx-intel/utils.py", line 525, in safe_matheval
return eval(sexpr,{"__builtins__":None},safe_dict)
File "<string>", line 1
=
^
SyntaxError: unexpected EOF while parsing
I have seen that the error has been come across by many writing simple programs, but I have not found any resolutions/examples where a tool has been queried.

working around an error in a python module

So I have found an error in a module I am using (which one is unimportant, but if you must know it's geopy.distance)
So I know what I have to do to fix the code, but when I open the .py file and edit it, it acts as if it was not edited!
Here is the error traceback before edited:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/geopy-0.95.1-py2.7.egg/geopy/distance.py", line 37, in __init__
kilometers += self.measure(a, b)
File "/Library/Python/2.7/site-packages/geopy-0.95.1-py2.7.egg/geopy/distance.py", line 72, in measure
raise NotImplementedError
NotImplementedError
Here is the error traceback after I edit the file:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/geopy-0.95.1-py2.7.egg/geopy/distance.py", line 37, in __init__
kilometers += self.measure(a, b)
File "/Library/Python/2.7/site-packages/geopy-0.95.1-py2.7.egg/geopy/distance.py", line 72, in measure
a, b = Point(a), Point(b)
NotImplementedError
as you can see, I changed it so it would not raise NotImplementedError, but it is still raising it! How is this possible?
Looks like you've edited a .py file, but the user process doesn't have permission to overwrite the .pyc file. The NotImplementedError is still on line 72 according to the .pyc file, but it's displaying the current line 72 from the .py file
Aside: Looks like Distance is an abstract class. You're not supposed to instantiate directly, but one of the subclasses of it. Eg GreatCircleDistance or VincentyDistance
Also notice the last two lines in the file
# Set the default distance formula to the most generally accurate.
distance = VincentyDistance

Categories

Resources