I have a tab separated file that I am trying to parse and for that I am doing this :
header of my file :
chrom coord ref_base var_base A C G T
17 26695663 G A 1 0 1934 0
17 26695664 T A 1 0 1 1935
my code is :
counts = pd.read_csv(args.counts_file, sep='\t')
toto = counts[(counts['chrom'].astype(str) == "17") & (counts['coord'].astype(str) == "26695663")]
print toto["G"].values[0]
this function returns the number wanted which is 1934
Now when I try to create a function that takes arguments the dataframe read from the file, I wrote this function
def get_foreground_counts(chrom, coord, counts, ref_base, var_base):
foreground_counts = counts[(counts['chrom'] == chrom) & (counts['coord'] == coord)]
foreground_ref_counts = foreground_counts[ref_base].values[0]
foreground_var_counts = foreground_counts[var_base].values[0]
return foreground_ref_counts, foreground_var_counts
I got this error that I am trying to figure out but still cant see why
Traceback (most recent call last):
File "test.py", line 203, in <module>
main(args)
File "test.py", line 71, in main
foreground_ref_counts, foreground_var_counts = get_foreground_counts(chrom, coord, counts, ref_base, var_base)
File "test.py", line 137, in get_foreground_counts
foreground_ref_counts = foreground_counts[ref_base].values[0]
IndexError: index out of bounds
Any idea why ?
Thanks
UPDATE
When I try to print foreground_counts[ref_base].values I get this []
What I am passing to the function is chrom (string), coord(string), counts(panda dataframe), ref_base (string), var_base(string) )
In your function, your filter does return zero rows, that's why you get the error. It seems you forgot the .astype(str) in your function's first line.
You could either cast the column type before calling the function or modify that line. The former would be a better approach if you really need to use a string type, otherwise why don't you use integer values for the comparison?.
Related
For a university assignment I was asked to convert a 1 line text file into a 2d array. However, when I run the program, I get this error:
(venv) D:\Uni Stuff\Year 2\AIGP\Assignment\PYTHONASSIGNMEN>python astar.py
Input file name: Lab9TerrainFile1.txt
Traceback (most recent call last):
File "D:\Uni Stuff\Year 2\AIGP\Assignment\PYTHONASSIGNMEN\astar.py", line 129, in <module>
main()
File "D:\Uni Stuff\Year 2\AIGP\Assignment\PYTHONASSIGNMEN\astar.py", line 110, in main
number_of_rows = maze_file[1]
IndexError: index 1 is out of bounds for axis 0 with size 1
This is the code for generating the maze:
def main():
maze_file = open(input("Input file name: "), "r").readlines()
maze_file = np.array([maze_file])
number_of_columns = maze_file[0]
number_of_rows = maze_file[1]
maze_column = np.array_split(maze_file[2:8], number_of_columns)
maze_row = np.array_split(maze_file[2:8], number_of_rows)
maze = np.concatenate([maze_column][maze_row])
start = np.where(maze == 2)
end = np.where(maze == 3)
maze_file.close()
path = astar(maze, start, end)
print(path)
Any help would be appreciated and thank you!
You can test this by checking the size of your array maze_file by running the code below.
print(len(maze_file))
If it returns 1, then it means it only has 1 element.
maze_file[0] means you are getting the first element. Hence, the index 0 between the square brackets. When you specify maze_file[1], its trying to get the 2nd element, which doesn't exists. Hence the error Index out of Bounds.
Reviewing your code, it looks like you are trying to get the number of columns and rows for the array. You can use the following code.
number_of_columns = len(maze_file)
number_of_rows = len(maze_file[0])
I have data-frame which has column name price. So i want to draw a distribution plot for that column. and i want to assign the graph name as column_name so that i can the graph when i need in multiple places even though i have number of distribution, I can call required graph separately, here i have column are dynamic.
x = 'price'
y = sns.distplot(df[x])
exec("%s = %s" % (x,y))
print(price)
I have try this code but throwing an error like
Traceback (most recent call last):
File "/home/mahesh/.local/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3287, in run_code
last_expr = (yield from self._async_exec(code_obj, self.user_ns))
File "<ipython-input-36-f28fdca73b33>", line 8, in async-def-wrapper
File "<string>", line 1
price = AxesSubplot(0.125,0.125;0.775x0.755)
^
SyntaxError: invalid syntax
One way is using a function
x = df.price
def displot(j):
sns.distplot(j)
displot(x)
I have a dictionary with a Ref_ID for street link as the key and sequenced stops on that street as the values. I want to determine if the stops are out of sequence. I have a dict consisting of items like so 1234567:[5,10,15,35,] where the array represents the sequences on a given block.
I am using a while loop with in a for loop that iterates through each value until the count = 2, appending the values to a tuple and then subtracting the first value from the second. I d the difference is greater than 40 I wan the program to store the link in another list under the route it's associated with.
I am presently getting a memory error when running the script:
eCheck = []
oCheck = []
for key, value in eLinks.items():
for k in value:
eValues.append(k)
eList = sorted(eValues)
for i in eList:
eValuescount = 0
while eValuescount < 2:
eCheck.append(k)
eItemscount += 1
x = eValues[1] - eValues[0]
print x
if x > 40:
eCheckStreet.append(key)
print "Route ", route, " even side"
for link in eCheckStreet:
print link
Here is the error:
Traceback (most recent call last):
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\framework\scriptutils.py", line 323, in RunScript
debugger.run(codeObject, __main__.__dict__, start_stepping=0)
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\debugger\__init__.py", line 60, in run
_GetCurrentDebugger().run(cmd, globals,locals, start_stepping)
File "C:\Python27\ArcGIS10.2\Lib\site-packages\pythonwin\pywin\debugger\debugger.py", line 654, in run
exec cmd in globals, locals
File "N:\Python\Completed scripts\Check_Sequences.py", line 1, in <module>
import arcpy
MemoryError
Here is my question. I have a "list" of objects and a data frame as heads below:
0
0 hsa-let-7f-2-3p
1 hsa-let-7f-2-5p
2 hsa-miR-105-3p
3 hsa-miR-105-5p
6 hsa-miR-106a-3p
And
Gene_ID miRNA_family_ID
1452449 NM_001038707 hsa-let-7f-2-3p
14537388 NM_058241 hsa-let-7f-2-3p
14540512 NM_078467 hsa-let-7f-2-3p
15618969 NM_153051 hsa-let-7f-2-3p
5500627 NM_001184880 hsa-let-7f-2-3p
Their len.index are diferent
For the "list"
`>>> len(miRNAs.index)`
175
>>> len(Alvos_Mir.index)
18744
their boths dtypes are objects.
What I really need to do is to use the content of the list to compare with the column miRNA_family_ID to save in a .csv file all the Gene_ID that has it as it's miRNA.
What I tried to do was:
for i in range(len(miRNAs)):
GenesAlvo_miRNA = [Alvos_Mir['miRNA_family_ID'] == miRNAs[i]];
colunas_interesse_to_save = GenesAlvo_miRNA.ix[:, ['Gene_ID']];
#here i put the values.tolist() beacause I need the format of the output to be like (1,2,3,4,5) and not as a column
colunas_interesse_to_save = colunas_interesse_to_save.values.tolist()
#I need that the name of the output file is the content current being compare
colunas_interesse_to_save.to_csv(miRNAs[i], index=False)
I'm geting the error:
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/home/beatriz/anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 582, in wrapper
raise ValueError('Series lengths must match to compare')
ValueError: Series lengths must match to compare
Any suggestions?
Thanks in advance
Im not sure whether you want to save things in the same file or in different files, but if you wanted to save things in different files, you could try something like the following:
for k, v in Alvos_Mir.groupby('miRNA_family_ID'):
if k in miRNAs: v.to_csv(k+'.csv')
Each line represents a single student and consists of a student number, a name, a section code and a midterm grade, all separated by whitespace.
The first parameter is already done and the file is open and
The second parameter is a section code
this is the link http://www.cdf.toronto.edu/~csc108h/fall/exercises/e3/grade_file.txt
My code:
def average_by_section(the_file, section_code):
'''(io.TextIOWrapper, str) -> float
Return the average midtermmark for all students in that section
'''
score = 0
n = 0
for element in the_file:
line = element.split()
if section_code == line[-2]:
mark = mark + float(line[-1])
n += 1
lecture_avg = mark / n
return lecture_avg
I'm getting an index out of range. Is this correct? Or am I just opening up the wrong file?
can someone test this code and download that file? I'm pretty sure it should work, but not for me.
Well, you can troubleshoot the index out of range error with a print line or print(line) to explore the number of items in "line" (i.e. the effect of split()). I'd suggest looking closer at your split() statement...
It looks like you are omitting parts of your code where you define some of those variables (section_code, mark, etc.), but adjusting for some of those things seems to work properly. Assuming that the error you got was IndexError: list index out of range, that happens when you try to access an element of a list by index where that index doesn't exist. For instance:
>>> l = ['one']
>>> l[0]
'one'
>>> l[1]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
>>> l[-1]
'one'
>>> l[-2]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: list index out of range
Therefore in your code, you will get that error if line is ever fewer than two items. I would check and see what you are actually getting for line to make sure it is what you expect.