PYTHON problem with negative decimals - python

I have a list of negative floats. I want to make a histogram with them. As far as I know, Python can't do operations with negative numbers. Is this correct? The list is like [-0.2923998, -1.2394875, -0.23086493, etc.]. I'm trying to find the maximum and minimum number so I can find out what the range is. My code is giving an error:
setrange = float(maxv) - float(minv)
TypeError: float() argument must be a string or a number
And this is the code:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
lineval = line.split()
print lineval
val.append(lineval)
print val
#val = map(float,val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
All the values that are being put into the 'val' list are negative decimals. What is the error referring to, and how do I fix it?
The input file looks like:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259

The results of split() are a list of split values, which is probably why you are getting that error.
For example, if you do '-0.2'.split(), you get back a list with a single value ['-0.2'].
EDIT: Aha! With your input file provided, it looks like this is the problem: -0.733029194040.765257900121. I think you mean to make that two separate floats?
Assuming a corrected file like this:
-0.0783532095182 -0.99415440702 -0.692972552716 -0.639273674023 -0.733029194040 -0.765257900121 -0.755438339963
-0.144140594077 -1.06533353638 -0.366278118372 -0.746931508538 -1.02549039392 -0.296715961215
-0.0915937502791 -1.68680560936 -0.955147543358
-0.0488457137771 -0.0943080192383 -0.747534412969 -1.00491121699
-1.43973471463
-0.0642611118901 -0.0910684525497
-1.19327387414 -0.0794696449245
-1.00791366035 -0.0509749096549
-1.08046507281 -0.957339914505 -0.861495748259
The following code will no longer throw that exception:
f = open('clusters_scores.out','r')
#first, extract all of the sim values
val = []
for line in f:
linevals = line.split()
print linevals
val += linevals
print val
val = map(float, val)
maxv = max(val)
minv = min(val)
setrange = float(maxv) - float(minv)
I have changed it to take the list result from split() and concatenate it to the list, rather than append it, which will work provided there are valid inputs in your file.

All the values that are being put into the 'val' list are negative decimals.
No, they aren't; they're lists of strings that represent negative decimals, since the .split() call produces a list. maxv and minv are lists of strings, which can't be fed to float().
What is the error referring to, and how do I fix it?
It's referring to the fact that the contents of val aren't what you think they are. The first step in debugging is to verify your assumptions. If you try this code out at the REPL, then you could inspect the contents of maxv and minv and notice that you have lists of strings rather than the expected strings.
I assume you want to put all the lists of strings (from each line of the file) together into a single list of strings. Use val.extend(lineval) rather than val.append(lineval).
That said, you'll still want to map the strings into floats before calling max or min because otherwise you will be comparing the strings as strings rather than floats. (It might well work, but explicit is better than implicit.)
Simpler yet, just read the entire file at once and split it; .split() without arguments splits on whitespace, and a newline is whitespace. You can also do the mapping at the same point as the reading, with careful application of a list comprehension. I would write:
with open('clusters_scores.out') as f:
val = [float(x) for x in f.read().split()]
result = max(val) - min(val)

Related

Sort list for date in Python

import re
arr1 = ['2018.07.17 11:30:00,-0.19', '2018.07.17 17:55:00,0.86']
arr2 = ['2018.07.17 11:34:00,-0.39', '2018.07.17 17:59:01,0.85']
def combine_strats_lambda(*strats):
"""
Takes *strats in date,return format
combines infinite amount of strats with date, return and packs them into
one
single sorted array
>> RETURN: combined list
"""
temp = []
# create combined list
for v in enumerate(strats):
i = 0
while i < len(v[1]):
temp.append(v[1][i])
#k = re.findall(r"[\w']+", temp)[:6]
i += 1
temp2 = sorted(timestamps, key=lambda d: tuple(map(int, re.findall(r"[\w']+", d[0]))))
return temp2
Hi,
I've been trying to finish this function, which should combine multiple lists of dates,percentage returns and sort them.
I've come across a solution with lambda but all I get is this message:
return _compile(pattern, flags).findall(string)
TypeError: expected string or bytes-like object
Do you know an easier solution to the problem or what the error is caused by? I can't seem to figure it out.
Anything appreciated :)
The very basic error in your code is in line:
for v in enumerate(strats):
You have apparently forgotten that enumerate(...) returns two
values: the index and the current value from the iterable.
So, as you used just single v, it gets the index, not the value.
Another important point is that if the datetime strings are written as
yyyy.MM.dd hh:mm:ss, you can sort them using just string sort.
So, to gather the strings, you need a list comprehension, with 2 nested
loops.
And to sort them, you should use sorted function, specifying as the sort
key the "initial" (date / time) part, before the comma.
To sum up, to get the sorted list of strings, taken from a couple of
arguments of your function, sorted on the date / time part,
you can use the following program, written using version 3.6 of Python:
arr1 = ['2018.07.17 11:30:00,-0.19', '2018.07.17 17:55:00,0.86']
arr2 = ['2018.07.17 11:34:00,-0.39', '2018.07.17 17:59:01,0.85']
def combine_strats_lambda(*strats):
temp = [ v2 for v1 in strats for v2 in v1 ]
return sorted(temp, key = lambda v: v.split(',')[0])
res = combine_strats_lambda(arr1, arr2)
for x in res:
parts = x.split(',')
print("{:20s} {:>6s}".format(parts[0], parts[1]))
It does not even use re module.

Python List comprehension and JSON parsing

I'm new to Python and trying to figure out the best way to parse the values of a JSON object into an array, using a list comprehension.
Here is my code - I'm querying the publicly available iNaturalist API and would like to take the JSON object that it returns, so that I take specific parts of the JSON object into a bumpy array:
import json
import urllib2
#Set Observations URL request for Resplendent Quetzal of Costa Rica
query = urllib2.urlopen("http://api.inaturalist.org/v1/observations?place_id=6924&taxon_id=20856&per_page=200&order=desc&order_by=created_at")
obSet = json.load(query)
#Print out Lat Long of observation
n = obSet['total_results']
for i in range(n) :
print obSet['results'][i]['location']
This all works fine and gives the following output:
9.5142456535,-83.8011438905
10.2335478381,-84.8517773638
10.3358965682,-84.9964271008
10.3744851815,-84.9871494128
10.2468720343,-84.9298072822
...
What I'd like to do next is replace the for loop with a list comprehension, and store the location value in a tuple. I'm struggling with the syntax in that I'm guessing it's something like this:
[(long,lat) for i in range(n) for (long,lat) in obSet['results'][i]['location']]
But this doesn't work...thanks for any help.
obSet['results'] is a list, no need to use range to iterate over it:
for item in obSet['results']:
print(item['location'])
To make this into list comprehension you can write:
[item['location'] for item in obSet['results']]
But, each location is coded as a string, instead of list or tuple of floats. To get it to the proper format, use
[tuple(float(coord) for coord in item['location'].split(','))
for item in obSet['results']]
That is, split the item['location'] string into parts using , as the delimiter, then convert each part into a float, and make a tuple of these float coordinates.
The direct translation of your code into a list comprehension is:
positions = [obSet['results'][i]['location'] for i in range(obSet['total_results'])]
The obSet['total_results'] is informative but not needed, you could just loop over obSet['results'] directly and use each resulting dictionary:
positions = [res['location'] for res in obSet['results']]
Now you have a list of strings however, as each 'location' is still the long,lat formatted string you printed before.
Split that string and convert the result into a sequence of floats:
positions = [map(float, res['location'].split(',')) for res in obSet['results']]
Now you have a list of lists with floating point values:
>>> [map(float, res['location'].split(',')) for res in obSet['results']]
[[9.5142456535, -83.8011438905], [10.2335478381, -84.8517773638], [10.3358965682, -84.9964271008], [10.3744851815, -84.9871494128], [10.2468720343, -84.9298072822], [10.3456659939, -84.9451804822], [10.3611732346, -84.9450302597], [10.3174360636, -84.8798676791], [10.325110706, -84.939710318], [9.4098152454, -83.9255607577], [9.4907141714, -83.9240819199], [9.562637289, -83.8170178428], [9.4373885911, -83.8312881263], [9.4766746409, -83.8120952573], [10.2651190176, -84.6360466565], [9.6572995298, -83.8322965118], [9.6997991784, -83.9076919066], [9.6811177044, -83.8487647156], [9.7416717045, -83.929327673], [9.4885099275, -83.9583968683], [10.1233252667, -84.5751029683], [9.4411815757, -83.824401543], [9.4202687169, -83.9550344212], [9.4620656621, -83.665183105], [9.5861809119, -83.8358881552], [9.4508914243, -83.9054016165], [9.4798058284, -83.9362558497], [9.5970449879, -83.8969131893], [9.5855562829, -83.8354434596], [10.2366179555, -84.854847472], [9.718459702, -83.8910277016], [9.4424384874, -83.8880459793], [9.5535916157, -83.9578166199], [10.4124554163, -84.9796942349], [10.0476688795, -84.298227929], [10.2129436252, -84.8384097435], [10.2052632717, -84.6053701877], [10.3835784147, -84.8677930134], [9.6079669672, -83.9084281155], [10.3583643315, -84.8069762134], [10.3975986735, -84.9196996767], [10.2060835381, -84.9698814407], [10.3322929317, -84.8805587129], [9.4756504472, -83.963818143], [10.3997876964, -84.9127311339], [10.1777433853, -84.0673088686], [10.3346128571, -84.9306278215], [9.5193346195, -83.9404786293], [9.421538224, -83.7689452093], [9.430427837, -83.9532672942], [10.3243212895, -84.9653175843], [10.021698503, -83.885674888]]
If you must have tuples rather than lists, add a tuple() call:
positions = [tuple(map(float, res['location'].split(',')))
for res in obSet['results']]
The latter also makes sure the expression works in Python 3 (where map() returns an iterator, not a list); you'd otherwise have to use a nested list comprehension:
# produce a list of lists in Python 3
positions = [[float(p) for p in res['location'].split(',')] for res in obSet['results']]
Another way to get list of [long, lat] without list comprehension:
In [14]: map(lambda x: obSet['results'][x]['location'].split(','), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you would like list of tuples instead:
In [14]: map(lambda x: tuple(obSet['results'][x]['location'].split(',')), range(obSet['total_results']))
Out[14]:
[[u'9.5142456535', u'-83.8011438905'],
[u'10.2335478381', u'-84.8517773638'],
[u'10.3358965682', u'-84.9964271008'],
[u'10.3744851815', u'-84.9871494128'],
...
If you want to convert to floats too:
In [17]: map(lambda x: tuple(map(float, obSet['results'][x]['location'].split(','))), range(obSet['total_results']))
Out[17]:
[(9.5142456535, -83.8011438905),
(10.2335478381, -84.8517773638),
(10.3358965682, -84.9964271008),
(10.3744851815, -84.9871494128),
(10.2468720343, -84.9298072822),
(10.3456659939, -84.9451804822),
...
You can iterate over the list of results directly:
print([tuple(result['location'].split(',')) for result in obSet['results']])
>> [('9.5142456535', '-83.8011438905'), ('10.2335478381', '-84.8517773638'), ... ]
[tuple(obSet['results'][i]['location'].split(',')) for i in range(n)]
This will return a list of tuple, elements of the tuples are unicode.
If you want that the elements of tuples as floats, do the following:
[tuple(map(float,obSet['results'][i]['location'].split(','))) for i in range(n)]
To correct way to get a list of tuples using list comprehensions would be:
def to_tuple(coords_str):
return tuple(coords_str.split(','))
output_list = [to_tuple(obSet['results'][i]['location']) for i in range(obSet['total_results'])]
You can of course replace to_tuple() with a lambda function, I just wanted to make the example clear. Moreover, you could use map() to have a tuple with floats instead of string: return tuple(map(float,coords_str.split(','))).
Let's try to give this a shot, starting with just 1 location:
>>> (long, lat) = obSet['results'][0]['location']
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: too many values to unpack
Alright, so that didn't work, but why? It's because the longitude and latitude coordinates are just 1 string, so you can't unpack it immediately as a tuple. We must first separate it into two different strings.
>>> (long, lat) = obSet['results'][0]['location'].split(",")
From here we will want to iterate through the whole set of results, which we know are indexed from 0 to n. tuple(obSet['results'][i]['location'].split(",")) will give us the tuple of longitude, latitude for the result at index i, so:
>>> [tuple(obSet['results'][i]['location'].split(",")) for i in range(n)]
ought to give us the set of tuples we want.

Error in sorting operation on dictionary

I am trying to sort a file of sequences according to a certain parameter. The data looks as follows:
ID1 ID2 32
MVKVYAPASSANMSVGFDVLGAAVTP ...
ID1 ID2 18
MKLYNLKDHNEQVSFAQAVTQGLGKN ...
....
There are about 3000 sequences like this, i.e. the first line contains two ID field and one rank field (the sorting key) while the second one contains the sequence. My approach is to open the file, convert the file object to a list object, separate the annotation line (ID1, ID2, rank) from the actual sequence (annotation lines always occur on even indices, while sequence lines always occur on odd indices), merge them into a dictionary and sort the dictionary using the rank field. The code reads like so:
#!/usr/bin/python
with open("unsorted.out","rb") as f:
f = f.readlines()
assert type(f) == list, "ERROR: file object not converted to list"
annot=[]
seq=[]
for i in range(len(f)):
# IDs
if i%2 == 0:
annot.append(f[i])
# Sequences
elif i%2 != 0:
seq.append(f[i])
# Make dictionary
ids_seqs = {}
ids_seqs = dict(zip(annot,seq))
# Solub rankings are the third field of the annot list, i.e. annot[i].split()[2]
# Use this index notation to rank sequences according to solubility measurements
sorted_niwa = sorted(ids_seqs.items(), key = lambda val: val[0].split()[2], reverse=False)
# Save to file
with open("sorted.out","wb") as out:
out.write("".join("%s %s" % i for i in sorted_niwa))
The problem I have encountered is that when I open the sorted file to inspect manually, as I scroll down I notice that some sequences have been wrongly sorted. For example, I see the rank 9 placed after rank 89. Up until a certain point the sorting is correct, but I don't understand why it hasn't worked throughout.
Many thanks for any help!
Sounds like you're comparing strings instead of numbers. "9" > "89" because the character '9' comes lexicographically after the character '8'. Try converting to integers in your key.
sorted_niwa = sorted(ids_seqs.items(), key = lambda val: int(val[0].split()[2]), reverse=False)

TypeError: list indices must be integers, not str - loading strings from a list

So I have a list of strings I want to make = 999
with open ("Inventory.txt", 'r') as i:
for items in i:
item = GetItemID(items)
PositionOfItems[items]=HERO_INVENTORY_POS
HERO_INVENTORY_POS is = 999, but I get the error displayed above, if I'm missing anything else require please tell me.
This is the code I used for spawning an item so I kinda just tried to recreate that.`
ItemRequest = input("Which item would you like?").upper()
for i in ItemList:
if i == ItemRequest:
ItemRequest = GetItemID(ItemRequest)
PositionOfItems[ItemRequest]=HERO_INVENTORY_POS`
If PositionOfItems is a list, then items needs to be in an integer. Right now, it's a string, because you're reading it from a file.
Try:
PositionOfItems[int(items)]=HERO_INVENTORY_POS
Alternatively, maybe you intended to index the list with item and not items? In which case you should do
PositionOfItems[item]=HERO_INVENTORY_POS
Depending in how you defined PositionOfItems
In your line of code
PositionOfItems[items]=HERO_INVENTORY_POS
You are treating it as a dictionary instead of a list, where items is the key and HERO_INVENTORY_POS is the value. When I tried reproducing your code snippet(below), my error was that the dictionary was not defined as empty before its use, and if defined as a list I received the TypeError: list indices must be integers, not str.
with open("test.txt", 'r') as f:
dict = {} #This line
for item in f:
dict[item] = 999
print item,
If you have assigned PositionOfItems as a list, then the issue is that you would be referring to indexes that have not been defined (or at least not show in your code here) and are attempting to reference them with a string (items) instead of an integer. (Giving you the TypeError)

Python: argument conversion during string format error /w dictionary/list reads

new to these boards and understand there is protocol and any critique is appreciated. I have begun python programming a few days ago and am trying to play catch-up. The basis of the program is to read a file, convert a specific occurrence of a string into a dictionary of positions within the document. Issues abound, I'll take all responses.
Here is my code:
f = open('C:\CodeDoc\Mm9\sampleCpG.txt', 'r')
cpglist = f.read()
def buildcpg(cpg):
return "\t".join(["%d" % (k) for k in cpg.items()])
lookingFor = 'CG'
i = 0
index = 0
cpgdic = {}
try:
while i < len(cpglist):
index = cpglist.index(lookingFor, i)
i = index + 1
for index in range(len(cpglist)):
if index not in cpgdic:
cpgdic[index] = index
print (buildcpg(cpgdic))
except ValueError:
pass
f.close()
The cpgdic is supposed to act as a dictionary of the position reference obtained in the index. Each read of index should be entering cpgdic as a new value, and the print (buildcpg(cpgdic)) is my hunch of where the logic fails. I believe(??) it is passing cpgdic into the buildcpg function, where it should be returned as an output of all the positions of 'CG', however the error "TypeError:not all arguments converted during string formatting" shows up. Your turn!
ps. this destroys my 2GB memory; I need to improve with much more reading
cpg.items is yielding tuples. As such, k is a tuple (length 2) and then you're trying to format that as a single integer.
As a side note, you'll probably be a bit more memory efficient if you leave off the [ and ] in the join line. This will turn your list comprehension to a generator expression which is a bit nicer. If you're on python2.x, you could use cpg.iteritems() instead of cpg.items() as well to save a little memory.
It also makes little sense to store a dictionary where the keys and the values are the same. In this case, a simple list is probably more elegant. I would probably write the code this way:
with open('C:\CodeDoc\Mm9\sampleCpG.txt') as fin:
cpgtxt = fin.read()
indices = [i for i,_ in enumerate(cpgtxt) if cpgtxt[i:i+2] == 'CG']
print '\t'.join(indices)
Here it is in action:
>>> s = "CGFOOCGBARCGBAZ"
>>> indices = [i for i,_ in enumerate(s) if s[i:i+2] == 'CG']
>>> print indices
[0, 5, 10]
Note that
i for i,_ in enumerate(s)
is roughly the same thing as
i for i in range(len(s))
except that I don't like range(len(s)) and the former version will work with any iterable -- Not just sequences.

Categories

Resources