I'm trying to read in a text file and format it correctly in a numpy array.
The Input.txt file contains:
point_load, 3, -300
point_load, 6.5, 500
point_moment, 6.5, 3000
I want to produce this array:
point_load = [3, -300, 65, 500]
My code is:
a = []
for line in open("Input.txt"):
li=line.strip()
if li.startswith("point_load")
a.append(li.split(","))
#np.flatten(a)
My code prints:
[['point_load', ' 3', ' -300'], ['point_load', ' 6.5', ' 500']]
Any help would be appreciated. Thank you.
Change this line :
a.append(li.split(","))
to this:
a.append(li.split(",")[1:])
li.split(",")
returns a list, so you appended a list, obtaining a nested list.
You wanted append individual elements append to the a list, namely 2nd and 3rd, i. e. with indices 1 and 2. So instead
a.append(li.split(","))
use
temp = li.split(","))
second = temp[1]
third = temp[2]
a.append(float(second))
a.append(float(third))
Note the use of the float() function as the .split() method returns a list of strings.
(Maybe in the last .append() would be more appropriate for you the use of int() function instead.)
To end up with a list of numbers instead of strings, I recommend the following:
a = []
for line in open("Input.txt"):
li=line.strip()
if li.startswith("point_load"):
l = li.split(',')
for num in l[1:]:
try:
num = float(num.strip())
a.append(num)
except ValueError:
print 'num was not a number'
The difference here is the list slice which takes the entire line starting from the second comma-separated element (more here: understanding-pythons-slice-notation)
l[1:]
Also, stripping then converting the strings to floats (since you have decimals)
num = float(num.strip())
Resulting array:
a = [3.0, -300.0, 6.5, 500.0]
Related
When I use:
with open("test.txt") as file
read = csv.reader(file)
for i in read:
print(i)
and I have got something like that:
['[0', ' 0', ' 1', ' 0]']
and I need:
[0, 0, 1, 0]
Any advises please?
ad = ['[0', ' 0', ' 1', ' 0]']
for i in range(len(ad)):
ad[i] = int(ad[i].replace("[", "").replace("]", ""))
print ad
[0, 0, 1, 0]
Normally, the easiest solution would be a list comprehension where new = [int(a) for a in old]. However, in your case, the first and last elements of your list actually have brackets inside of them too.
Instead, you need to do something like:
new = [int("".join(filter(str.isdigit, a))) for a in old]
This is a pretty big one liner so lets break it down.
The list comprehension iterates through each element in your list (I called it old) and names it a
A is passed into the filter command with the function str.isdigit. This basically removes any character that isn't a digit. The issue with this, is that it returns an iterator and not a simple value.
To fix the iterator problem, I wrapped the filter command with a "".join() command to convert it to a simple string value. This string will only have the number.
Finally, we can wrap that entire thing with the int command which will transform your value into an int.
If you don't like the one-liner, it can also be done this way:
new = []
for a in old:
filtered = filter(str.isdigit, a)
num_str = "".join(filtered)
num.append(int(num_str))
It's the same thing but a bit more verbose.
def parseToInt(s):
...: return int((s.replace('[', '')).replace(']', ''))
list(map(lambda x: parseToINt(x), a))
Is it possible to remove the string and just have the list
data = [
"50,bird,corn,105.4,"
"75,cat,meat,10.3,"
"100,dog,eggs,1000.5,"
]
would like it to look like this
data = [
50,'bird','corn',105.4,
75,'cat','meat',10.3,
100,'dog','eggs',1000.5,
]
out = []
for x in data:
for e in x.split(","):
out.append(e)
What does this do? It splits each element (x) in data on the comma, picks out each of those separate tokens (e), and puts them in the variable (out.append).
new_data = []
for i in data:
new_data.extend(i.split(','))
new_data
Do note that there might be issues (for example, you have one last comma with nothing after it, so it generates a '' string as the last element in the new array).
If you want to specifically convert the numbers to ints and floats, maybe there is a more elegant way, but this will work (it also removes empty cells if you have excess commas):
new_data = []
for i in data:
strings = i.split(',')
for s in strings:
if (len(s)>0):
try:
num = int(s)
except ValueError:
try:
num = float(s)
except ValueError:
num = s
new_data.append(num)
new_data
split each string (this gives you an array of the segments between "," in each string):
str.split(",")
and add the arrays together
Because each string in the list has a trailing comma, you can simply put it back together as a single string and split it again on commas. In order to get actual numeric items in the resulting list, you could do this:
import re
data = [
"50,bird,corn,105.4,"
"75,cat,meat,10.3,"
"100,dog,eggs,1000.5,"
]
numeric = re.compile("-?\d+[\.]\d*$")
data = [ eval(s) if numeric.match(s) else s for s in "".join(data).split(",")][:-1]
data # [50, 'bird', 'corn', 105.4, 75, 'cat', 'meat', 10.3, 100, 'dog', 'eggs', 1000.5]
I have this code wrote in Python:
with open ('textfile.txt') as f:
list=[]
for line in f:
line = line.split()
if line:
line = [int(i) for i in line]
list.append(line)
print(list)
This actually read integers from a text file and put them in a list.But it actually result as :
[[10,20,34]]
However,I would like it to display like:
10 20 34
How to do this? Thanks for your help!
You probably just want to add the items to the list, rather than appending them:
with open('textfile.txt') as f:
list = []
for line in f:
line = line.split()
if line:
list += [int(i) for i in line]
print " ".join([str(i) for i in list])
If you append a list to a list, you create a sub list:
a = [1]
a.append([2,3])
print a # [1, [2, 3]]
If you add it you get:
a = [1]
a += [2,3]
print a # [1, 2, 3]!
with open('textfile.txt') as f:
lines = [x.strip() for x in f.readlines()]
print(' '.join(lines))
With an input file 'textfiles.txt' that contains:
10
20
30
prints:
10 20 30
It sounds like you are trying to print a list of lists. The easiest way to do that is to iterate over it and print each list.
for line in list:
print " ".join(str(i) for i in line)
Also, I think list is a keyword in Python, so try to avoid naming your stuff that.
If you know that the file is not extremely long, if you want the list of integers, you can do it at once (two lines where one is the with open(.... And if you want to print it your way, you can convert the element to strings and join the result via ' '.join(... -- like this:
#!python3
# Load the content of the text file as one list of integers.
with open('textfile.txt') as f:
lst = [int(element) for element in f.read().split()]
# Print the formatted result.
print(' '.join(str(element) for element in lst))
Do not use the list identifier for your variables as it masks the name of the list type.
I am working on a csv file using python.
I wrote the following script to treat the file:
import pickle
import numpy as np
from csv import reader, writer
dic1 = {'a': 2, 'b': 2, 'c': 2}
dic2 = {'a': 2,'b': 2,'c': 0}
number = dict()
for k in dic1:
number[k] = dic1[k] + dic2[k]
ctVar = {'a': [0.093323751331788565, -1.0872670058072453, '', 8.3574590513050264], 'b': [0.053169909627947334, -1.0825742255395172, '', 8.0033788558001984], 'c': [-0.44681777279768059, 2.2380488442495348]}
Var = {}
for k in number:
Var[k] = number[k]
def findIndex(myList, number):
n = str(number)
m = len(n)
for elt in myList:
e = str(elt)
l = len(e)
mi = min(m,l)
if e[:mi-1] == n[:mi-1]:
return myList.index(elt)
def sortContent(myList):
if '' in myList:
result = ['']
myList.remove('')
else:
result = []
myList.sort()
result = myList + result
return result
An extract of the csv file follows: (INFO: The blanks are important. To increase the readability, I noted them BL but they should just be empty cases)
The columns contain few elements (including '') repeated many times.
a
0.0933237513
-1.0872670058
0.0933237513
BL
BL
0.0933237513
0.0933237513
0.0933237513
BL
Second column:
b
0.0531699096
-1.0825742255
0.0531699096
BL
BL
0.0531699096
0.0531699096
0.0531699096
BL
Third column:
c
-0.4468177728
2.2380488443
-0.4468177728
-0.4468177728
-0.4468177728
-0.4468177728
-0.4468177728
2.2380488443
2.2380488443
I just posted an extract of the code (where I am facing a problem) and we can't see its utility. Basically, it is part of a larger code that I use to modify this csv file and encode it differently.
In this extract, I am trying at some point (line 68) to sort elements of a list that contains numbers and ''.
When I remove the line that does this, the elements printed are those of each column (without any repetition).
The problem is that, when I try to sort them, the '' are no longer taken into account. Yet, when I tested my function sortContent with lists that have '', it worked perfectly.
I thought this problem was related to the use of numpy.float64 elements in my list. So I converted all these elements into floats, but the problem remains.
Any help would be greatly appreciated!
I assume you mean to use sortContent on something else (as obviously if you want the values in your predefined lists in ctVar in a certain order, you can just put them in order in your code rather than sorting them at runtime).
Let's go through your sortContent piece by piece.
if '' in myList:
result = ['']
myList.remove('')
If the list object passed in (let's call this List 1) has items '', create a new list object (let's call it List 2) with just '', and remove the first instance of '' from list 1.
mylist.Sort()
Now, sort the contents of list 1.
result = myList + result
Now create a new list object (call it list 3) with the contents of list 1 and list 2.
return result
Keep in mind that list 1 (the list object that was passed in) still has the '' removed.
fh=open('asd.txt')
data=fh.read()
fh.close()
name=data.split('\n')[0][1:]
seq=''.join(data.split('\n')[1:])
print name
print seq
In this code, the 3rd line means "take only first line with first character removed" while the 4th line means "leave the first line and join the next remaining lines".
I cannot get the logic of these two lines.
Can anyone explain me how these two slice operators ([0][1:]) are used together?
Thanx
Edited: renamed file variable (which is a keyword, too) to data.
Think of it like this: file.split('\n') gives you a list of strings. So the first indexing operation, [0], gives you the first string in the list. Now, that string itself is a "list" of characters, so you can then do [1:] to get every character after the first. It's just like starting with a two-dimensional list (a list of lists) and indexing it twice.
When confused by a complex expression, do it it steps.
>>> data.split('\n')[0][1:]
>>> data
>>> data.split('\n')
>>> data.split('\n')[0]
>>> data.split('\n')[0][1:]
That should help.
lets do it by steps, (I think I know what name and seq is):
>>> file = ">Protein kinase\nADVTDADTSCVIN\nASHRGDTYERPLK" <- that's what you get reading your (fasta) file
>>> lines = file.split('\n') <- make a list of lines
>>> line_0 = lines[0] <- take first line (line numbers start at 0)
>>> name = line_0[1:] <- give me line items [x:y] (from x to y)
>>> name
'Protein kinase'
>>>
>>> file = ">Protein kinase\nADVTDADTSCVIN\nASHRGDTYERPLK"
>>> lines = file.split('\n')
>>> seqs = lines[1:] <- gime lines [x:y] (from x to y)
>>> seq = ''.join(seqs)
>>> seq
'ADVTDADTSCVINASHRGDTYERPLK'
>>>
in slice [x:y], x is included, y is not included. When you want to arrive to the end of the list just do not indicate y -> [x:] (from item of index x to the end)
Each set of [] just operates on the list that split returns, and the resulting
list or string then used without assigning it to another variable first.
Break down the third line like this:
lines = file.split('\n')
first_line = lines[0]
name = first_line[1:]
Break down the fourth line like this:
lines = file.split('\n')
all_but_first_line = lines[1:]
seq = ''.join(all_but_first_line)
take this as an example
myl = [["hello","world","of","python"],["python","is","good"]]
so here myl is a list of list. So, myl[0] means first element of list which is equal to ['hello', 'world', 'of', 'python'] but when you use myl[0][1:] it means selecting first element from list which is represented by myl[0] and than from the resulting list(myl[0]) select every element except first one(myl[0][1:]). So output = ['world', 'of', 'python']