Writing list to text file not making new lines python - python

I'm having trouble writing to text file. Here's my code snippet.
ram_array= map(str, ram_value)
cpu_array= map(str, cpu_value)
iperf_ba_array= map(str, iperf_ba)
iperf_tr_array= map(str, iperf_tr)
#with open(ram, 'w') as f:
#for s in ram_array:
#f.write(s + '\n')
#with open(cpu,'w') as f:
#for s in cpu_array:
#f.write(s + '\n')
with open(iperf_b,'w') as f:
for s in iperf_ba_array:
f.write(s+'\n')
f.close()
with open(iperf_t,'w') as f:
for s in iperf_tr_array:
f.write(s+'\n')
f.close()
The ram and cpu both work flawlessly, however when writing to a file for iperf_ba and iperf_tr they always come out look like this:
[45947383.0, 47097609.0, 46576113.0, 47041787.0, 47297394.0]
Instead of
1
2
3
They're both reading from global lists. The cpu and ram have values appended 1 by 1, but otherwise they look exactly the same pre processing.
Here's how they're made
filename= "iperfLog_2015_03_12_20:45:18_123_____tag_33120L06.csv"
write_location= self.tempLocation()
location=(str(write_location) + str(filename));
df = pd.read_csv(location, names=list('abcdefghi'))
transfer = df.h
transfer=transfer[~transfer.isnull()]#uses pandas to remove nan
transfer=transfer.tolist()
length= int(len(transfer))
extra= length-1
del transfer[extra]
bandwidth= df.i
bandwidth=bandwidth[~bandwidth.isnull()]
bandwidth=bandwidth.tolist()
del bandwidth[extra]
iperf_tran.append(transfer)
iperf_band.append(bandwidth)

[from comment]
you need to use .extend(list) if you want to add a list to a list - and don't worry: we're all spending hours debugging/chasing classy-stupid-me mistakes sometimes ;)

Related

I am getting an empty file (Subset_soil_param.txt file) in Python 3.7

I am new to Python and to this forum, so I need some help with the following code:
original_soil_parameter_file = open('D:\Spring 2020\VIC\Parameter_files\original_soil_param.txt', "r")
Grid_Cell_id = open('D:\Spring 2020\VIC\Parameter_files\Grid_Cells.txt', "r")
Subset_soil_param = open('D:\Spring 2020\VIC\Parameter_files\subset_soil_param.txt', "w")
with open('D:\Spring 2020\VIC\Parameter_files\original_soil_param.txt') as f:
for line in f:
a = line.split(' ')
if a[1] == Grid_Cell_id:
Subset_soil_param.write(line)
Subset_soil_param.close()
Basically, I have an original file (variable original_soil_parameter_file), which covers the whole North Western United States. And I want to subset the file based on my area. The original file contains rows of values with each values separated by a space. In order to subset I provided another text file to the code and called it Grid_cell_id. Then I used the for loop to match the second value (a[1]) with the values in Grid_Cell_id, so that after a finding an identical grid cell id in both files, the code will start saving the lines in the new file named Subset_soil_param.txt. After I run code, the Subset_soil_param is created but it's empty. I get the following output, and the console nothing else and the file is empty, but the code does generate the subset_soil_param.txt file (which is empty).
runfile('D:/Spring 2020/VIC/Parameter_files/subset_soil_param.py', wdir='D:/Spring 2020/VIC/Parameter_files')
A sample from the original file:
1 240493 41.21875 -116.21875 0.1000 0.767791 0.400832 0.673064 2 13.6030 13.6030 13.6030 473.0640 473.0640 473.0640 -99 -99 -99 21.4270 64.2820 214.2750 1821.3800 0.1000 0.3000 0.7118 6.0880 4.0000 11.1500 11.1500 11.1500 0.4100 0.4100 0.4100 1485.7000 1485.7000 1485.7000 2620.2800 2620.2800 2620.2800 -8 0.3920 0.3920 0.3920 0.2560 0.2560 0.2560 0.0100 0.0300 458.8940 0 0 0 0 19.0384
Sample from the Grid_Cells.txt file:
288832
287904
287909
240493
You're hitting a road block when checking if there is a match. You're doing this;
if '240493' == Grid_Cell_id:
The issue here, is that Grid_Cell_id is a file type, it isn't actually what you're trying to compare. This causes you to always return false.
type(Grid_Cell_id)
#<class '_io.TextIOWrapper'>
#<_io.TextIOWrapper name='C:\\users\\admin\\desktop\\test.txt' mode='r' encoding='cp1252'>
We can solve this by storing the contents of Grid_Cell_id as a list.
Grid_Cell_id = open('D:\Spring 2020\VIC\Parameter_files\Grid_Cells.txt', "r")
Grid_Cell_id = list(Grid_Cell_id)
#You could also do;
Grid_Cell_id = list(open('D:\Spring 2020\VIC\Parameter_files\Grid_Cells.txt', "r"))
Now that you have your grid cells stored in a list, you can call;
if a[1] in Grid_Cell_id:
This will return True if the value is found and False if not found.
EDIT:
This code is working on my system;
Grid_Cell_id = list(open('Grid_Cell_id.txt', "r"))
with open('original_soil_parameter_file.txt') as f:
for line in f:
a = line.split(' ')
if a[1] in Grid_Cell_id:
with open('subset_soil_param.txt', "w") as output:
output.write(line)

IndexError: list index out of range in Python Script

I'm new to python and so I apologize if this question has already been answered. I've used this script before and its worked so I'm not at all sure what is wrong.
I'm trying to transform a MALLET output document into a long list of topic, weight, value rather than a wide list of topics documents and weights.
Here's what the original csv I'm trying to convert looks like but there are 30 topics in it (its a text file called mb_composition.txt):
0 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Abizaid.txt 6.509147794508226E-6 1.8463345214533957E-5 3.301298069640119E-6 0.003825178550032757 0.15240841618294929 0.03903974304065183 0.10454783676528623 0.1316719812119471 1.8018057013225344E-5 4.869261713020613E-6 0.0956868156114931 1.3521101623203115E-5 9.514591058923748E-6 1.822741355900598E-5 4.932324961835634E-4 2.756817586271138E-4 4.039186874601744E-5 1.0503346606335033E-5 1.1466132458804392E-5 0.007003443189848799 6.7094360963952E-6 0.2651753488982284 0.011727025879070194 0.11306132549594633 4.463460490946615E-6 0.0032751230536005056 1.1887304822238514E-5 7.382714572306351E-6 3.538808652077042E-5 0.07158823129977483
1 file:/Users/mandyregan/Dropbox/CPH-DH/MiningtheSurge/txt/Jeffrey,%20Jim%20-%20Chk5-%20ASC%20-%20FINAL%20-%20Sept%202017.docx.txt 4.296636200313062E-6 1.218750594272488E-5 1.5556725986514498E-4 0.043172816021532695 0.04645757277949794 0.01963429696910822 0.1328206370818606 0.116826297071711 1.1893574776047563E-5 3.2141605637859693E-6 0.10242945223692496 0.010439315937573735 0.2478814493196687 1.2031769351093548E-5 0.010142417179693447 2.858721603853616E-5 2.6662348272204834E-5 6.9331747684835E-6 7.745091995495631E-4 0.04235638910274044 4.428844900369446E-6 0.0175105406405736 0.05314379308820005 0.11788631730736487 2.9462944350793084E-6 4.746133386282654E-4 7.846714475661223E-6 4.873270616886766E-6 0.008919869163605806 0.02884824479155971
And here's the python script I'm trying to use to convert it:
infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')
outfile.write('file,topicnum,weight\n')
for line in infile:
tokens = line.split('\t')
fn = tokens[1]
topics = tokens[2:]
#outfile.write(fn[46:] + ",")
for i in range(0,59):
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
I'm running this in the terminal with python reshape.py and I get this error:
Traceback (most recent call last):
File "reshape.py", line 12, in <module>
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
IndexError: list index out of range
Any idea what I'm doing wrong here? I can't seem to figure it out and am frustrated because I know Ive used this script many times before with success! If it helps I'm on Mac OSx with Python Version 2.7.10
The problem is you're looking for 60 topics per line of your CSV.
If you just want to print out the topics in the list up to the nth topic per line, you should probably define your range by the actual number of topics per line:
for i in range(len(topics) // 2):
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
Stated more pythonically, it would look something like this:
# Group the topics into tuple-pairs for easier management
paired_topics = [tuple(topics[i:i+2]) for i in range(0, len(topics), 2)]
# Iterate the paired topics and print them each on a line of output
for topic in paired_topics:
outfile.write(fn[46:] + ',' + ','.join(topic) + '\n')
You need to debug your code. Try printing out variables.
infile = open('mallet_output_files/mb_composition.txt', 'r')
outfile = open('mallet_output_files/weights.csv', 'w+')
outfile.write('file,topicnum,weight\n')
for line in infile:
tokens = line.split('\t')
fn = tokens[1]
topics = tokens[2:]
# outfile.write(fn[46:] + ",")
for i in range(0,59):
# Add a print statement like this
print(f'Topics {i}: {i*2} and {i*2+1}')
outfile.write(fn[46:] + ",")
outfile.write(topics[i*2]+','+topics[i*2+1]+'\n')
Your 'topics' list only has 30 elements? It looks like you're trying to access items far outside of the available range, i.e., you're trying to access topics[x] where x > 30.

Python sum certain values from multiple text files

I have multiple text files that contain multiple lines of floats and each line has two floats separated by white space, like this: 1.123 456.789123. My task is to sum floats after white space from each text file. This has to be done for all lines. For example, if I have 3 text files:
1.213 1.1
23.33 1
0.123 2.2
23139 0
30.3123 3.3
44.4444 444
Now the sum of numbers on the first lines should be 1.1 + 2.2 + 3.3 = 6.6. And the sum of numbers on second lines should be 1 + 0 + 444 = 445. I tried something like this:
def foo(folder_path):
contents = os.listdir(folder_path)
for file in contents:
path = os.path.join(folder_path, file)
with open(path, "r") as data:
rows = data.readlines()
for row in rows:
value = row.split()
second_float = float(value[1])
return sum(second_float)
When I run my code I get this error: TypeError: 'float' object is not iterable. I've been pulling my hair out with this, and don't know what to do can anyone help?
Here is how I would do it:
def open_file(file_name):
with open(file_name) as f:
for line in f:
yield line.strip().split() # Remove the newlines and split on spaces
files = ('text1.txt', 'text2.txt', 'text3.txt')
result = list(zip(*(open_file(f) for f in files)))
print(*result, sep='\n')
# result is now equal to:
# [
# (['1.213', '1.1'], ['0.123', '2.2'], ['30.3123', '3.3']),
# (['23.33', '1'], ['23139', '0'], ['44.4444', '444'])
# ]
for lst in result:
print(sum(float(x[1]) for x in lst)) # 6.6 and 445.0
It may be more logical to type cast the values to float inside open_file such as:
yield [float(x) for x in line.strip().split()]
but I that is up to you on how you want to change it.
See it in action.
-- Edit --
Note that the above solution loads all the files into memory before doing the math (I do this so I can print the result), but because of how the open_file generator works you don't need to do that, here is a more memory friendly version:
# More memory friendly solution:
# Note that the `result` iterator will be consumed by the `for` loop.
files = ('text1.txt', 'text2.txt', 'text3.txt')
result = zip(*(open_file(f) for f in files))
for lst in result:
print(sum(float(x[1]) for x in lst))

Concatenate multiple text files of DNA sequences in Python or R?

I was wondering how to concatenate exon/DNA fasta files using Python or R.
Example files:
So far I really liked using R ape package for the cbind method, solely because of the fill.with.gaps=TRUE attribute. I really need gaps inserted when a species is missing an exon.
My code:
ex1 <- read.dna("exon1.txt", format="fasta")
ex2 <- read.dna("exon2.txt", format="fasta")
output <- cbind(ex1, ex2, fill.with.gaps=TRUE)
write.dna(output, "Output.txt", format="fasta")
Example:
exon1.txt
>sp1
AAAA
>sp2
CCCC
exon2.txt
>sp1
AGG-G
>sp2
CTGAT
>sp3
CTTTT
Output file:
>sp1
AAAAAGG-G
>sp2
CCCCCTGAT
>sp3
----CTTTT
So far I am having trouble trying to apply this technique when I have multiple exon files (trying to figure out a loop to open and execute the cbind method for all files ending with .fa in the directory), and sometimes not all files have exons that are all identical in length - hence DNAbin stops working.
So far I have:
file_list <- list.files(pattern=".fa")
myFunc <- function(x) {
for (file in file_list) {
x <- read.dna(file, format="fasta")
out <- cbind(x, fill.with.gaps=TRUE)
write.dna(out, "Output.txt", format="fasta")
}
}
However when I run this and I check my output text file, it misses many exons and I think that is because not all files have the same exon length... or my script is failing somewhere and I can't figure it out: (
Any ideas? I can also try Python.
If you prefer using Linux one liners you have
cat exon1.txt exon2.txt > outfile
if you want only the unique records from the outfile use
awk '/^>/{f=!d[$1];d[$1]=1}f' outfile > sorted_outfile
I just came out with this answer in Python 3:
def read_fasta(fasta): #Function that reads the files
output = {}
for line in fasta.split("\n"):
line = line.strip()
if not line:
continue
if line.startswith(">"):
active_sequence_name = line[1:]
if active_sequence_name not in output:
output[active_sequence_name] = []
continue
sequence = line
output[active_sequence_name].append(sequence)
return output
with open("exon1.txt", 'r') as file: # read exon1.txt
file1 = read_fasta(file.read())
with open("exon2.txt", 'r') as file: # read exon2.txt
file2 = read_fasta(file.read())
finaldict = {} #Concatenate the
for i in list(file1.keys()) + list(file2.keys()): #both files content
if i not in file1.keys():
file1[i] = ["-" * len(file2[i][0])]
if i not in file2.keys():
file2[i] = ["-" * len(file1[i][0])]
finaldict[i] = file1[i] + file2[i]
with open("output.txt", 'w') as file: # output that in file
for k, i in finaldict.items(): # named output.txt
file.write(">{}\n{}\n".format(k, "".join(i))) #proper formatting
It's pretty hard to comment and explain it completely, and it might not help you, but this is better than nothing :P
I used Ɓukasz Rogalski's code from answer to Reading a fasta file format into Python dict.

Csv Writer - Trying to write variable in each Column

iam trying to write some data in a csv file but i cant select different columns..
car=["car 11"]
finish=["Landhaus , Nord"]
time=["['05:36']", "['06:06']", "['06:36']", "['07:06']", "['07:36']", "['08:06']", "['08:36']", "['09:06']", "['09:36']", "['10:06']", "['10:36']", "['11:06']", "['11:36']", "['12:06']", "['12:36']", "['13:06']", "['13:36']", "['14:06']", "['14:36']", "['15:06']", "['15:36']", "['16:06']", "['16:36']", "['17:06']", "['17:36']", "['18:06']", "['18:36']", "['19:06']", "['19:36']", "['20:06']", "['20:36']"]<br/>
myfile = open("Informationen.csv", "wb")
writer = csv.writer(myfile,dialect='excel',delimiter=' ')
bla =[car,finish,time]
writer.writerow(bla)
Output:
car 11 Landhaus , Nord "['05:36']", "['06:06']", [..]
All in 1 row and Colum 1
But i want it like this
car 11 (in row 1 Colum 1) | "Landhaus , Nord" (in row 1 Column 2) | ['05:36'] (in Line 1 Column 3 ) | ['06:06'] (in row 1 Column 4 ) till Column n
Thanks for any help !
Edit
1more example how it should look like
Line 1: car 11 (column 1) Landhaus, Nord (column 2) ['05:36'] (column 3) ['05:36'] (column 4) [...]
example http://img13.imageshack.us/img13/4964/unbenanntvilw.png
Solution so far:
but got still problems with time list
car=["car 11"]
trenn=[';']
finish=['Landhaus , Nord']
time=["['05:36']", "['06:06']", "['06:36']", "['07:06']", "['07:36']", "['08:06']", "['08:36']", "['09:06']", "['09:36']", "['10:06']", "['10:36']", "['11:06']", "['11:36']", "['12:06']", "['12:36']", "['13:06']", "['13:36']", "['14:06']", "['14:36']", "['15:06']", "['15:36']", "['16:06']", "['16:36']", "['17:06']", "['17:36']", "['18:06']", "['18:36']", "['19:06']", "['19:36']", "['20:06']", "['20:36']"]
myfile = open("Informationen2.csv", "wb")
writer = csv.writer(myfile,delimiter=' ')
bla = car + trenn + finish + trenn + time
writer.writerow(bla)
myfile.close()
The Python documentation states that for the csv.writer() function the ...
"optional dialect parameter can be given which is used to define
a set of parameters specific to a particular CSV dialect".
... and that ...
"the other optional fmtparams keyword arguments can be given to override individual formatting parameters in the current dialect".
The problem you are experiencing is a consequence of writing a string representation of the time list to file and writing to file using a whitespace delimiter. If you were to view Informationen.csv as a plain text file the problem becomes apparent.
Firstly, writing to file having passed a whitespace delimiter as an argument in csv.writer(myfile, dialect='excel', delimiter=' ') overrides the default delimiter as defined in the Excel dialect and results in the elements of list bla being written to file with the format element1 element2 element3 as opposed to element1,element2,element3.
Secondly, although the majority of the elements in the time list are allocated their own columns in a spreadsheet as desired, writing the list to file as a string representation of itself has contributed to the overall undesired formatting.
When you open the file created with your script as an Excel file, Excel reads in two initial values based on the first two commas it finds which happen to be in the center of the string 'Landhaus , Nord' and within the string representation of the time list.
You can achieve the column separation you require firstly by appending the elements of the time list to the bla list, as opposed to nesting the former within the latter. You then need to omit delimiter=' ' in csv.writer(myfile, dialect='excel', delimiter=' '), thus avoiding the delimiter overriding effect when writing to file:
import csv
car = ['car 11']
finish = ['Landhaus , Nord']
time = ["['05:36']", "['06:06']", "['06:36']", "['07:06']", "['07:36']"]
try:
with open('Informationen.csv', 'w') as myfile:
writer = csv.writer(myfile, dialect='excel')
bla = [car, finish]
for each_time in time:
bla.append(each_time)
writer.writerow(bla)
except IOError as ioe:
print('Error: ' + str(ioe))
producing the following output in Excel:
http://imageshack.us/a/img839/8061/screenshotkn.png
import csv
car=["car 11"]
finish=['Landhaus , Nord']
time=["['05:36']", "['06:06']", "['06:36']", "['07:06']", "['07:36']", "['08:06']", "['08:36']", "['09:06']", "['09:36']", "['10:06']", "['10:36']", "['11:06']", "['11:36']", "['12:06']", "['12:36']", "['13:06']", "['13:36']", "['14:06']", "['14:36']", "['15:06']", "['15:36']", "['16:06']", "['16:36']", "['17:06']", "['17:36']", "['18:06']", "['18:36']", "['19:06']", "['19:36']", "['20:06']", "['20:36']"]
myfile = open("derp.csv", "wb")
writer = csv.writer(myfile)
bla = car + finish + time
writer.writerow(bla)
myfile.close()
Here is the output I get from excel

Categories

Resources