I have multiple text files that contain multiple lines of floats and each line has two floats separated by white space, like this: 1.123 456.789123. My task is to sum floats after white space from each text file. This has to be done for all lines. For example, if I have 3 text files:
1.213 1.1
23.33 1
0.123 2.2
23139 0
30.3123 3.3
44.4444 444
Now the sum of numbers on the first lines should be 1.1 + 2.2 + 3.3 = 6.6. And the sum of numbers on second lines should be 1 + 0 + 444 = 445. I tried something like this:
def foo(folder_path):
contents = os.listdir(folder_path)
for file in contents:
path = os.path.join(folder_path, file)
with open(path, "r") as data:
rows = data.readlines()
for row in rows:
value = row.split()
second_float = float(value[1])
return sum(second_float)
When I run my code I get this error: TypeError: 'float' object is not iterable. I've been pulling my hair out with this, and don't know what to do can anyone help?
Here is how I would do it:
def open_file(file_name):
with open(file_name) as f:
for line in f:
yield line.strip().split() # Remove the newlines and split on spaces
files = ('text1.txt', 'text2.txt', 'text3.txt')
result = list(zip(*(open_file(f) for f in files)))
print(*result, sep='\n')
# result is now equal to:
# [
# (['1.213', '1.1'], ['0.123', '2.2'], ['30.3123', '3.3']),
# (['23.33', '1'], ['23139', '0'], ['44.4444', '444'])
# ]
for lst in result:
print(sum(float(x[1]) for x in lst)) # 6.6 and 445.0
It may be more logical to type cast the values to float inside open_file such as:
yield [float(x) for x in line.strip().split()]
but I that is up to you on how you want to change it.
See it in action.
-- Edit --
Note that the above solution loads all the files into memory before doing the math (I do this so I can print the result), but because of how the open_file generator works you don't need to do that, here is a more memory friendly version:
# More memory friendly solution:
# Note that the `result` iterator will be consumed by the `for` loop.
files = ('text1.txt', 'text2.txt', 'text3.txt')
result = zip(*(open_file(f) for f in files))
for lst in result:
print(sum(float(x[1]) for x in lst))
Related
screenshot of the csv file
Hi(sorry if this is a dump question)..i have a data set as CSV file ...every row contains 44 column and every cell containes 44 float number separated by two spaces like this(look at the screenshot) ...i tried CSV readline/s + numpy and non of them worked
i want to take every row as a list with[1936] variable (44*44)
and then combine the whole data set into 2d array ...my_data[n_of_samples][1936]
so as stated by user ybl, this is not a CSV. It's not even close to being a CSV.
This means that you have to implement some processing to turn this into something useable. I put the screenshot through an OCR to extract the actual text values, but next time provide the input file. Screenshots of data are annoying to work with.
The processing you need to to is to find the start and end of the rows, using the [ and ] characters respectively. Then you split this data with the basic string.split() which doesn't care about the number of spaces.
Try the code below and see if that works for the input file.
rows = []
current_row = ""
with open("somefile.txt") as infile:
for line in infile.readlines():
cleaned = line.replace('"', '').replace("\n", " ")
if "]" in cleaned:
current_row = f"{current_row} {cleaned.split(']')[0]}"
rows.append(current_row.split())
current_row = ""
cleaned = cleaned.split(']')[1]
if "[" in cleaned:
cleaned = cleaned.split("[")[1]
current_row = f"{current_row} {cleaned}"
for row in rows:
print(len(row))
output
44
44
44
input file:
"[ 1.79619717e+04 1.09988207e+02 4.13270009e+01 1.72227906e+01
1.06178751e+01 5.20957856e+00 7.50891645e+00 4.57943370e+00
2.65572713e+00 2.96725867e-01 2.43040664e+00 1.32822091e+00
4.09853169e-01 1.18412873e+00 6.43398990e-01 1.23796528e+00
9.63975374e-02 2.95295579e-01 7.68998970e-01 4.98040980e-01
2.84036936e-01 1.76004564e-01 1.43527613e-01 1.64765236e-01
1.51171075e-01 1.02586637e-01 3.27835810e-02 1.21872869e-02
-7.59824907e-02 8.48217334e-02 7.29953754e-02 4.89750588e-02
5.89426950e-02 5.05485266e-02 2.34761263e-02 -2.41095452e-02
5.15952510e-02 1.39933210e-02 2.12354074e-02 3.40820680e-03
-2.57466949e-03 -1.06481222e-02 -8.35155410e-03 1.21653512e-12]","[-6.12189619e+02 1.03584744e+04 2.34417495e+02 7.01761526e+01
3.92495170e+01 1.81609738e+01 2.58114624e+01 1.52275550e+01
8.59676934e+00 9.45036161e-01 7.71943506e+00 4.17516432e+00
1.27920413e+00 3.68862368e+00 1.99582544e+00 3.82999035e+00
2.96068511e-01 9.06341796e-01 2.35621065e+00 1.52094079e+00
8.64565916e-01 5.34605108e-01 4.35456793e-01 4.99450615e-01
4.57778770e-01 3.10324997e-01 9.90860520e-02 3.68281889e-02
-2.29532895e-01 2.56108491e-01 2.20284123e-01 1.47727878e-01
1.77724506e-01 1.52350751e-01 7.07318164e-02 -7.26252404e-02
1.55364050e-01 4.21222079e-02 6.39113311e-02 1.02558665e-02
-7.74736016e-03 -3.20368093e-02 -2.51241082e-02 1.21653512e-12]","[-5.03959282e+02 -5.64452044e+02 7.90433958e+03 1.94146598e+02
1.06178751e+01 5.20957856e+00 7.50891645e+00 4.57943370e+00
2.65572713e+00 2.96725867e-01 2.43040664e+00 1.32822091e+00
4.09853169e-01 1.18412873e+00 6.43398990e-01 1.23796528e+00
9.63975374e-02 2.95295579e-01 7.68998970e-01 4.98040980e-01
2.84036936e-01 1.76004564e-01 1.43527613e-01 1.64765236e-01
1.51171075e-01 1.02586637e-01 3.27835810e-02 1.21872869e-02
-7.59824907e-02 8.48217334e-02 7.29953754e-02 4.89750588e-02
5.89426950e-02 5.05485266e-02 2.34761263e-02 -2.41095452e-02
5.15952510e-02 1.39933210e-02 2.12354074e-02 3.40820680e-03
-2.57466949e-03 -1.06481222e-02 -8.35155410e-03 1.21653512e-12]"
The option is this:
import numpy as np
import csv
c = np.array([n_of_samples])
with open('cocacola_sick.csv') as f:
p = csv.reader(f) # read file as csv
for s in p:
a = ','.join(s) # concatenate all lines into one line
a = a.replace("\n", "") # remove line breaks
b = np.array(np.mat(a))
my_data = np.vstack((c,b))
print(my_data)
I am new to Python and to this forum, so I need some help with the following code:
original_soil_parameter_file = open('D:\Spring 2020\VIC\Parameter_files\original_soil_param.txt', "r")
Grid_Cell_id = open('D:\Spring 2020\VIC\Parameter_files\Grid_Cells.txt', "r")
Subset_soil_param = open('D:\Spring 2020\VIC\Parameter_files\subset_soil_param.txt', "w")
with open('D:\Spring 2020\VIC\Parameter_files\original_soil_param.txt') as f:
for line in f:
a = line.split(' ')
if a[1] == Grid_Cell_id:
Subset_soil_param.write(line)
Subset_soil_param.close()
Basically, I have an original file (variable original_soil_parameter_file), which covers the whole North Western United States. And I want to subset the file based on my area. The original file contains rows of values with each values separated by a space. In order to subset I provided another text file to the code and called it Grid_cell_id. Then I used the for loop to match the second value (a[1]) with the values in Grid_Cell_id, so that after a finding an identical grid cell id in both files, the code will start saving the lines in the new file named Subset_soil_param.txt. After I run code, the Subset_soil_param is created but it's empty. I get the following output, and the console nothing else and the file is empty, but the code does generate the subset_soil_param.txt file (which is empty).
runfile('D:/Spring 2020/VIC/Parameter_files/subset_soil_param.py', wdir='D:/Spring 2020/VIC/Parameter_files')
A sample from the original file:
1 240493 41.21875 -116.21875 0.1000 0.767791 0.400832 0.673064 2 13.6030 13.6030 13.6030 473.0640 473.0640 473.0640 -99 -99 -99 21.4270 64.2820 214.2750 1821.3800 0.1000 0.3000 0.7118 6.0880 4.0000 11.1500 11.1500 11.1500 0.4100 0.4100 0.4100 1485.7000 1485.7000 1485.7000 2620.2800 2620.2800 2620.2800 -8 0.3920 0.3920 0.3920 0.2560 0.2560 0.2560 0.0100 0.0300 458.8940 0 0 0 0 19.0384
Sample from the Grid_Cells.txt file:
288832
287904
287909
240493
You're hitting a road block when checking if there is a match. You're doing this;
if '240493' == Grid_Cell_id:
The issue here, is that Grid_Cell_id is a file type, it isn't actually what you're trying to compare. This causes you to always return false.
type(Grid_Cell_id)
#<class '_io.TextIOWrapper'>
#<_io.TextIOWrapper name='C:\\users\\admin\\desktop\\test.txt' mode='r' encoding='cp1252'>
We can solve this by storing the contents of Grid_Cell_id as a list.
Grid_Cell_id = open('D:\Spring 2020\VIC\Parameter_files\Grid_Cells.txt', "r")
Grid_Cell_id = list(Grid_Cell_id)
#You could also do;
Grid_Cell_id = list(open('D:\Spring 2020\VIC\Parameter_files\Grid_Cells.txt', "r"))
Now that you have your grid cells stored in a list, you can call;
if a[1] in Grid_Cell_id:
This will return True if the value is found and False if not found.
EDIT:
This code is working on my system;
Grid_Cell_id = list(open('Grid_Cell_id.txt', "r"))
with open('original_soil_parameter_file.txt') as f:
for line in f:
a = line.split(' ')
if a[1] in Grid_Cell_id:
with open('subset_soil_param.txt', "w") as output:
output.write(line)
I have a .txt file. It has 3 different columns. The first one is just numbers. The second one is numbers which starts with 0 and it goes until 7. The final one is a sentence like. And I want to keep them in different lists because of matching them for their numbers. I want to write a function. How can I separate them in different lists without disrupting them?
The example of .txt:
1234 0 my name is
6789 2 I am coming
2346 1 are you new?
1234 2 Who are you?
1234 1 how's going on?
And I have keep them like this:
----1----
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
----2----
2346 1 are you new?
----3-----
6789 2 I am coming
What I've tried so far:
inputfile=open('input.txt','r').read()
m_id=[]
p_id=[]
packet_mes=[]
input_file=inputfile.split(" ")
print(input_file)
input_file=line.split()
m_id=[int(x) for x in input_file if x.isdigit()]
p_id=[x for x in input_file if not x.isdigit()]
With your current approach, you are reading the entire file as a string, and performing a split on a whitespace (you'd much rather split on newlines instead, because each line is separated by a newline). Furthermore, you're not segregating your data into disparate columns properly.
You have 3 columns. You can split each line into 3 parts using str.split(None, 2). The None implies splitting on space. Each group will be stored as key-list pairs inside a dictionary. Here I use an OrderedDict in case you need to maintain order, but you can just as easily declare o = {} as a normal dictionary with the same grouping (but no order!).
from collections import OrderedDict
o = OrderedDict()
with open('input.txt', 'r') as f:
for line in f:
i, j, k = line.strip().split(None, 2)
o.setdefault(i, []).append([int(i), int(j), k])
print(dict(o))
{'1234': [[1234, 0, 'my name is'],
[1234, 2, 'Who are you?'],
[1234, 1, "how's going on?"]],
'6789': [[6789, 2, 'I am coming']],
'2346': [[2346, 1, 'are you new?']]}
Always use the with...as context manager when working with file I/O - it makes for clean code. Also, note that for larger files, iterating over each line is more memory efficient.
Maybe you want something like that:
import re
# Collect data from inpu file
h = {}
with open('input.txt', 'r') as f:
for line in f:
res = re.match("^(\d+)\s+(\d+)\s+(.*)$", line)
if res:
if not res.group(1) in h:
h[res.group(1)] = []
h[res.group(1)].append((res.group(2), res.group(3)))
# Output result
for i, x in enumerate(sorted(h.keys())):
print("-------- %s -----------" % (i+1))
for y in sorted(h[x]):
print("%s %s %s" % (x, y[0], y[1]))
The result is as follow (add more newlines if you like):
-------- 1 -----------
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
-------- 2 -----------
2346 1 are you new?
-------- 3 -----------
6789 2 I am coming
It's based on regexes (module re in python). This is a good tool when you want to match simple line based patterns.
Here it relies on spaces as columns separators but it can as easily be adapted for fixed width columns.
The results is collected in a dictionary of lists. each list containing tuples (pairs) of position and text.
The program waits output for sorting items.
It's a quite ugly code but it's quite easy to understand.
raw = []
with open("input.txt", "r") as file:
for x in file:
raw.append(x.strip().split(None, 2))
raw = sorted(raw)
title = raw[0][0]
refined = []
cluster = []
for x in raw:
if x[0] == title:
cluster.append(x)
else:
refined.append(cluster)
cluster = []
title = x[0]
cluster.append(x)
refined.append(cluster)
for number, group in enumerate(refined):
print("-"*10+str(number)+"-"*10)
for line in group:
print(*line)
I have a data.dat file that has 3 columns: The 3rd column is just the numbers 1 to 6 repeated again and again:
( In reality, column 3 has numbers from 1 to 1917, but for a minimal working example, let's stick to 1 to 6 )
# Title
127.26 134.85 1
127.26 135.76 2
127.26 135.76 3
127.26 160.97 4
127.26 160.97 5
127.26 201.49 6
125.88 132.67 1
125.88 140.07 2
125.88 140.07 3
125.88 165.05 4
125.88 165.05 5
125.88 203.06 6
137.20 140.97 1
137.20 140.97 2
137.20 148.21 3
137.20 155.37 4
137.20 155.37 5
137.20 184.07 6
I would like to:
1) extract the lines that contain 1 in the 3rd column and save them to a file called mode_1.dat.
2) extract the lines that contain 2 in the 3rd column and save them to a file called mode_2.dat.
3) extract the lines that contain 3 in the 3rd column and save them to a file called mode_3.dat.
.
.
.
6) extract the lines that contain 6 in the 3rd column and save them to a file called mode_6.dat.
In order to accomplish this, I have:
a) defined a variable factor = 6
a) created a one_to_factor list that has numbers 1 to 6
b) The re.search statement is in charge of extracting the lines for each value of one_to_factor. %s are the i inside the one_to_factor list
c) append these results to an empty LINES list.
However, this does not work. I cannot manage to extract the lines that contain i in the 3rd column and save them to a file called mode_i.dat
I would appreciate if you could help me.
factor = 6
one_to_factor = range(1,factor+1)
LINES = []
f_2 = open('data.dat', 'r')
for line in f_2:
for i in one_to_factor:
if re.search(r' \b%s$' %i , line):
print 'line = ', line
LINES.append(line)
print 'LINES =' , LINES
I would do it like this:
no regexes, just use str.split() to split according to whitespace
use last item (the digit) of the current line to generate the filename
use a dictionary to open the file the first time, and reuse the handle for subsequent matches (write title line at file open)
close all handles in the end
code:
title_line="# Vol \t Freq \t Mod \n"
handles = dict()
next(f_2) # skip title
for line in f_2:
toks = line.split()
filename = "mode_{}.dat".format(toks[-1])
# create files first time id encountered
if filename in handles:
pass
else:
handles[filename] = open(filename,"w")
handles[filename].write(title_line) # write title
handles[filename].write(line)
# close all files
for v in handles.values():
v.close()
EDIT: that's the fastest way but the problem is if you have too many suffixes (like in your real example), you'll get "too many open files" exception. So for this case, there's a slightly less efficient method but which works too:
import glob,os
# pre-processing: cleanup old files if any
for f in glob.glob("mode_*.dat"):
os.remove(f)
next(f_2) # skip title
s = set()
title_line="# Vol \t Freq \t Mod \n"
for line in f_2:
toks = line.split()
filename = "mode_{}.dat".format(toks[-1])
with open(filename,"a") as f:
if filename in s:
pass
else:
s.add(filename)
f.write(title_line)
f.write(line)
It basically opens as append mode, writes the lines, and closes the file.
(the set is used to detect first write in this file, so title can be written before the data)
There's a directory cleanup first to ensure that no data is left from a previous computation (append mode expects that no file exists, and if input data set changes, there's a possibility that there's an indentifier not present in the new dataset, so there would be an "orphan" file remaining from previous run)
First, instead of looping on you one_to_factor, you can get the index in one step :
index = line[-1] # Last character on the line
Then, you can check if index is in your one_to_factor list.
You should created a dictionary of lists to store your lines.
Something like :
{ "1" : [line1, line7, ...],
"2" : ....
}
And then you can use the key of the dictionnary to create the file and populate it with lines.
I'm having trouble writing to text file. Here's my code snippet.
ram_array= map(str, ram_value)
cpu_array= map(str, cpu_value)
iperf_ba_array= map(str, iperf_ba)
iperf_tr_array= map(str, iperf_tr)
#with open(ram, 'w') as f:
#for s in ram_array:
#f.write(s + '\n')
#with open(cpu,'w') as f:
#for s in cpu_array:
#f.write(s + '\n')
with open(iperf_b,'w') as f:
for s in iperf_ba_array:
f.write(s+'\n')
f.close()
with open(iperf_t,'w') as f:
for s in iperf_tr_array:
f.write(s+'\n')
f.close()
The ram and cpu both work flawlessly, however when writing to a file for iperf_ba and iperf_tr they always come out look like this:
[45947383.0, 47097609.0, 46576113.0, 47041787.0, 47297394.0]
Instead of
1
2
3
They're both reading from global lists. The cpu and ram have values appended 1 by 1, but otherwise they look exactly the same pre processing.
Here's how they're made
filename= "iperfLog_2015_03_12_20:45:18_123_____tag_33120L06.csv"
write_location= self.tempLocation()
location=(str(write_location) + str(filename));
df = pd.read_csv(location, names=list('abcdefghi'))
transfer = df.h
transfer=transfer[~transfer.isnull()]#uses pandas to remove nan
transfer=transfer.tolist()
length= int(len(transfer))
extra= length-1
del transfer[extra]
bandwidth= df.i
bandwidth=bandwidth[~bandwidth.isnull()]
bandwidth=bandwidth.tolist()
del bandwidth[extra]
iperf_tran.append(transfer)
iperf_band.append(bandwidth)
[from comment]
you need to use .extend(list) if you want to add a list to a list - and don't worry: we're all spending hours debugging/chasing classy-stupid-me mistakes sometimes ;)