Using '' with open, reader '' functions - python

I got a problem with this code:
import csv
with open('gios-pjp-data.csv', 'r') as data:
l = []
reader = csv.reader(data, delimiter=';')
next(reader)
next(reader) # I need to skip 2 lines here and dont know how to do it in other way
l.append(# here is my problem that I will describe below)
So this file contains about 350 lines with 4 columns and
each one is built like this:
Date ; float number ; float number ; float number
Something like this:
2017-01-01;56.7;167.2;236.9
Now, I dont know how to build a function that would append first float number and third float number to the list on condition that its value is >200.
Do you have any suggestions?

List comprehentions if you don't have too many items in the file.
l = [x[1], x[3] for x in reader if x[1] > 200]
Or a similar function that would yield each line, if you have a huge number of entries.
def getitems():
for x in reader:
if x[1] > 200:
yield x[1], x[3]
l = getitems() # this is now an iterator, more memory efficient.
l = list(l) # now its a list

Related

Find max and extract data from a list

I have a text file with twenty car prices and its serial number there are 50 lines in this file. I would like to find the max car price and its serial for every 10 lines.
priceandserial.txt
102030 4000.30
102040 5000.40
102080 5500.40
102130 4000.30
102140 5000.50
102180 6000.50
102230 2000.60
102240 4000.30
102280 6000.30
102330 9000.70
102340 1000.30
102380 3000.30
102430 4000.80
102440 5000.30
102480 7000.30
When I tried Python's builtin max function I get 102480 as the max value.
x = np.loadtxt('carserial.txt', unpack=True)
print('Max:', np.max(x))
Desired result:
102330 9000.70
102480 7000.30
There are 50 lines in file, therefore I should have a 5 line result with serial and max prices of each 10 lines.
Respectfully, I think the first solution is over-engineered. You don't need numpy or math for this task, just a dictionary. As you loop through, you update the dictionary if the latest value is greater than the current value, and do nothing if it isn't. Everything 10th item, you append the values from the dictionary to an output list and reset the buffer.
with open('filename.txt', 'r') as opened_file:
data = opened_file.read()
rowsplitdata = data.split('\n')
colsplitdata = [u.split(' ') for u in rowsplitdata]
x = [[int(j[0]), float(j[1])] for j in colsplitdata]
output = []
buffer = {"max":0, "index":0}
count = 0
#this assumes x is a list of lists, not a numpy array
for u in x:
count += 1
if u[1] > buffer["max"]:
buffer["max"] = u[1]
buffer["index"] = u[0]
if count == 10:
output.append([buffer["index"], buffer["max"]])
buffer = {"max":0, "index":0}
count = 0
#append the remainder of the buffer in case you didn't get to ten in the final pass
output.append([buffer["index"], buffer["max"]])
output
[[102330, 9000.7], [102480, 7000.3]]
You should iterate over it and for each 10 lines extract the maximum:
import math
# New empty list for colecting the results
max_list=[]
#iterate thorught x supposing
for i in range(math.ceil(len(x)/10)):
### append only 10 elments if i+10 is not superior to the lenght of the array
if i+11<len(x):
max_list=max_list.append(np.max(x[i:i+11]))
### if it is superior, then append all the remaining elements
else:
max_list=max_list.append(np.max(x[i:]))
This should do your job.
number_list = [[],[]]
with open('filename.txt', 'r') as opened_file:
for line in opened_file:
if len(line.split()) == 0:
continue
else:
a , b = line.split(" ")
number_list[0].append(a)
number_list[1].append(b)
col1_max, col2_max = max(number_list[0]), max(number_list[1])
col1_max, col2_max
Just change the filename. col1_max, col2_max have the respective column's max value. You can edit the code to accommodate more columns.
You can transpose your input first, then use np.split and for each submatrix you calculate its max.
x = np.genfromtxt('carserial.txt', unpack=True).T
print(x)
for submatrix in np.split(x,len(x)//10):
print(max(submatrix,key=lambda l:l[1]))
working example

Removing quotes from 2D array python

I am currently trying to execute code that evaluetes powers with big exponents without calculating them, but instead logs of them. I have a file containing 1000 lines. Each line contains two itegers separated by a comma. I got stuck at point where i tried to remove quotes from array. I tried many way of which none worked. Here is my code:
function from myLib called split() takes two argumanets of which one is a list and second is to how many elemts to split the original list. Then does so and appends smaller lists to the new one.
import math
import myLib
i = 0
record = 0
cmpr = 0
with open("base_exp.txt", "r") as f:
fArr = f.readlines()
fArr = myLib.split(fArr, 1)
#place get rid of quotes
print(fArr)
while i < len(fArr):
cmpr = int(fArr[i][1]) * math.log(int(fArr[i][0]))
if cmpr > record:
record = cmpr
print(record)
i = i + 1
This is how my Array looks like:
[['519432,525806\n'], ['632382,518061\n'], ... ['172115,573985\n'], ['13846,725685\n']]
I tried to find a way around the 2d array and tried:
i = 0
record = 0
cmpr = 0
with open("base_exp.txt", "r") as f:
fArr = f.readlines()
#fArr = myLib.split(fArr, 1)
fArr = [x.replace("'", '') for x in fArr]
print(fArr)
while i < len(fArr):
cmpr = int(fArr[i][1]) * math.log(int(fArr[i][0]))
if cmpr > record:
record = cmpr
print(i)
i = i + 1
But output looked like this:
['519432,525806\n', '632382,518061\n', '78864,613712\n', ...
And the numbers in their current state cannot be considered as integers or floats so this isnt working as well...:
[int(i) for i in lst]
Expected output for the array itself would look like this, so i can pick one of the numbers and work with it:
[[519432,525806], [632382,518061], [78864,613712]...
I would really apreciate your help since im still very new to python and programming in general.
Thank you for your time.
You can avoid all of your problems by simply using numpy's convenient loadtxt function:
import numpy as np
arr = np.loadtxt('p099_base_exp.txt', delimiter=',')
arr
array([[519432., 525806.],
[632382., 518061.],
[ 78864., 613712.],
...,
[325361., 545187.],
[172115., 573985.],
[ 13846., 725685.]])
If you need a one-dimensional array:
arr.flatten()
# array([519432., 525806., 632382., ..., 573985., 13846., 725685.])
This is your missing piece:
fArr = [[int(num) for num in line.rstrip("\n").split(",")] for line in fArr]
Here, rstrip("\n") will remove trailing \n character from the line and then the string will be split on , so that each string will be become a list and all integers in that line will become elements of that list but as a string. Then, we can call int() function on each list element to convert them into int data type.
Below code should do the job if you don't want to import an additional library.
i = 0
record = 0
cmpr = 0
with open("base_exp.txt", "r") as f:
fArr = f.readlines()
fArr = [[int(num) for num in line.rstrip("\n").split(",")] for line in fArr]
print(fArr)
while i < len(fArr):
cmpr = fArr[i][1] * math.log(fArr[i][0])
if cmpr > record:
record = cmpr
print(i)
i = i + 1
This snippet will transform your array to 1D array of integers:
from itertools import chain
arr = [['519432,525806\n'], ['632382,518061\n']]
new_arr = [int(i.strip()) for i in chain.from_iterable(i[0].split(',') for i in arr)]
print(new_arr)
Prints:
[519432, 525806, 632382, 518061]
For 2D output you can use this:
arr = [['519432,525806\n'], ['632382,518061\n']]
new_arr = [[int(i) for i in v] for v in (i[0].split(',') for i in arr)]
print(new_arr)
This prints:
[[519432, 525806], [632382, 518061]]
new_list=[]
a=['519432,525806\n', '632382,518061\n', '78864,613712\n',]
for i in a:
new_list.append(list(map(int,i.split(","))))
print(new_list)
Output:
[[519432, 525806], [632382, 518061], [78864, 613712]]
In order to flatten the new_list
from functools import reduce
reduce(lambda x,y: x+y,new_list)
print(new_list)
Output:
[519432, 525806, 632382, 518061, 78864, 613712]

python: multiple .dat's in multiple arrays

I'm trying to sort some data into (np.)arrays and get stuck with a problem.
I have 1000 .dat files and I need to put the data from them in 1000 different arrays. Further, every array should contain data depend on coordinates [i] [j] [k] (this part I've done already and the code looks like this (this is kind of "short" version):
with open('177500.dat', newline='') as csvfile:
f = csv.reader(csvfile, delimiter=' ')
for row in f:
<some code which works pretty good>
cV = [[[[] for k in range(kMax)] for j in range(jMax)] for i in range(iMax)]
with open('177500.dat', newline='') as csvfile:
f = csv.reader(csvfile, delimiter=' ')
<some code which works also good>
values = np.array([np.float64(row[i]) for i in range(3, rowLen)])
cV[int(row[0])][int(row[1])][int(row[2])] = values
After this, i can print cV [i] [j] [k] and I get all data which is contained in one .dat file at the coordinates [i] [j] [k].
And now I need to create cV [i] [j] [k] [n] to get the data from the specific file number n at the coordinates [i] [j] [k]. And I absolutely don't know how can I tell python to put the data into the "right" place.
I tried some things like this:
for m in range(160000,182501,2500):
with open ('output/%d.dat' % m, newline='') as csvfile:
<bla bla code>
cV = [[[[[] for k in range(kMax)] for j in range(jMax)] for i in range(iMax)] for n in range(tMax)]
if len(row) == rowLen:
values = [np.array([np.float64(row[i]) for i in range (3, rowLen)]) for n in range(tMax)]
for n in range(tMax):
cV[int(row[0])][int(row[1])][int(row[2])][int(n)] = values[n]
But this surely didn't work because python don't know what the hack should be this [n] after the values.
So, how can I tell pyhton to put this [i] [j] [k] data from the file nr. n in the array cV [i] [j] [k] [n]?
Thanks in advance
C.
P.S. I didn't post the whole code because I don't think it is necessary. All arrays are created properly, but the thing which isn't working ist the data in them.
I think building arrays like this is going to make things more complicated for you. It would be easier to build a dictionary using tuples as keys. In the example file you sent me, each (x, y, z) pair was repeated twice, making me think that each file contains data on two iterations of a total solution of 2000 iterations. Dictionaries must have unique keys, so for each file I have implemented another counter, timestep, that can increment when collating data from a single file.
Now, if I wanted coords (1, 2, 3) on the 3rd timestep, I could do simulation[(1, 2, 3, 3)].
import csv
import numpy as np
'''
Made the assumptions that:
-Each file contains two iterations from a simulation of 2000 iterations
-Each file is numbered sequentially. Each time the same (x, y, z) coords are
discovered, it represents the next timestep in simulation
Accessing data is via a tuple key (x, y, z, n) with n being timestep
'''
simulation = {}
file_count = 1
timestep = 1
num_files = 2
for x in range(1, num_files + 1):
with open('sim_file_{}.dat'.format(file_count), 'r') as infile:
second_read = False
reader = csv.reader(infile, delimiter=' ')
for row in reader:
item = [float(x) for x in row]
if row:
if (not second_read and not
any(simulation.get((item[0], item[1], item[2], timestep), []))):
timestep += 1
second_read = True
simulation[(item[0], item[1], item[2], timestep)] = np.array(item[3:])
file_count += 1
timestep += 1
second_read = False

Python script: Removing tabs, separating columns with comma, listing max/min

I am trying to write a script in Python to remove tabs/blank spaces between two columns (one with x coordinates, the other with y coordinates) plus separate columns by a comma instead and list the maximum and minimum values of each column (2 values for each the x and y coordinates) at the end like this:
10000000 6000000
20000000 6100000
30000000 6200000
40000000 6300000
50000000 6400000
to appear like:
10000000,6000000
20000000,6100000
30000000,6200000
40000000,6300000
50000000,6400000
10000000 50000000 60000000 640000000
I'm a novice so any help wis v much appreciated. Many thanks!
You can use the csv module for the output; simply loop over the input file and use str.split() to split lines into rows:
import csv
minimum = [float('inf'), float('inf')]
maximum = [float('-inf'), float('-inf')]
with open(inputfilename, 'r') as infile:
with open(outputfilename, 'wb') as outfile:
writer = csv.writer(outfile)
for line in infile:
row = map(int, line.split())
minimum = map(min, zip(minimum, row))
maximum = map(max, zip(maximum, row))
writer.writerow(row)
x_extremes, y_extremes = zip(minimum, maximum)
print ' '.join(map(str, x_extremes)), ' '.join(map(str, y_extremes))
The float('inf') and float('-inf') starter values make it easier to calculate the minimum and maximum coordinates later on.
The last line is
This would work:
li = []
for line in lines:
li.append(', '.join(line.split('\t')))
first_column = [int(x.split(',')[0]) for x in li]
second_column = [int(x.split(' ')[1]) for x in li]
for x in li:
print (x)
print(min(first_column), max(first_column), min(second_column), max(second_column))

nested for loop in python not working

We basically have a large xcel file and what im trying to do is create a list that has the maximum and minimum values of each column. there are 13 columns which is why the while loop should stop once it hits 14. the problem is once the counter is increased it does not seem to iterate through the for loop once. Or more explicitly,the while loop only goes through the for loop once yet it does seem to loop in that it increases the counter by 1 and stops at 14. it should be noted that the rows in the input file are strings of numbers which is why I convert them to tuples and than check to see if the value in the given position is greater than the column_max or smaller than the column_min. if so I reassign either column_max or column_min.Once this is completed the column_max and column_min are appended to a list( l ) andthe counter,(position), is increased to repeat the next column. Any help will be appreciated.
input_file = open('names.csv','r')
l= []
column_max = 0
column_min = 0
counter = 0
while counter<14:
for row in input_file:
row = row.strip()
row = row.split(',')
row = tuple(row)
if (float(row[counter]))>column_max:
column_max = float(row[counter])
elif (float(row[counter]))<column_min:
column_min = float(row[counter])
else:
column_min=column_min
column_max = column_max
l.append((column_max,column_min))
counter = counter + 1
I think you want to switch the order of your for and while loops.
Note that there is a slightly better way to do this:
with open('yourfile') as infile:
#read first row. Set column min and max to values in first row
data = [float(x) for x in infile.readline().split(',')]
column_maxs = data[:]
column_mins = data[:]
#read subsequent rows getting new min/max
for line in infile:
data = [float(x) for x in line.split(',')]
for i,d in enumerate(data):
column_maxs[i] = max(d,column_maxs[i])
column_mins[i] = min(d,column_mins[i])
If you have enough memory to hold the file in memory at once, this becomes even easier:
with open('yourfile') as infile:
data = [map(float,line.split(',')) for line in infile]
data_transpose = zip(*data)
col_mins = [min(x) for x in data_transpose]
col_maxs = [max(x) for x in data_transpose]
Once you have consumed the file, it has been consumed. Thus iterating over it again won't produce anything.
>>> for row in input_file:
... print row
1,2,3,...
4,5,6,...
etc.
>>> for row in input_file:
... print row
>>> # Nothing gets printed, the file is consumed
That is the reason why your code is not working.
You then have three main approaches:
Read the file each time (inefficient in I/O operations);
Load it into a list (inefficient for large files, as it stores the whole file in memory);
Rework the logic to operate line by line (quite feasible and efficient, though not as brief in code as loading it all into a two-dimensional structure and transposing it and using min and max may be).
Here is my technique for the third approach:
maxima = [float('-inf')] * 13
minima = [float('inf')] * 13
with open('names.csv') as input_file:
for row in input_file:
for col, value in row.split(','):
value = float(value)
maxima[col] = max(maxima[col], value)
minima[col] = min(minima[col], value)
# This gets the value you called ``l``
combined_max_and_min = zip(maxima, minima)

Categories

Resources