Related
I'm new to programming and python and I'm looking for a way to distinguish between two input formats in the same input file text file. For example, let's say I have an input file like so where values are comma-separated:
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
Where the format is N followed by N lines of Data1, and M followed by M lines of Data2. I tried opening the file, reading it line by line and storing it into one single list, but I'm not sure how to go about to produce 2 lists for Data1 and Data2, such that I would get:
Data1 = ["Washington,A,10", "New York,B,20", "Seattle,C,30", "Boston,B,20", "Atlanta,D,50"]
Data2 = ["New York,5", "Boston,10"]
My initial idea was to iterate through the list until I found an integer i, remove the integer from the list and continue for the next i iterations all while storing the subsequent values in a separate list, until I found the next integer and then repeat. However, this would destroy my initial list. Is there a better way to separate the two data formats in different lists?
You could use itertools.islice and a list comprehension:
from itertools import islice
string = """
5
Washington,A,10
New York,B,20
Seattle,C,30
Boston,B,20
Atlanta,D,50
2
New York,5
Boston,10
"""
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [string.split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
This yields
[['Washington,A,10', 'New York,B,20', 'Seattle,C,30', 'Boston,B,20', 'Atlanta,D,50'], ['New York,5', 'Boston,10']]
For a file, you need to change it to:
with open("testfile.txt", "r") as f:
result = [[x for x in islice(parts, idx + 1, idx + 1 + int(line))]
for parts in [f.read().split("\n")]
for idx, line in enumerate(parts)
if line.isdigit()]
print(result)
You're definitely on the right track.
If you want to preserve the original list here, you don't actually have to remove integer i; you can just go on to the next item.
Code:
originalData = []
formattedData = []
with open("data.txt", "r") as f :
f = list(f)
originalData = f
i = 0
while i < len(f): # Iterate through every line
try:
n = int(f[i]) # See if line can be cast to an integer
originalData[i] = n # Change string to int in original
formattedData.append([])
for j in range(n):
i += 1
item = f[i].replace('\n', '')
originalData[i] = item # Remove newline char in original
formattedData[-1].append(item)
except ValueError:
print("File has incorrect format")
i += 1
print(originalData)
print(formattedData)
The following code will produce a list results which is equal to [Data1, Data2].
The code assumes that the number of entries specified is exactly the amount that there is. That means that for a file like this, it will not work.
2
New York,5
Boston,10
Seattle,30
The code:
# get the data from the text file
with open('filename.txt', 'r') as file:
lines = file.read().splitlines()
results = []
index = 0
while index < len(lines):
# Find the start and end values.
start = index + 1
end = start + int(lines[index])
# Everything from the start up to and excluding the end index gets added
results.append(lines[start:end])
# Update the index
index = end
For starters I've programmed in C++ for the past year and a half, and this is the first time I'm using Python.
The objects have two int attributes, say i_ and j_.
The text file is as follows:
1,0
2,0
3,1
4,0
...
What I want to do is have the list filled with objects with correct attributes. For example,
print(myList[2].i_, myList[2].j_, end = ' ')
would return
3 1
Here's my attempt after reading a little online.
class myClass:
def __init__(self, i, j):
self.i_ = i
self.j_ = j
with open("myFile.txt") as f:
myList = [list(map(int, line.strip().split(','))) for line in f]
for line in f:
i = 0
while (i < 28):
myList.append(myClass(line.split(","), line.split(",")))
i +=1
But it doesn't work obviously.
Thanks in advance!
Since you're working with a CSV file you might want to use the csv module. First you would pass the file object to the csv.reader function and it will return an iterable of rows from the file. From there you can cast it to a list and slice it to the 29 rows you are required to have. Finally, you can iterate over the rows (e.g. [1,0]) and simply unpack them in the class constructor.
class MyClass:
def __init__(self, i, j):
self.i = int(i)
self.j = int(j)
def __repr__(self):
return f"MyClass(i={self.i}, j={self.j})"
with open('test.txt') as f:
rows = [r.strip().split(',') for r in f.readlines()[:29]]
my_list = [MyClass(*row) for row in rows]
for obj in my_list:
print(obj.i, obj.j)
print(len(my_list))
I not sure you really what to stick with this format
print(myList[2].i_, myList[2].j_, end = ' ')
My solution is quite manual coded and i am using dictionary to store i and j
result = {'i':[],
'j':[]}
and below is my code
result = {'i':[],
'j':[]}
with open('a.txt', 'r') as myfile:
data=myfile.read().replace('\n', ',')
print(data)
a = data.split(",")
print (a)
b = [x for x in a if x]
print(b)
for i in range( 0, len(b)):
if i % 2 == 0:
result['i'].append(b[i])
else:
result['j'].append(b[i])
print(result['i'])
print(result['j'])
print(str(result['i'][2])+","+ str(result['j'][2]))
The result: 3,1
I'm not sure what you're trying to do with myList = [list(map(int, line.strip().split(','))) for line in f]. This will give you a list of lists with those pairs converted to ints. But you really want objects from those numbers. So let's do that directly as we iterate through the lines in the file and do away with the next while loop:
my_list = []
with open("myFile.txt") as f:
for line in f:
nums = [int(i) for i in line.strip().split(',') if i]
if len(nums) >= 2:
my_list.append(myClass(nums[0], nums[1]))
I am currently trying to execute code that evaluetes powers with big exponents without calculating them, but instead logs of them. I have a file containing 1000 lines. Each line contains two itegers separated by a comma. I got stuck at point where i tried to remove quotes from array. I tried many way of which none worked. Here is my code:
function from myLib called split() takes two argumanets of which one is a list and second is to how many elemts to split the original list. Then does so and appends smaller lists to the new one.
import math
import myLib
i = 0
record = 0
cmpr = 0
with open("base_exp.txt", "r") as f:
fArr = f.readlines()
fArr = myLib.split(fArr, 1)
#place get rid of quotes
print(fArr)
while i < len(fArr):
cmpr = int(fArr[i][1]) * math.log(int(fArr[i][0]))
if cmpr > record:
record = cmpr
print(record)
i = i + 1
This is how my Array looks like:
[['519432,525806\n'], ['632382,518061\n'], ... ['172115,573985\n'], ['13846,725685\n']]
I tried to find a way around the 2d array and tried:
i = 0
record = 0
cmpr = 0
with open("base_exp.txt", "r") as f:
fArr = f.readlines()
#fArr = myLib.split(fArr, 1)
fArr = [x.replace("'", '') for x in fArr]
print(fArr)
while i < len(fArr):
cmpr = int(fArr[i][1]) * math.log(int(fArr[i][0]))
if cmpr > record:
record = cmpr
print(i)
i = i + 1
But output looked like this:
['519432,525806\n', '632382,518061\n', '78864,613712\n', ...
And the numbers in their current state cannot be considered as integers or floats so this isnt working as well...:
[int(i) for i in lst]
Expected output for the array itself would look like this, so i can pick one of the numbers and work with it:
[[519432,525806], [632382,518061], [78864,613712]...
I would really apreciate your help since im still very new to python and programming in general.
Thank you for your time.
You can avoid all of your problems by simply using numpy's convenient loadtxt function:
import numpy as np
arr = np.loadtxt('p099_base_exp.txt', delimiter=',')
arr
array([[519432., 525806.],
[632382., 518061.],
[ 78864., 613712.],
...,
[325361., 545187.],
[172115., 573985.],
[ 13846., 725685.]])
If you need a one-dimensional array:
arr.flatten()
# array([519432., 525806., 632382., ..., 573985., 13846., 725685.])
This is your missing piece:
fArr = [[int(num) for num in line.rstrip("\n").split(",")] for line in fArr]
Here, rstrip("\n") will remove trailing \n character from the line and then the string will be split on , so that each string will be become a list and all integers in that line will become elements of that list but as a string. Then, we can call int() function on each list element to convert them into int data type.
Below code should do the job if you don't want to import an additional library.
i = 0
record = 0
cmpr = 0
with open("base_exp.txt", "r") as f:
fArr = f.readlines()
fArr = [[int(num) for num in line.rstrip("\n").split(",")] for line in fArr]
print(fArr)
while i < len(fArr):
cmpr = fArr[i][1] * math.log(fArr[i][0])
if cmpr > record:
record = cmpr
print(i)
i = i + 1
This snippet will transform your array to 1D array of integers:
from itertools import chain
arr = [['519432,525806\n'], ['632382,518061\n']]
new_arr = [int(i.strip()) for i in chain.from_iterable(i[0].split(',') for i in arr)]
print(new_arr)
Prints:
[519432, 525806, 632382, 518061]
For 2D output you can use this:
arr = [['519432,525806\n'], ['632382,518061\n']]
new_arr = [[int(i) for i in v] for v in (i[0].split(',') for i in arr)]
print(new_arr)
This prints:
[[519432, 525806], [632382, 518061]]
new_list=[]
a=['519432,525806\n', '632382,518061\n', '78864,613712\n',]
for i in a:
new_list.append(list(map(int,i.split(","))))
print(new_list)
Output:
[[519432, 525806], [632382, 518061], [78864, 613712]]
In order to flatten the new_list
from functools import reduce
reduce(lambda x,y: x+y,new_list)
print(new_list)
Output:
[519432, 525806, 632382, 518061, 78864, 613712]
The code uses the matrix and arrpow functions to calculate the fibonacci numbers for the elements in my list, num. Oddly, right after a.append(float(row[0])) is completed, the error I get is
IndexError: list index out of range
Which is obviously coming from b.append.
Here's the file I want to pull from
import time
import math
import csv
import matplotlib.pyplot as plt
def arrpow(arr, n):
yarr=arr
if n<1:
pass
if n==1:
return arr
yarr = arrpow(arr, n//2)
yarr = [[yarr[0][0]*yarr[0][0]+yarr[0][1]*yarr[1][0],yarr[0][0]*yarr[0][1]+yarr[0][1]*yarr[1][1]],
[yarr[1][0]*yarr[0][0]+yarr[1][1]*yarr[1][0],yarr[1][0]*yarr[0][1]+yarr[1][1]*yarr[1][1]]]
if n%2:
yarr=[[yarr[0][0]*arr[0][0]+yarr[0][1]*arr[1][0],yarr[0][0]*arr[0][1]+yarr[0][1]*arr[1][1]],
[yarr[1][0]*arr[0][0]+yarr[1][1]*arr[1][0],yarr[1][0]*arr[0][1]+yarr[1][1]*arr[1][1]]]
return yarr
def matrix(n):
arr= [[1,1],[1,0]]
f=arrpow(arr,n-1)[0][0]
return f
num = [10,100,1000,10000,100000,1000000]
with open('matrix.dat', 'w') as h:
for i in num:
start_time = 0
start_time = time.time()
run = matrix(i)
h.write(str(math.log10(i)))
h.write('\n')
h.write((str(math.log10(time.time()-start_time))))
h.write('\n')
a = []
b = []
with open('matrix.dat','r+') as csvfile:
plots = csv.reader(csvfile, delimiter=',')
for row in plots:
a.append(float(row[0]))
b.append(float(row[1]))
plt.plot(a,b,label = " ")
row = ['1.0']
So row is a list with 1 value. row[1] is trying to access the second index of a list with 1 value. That is why you are getting an error.
When you are constructing matrix.dat, you do not add a comma for the CSV reader to separate the data. So when it tries to read the file, the whole thing is converted into a 1-element array. Attempting to access the second element throws an error because it doesn't exist.
Solution: Replace \n on line 34 with a comma (,).
I want to do the equivalent to adding elements in a python list recursively in Numpy, As in the following code
matrix = open('workfile', 'w')
A = []
for row in matrix:
A.append(row)
print A
I have tried the following:
matrix = open('workfile', 'w')
A = np.array([])
for row in matrix:
A = numpy.append(row)
print A
It does not return the desired output, as in the list.
Edit this is the sample code:
mat = scipy.io.loadmat('file.mat')
var1 = mat['data1']
A = np.array([])
for row in var1:
np.append(A, row)
print A
This is just the simplest case of what I want to do, but there is more data processing in the loop, I am putting it this way so the example is clear.
You need to pass the array, A, to Numpy.
matrix = open('workfile', 'w')
A = np.array([])
for row in matrix:
A = numpy.append(A, row)
print A
However, loading from the files directly is probably a nicer solution.