Creating a variable for each line in a txt file

Creating a variable for each line in a txt file - python

I am building a program that will eventually be able to construct finite automata and am in the early stages of reading in the txt file to set up my variables (states, alphabet, starting position, final positions, and rules, respectively). Here is the code I have thus far:
def read_dfa(dfa_filename):
dfa = open(dfa_filename, 'r')
count = 0
for line in dfa:
count += 1
states, alphabet, initial, final, rules = (line.format(count, line.strip()))
The issue I am currently facing is that lines 1:4 will always provide the data I need, however, the rules variable could be just line 5, or it could be multiple lines (5:n). How do I make it so I can successfully store the lines I want to each appropriate variable?
This is the error I am faced with: 'ValueError: Too many values to unpack (expected 5)'
This is an example of what dfa_filename could be:
q1,q2 #states
a,b #alphabet
q1 #initial
q2 #final
q1,a,q2 #rules
q1,b,q1 #rules
q2,a,q2 #rules
q2,b,q2 #rules

You could use splitlines and assign the first 4 lines to states, alphabet, initial,and final. Then, assign the remaining lines to rules.
with open('textfile.txt') as input:
data = input.read()
states, alphabet, initial, final = data.splitlines()[0:4]
rules = data.splitlines()[4:]

if I understood correctly, the first 4 are always in this order and then there are multiple rules. And the number of rules can change?
If so, try to unpack with * Operator. This allows to unpack multiple values into one variable:
a, *b = 1, 2, 3
results in a=1 and b=[2, 3]
So it should be something around
states, alphabet, initial, final, *rules = (line.format(count, line.strip()))
Stay save

you can use list to unpack file in list and then get needed lines by index numbers.
file_lines = list(open('textfile.txt'))
states, alphabet, initial, final = file_lines[0:4]
rules = file_lines[4:]

Related

opening txt file and creating list and getting basic stats

I am trying to open a textfile called state_meet.txt file; the info is formatted as
gymnastics_school,participant_name,all-around_points_earned
see example:
Lanier City Gymnastics,Ben W.,55.301
Lanier City Gymnastics,Alex W.,54.801
Lanier City Gymnastics,Sky T.,51.2
Lanier City Gymnastics,William G.,47.3 etc..
and create functions to get info such as:
The total count of gymnasts that participated in the state meet.
The first place score.
The last place score.
The score differential between the first and last place.
The average score for all gymnasts.
The median score. (The median is the grade at the mid-point of a sorted list. If there is an even number of elements in the list, the median is the average of the 2 middle elements.)
The average of all scores above the median (not including the median).
The average of all scores below the median (not including the median).
The output should look as such
Summary of Data:
Number of gymnasts: 103
First place score: 143.94
Here's the code I have so far:
with open('state_meet.txt','r') as f:
for line in f:
allt = []
values = line.split()
print(values[3])
#first
max_val = max(values[3])
int(max_val)
print(max_val)
#last
min_val = min(values[3])
int(min_val)
print(min_val)
#Mean
total = sum(input_list)
length = len(input_list)
for nums in [input_list]:
mean_val = total / length
float(mean_val)
#Median
sorted(input_list)
med_val = sorted(lst)
lstLen = len(lst)
index = (lstLen - 1) // 2
this is what i have so far but my text is reading it as W.,55.301 instead of 55.301 and giving me errors

You have a comma-separated values (csv) file. Use the csv module.
import csv
data = []
with open("state_meet.txt") as f:
reader = csv.DictReader(f, fieldnames=["school", "participant", "score"])
for line in reader:
data.append(line)
# first place
record = max(data, lambda d: d["score"])
best_score = int(record["score"])
# last place
record = min(data, lambda d: d["score"])
worst_score = int(record["score"])
# Mean score
mean = sum(d["score"] for d in data) / len(data)
# Median score
median = sorted([d["score"] for d in data])[(len(data) - 1) // 2]
csv.DictReader reads the lines of your csv file and automatically converts each one to a dictionary, keyed by whatever you like. This is perhaps easier to read than the collections.namedtuple suggestion in dokelung's answer, though namedtuple is equally valid. The key here is that we can keep the entire record around instead of throwing away everything but the score.

You should use split(',') to replace split().
Use values[2] to get the third item of list values.
list allt seems no use.
It seems that no matter how many lines are there in state_meet.txt, values always gets the last line data.
I guess the things you want to do:
import collections
names = ["gymnastics_school", "participant_name", "all_around_points_earned"]
Data = collections.namedtuple("Data", names)
data = []
with open('state_meet.txt','r') as f:
for line in f:
line = line.strip()
items = line.split(',')
items[2] = float(items[2])
data.append(Data(*items))
# max value
max_one = max(data, key=lambda d:d.all_around_points_earned)
print(max_one.all_around_points_earned)
# min value
min_one = min(data, key=lambda d:d.all_around_points_earned)
print(min_one.all_around_points_earned)
# mean value
total = sum(d.all_around_points_earned for d in data)
mean_val = total/len(data)
print(mean_val)
# median value
sorted_data = sorted(data, key=lambda d:d.all_around_points_earned)
if len(data)%2==0:
a = sorted_data[len(data)//2].all_around_points_earned
b = sorted_data[len(data)//2-1].all_around_points_earned
median_val = (a+b)/2
else:
median_val = sorted_data[(len(data)-1)//2].all_around_points_earned
print(median_val)
Let me explain more:
First, I define a namedtuple type called "Data" with the the item names(gymnastics_school...). Then I can use d = Data('school', 'name', '50.0') to create a namedtuple d. We can easily fetch the item values by using . to get attributes.
>>> names = ["gymnastics_school", "participant_name", "all_around_points_earned"]
>>> Data = collections.namedtuple("Data", names)
>>> d = Data('school', 'name', '50.0')
>>> d.gymnastics_school
'scholl'
>>> d.participant_name
'name'
>>> d.all_around_points_earned
'50.0'
Next, when we iterate the lines in file object, use string method strip to remove blanks and new lines. It makes the line more clean. Then split(',') can help us splitting the line with specified delimiter ,.
Here, we do a conversion with function float because third item in the split list items is a float(but it is string in the file). Finally, use namedtuple Data to create data then append to list data.
Next, builtin function max and min can help us find the max/min item. But each thing in data is a namedtuple, we should use a lambda function to fetch the points then use them as keys to pick the max/min one.
Also, function sum let us compute the summation without a loop. Here, we have to extract the points to get their summation so we pass a generator d.all_around_points_earned for d in data to sum.
I get the median value by sorting data then get the middle one. When the number of data is odd, we just pick the center number. But if it is even, we should pick the middle "two" and compute their mean.
Hope my answer can help you!

split() by default splits on whitespace, did you mean
values = line.split(',')
to split on commas?
https://docs.python.org/2/library/stdtypes.html#str.split

Arranging distinct number of floats in an 2d array

First of all I am quite a newbie on python, so please forgive me if I don't see the wood for the trees. My question is on reading a huge file of float numbers and storing them in an array for fast mathematical postprocessing.
Lets assume the file looks similar to this:
!!
-3.2297390 0.4474691 3.5690145 3.5976372 6.9002712 7.7787466 14.2159269 14.3291490
16.7660723 17.1258704 18.9469059 19.1716808 20.0700721 21.4088414
-3.2045361 0.4123081 3.5625981 3.5936954 6.8901539 7.7543415 14.2764611 14.3623976
16.7955934 17.1560337 18.9527369 19.1251184 20.0700709 21.3515145
-3.2317597 0.4494166 3.5799182 3.6005429 6.8838705 7.7661897 14.2576455 14.3295731
16.7550357 17.0986678 19.0187779 19.1687722 20.0288587 21.3818250
-3.1921346 0.3949598 3.5636878 3.5892085 6.8833690 7.7404542 14.3061281 14.3855389
16.8063645 17.1697110 18.9549920 19.1134580 20.0613223 21.3196066
here there are 4 (nb) blocks of 14 (nk) float numbers each. I want them to be arranged in an array elements[nb][nk] so that I can access easily looping over certain floats of the blocks.
Here is what I thought it should look like, but it doesn't work at all:
nb=4
nk=14
with open("datafile") as file:
elements = []
n = 0
while '!!' not in file:
while n <= (nb-1):
elements.append([])
current = map(float,file.read().split()) # here I would need something to assure only 14 (nk) floats are read in
elements[n].append(current)
n += 1
print(elements[0][1])
It would be great if had some ideas and suggestions. Thanks!
EDIT:
here an datafile where the numbers follow after each other with no clear seperator after a block nb. Here it is nb=2 and nk=160. How to split the read in floats after each 160th number?
!!
-7.2578105433 -7.2578105433 -6.7774609392 -6.7774609392 -6.3343986693 -6.3343986693 -5.8537216826 -5.8537216826
-5.6031029888 -5.6031029888 -2.9103190893 -2.9103190893 -1.7962279174 -1.7962279174 -0.8136720023 -0.8136720023
-0.1418500769 -0.1418500769 2.9923464558 2.9923464558 3.5797768050 3.5797768050 3.8793240270 3.8793240270
4.0774192689 4.0774192689 4.2378755781 4.2378755781 4.2707165126 4.2707165126 4.3290523910 4.3290523910
4.4487102661 4.4487102661 4.5341883539 4.5341883539 4.7946098470 4.7946098470 4.9518205998 4.9518205998
4.9592549825 4.9592549825 5.1648268937 5.1648268937 5.2372127454 5.2372127454 5.9377062691 5.9377062691
6.2971992823 6.2971992823 6.6324702419 6.6324702419 6.7948808733 6.7948808733 7.0835270703 7.0835270703
7.6252686579 7.6252686579 7.7886279100 7.7886279100 7.8514022664 7.8514022664 7.9188180854 7.9188180854
7.9661386138 7.9661386138 8.2830991934 8.2830991934 8.4581462733 8.4581462733 8.5537201519 8.5537201519
10.2738010533 10.2738010533 11.4495306517 11.4495306517 11.4819579346 11.4819579346 11.5788238984 11.5788238984
11.9411469341 11.9411469341 12.5006172267 12.5006172267 12.5055546075 12.5055546075 12.6659410418 12.6659410418
12.8741094000 12.8741094000 12.9560279595 12.9560279595 12.9780521671 12.9780521671 13.2195973082 13.2195973082
13.2339969658 13.2339969658 13.3594047155 13.3594047155 13.4530024795 13.4530024795 13.4556342387 13.4556342387
13.5784994631 13.5784994631 14.6887369915 14.6887369915 14.9019726334 14.9019726334 15.1279383300 15.1279383300
15.1953349879 15.1953349879 15.3209538297 15.3209538297 15.4042612992 15.4042612992 15.4528348692 15.4528348692
15.4542742538 15.4542742538 15.5291462589 15.5291462589 15.5415591416 15.5415591416 16.0741610117 16.0741610117
16.1117432607 16.1117432607 16.3566675522 16.3566675522 17.7569123657 17.7569123657 18.4416346230 18.4416346230
18.9525843134 18.9525843134 19.0591624486 19.0591624486 19.1069867477 19.1069867477 19.1853525353 19.1853525353
19.4020021909 19.4020021909 19.4718240723 19.4718240723 19.6384650104 19.6384650104 19.6919638323 19.6919638323
19.7044699790 19.7044699790 19.8851141335 19.8851141335 20.6132283388 20.6132283388 21.4074471478 21.4074471478
-7.2568288331 -7.2568280628 -6.7765483088 -6.7765429702 -6.3336003082 -6.3334841531 -5.8529872639 -5.8528369047
-5.6024822566 -5.6024743589 -2.9101060346 -2.9100930470 -1.7964872791 -1.7959333994 -0.8153333579 -0.8144924713
-0.1440078470 -0.1421444935 2.9869228390 2.9935342026 3.5661875018 3.5733148387 3.8777649741 3.8828300867
4.0569348321 4.0745074351 4.2152251981 4.2276050415 4.2620483420 4.2649182323 4.3401804124 4.3402590222
4.4446178512 4.4509411587 4.5139270348 4.5526439516 4.7788285567 4.7810706248 4.9282976775 4.9397807768
4.9737752749 4.9900180286 5.1456209436 5.1507667583 5.2528363215 5.2835144984 5.9252188817 5.9670441193
6.2699491148 6.3270140700 6.5912060019 6.6576016532 6.7976670773 6.7982056614 7.0789050974 7.1023337244
7.6182108739 7.6309688587 7.7678148773 7.7874194913 7.8544608005 7.8594983757 7.9019395451 7.9100447766
7.9872550937 7.9902791771 8.2617740182 8.3147140843 8.4533756827 8.4672364683 8.5556163680 8.5558640539
10.2756173692 10.2760227976 11.4344757209 11.4355375519 11.4737803653 11.4760186102 11.5914333288 11.5953932241
11.9369518613 11.9380900159 12.4973099542 12.5002401499 12.5030167542 12.5031963862 12.6629548222 12.6634150863
12.8719844312 12.8728126622 12.9541436501 12.9568445777 12.9762780998 12.9764840239 13.2074024551 13.2108294169
13.2279146175 13.2308902307 13.3780648962 13.3839050348 13.4634576072 13.4650575047 13.4701414823 13.4718238883
13.5901622459 13.5971076111 14.6735704782 14.6840793519 14.8963924604 14.8968395615 15.1163287408 15.1219631271
15.1791724308 15.1817299995 15.2628531102 15.3027136606 15.3755066968 15.3802521520 15.3969012144 15.4139294088
15.5131322524 15.5315039463 15.5465532500 15.5629105034 15.5927166831 15.5966393750 16.0841067052 16.0883417123
16.1224821534 16.1226510159 16.3646268213 16.3665839987 17.7654543366 17.7657216551 18.4305335335 18.4342292730
18.9110142692 18.9215889808 18.9821593138 18.9838270736 19.1633959849 19.1637558341 19.2040877093 19.2056062802
19.3760597529 19.3846323861 19.4323552578 19.4329488797 19.6494790293 19.6813374885 19.6943820824 19.7202356536
19.7381237231 19.7414645409 19.9056461663 19.9197428869 20.6239183178 20.6285756411 21.4127637743 21.4128909767

This should work:
elements = []
with open("datafile") as file:
next(file)
for line in file:
elements.append([float(x) for x in line.split()])
next(line) reads the first line. Then for line in file: iterates over all other lines. The list comprehension [float(x) for x in line.split()] goes through all entries in the line split by whitespace. Finally, elements.append() appends this list to elements, which becomes a list of lists that you can call an 2D array.
Access the first entry in the first line:
>>> elements[0][0]
-3.229739
or the last entry in the last line:
>>> elements[3][13]
21.319606
alternatively:
>>> elements[-1][-1]
21.319606
Update
This reads the file into a list of lists without taking line breaks as special:
nb = 2
nk = 160
with open("datafile") as fobj:
all_values = iter(x for x in fobj.read().split())
next(all_values)
elements = []
for x in range(nb):
elements.append([float(next(all_values)) for counter in range(nk)])
If you like nested list comprehensions:
with open("datafile") as fobj:
all_values = iter(x for x in fobj.read().split())
next(all_values)
elements = [[float(next(all_values)) for counter in range(nk)] for x in range(nb)]

read multi-line list from file

I have a file with data like:
POTENTIAL
TYPE 1
-5.19998150116627E+07 -5.09571848744513E+07 -4.99354600752570E+07 -4.89342214499422E+07 -4.79530582388520E+07
-4.69915679183017E+07 -4.60493560354389E+07 -4.51260360464197E+07 -4.42212291578282E+07 -4.33345641712756E+07
-4.24656773311163E+07 -4.16142121752159E+07 -4.07798193887125E+07 -3.99621566607090E+07 -3.91608885438409E+07
-3.83756863166569E+07
-8.99995987594328E+07 -8.81884626368405E+07 -8.64137733336537E+07 -8.46747974037847E+07 -8.29708161608188E+07
-8.13011253809965E+07 -7.96650350121689E+07 -7.80618688886128E+07 -7.64909644515842E+07 -7.49516724754953E+07
-7.34433567996002E+07 -7.19653940650832E+07 -7.05171734574350E+07 -6.90980964540154E+07 -6.77075765766936E+07
-6.63450391494693E+07
Note as per Nsh's comment these data are not single line. They always have 5 data per line, and as per this example, 4 row, with only one data in 4th row. So, I have 16 float spread over 4 line. I always know the total number (i.e. 16 in this case)
My aim is to read them as a list (please let me know if there is better things). The row with the single entry denotes end of a list (e.g. the list[1] ends with -3.83756863166569E+07).
I tried to read it as:
if line.startswith("POTENTIAL"):
lines = f.readline()
if lines.startswith("TYPE "):
lines=f.readline()
lines=lines.split()
lines = [float(i) for i in lines]
pots.append(lines)
print(pots)
which gives result:
[[-51999815.0116627, -50957184.8744513, -49935460.075257, -48934221.4499422, -47953058.238852]]
i.e. just the first line from the list, and not going any further.
My aim is to get them as different list (possibly) as:
pots[1]=[-5.19998150116627E+07....-3.83756863166569E+07]
pots[2]=[-8.99995987594328E+07....-6.63450391494693E+07]
I have read searched google extensively (the present state itself is from another SO question), but due to my inexperience, I cant solve my problem.
Kindly help.

use + instead of append.
It will append the elements of lines to pots.
pots = pots + lines
I didn't see in the start:
pots = []
It is needed in this case...

ITEMS_PER_LIST = 16
lists = [[]] # list of lists with initialized first sublist
with open('data.txt') as f:
for line in f:
if line.startswith(("POTENTIAL", "TYPE")):
continue
if len(lists[-1]) == ITEMS_PER_LIST:
lists.append([]) # create new list
lists[-1].extend([float(i) for i in line.split()])
Additional tweaks are required to validate headers.

Extraction and processing the data from txt file

I am beginner in python (also in programming)I have a larg file containing repeating 3 lines with numbers 1 empty line and again...
if I print the file it looks like:
1.93202838
1.81608154
1.50676177
2.35787777
1.51866227
1.19643624
...
I want to take each three numbers - so that it is one vector, make some math operations with them and write them back to a new file and move to another three lines - to another vector.so here is my code (doesnt work):
import math
inF = open("data.txt", "r+")
outF = open("blabla.txt", "w")
a = []
fin = []
b = []
for line in inF:
a.append(line)
if line.startswith(" \n"):
fin.append(b)
h1 = float(fin[0])
k2 = float(fin[1])
l3 = float(fin[2])
h = h1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
k = k1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
l = l1/(math.sqrt(h1*h1+k1*k1+l1*l1)+1)
vector = [str(h), str(k), str(l)]
outF.write('\n'.join(vector)
b = a
a = []
inF.close()
outF.close()
print "done!"
I want to get "vector" from each 3 lines in my file and put it into blabla.txt output file. Thanks a lot!

My 'code comment' answer:
take care to close all parenthesis, in order to match the opened ones! (this is very likely to raise SyntaxError ;-) )
fin is created as an empty list, and is never filled. Trying to call any value by fin[n] is therefore very likely to break with an IndexError;
k2 and l3 are created but never used;
k1 and l1 are not created but used, this is very likely to break with a NameError;
b is created as a copy of a, so is a list. But you do a fin.append(b): what do you expect in this case by appending (not extending) a list?
Hope this helps!

This is only in the answers section for length and formatting.
Input and output.
Control flow
I know nothing of vectors, you might want to look into the Math module or NumPy.
Those links should hopefully give you all the information you need to at least get started with this problem, as yuvi said, the code won't be written for you but you can come back when you have something that isn't working as you expected or you don't fully understand.

Append Rows of Different Lengths to the Same Variable

I am trying to append a lengthy list of rows to the same variable. It works great for the first thousand or so iterations in the loop (all of which have the same lengths), but then, near the end of the file, the rows get a bit shorter, and while I still want to append them, I am not sure how to handle it.
The script gives me an out of range error, as expected.
Here is what the part of code in question looks like:
ii = 0
NNCat = []
NNCatelogue = []
while ii <= len(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii], nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
ii = ii + 1
print NNCatelogue, ii
Any help on this would be greatly appreciated!

I'll answer the question you didn't ask first ;) : how can this code be more pythonic?
Instead of
ii = 0
NNCat = []
NNCatelogue = []
while ii <= len(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii], nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
ii = ii + 1
you should do
NNCat = []
NNCatelogue = []
for ii, line in enumerate(lines):
NNCat = (ev_id[ii], nn1[ii], nn2[ii], nn3[ii], nn4[ii], nn5[ii], nn6[ii],
nn7[ii], nn8[ii], nn9[ii], nn10[ii], nn11[ii])
NNCatelogue.append(NNCat)
During each pass ii will be incremented by one for you and line will be the current line.
As for your short lines, you have two choices
Use a special value (such as None) to fill in when you don't have a real value
check the length of nn1, nn2, ..., nn11 to see if they are large enough
The second solution will be much more verbose, hard to maintain, and confusing. I strongly recommend using None (or another special value you create yourself) as a placeholder when there is no data.

def gvop(vals,indx): #get values or padding
return vals[indx] if indx<len(vals) else None
NNCatelogue = [(gvop(ev_id,ii), gvop(nn1,ii), gvop(nn2,ii), gvop(nn3,ii), gvop(nn4,ii),
gvop(nn5,ii), gvop(nn6,ii), gvop(nn7,ii), gvop(nn8,ii), gvop(nn9,ii),
gvop(nn10,ii), gvop(nn11,ii)) for ii in xrange(0, len(lines))]
By defining this other function to return either the correct value or padding, you can ensure rows are the same length. You can change the padding to anything, if None is not what you want.
Then the list comp creates a list of tuples as before, except containing padding in cases where some of the lines in the input are shorter.

from itertools import izip_longest
NNCatelogue = list(izip_longest(ev_id, nn1, nn2, ... nn11, fillvalue=None))
See here for documentation of izip. Do yourself a favour and skip the list around the iterator, if you don't need it. In many cases you can use the iterator as well as the list, and you save a lot of memory. Especially if you have long lists, that you're grouping together here.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Creating a variable for each line in a txt file - python

You could use splitlines and assign the first 4 lines to states, alphabet, initial,and final. Then, assign the remaining lines to rules. with open('textfile.txt') as input: data = input.read() states, alphabet, initial, final = data.splitlines()[0:4] rules = data.splitlines()[4:]

you can use list to unpack file in list and then get needed lines by index numbers. file_lines = list(open('textfile.txt')) states, alphabet, initial, final = file_lines[0:4] rules = file_lines[4:]

Related

opening txt file and creating list and getting basic stats

Arranging distinct number of floats in an 2d array

read multi-line list from file

Extraction and processing the data from txt file

Append Rows of Different Lengths to the Same Variable

Categories

Resources