Using text file data, classification and make other text file in python - python

using python, i want to seperated some data file.
file form is text file and there are no tabs only one space between inside data.
here is example file,
//test.txt
Class name age room fund.
13 A 25 B101 300
12 B 21 B102 200
9 C 22 B103 200
13 D 25 B102 100
20 E 23 B105 100
13 F 25 B103 300
11 G 25 B104 100
13 H 22 B101 300
I want to take only line containing specific data,
class : 13 , fund 300
,and save another text file.
if this code was worked, making text file is that
//new_test.txt
Class name age room fund.
13 A 25 B101 300
13 F 25 B103 300
13 H 22 B101 300
thanks.
Hk

This should do.
with open('new_test.txt','w') as new_file:
with open('test.txt') as file:
print(file.readline(),end='',file=new_file)
for line in file:
arr=line.strip().split()
if arr[0]=='13' and arr[-1]=='300':
print(line,end='',file=new_file)
However, you should include your code when asking a question. It ensures that the purpose of this site is served.

If you want to filter your data:
def filter_data(src_file, dest_file, filters):
data = []
with open(src_file) as read_file:
header = [h.lower().strip('.') for h in read_file.readline().split()]
for line in read_file:
values = line.split()
row = dict(zip(header, values))
data.append(row)
for k, v in filters.items():
if data and row.get(k, None) != v:
data.pop()
break
with open(dest_file, 'w') as write_file:
write_file.write(' '.join(header) + '\n')
for row in data:
write_file.write(' '.join(row.values()) + '\n')
my_filters = {
"class": "13",
"fund": "300"
}
filter_data(src_file='test.txt', dest_file='new_test.txt', filters=my_filters)

Related

Clean from .txt, write new variable as a csv delimited string (not list) into a csv file

my .txt file looks like the following:
Page 1 of 49
<="">
View Full Profile
S.S. Anne
Oil Tanker
42 miles offshore
Anchor length is 50 feet
<="">
View Full Profile
S.S. Minnow
Passenger Ship
1502.2 miles offshore
Anchor length is 12 feet
<="">
View Full Profile
S.S. Virginia
Passenger Ship
2 km offshore
Anchor length is 25 feet
<="">
View Full Profile
S.S. Chesapeake
Naval Ship
10 miles offshore
Anchor length is 75 feet
<="">
I've worked out the cleaning part so that following 'View Full Profile' I take the next 4 line items and put them into their own new line item, I do this for each instance of 'View Full Profile'.
Code:
import csv
data = []
with open('ship.txt','r',encoding='utf8') as f:
lines = f.readlines()
for i, line in enumerate(lines):
if 'View Full Profile' in line:
x = [lines[i+1],lines[i+2],lines[i+3],lines[i+4]]
data.append(x)
for line in data:
y = line
print(y)
with open('ship_test.csv', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for line in data:
writer.writerow(line)
And the output of printing y to see what will be written into the new file is:
['S.S. Anne\n', 'Oil Tanker\n', '42 miles offshore\n', 'Anchor length is 50 feet\n']
['S.S. Minnow\n', 'Passenger Ship\n', '1502.2 miles offshore\n', 'Anchor length is 12 feet\n']
['S.S. Virginia\n', 'Passenger Ship\n', '2 km offshore\n', 'Anchor length is 25 feet\n']
['S.S. Chesapeake\n', 'Naval Ship\n', '10 miles offshore\n', 'Anchor length is 75 feet\n']
[Finished in 0.1s]
When it copies into the new file I get the following:
"S.S. Anne
","Oil Tanker
","42 miles offshore
","Anchor length is 50 feet
"
"S.S. Minnow
","Passenger Ship
","1502.2 miles offshore
","Anchor length is 12 feet
"
"S.S. Virginia
","Passenger Ship
","2 km offshore
","Anchor length is 25 feet
"
"S.S. Chesapeake
","Naval Ship
","10 miles offshore
","Anchor length is 75 feet
"
I am looking to write into the new file the output of y. I believe it writes each item as its own line due to the '/n' attached to each element of x. How do I remove this (I tried splitting and received an error) so that x is written as a single line item in the new file as a csv string?
line = line.replace('\n', '')
Use the replace method
If there is a list of strings:
line = [l.replace('\n', '') for l in line]
You should strip the terminating newline as soon as you read the line. The first part of your code could become:
data = []
with open('ship.txt','r',encoding='utf8') as f:
for line in lines:
if 'View Full Profile' in line:
x = [ next(f).strip(), next(f).strip(),
next(f).strip(), next(f).strip() ]
data.append(x)
You could even write the csv file on the fly:
with open('ship.txt','r',encoding='utf8') as f, open('ship_test.csv', 'w') as csv_file:
writer = csv.writer(csv_file, delimiter=',')
for line in f:
if 'View Full Profile' in line:
x = [ next(f).strip(), next(f).strip(),
next(f).strip(), next(f).strip() ]
writer.writerow(x)

Not sure why my python output is looping

I wrote a little bit of code to read a number in a file. Append it to a variable, then increment the number so the next time it runs the number in the file will be number +1. It looks like its working except it seems to increment twice.. For example here is my code :
11 def mcIPNumber():
12 with open('mcIPlatest.txt', 'r+') as file:
13 NameNumber= file.read().replace('\n','')
14 NameNumber=int(NameNumber)
15 NewNumber= NameNumber+1
16 print "newnumber = %s" % NewNumber
17 file.seek(0)
18 file.write(str(NewNumber))
19 file.truncate()
20 return NameNumber
21
22 def makeNameMCTag():
23 NameNumber = mcIPNumber()
24 NameTag = "varName" + str(NameNumber)
25 print "Name Tag: %s" % NameTag
26 mcGroup = "varTagmc"
27 #IPNumber = 1
28 mcIP = "172.16.0.%s" % NameNumber
29 print ( "Multicast Tag: %s, %s" % (mcGroup,mcIP))
30
31
32 mcIPNumber()
33 makeNameMCTag()
But here is my output.. notice that "NewNumber" gets printed out twice.. for some reason"
newnumber = 2
newnumber = 3
Name Tag: varName2
Multicast Tag: varTagmc, 172.16.0.2
So it correctly made my varName2 and my IP 172.16.0.2 (incremented my initial number in the file by 1) but this means the 2nd time I run it.. I get this:
newnumber = 4
newnumber = 5
Name Tag: varName
Multicast Tag: varTagmc, 172.16.0.4
My expected result is this:
newnumber = 3
Name Tag: varName3
Multicast Tag: varTagmc, 172.16.0.3
Any idea why its looping?
Thanks!
(by the way if you're curious I'm trying to write some code which will eventually write the tf file for my TerraForm lab)
Because of this:
def makeNameMCTag():
NameNumber = mcIPNumber()
You are calling mcIPNumber from inside makeNameMCTag, so you don't excplicitly need to call that method in line 32.
Alternatively
def make_name_mc_tag(name_number):
NameTag = "varName" + str(name_number)
print "Name Tag: %s" % NameTag
...
make_name_mc_tag(mcIPNumber())
here you are passing the required data as a parameter.

Data Analysis using Python

I have 2 CSV files. One with city name, population and humidity. In second cities are mapped to states. I want to get state-wise total population and average humidity. Can someone help? Here is the example:
CSV 1:
CityName,population,humidity
Austin,1000,20
Sanjose,2200,10
Sacramento,500,5
CSV 2:
State,city name
Ca,Sanjose
Ca,Sacramento
Texas,Austin
Would like to get output(sum population and average humidity for state):
Ca,2700,7.5
Texas,1000,20
The above solution doesn't work because dictionary will contain one one key value. i gave up and finally used a loop. below code is working, mentioned input too
csv1
state_name,city_name
CA,sacramento
utah,saltlake
CA,san jose
Utah,provo
CA,sanfrancisco
TX,austin
TX,dallas
OR,portland
CSV2
city_name population humidity
sacramento 1000 1
saltlake 300 5
san jose 500 2
provo 100 7
sanfrancisco 700 3
austin 2000 4
dallas 2500 5
portland 300 6
def mapping_within_dataframe(self, file1,file2,file3):
self.csv1 = file1
self.csv2 = file2
self.outcsv = file3
one_state_data = 0
outfile = csv.writer(open('self.outcsv', 'w'), delimiter=',')
state_city = read_csv(self.csv1)
city_data = read_csv(self.csv2)
all_state = list(set(state_city.state_name))
for one_state in all_state:
one_state_cities = list(state_city.loc[state_city.state_name == one_state, "city_name"])
one_state_data = 0
for one_city in one_state_cities:
one_city_data = city_data.loc[city_data.city_name == one_city, "population"].sum()
one_state_data = one_state_data + one_city_data
print one_state, one_state_data
outfile.writerows(whatever)
def output(file1, file2):
f = lambda x: x.strip() #strips newline and white space characters
with open(file1) as cities:
with open(file2) as states:
states_dict = {}
cities_dict = {}
for line in states:
line = line.split(',')
states_dict[f(line[0])] = f(line[1])
for line in cities:
line = line.split(',')
cities_dict[f(line[0])] = (int(f(line[1])) , int(f(line[2])))
for state , city in states_dict.iteritems():
try:
print state, cities_dict[city]
except KeyError:
pass
output(CSV1,CSV2) #these are the names of the files
This gives the output you wanted. Just make sure the names of cities in both files are the same in terms of capitalization.

Handling word index in text files

wordlist A: book jesus christ son david son abraham jacob judah his brothers perez amminadab
wordlist B: akwụkwọ jizọs kraịst nwa devid nwa ebreham jekọb juda ya ụmụnne pirez aminadab
file.txt A:
the book of the history of jesus christ , son of david , son of abraham :
abraham became father to isaac ; isaac became father to jacob ; jacob became father to judah and his brothers ;
file.txt B:
akwụkwọ nke kọrọ akụkọ banyere jizọs kraịst , nwa devid , nwa ebreham :
ebreham mụrụ aịzik ; aịzik amụọ jekọb ; jekọb amụọ juda na ụmụnne ya ndị ikom ;
I have 2 above word-lists (say A & B) of 2 diff. languages. Both contain word translation of each other in order. My task is to run these word-lists through 2 separate files.txt of both languages like word-list A through file.txt A and vice versa, then return a line for both txt files, each will contain the index numbers of both word-list where they were found on each line of the txt paired like:
2:1 7:6 8:7 10:9 12:10 14:12 16:13 [ 2:1 = 2 index of book in txt.file A and 1-akwụkwọ in txt.file B and so on]
1:1 11:6 13:8 17:10 19:12 20:13 [ 1:1 = 1 index of abraham in txt.file A and 1- ebreham in txt.file B and so on].
see codes below:
import sys
def wordlist(filename):
wordlist = []
with open(filename, 'rb') as f:
for line in f:
wordlist.append(line)
return wordlist
eng = []
for lines in open('eng_try.txt', 'rb'):
line = lines.strip()
eng.append(line)
igb = []
for lines in open('igb_try.txt', 'rb'):
line = lines.strip()
igb.append(line)
i = 0
while i < len(eng):
eng_igb_verse_pair = eng[i] + " " + igb[i]
line = eng_igb_verse_pair.strip().split()
for n in range(0, len(wordlist('eng_wordlist.txt'))):
eng_word = wordlist('eng_wordlist.txt').pop(n)
igb_word = wordlist('igb_wordlist.txt').pop(n)
if eng_word in line and igb_word in line:
print '{0} {1}:{2}'.format(i, line.index[eng_word], line.index[igb_word])
i += 1
This actually prints empty. I know my problem is in the last segment of the program. Can someone help. I am not that experienced python programmer. Apologies if I didn't construct my explanation well.
You mean something like this:
import sys
def checkLine(line_eng, line_igb):
eng_words = line_eng.split()
igb_words = line_igb.split()
for word in eng_words:
if word in eng:
igb_word = igb[eng.index(word)]
print "%d:%d" % ( eng_words.index(word)+1, igb_words.index(igb_word)+1),
def linelist(filename):
lineslist = []
for line in open(filename, 'rb'):
lineslist.append(line)
return lineslist
eng = []
for lines in open('eng_try.txt', 'rb'):
line = lines.strip()
for w in line.split():
eng.append(w)
igb = []
for lines in open('igb_try.txt', 'rb'):
line = lines.strip()
for w in line.split():
igb.append(w)
eng_lines = linelist("eng_wordlist.txt")
igb_lines = linelist("igb_wordlist.txt")
for n in range(0, len(eng_lines)):
print "%d. " % (n+1),
checkLine(eng_lines[n],igb_lines[n])
print
For your files i got result:
1. 2:1 7:6 8:7 10:9 12:10 10:9 16:13
2. 1:1 11:7 11:7 17:11 19:14 20:13
BR
Parasit Hendersson

Python: compare column in two files

I'm just trying to solving this text-processing task using python, but I'm not able to compare column.
What I have tried :
#!/usr/bin/env python
import sys
def Main():
print "This is your input Files %s,%s" % ( file1,file2 )
f1 = open(file1, 'r')
f2 = open(file2, 'r')
for line in f1:
column1_f1 = line.split()[:1]
#print column1_f1
for check in f2:
column2_f2 = check.split()[:1]
print column1_f1,column2_f2
if column1_f1 == column2_f2:
print "Match",line
else:
print line,check
f1.close()
f2.close()
if __name__ == '__main__':
if len(sys.argv) != 3:
print >> sys.stderr, "This Script need exact 2 argument, aborting"
exit(1)
else:
ThisScript, file1, file2 = sys.argv
Main()
I'm new in Python, Please help me to learn and understand this..
I would resolve it in similar way in python3 that user46911 did with awk. Read second file and save its keys in a dictionary. Later check if exists for each line of first file:
import sys
codes = {}
with open(sys.argv[2], 'r') as f2:
for line in f2:
fields = line.split()
codes[fields[0]] = fields[1]
with open(sys.argv[1], 'r') as f1:
for line in f1:
fields = line.split(None, 1)
if fields[0] in codes:
print('{0:4s}{1:s}'.format(codes[fields[0]], line[4:]), end='')
else:
print(line, end='')
Run it like:
python3 script.py file1 file2
That yields:
060090 AKRABERG FYR DN 6138 -666 101
EKVG 060100 VAGA FLOGHAVN DN 6205 -728 88
060110 TORSHAVN DN 6201 -675 55
060120 KIRKJA DN 6231 -631 55
060130 KLAKSVIK HELIPORT DN 6221 -656 75
060160 HORNS REV A DN 5550 786 21
060170 HORNS REV B DN 5558 761 10
060190 SILSTRUP DN 5691 863 0
060210 HANSTHOLM DN 5711 858 0
EKGF 060220 TYRA OEST DN 5571 480 43
EKTS 060240 THISTED LUFTHAVN DN 5706 870 8
060290 GROENLANDSHAVNEN DN 5703 1005 0
EKYT 060300 FLYVESTATION AALBORG DN 5708 985 13
060310 TYLSTRUP DN 5718 995 0
060320 STENHOEJ DN 5736 1033 56
060330 HIRTSHALS DN 5758 995 0
EKSN 060340 SINDAL FLYVEPLADS DN 5750 1021 28

Categories

Resources