Appending the length of sentences to file - python

I found the length and index and i want save all of them to new file:
example: index sentences length
my code
file = open("testing_for_tools.txt", "r")
lines_ = file.readlines()
for line in lines_:
lenght=len(line)-1
print(lenght)
for item in lines_:
print(lines_.index(item)+1,item)
output:
64
18
31
31
23
36
21
9
1
1 i went to city center, and i bought xbox5 , and some other stuff
2 i will go to gym !
3 tomorrow i, sill start my diet!
4 i achive some and i need more ?
5 i lost lots of weights؟
6 i have to , g,o home,, then sleep ؟
7 i have things to do )
8 i hope so
9 o
desired output and save to new file :
1 i went to city center, and i bought xbox5 , and some other stuff 64
2 i will go to gym ! 18

This can be achieved using the following code. Note the use of with ... as f which means we don't have to worry about closing the file after using it. In addition, I've used f-strings (requires Python 3.6), and enumerate to get the line number and concatenate everything into one string, which is written to the output file.
with open("test.txt", "r") as f:
lines_ = f.readlines()
with open("out.txt", "w") as f:
for i, line in enumerate(lines_, start=1):
line = line.strip()
f.write(f"{i} {line} {len(line)}\n")
Output:
1 i went to city center, and i bought xbox5 , and some other stuff 64
2 i will go to gym ! 18
If you wanted to sort the lines based on length, you could just put the following line after the first with block:
lines_.sort(key=len)
This would then give output:
1 i will go to gym ! 18
2 i went to city center, and i bought xbox5 , and some other stuff 64

Related

How to print out nicely formatted tables from a dictionary

For the sake of practicing how to be more comfortable and fluent in working with dictionaries, I have written a little program that reads the content of a file and adds it to a dictionary as a key: value pair. This is no problem, but when I got curious about how to print the content out again in the same format as the table in the datafile using for-loops, I ran into trouble.
My question is: How can I print out the content of the dictionary onto the terminal using for-loops?
The datafile is:
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU
My code is:
infile = open("table.dat", "r")
data = {}
headers = infile.readline().split()
for i in range(len(headers)):
data[headers[i]] = []
for line in infile:
words = line.split()
for i in range(len(headers)):
data[headers[i]].append(words[i])
infile.close()
I would like the out print the data back onto the terminal. Ideally, the out print should look something like this
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU
If someone can help me with this, I would be grateful.
The easiest solution is to use a library such as Tabulate, which you can find here an example of an output (You can customize it further)
>>> from tabulate import tabulate
>>> table = [["Sun",696000,1989100000],["Earth",6371,5973.6],
... ["Moon",1737,73.5],["Mars",3390,641.85]]
>>> print(tabulate(table))
----- ------ -------------
Sun 696000 1.9891e+09
Earth 6371 5973.6
Moon 1737 73.5
Mars 3390 641.85
----- ------ -------------
Otherwise, if you MUST use your own custom for-loop, you can add tabs to fix how it looks as in:
print(a+"\t") where \t is the horizental tabulation escape character
Edit: An example of how this can be utilized is below:
infile = open("table.dat", "r")
data = {}
headers = infile.readline().split()
for i in range(len(headers)):
data[headers[i]] = []
for line in infile:
words = line.split()
for i in range(len(headers)):
data[headers[i]].append(words[i])
print(words[i],end= '\t')
print()
infile.close()
Things to note:
1- For each field, we use print(...,end= '\t'), this causes the output to be a tab instead of a new line, we also might consider adding more tabs (e.g. end='\t\t') or spaces, or any other formating such as a seperator character (e.g. `end='\t|\t')
2- After each line, we use print(), this will only print a new line, moving the cursor for the printing downwards.
Take look at .ljust, .rjust and .center methods of str, consider following simple example
d = {"Alpha": 1, "Beta": 10, "Gamma": 100, "ExcessivelyLongName": 1}
for key, value in d.items():
print(key.ljust(5), str(value).rjust(3))
output
Alpha 1
Beta 10
Gamma 100
ExcessivelyLongName 1
Note that ljust does add (by default) space to attain specified width or do nothing if name is longer than that, also as values are integers they need to be first converted to str if you want to use one of mentioned methods.
You can do this using pandas although it isn't exactly your same styling:
import pandas as pd
with open('filename.csv') as f:
headers, *data = map(str.split, f.readlines())
df = pd.DataFrame(dict(zip(headers, zip(*data)))
print(df.to_string(index=False))
Name Age School
Anne 10 Eiksmarka
Tom 15 Marienlyst
Vidar 18 Persbråten
Ayla 18 Kongshavn
Johanne 17 Wang
Silje 16 Eikeli
Per 19 UiO
Ali 25 NTNU

Edit last element of each line of text file

I've got a text file with some elements as such;
0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0
1,Harry Potter,JK Rowling,1/1/2010,8137
2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,0
3,To Kill a Mockingbird,Harper Lee,1/3/2010,1828
The last element of these lists determine which user has taken out the given book. If 0 then nobody has it.
I want a code to replace the ',0' of any given line into an input 4 digit number to show someone has taken out the book.
I've used .replace to change it from 1828 e.g. into 0 however I don't know how I can change the last element of specific line from 0 to something else.
I cannot use csv due to work/education restrictions so I have to leave the file in .txt format.
I also can only use Standard python library, therefore no pandas.
You can capture txt using read_csv into a dataframe:
import pandas as pd
df = pd.read_csv("text_file.txt", header=None)
df.columns = ["Serial", "Book", "Author", "Date", "Issued"]
print(df)
df.loc[3, "Issued"] = 0
print(df)
df.to_csv('text_file.txt', header=None, index=None, sep=',', mode='w+')
This replaces the third book issued count to 0.
Serial Book Author Date \
0 0 The Hitchhiker's Guide to Python Kenneth Reitz 2/4/2012
1 1 Harry Potter JK Rowling 1/1/2010
2 2 The Great Gatsby F. Scott Fitzgerald 1/2/2010
3 3 To Kill a Mockingbird Harper Lee 1/3/2010
Issued
0 0
1 8137
2 0
3 1828
Serial Book Author Date \
0 0 The Hitchhiker's Guide to Python Kenneth Reitz 2/4/2012
1 1 Harry Potter JK Rowling 1/1/2010
2 2 The Great Gatsby F. Scott Fitzgerald 1/2/2010
3 3 To Kill a Mockingbird Harper Lee 1/3/2010
Issued
0 0
1 8137
2 0
3 0
Edit after comment:
In case you only need to use python standard libraries, you can do something like this with file read:
import fileinput
i = 0
a = 5 # line to change with 1 being the first line in the file
b = '8371'
to_write = []
with open("text_file.txt", "r") as file:
for line in file:
i += 1
if (i == a):
print('line before')
print(line)
line = line[:line.rfind(',')] + ',' + b + '\n'
to_write.append(line)
print('line after edit')
print(line)
else:
to_write.append(line)
print(to_write)
with open("text_file.txt", "w") as f:
for line in to_write:
f.write(line)
File content
0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0
1,Harry Potter,JK Rowling,1/1/2010,8137
2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,84
3,To Kill a Mockingbird,Harper Lee,1/3/2010,7895
4,XYZ,Harper,1/3/2018,258
5,PQR,Lee,1/3/2019,16
gives this as output
line before
4,XYZ,Harper,1/3/2018,258
line after edit
4,XYZ,Harper,1/3/2018,8371
["0,The Hitchhiker's Guide to Python,Kenneth Reitz,2/4/2012,0\n", '1,Harry Potter,JK Rowling,1/1/2010,8137\n', '2,The Great Gatsby,F. Scott Fitzgerald,1/2/2010,84\n', '3,To Kill a Mockingbird,Harper Lee,1/3/2010,7895\n', '4,XYZ,Harper,1/3/2018,8371\n', '5,PQR,Lee,1/3/2019,16\n', '\n']
you can try this:
to_write = []
with open('read.txt') as f: #read input file
for line in f:
if int(line[-2]) ==0:
to_write.append(line.replace(line[-2],'1234')) #any 4 digit number
else:
to_write.append(line)
with open('output.txt', 'w') as f: #name of output file
for _list in to_write:
f.write(_list)

Text files help(Python 3)

Students.txt
64 Mary Ryan
89 Michael Murphy
22 Pepe
78 Jenny Smith
57 Patrick James McMahon
89 John Kelly
22 Pepe
74 John C. Reilly
My code
f = open("students.txt","r")
for line in f:
words = line.strip().split()
mark = (words[0])
name = " ".join(words[1:])
for i in (mark):
print(i)
The output im getting is
6
4
8
9
2
2
7
8
etc...
My expected output is
64
80
22
78
etc..
Just curious to know how I would print the whole integer, not just a single integer at a time.
Any help would be more than appreciative.
As I can see you have some integer with a string in the text file. You wanted to know about your code will output only full Integer.
You can use the code
f = open("Students.txt","r")
for line in f:
l = line.split(" ")
print(l[0])
In Python, when you do this:
for i in (mark):
print(i)
and mark is of type string, you are asking Python to iterate over each character in the string. So, if your string contains space-separated integers and you iterate over the string, you'll get one integer at a time.
I believe in your code the line
mark = (words[0])name = " ".join(words[1:])
is a typo. If you fix that we can help you with what's missing (it's most likely a statement like mark = something.split(), but not sure what something is based on the code).
You should be using context managers when you open files so that they are automatically closed for you when the scope ends. Also mark should be a list to which you append the first element of the line split. All together it will look like this:
with open("students.txt","r") as f:
mark = []
for line in f:
mark.append(line.strip().split()[0])
for i in mark:
print(i)
The line
for i in (mark):
is same as this because mark is a string:
for i in mark:
I believe you want to make mark an element of some iterable, which you can create a tuple with single item by:
for i in (mark,):
and this should give what you want.
in your line:
line.strip().split()
you're not telling the sting to split based on a space. Try the following:
str(line).strip().split(" ")
A quick one with list comprehensions:
with open("students.txt","r") as f:
mark = [line.strip().split()[0] for line in f]
for i in mark:
print(i)

Python print .psl format without quotes and commas

I am working on a linux system using python3 with a file in .psl format common to genetics. This is a tab separated file that contains some cells with comma separated values. An small example file with some of the features of a .psl is below.
input.psl
1 2 3 x read1 8,9, 2001,2002,
1 2 3 mt read2 8,9,10 3001,3002,3003
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
I need to filter this file to extract only regions of interest. Here, I extract only rows with a value of 9 in the fourth column.
import csv
def read_psl_transcripts():
psl_transcripts = []
with open("input.psl") as input_psl:
csv_reader = csv.reader(input_psl, delimiter='\t')
for line in input_psl:
#Extract only rows matching chromosome of interest
if '9' == line[3]:
psl_transcripts.append(line)
return psl_transcripts
I then need to be able to print or write these selected lines in a tab delimited format matching the format of the input file with no additional quotes or commas added. I cant seem to get this part right and additional brackets, quotes and commas are always added. Below is an attempt using print().
outF = open("output.psl", "w")
for line in read_psl_transcripts():
print(str(line).strip('"\''), sep='\t')
Any help is much appreciated. Below is the desired output.
1 2 3 9 read3 8,9,10,11 4001,4002,4003,4004
1 2 3 9 read4 8,9,10,11 4001,4002,4003,4004
You might be able to solve you problem with a simple awk statement.
awk '$4 == 9' input.pls > output.pls
But with python you could solve it like this:
write_pls = open("output.pls", "w")
with open("input.pls") as file:
for line in file:
splitted_line = line.split()
if splitted_line[3] == '9':
out_line = '\t'.join(splitted_line)
write_pls.write(out_line + "\n")
write_pls.close()

Adding in-between columns, skipping and keeping some rows/columns

I am new to programming but I have started looking into both Python and Perl.
I am looking for data in two input files that are partly CSV, selecting some of them and putting into a new output file.
Maybe Python CSV or Pandas can help here, but I'm a bit stuck when it comes to skipping/keeping rows and columns.
Also, I don't have any headers for my columns.
Input file 1:
-- Some comments
KW1
'Z1' 'F' 30 26 'S'
KW2
'Z1' 30 26 1 1 5 7 /
'Z1' 30 26 2 2 6 8 /
'Z1' 29 27 4 4 12 13 /
Input file 2:
-- Some comments
-- Some more comments
KW1
'Z2' 'F' 40 45 'S'
KW2
'Z2' 40 45 1 1 10 10 /
'Z2' 41 45 2 2 14 15 /
'Z2' 41 46 4 4 16 17 /
Desired output file:
KW_NEW
'Z_NEW' 1000 30 26 1 /
'Z_NEW' 1000 30 26 2 /
'Z_NEW' 1000 29 27 4 /
'Z_NEW' 1000 40 45 1 /
'Z_NEW' 1000 41 45 2 /
'Z_NEW' 1000 41 46 4 /
So what I want to do is:
Do not include anything in either of my two input files before I reach KW2
Replace KW2 with KW_NEW
Replace either Z1' orZ2withZ_NEW` in the first column
Add a new second column with a constant value e.g. 1000
Copy the next three columns as they are
Leave out any remaining columns before printing the slash / at the end
Could anyone give me at least some general hints/tips how to approach this?
Your files are not "partly csv" (there is not a comma in sight); they are (partly) space delimited. You can read the files line-by-line, use Python's .split() method to convert the relevant strings into lists of substrings, and then re-arrange the pieces as you please. The splitting and re-assembly might look something like this:
input_line = "'Z1' 30 26 1 1 5 7 /" # test data
input_items = input_line.split()
output_items = ["'Z_NEW'", '1000']
output_items.append(input_items[1])
output_items.append(input_items[2])
output_items.append(input_items[3])
output_items.append('/')
output_line = ' '.join(output_items)
print(output_line)
The final print() statement shows that the resulting string is
'Z_NEW' 1000 30 26 1 /
Is your file format static? (this is not actually csv by the way :P) You might want to investigate a standardized file format like JSON or strict CSV to store your data, so that you can use already-existing tools to parse your input files. python has great JSON and CSV libraries that can do all the hard stuff for you.
If you're stuck with this file format, I would try something along these lines.
path = '<input_path>'
kws = ['KW1', 'KW2']
desired_kw = kws[1]
def parse_columns(line):
array = line.split()
if array[-1] is '/':
# get rid of trailing slash
array = array[:-1]
def is_kw(cols):
if len(cols) > 0 and cols[0] in kws:
return cols[0]
# to parse the section denoted by desired keyword
with open(path, 'r') as input_fp:
matrix = []
reading_file = False
for line in input_fp.readlines:
cols = parse_columns(line)
line_is_kw = is_kw(line)
if line_is_kw:
if not reading_file:
if line_is_kw is desired_kw:
reading_file = True
else:
continue
else:
break
if reading_file:
matrix = cols
print matrix
From there you can use stuff like slice notation and basic list manipulation to get your desired array. Good luck!
Here is a way to do it with Perl:
#!/usr/bin/perl
use strict;
use warnings;
# initialize output array
my #output = ('KW_NEW');
# proceed first file
open my $fh1, '<', 'in1.txt' or die "unable to open file1: $!";
while(<$fh1>) {
# consider only lines after KW2
if (/KW2/ .. eof) {
# Don't treat KW2 line
next if /KW2/;
# split the current line on space and keep only the fifth first element
my #l = (split ' ', $_)[0..4];
# change the first element
$l[0] = 'Z_NEW';
# insert 1000 at second position
splice #l,1,0,1000;
# push into output array
push #output, "#l";
}
}
# proceed second file
open my $fh2, '<', 'in2.txt' or die "unable to open file2: $!";
while(<$fh2>) {
if (/KW2/ .. eof) {
next if /KW2/;
my #l = (split ' ', $_)[0..4];
$l[0] = 'Z_NEW';
splice #l,1,0,1000;
push #output, "#l";
}
}
# write array to output file
open my $fh3, '>', 'out.txt' or die "unable to open file3: $!";
print $fh3 $_,"\n" for #output;

Categories

Resources