How do you split a list by space? With the code below, it reads a file with 4 lines of 7 numbers separated by spaces. When it takes the file and then splits it, it splits it by number so if i print item[0], 5 will print instead of 50. here is the code
def main():
filename = input("Enter the name of the file: ")
infile = open(filename, "r")
for i in range(4):
data = infile.readline()
print(data)
item = data.split()
print(data[0])
main()
the file looks like this
50 60 15 100 60 15 40 /n
100 145 20 150 145 20 45 /n
50 245 25 120 245 25 50 /n
100 360 30 180 360 30 55 /n
Split takes as argument the character you want to split your string with.
I invite you to read the documentation of methods you are using. :)
EDIT : By the way, readline returns a string, not a **list **.
However, split does return a list.
import nltk
tokens = nltk.word_tokenize(TextInTheFile)
Try this once you have opened that file.
TextInTheFile is a variable
There's not a lot wrong with what you are doing, except that you are printing the wrong thing.
Instead of
print(data[0])
use
print(item[0])
data[0] is the first character of the string you read from file. You split this string into a variable called item so that's what you should print.
Related
This question already has answers here:
Insert element in Python list after every nth element
(11 answers)
Closed 2 years ago.
The input I have is a large strain of characters in a .TXT file (over 18,000 characters) and I need to add a space after every two characters. How can I write the code to provide the output in a .TXT file again?
Like so;
Input:
123456789
Output:
12 34 56 78 9
The enumerate() function is going to be doing the heavy work for you here, all we need to do is iterate over the string of characters and use modulo to split ever two characters, a worked example is below!
string_of_chars = "123456789101213141516171819"
spaced_chars = ""
for i, c in enumerate(string_of_chars):
if i % 2 == 1:
spaced_chars += c + " "
else:
spaced_chars += c
print(spaced_chars)
This will produce 12 34 56 78 9
t = '123456789'
' '.join(t[i:i+2] for i in range(0, len(t), 2))
If the file is very large, you won't want to read it all into memory, and instead read a block of characters, and write them to an output handle, and loop that.
To include read/write:
write_handle = open('./output.txt', 'w')
with open('./input.txt') as read_handle:
for line in read_handle:
write_handle.write(' '.join(line[i:i+2] for i in range(0, len(line), 2)))
write_handle.close()
Try the following
txt = '123456789'
print(*[txt[x:x+2] for x in range(0, len(txt), 2)])
output
'12 34 56 78 9'
Students.txt
64 Mary Ryan
89 Michael Murphy
22 Pepe
78 Jenny Smith
57 Patrick James McMahon
89 John Kelly
22 Pepe
74 John C. Reilly
My code
f = open("students.txt","r")
for line in f:
words = line.strip().split()
mark = (words[0])
name = " ".join(words[1:])
for i in (mark):
print(i)
The output im getting is
6
4
8
9
2
2
7
8
etc...
My expected output is
64
80
22
78
etc..
Just curious to know how I would print the whole integer, not just a single integer at a time.
Any help would be more than appreciative.
As I can see you have some integer with a string in the text file. You wanted to know about your code will output only full Integer.
You can use the code
f = open("Students.txt","r")
for line in f:
l = line.split(" ")
print(l[0])
In Python, when you do this:
for i in (mark):
print(i)
and mark is of type string, you are asking Python to iterate over each character in the string. So, if your string contains space-separated integers and you iterate over the string, you'll get one integer at a time.
I believe in your code the line
mark = (words[0])name = " ".join(words[1:])
is a typo. If you fix that we can help you with what's missing (it's most likely a statement like mark = something.split(), but not sure what something is based on the code).
You should be using context managers when you open files so that they are automatically closed for you when the scope ends. Also mark should be a list to which you append the first element of the line split. All together it will look like this:
with open("students.txt","r") as f:
mark = []
for line in f:
mark.append(line.strip().split()[0])
for i in mark:
print(i)
The line
for i in (mark):
is same as this because mark is a string:
for i in mark:
I believe you want to make mark an element of some iterable, which you can create a tuple with single item by:
for i in (mark,):
and this should give what you want.
in your line:
line.strip().split()
you're not telling the sting to split based on a space. Try the following:
str(line).strip().split(" ")
A quick one with list comprehensions:
with open("students.txt","r") as f:
mark = [line.strip().split()[0] for line in f]
for i in mark:
print(i)
I want to read some files with Python that contain certain data I need.
The structure of the files is like this:
NAME : a280
COMMENT : drilling problem (Ludwig)
TYPE : TSP
DIMENSION: 280
EDGE_WEIGHT_TYPE : EUC_2D
NODE_COORD_SECTION
1 288 149
2 288 129
3 270 133
4 256 141
5 256 157
6 246 157
7 236 169
8 228 169
9 228 161
So, the file starts with a few lines that contain data I need, then there are some random lines I do NOT need, and then there are lines with numerical data that I do need. I read everything that I need to read just fine.
However, my problem is that I cannot find a way to bypass the random number of lines that are sandwiched between the data I need. The lines from file to file can be 1, 2 or more. It would be silly to hardcode some f.readline() commands in there to bypass this.
I have thought of some regular expression to check if the line is starting with a string, in order to bypass it, but I'm failing.
In other words, there can be more lines like "NODE_COORD_SECTION" that I don't need in my data.
Any help is highly appreciated.
Well you can simply check if every line is valid (stuff you need) and if it is not, you just skip it. For example:
line_list = line.split()
if line_list[0] not in ['NAME', 'COMMENT', 'TYPE', ...]:
break
if len(line_list) != 3:
break
if len(line_list) == 3 and (type(line_list[0]) != int or type(line_list[1]) != int or type(line_list[2]) != int):
break
It would be nice if you add some format to the "lines of your file" and if you showed some code, but her's what I would try.
I would first define a list of strings containing an indication of a valid line, then I would split the current line into a list of strings and check if the first element corresponds to any of the elements in a list of valid strings.
In case the first string doesn't corresponds to any of the strings in the list of valid strings, I would check if the first element is an integer, and so on...
current_line = 'LINE OF TEXT FROM FILE'
VALID_WORDS = ['VALID_STR1','VALID_STR2','VALID_STR3']
elems = current_line.split(' ')
valid_line = False
if elems[0] in VALID_WORDS:
# If the first str is in the list of valid words,
# continue...
valid_line = True
else if len(elems)==3:
# If it's not in the list of valid words BUT has 3
# elements, check if it's and int
try:
valid_line = isinstance(int(elems[0]),int)
except Exception as e:
valid_line = False
if valid_line:
# Your thing
pass
else:
# Not a valid line
continue
I'm new to Python, hopefully someone can help me in this.
I want to grep data from multiple files then combine the data I grep into a single log.
My input files as such:
Input file1 (200MHz)
Cell_a freq_100 50
Cell_a freq_200 6.8
Cell_b freq_100 70
Input file2 (100MHz)
Cell_a freq_100 100
Cell_a freq_200 10.5
Cell_b freq_100 60
This is my expected output
[cell] [freq] [value_frm_file1] [value_frm_file2] [value_frm_file3] [etc...]
Example expected output:-
Cell_a freq_100 50 100 #50 is taken from file1, 100 from file2
Cell_a freq_200 6.8 10.5
Cell_b freq_100 70 60
I guess the best way is to store in Python dictionary? Can you give me example or show me how to do this? Here is my code, but I'm only able to get the value one at a time, how to combined them accordingly to it's respective freq type?
for i in cmaxFreqList: #this is the list base on it's frq type, IE 200MHz, 100MHz etc
file = path + freqfile
with open (file) as f:
data = f.readlines()
for line in data:
line = line.rstrip('\n')
freqlength = len(line.split())
if freqlength == 3:
searchFreqValue =re.search("(\S+)\s+(\S+)\s+(\S+)",line)
cell = searchFreqValue.group(1)
freq = searchFreqValue.group(2)
value = searchFreqValue.group(3)
print ('cell + ' ' + freq + ' ' + value) #only can get up to printing out one value at a time
Thank you for your help!
That's a relatively simple task provided your files are not extremely huge (i.e. their combined data can fit into the working memory while concatenating them). All you need is to create a (cell_name, freq) map (you can use a dict for that) and then append the matching values to it. Once you go through all your files just write to the combined output file the map->value elements and Bob's your uncle:
import os
import collections
path = "." # current folder
freq_list = ["100.dat", "200.dat"] # a list of files to concatenate
result = collections.defaultdict(list) # a map to hold a list of our results
for file_name in freq_list: # go through each file name
with open(os.path.join(path, file_name), "r") as f: # open the file
for line in f: # go through it line by line
try:
cell, freq, value = line.split() # split it by whitespace into 3 elements
except ValueError: # invalid line - it didn't have exactly 3 elements
continue # ignore the current line and continue with the next
result[(cell, freq)].append(value) # append the value to our result map
with open(os.path.join(path, "combined.dat"), "w") as f: # open our output file for writing
# Python dictionaries are unsorted (<v3.6), sort the keys when looping through them
for element in sorted(result): # loop through each key in our result map
# write the key (cell name and frequency) separated by space, add space,
# write the values separated by space and finally add a new line:
f.write("{} {}\n".format(" ".join(element), " ".join(result[element])))
It's unclear from your code what cmaxFreqList contains, but in my example it (freq_list) holds the actual file names - you can of course construct your input file names any way you want (just make sure that os.path.join(path, file_name) constructs a valid path). For example, if the above-listed 100.dat contained:
Cell_a freq_100 50
Cell_a freq_200 6.8
Cell_b freq_100 70
and the 200.dat contained:
Cell_a freq_100 100
Cell_a freq_200 10.5
Cell_b freq_100 60
the "combined.dat" file will end up as:
Cell_a freq_100 50 100
Cell_a freq_200 6.8 10.5
Cell_b freq_100 70 60
I dont fully understand the question due to the readability of your expected ouput, however here are some tips you could use to iterate through the parameters and the values:
for searching a type of value (i.e cell, freq, etc) you could use the list index method:
parameters = ['Cell_', 'freq_', 'etc'] #Name of the parameters you are looking for
for parameter in parameters:
for line in data:
new_list = line.split()
position_of_the_value = new_list.index(parameter) + 1
if you
print(new_list[position_of_the_value])
you get the value for that parameter in that line, you can then store it in a list
parameter1_list = list()
parameter1_list.append(new_list[position_of_the_value])
finally, you construct the string you want to print
print('Parameter_1 '+ ' '.join(parameter1_list))
and this will print something like
Parameter_1 100 50 200 300
you just have to contruct loops to iterate over every parameter and every list in order to get them all printed.
I am new to programming but I have started looking into both Python and Perl.
I am looking for data in two input files that are partly CSV, selecting some of them and putting into a new output file.
Maybe Python CSV or Pandas can help here, but I'm a bit stuck when it comes to skipping/keeping rows and columns.
Also, I don't have any headers for my columns.
Input file 1:
-- Some comments
KW1
'Z1' 'F' 30 26 'S'
KW2
'Z1' 30 26 1 1 5 7 /
'Z1' 30 26 2 2 6 8 /
'Z1' 29 27 4 4 12 13 /
Input file 2:
-- Some comments
-- Some more comments
KW1
'Z2' 'F' 40 45 'S'
KW2
'Z2' 40 45 1 1 10 10 /
'Z2' 41 45 2 2 14 15 /
'Z2' 41 46 4 4 16 17 /
Desired output file:
KW_NEW
'Z_NEW' 1000 30 26 1 /
'Z_NEW' 1000 30 26 2 /
'Z_NEW' 1000 29 27 4 /
'Z_NEW' 1000 40 45 1 /
'Z_NEW' 1000 41 45 2 /
'Z_NEW' 1000 41 46 4 /
So what I want to do is:
Do not include anything in either of my two input files before I reach KW2
Replace KW2 with KW_NEW
Replace either Z1' orZ2withZ_NEW` in the first column
Add a new second column with a constant value e.g. 1000
Copy the next three columns as they are
Leave out any remaining columns before printing the slash / at the end
Could anyone give me at least some general hints/tips how to approach this?
Your files are not "partly csv" (there is not a comma in sight); they are (partly) space delimited. You can read the files line-by-line, use Python's .split() method to convert the relevant strings into lists of substrings, and then re-arrange the pieces as you please. The splitting and re-assembly might look something like this:
input_line = "'Z1' 30 26 1 1 5 7 /" # test data
input_items = input_line.split()
output_items = ["'Z_NEW'", '1000']
output_items.append(input_items[1])
output_items.append(input_items[2])
output_items.append(input_items[3])
output_items.append('/')
output_line = ' '.join(output_items)
print(output_line)
The final print() statement shows that the resulting string is
'Z_NEW' 1000 30 26 1 /
Is your file format static? (this is not actually csv by the way :P) You might want to investigate a standardized file format like JSON or strict CSV to store your data, so that you can use already-existing tools to parse your input files. python has great JSON and CSV libraries that can do all the hard stuff for you.
If you're stuck with this file format, I would try something along these lines.
path = '<input_path>'
kws = ['KW1', 'KW2']
desired_kw = kws[1]
def parse_columns(line):
array = line.split()
if array[-1] is '/':
# get rid of trailing slash
array = array[:-1]
def is_kw(cols):
if len(cols) > 0 and cols[0] in kws:
return cols[0]
# to parse the section denoted by desired keyword
with open(path, 'r') as input_fp:
matrix = []
reading_file = False
for line in input_fp.readlines:
cols = parse_columns(line)
line_is_kw = is_kw(line)
if line_is_kw:
if not reading_file:
if line_is_kw is desired_kw:
reading_file = True
else:
continue
else:
break
if reading_file:
matrix = cols
print matrix
From there you can use stuff like slice notation and basic list manipulation to get your desired array. Good luck!
Here is a way to do it with Perl:
#!/usr/bin/perl
use strict;
use warnings;
# initialize output array
my #output = ('KW_NEW');
# proceed first file
open my $fh1, '<', 'in1.txt' or die "unable to open file1: $!";
while(<$fh1>) {
# consider only lines after KW2
if (/KW2/ .. eof) {
# Don't treat KW2 line
next if /KW2/;
# split the current line on space and keep only the fifth first element
my #l = (split ' ', $_)[0..4];
# change the first element
$l[0] = 'Z_NEW';
# insert 1000 at second position
splice #l,1,0,1000;
# push into output array
push #output, "#l";
}
}
# proceed second file
open my $fh2, '<', 'in2.txt' or die "unable to open file2: $!";
while(<$fh2>) {
if (/KW2/ .. eof) {
next if /KW2/;
my #l = (split ' ', $_)[0..4];
$l[0] = 'Z_NEW';
splice #l,1,0,1000;
push #output, "#l";
}
}
# write array to output file
open my $fh3, '>', 'out.txt' or die "unable to open file3: $!";
print $fh3 $_,"\n" for #output;