Python String Manipulation

Python String Manipulation - python

Having a hard time understanding what is happening in this snippet of code. Particularly with the 2nd line of code.
for line in infile:
data = line.strip('\n').split(':')
user_dict[data[0]] = data[1]

The line sets the variable data equal to the string represented by the variable line with the new line character '\n' removed and then split anywhere a : occurs.

It parses a file having this structure:
a:52
b:hi
key:value
for line in infile: is a loop for each line in the file. Each line (except for the last maybe) ends with new-line symbol \n.
line.strip('\n') removes the new-line symbol.
.split(':') splits the string into strings there were separated by :. For example: "qwe:rty:uio".split(':') -> ["qwe", "rty", "uio"]
user_dict[data[0]] = data[1] obviously saves the data into the dicionary user_dict taking the first string as a key, and second one as a value.
For the file mentioned above this code creates the following dictionary:
{"a": "52", "b": "hi", "key": "value"}

line.strip('\n') is removing all the \n (new line) from the string and the split(':') it is going to split your string using :
as the delimeter into an array of strings.

Above code is storing the file into the dictionary. Content of the file is like below
key1:value1
key2:value2
.
.
.
key3:value3
Second line is stripping off the \n character from the line and then splitting the each line by : character. However you should try to understand and debug the code line by line

line.strip('\n') will remove all the newlines from the string.
and
split(':') will split your string using ':' into array of strings.

data = line.strip('\n').split(':')
There are two string functions in one line. You also can separate the calls. This should be the same:
my_line = line.strip('\n')
my_line1 = my_line.split(':')
line.strip --> removes the new line character at the end of a line
line.split(':') --> splits the values at colon character and return a list of each record
It is easier to understand with concrete values.
Your file look like this and you loop through each line.
Name: Paul
Age: 18
Gender: Male
At the end of each line you have a "new line" character which will remove line.strip('\n').
Then you split the values at ":"
You finally create a dictionary (line 3) where key is the left side and the value is the right side.
dict['Name'] = 'Paul'
dict['Age'] = '18'

Basically line.strip('\n') removes leading consecutive newlines and trailing consecutive newlines, but leaves embedded newlines alone. from line; and then split(':') separates anywhere ":" is.This is then stored as a list in the variable named data.

Related

How to read file with lines of integers separated by a comma to a list without "\n"

I am a new to the os module
I've been stuck for hours on this question and I really would appreciate any type of help:)
I have a file that contains 6 lines where each line has 6 numbers separated by a comma.
What I want to do is to get all of these numbers into a list so that I later can convert them from str to int. My problem is that I can't get rid of "\n".
Here is my code
Thank you

def read_integers(path):
content_lst=[]
with open(path,'r') as file:
content=file.read()
for i in content.split('\n'):
content_lst+=i.split(', ')
return content_lst

Your file actually has multiple lines (in addition to the comma separated numbers). If you want all the numbers in a single "flat" list you'll need to process both line and comma separators. One way to do it simply is to replace all the end of line characters with commas so that you can use split(',') on a single string:
with open(path,'r') as file:
content_lst = file.read().replace("\n",", ").split(", ")
You could also do it the other way around and change the commas to end of lines. Converting the values to integers can be done using the map function:
with open(path,'r') as file:
content_lst = file.read().replace(",","\n").split()
content_lst = list(map(int,content_lst))

Reading a multiline file with "&" as separator

I have a file which contains the following content:
This is the first line.
&
This is the second line
but without separator.
&
This is the third line.
...
Each line terminates with a \n. I want to convert this file input into the following list:
['This is the first line.', 'This is the second line but without separator.', 'This is the third line.', ...]
My actual code looks like:
file = open("/path/to/file", "r")
list = [line.rstrip() for line in file if not line.rstrip() is "&"]
The problem is that the multi line section gets separated in the list but I want it togehter with or without a \n in it.
I hope someone can give me a hint. Thanks!

just split the whole file by & and remove whitespace (assuming that they should just be separated by &)
l = [s.strip().replace('\n', ' ') for s in file.read().split('&')]

Here is a working example. You already know how to read the file, here is how you might parse the contents.
file_contents = """This is the first line.
&
This is the second line
but without separator.
&
This is the third line."""
all_lines = []
for l in file_contents.split('&'):
all_lines.append(" ".join(l.split('\n')).rstrip())
print(all_lines)
Prints:
['This is the first line.', ' This is the second line but without separator.', ' This is the third line.']

How about read all the lines and join them as a single string, then use String.split("&")
with open("test.txt") as file:
lines = file.read()
print(lines.split("&"))
# to remove the \n
print(lines.replace("\n", "").split("&"))

Python printing lines from a file

Salutations, I am trying to write a function that prints data from a text file line by line. The output needs to have the number of the line followed by a colon and a space. I came up with the following code;
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number)+": "+line, end=' ')
line_number += 1
The issue is when I run this function using test text files I created, the first line is not on the same indentation level as the rest of the lines in the output, ie. the outputs look kind of like
1: 9874234,12.5,23.0,50.0
2: 7840231,70,60,85.4
3: 3845913,55.5,60.5,80.0
4: 3849511,20,60,50
Where am I going wrong? Thanks

Replace the value of end argument with empty string instead of space. As end argument is a space, it's printing a space after every line. So latter lines have a space at the beginning of the line.
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number) + ": " + line, end='')
line_number += 1
Another way you can do this, is strip the new lines and print without passing any value to end argument. This will remove the \n it has at the end of the line and a new line will be printed as end="\n" by default.
def print_numbered_lines(filename):
"""Function to print numbered lines from a list"""
data = open(filename)
line_number = 1
for line in data:
print(str(line_number) + ": " + line.strip("\n"))
line_number += 1

This has to do with your print statement.
print(str(line_number)+": "+line, end=' ')
You probably saw that when printing your lines there was an extra line between them and then you tried to work around this by using end=' '.
If you want to remove the 'empty' lines you should use line.strip(). This removes them.
Use this:
print(str(line_number)+": "+line.strip())
strip can also take an argument. This is from the documentation:
str.strip([chars])
Return a copy of the string with the leading and trailing characters removed. The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped:
Whats up with that?
The lines in your file are not separated into different lines by nothing. On linux a newline is represented by \n. Normal editors convert these by pushing the text down into a new line.
When reading a file Python separates lines on exactly these \n but doesn't throw them away. When printing they will be interpreted again and combined with the newline a print adds there will be one newline 'too much'.
The end parameter in your print statement simply changes what print will use after printing a line. Default is \n.
Check what it does when you use end=" !":
1: aaa
!2: bbb
!3: ccc
You can see the \n after 'aaa' causing a newline (which is part of the string) and after that print adds the contents of end. So it adds a !. The next line is printed in the same line because there is no other newline that would cause a line break before printing it.

You specified end argument as a space. So after first line each has this extra space.
line that your read from file looks somehting like this:
'9874234,12.5,23.0,50.0\n'
Look at the ending. Line translation happens is due to original line.
So to get what you want you just need to change end argument of print to empty string( not space)
Moreover, I advise you to change the implementation of the function and use enumerate for line numbering.
def print_numbered_lines(filename):
data = open(filename)
for i, line in enumerate(data):
print(str(i+1)+": "+line, end='')

Slice strings in .txt and return only one of the new strings

I want to use lines of strings of a .txt file as search queries in other .txt files. But before this, I need to slice those strings of the lines of my original text data. Is there a simple way to do this?
This is my original .txt data:
CHEMBL2057820|MUBD_HDAC2_ligandset|mol2|42|dock12
CHEMBL1957458|MUBD_HDAC2_ligandset|mol2|58|dock10
CHEMBL251144|MUBD_HDAC2_ligandset|mol2|41|dock98
CHEMBL269935|MUBD_HDAC2_ligandset|mol2|30|dock58
... (over thousands)
And I need to have a new file where the new new lines contain only part of those strings, like:
CHEMBL2057820
CHEMBL1957458
CHEMBL251144
CHEMBL269935

Open the file, read in the lines and split each line at the | character, then index the first result
with open("test.txt") as f:
parts = (line.lstrip().split('|', 1)[0] for line in f)
with open('dest.txt', 'w') as dest:
dest.write("\n".join(parts))
Explanation:
lstrip - removes whitespace on leading part of the line
split("|") returns a list like: ['CHEMBL2057820', 'MUBD_HDAC2_ligandset', 'mol2', '42', 'dock12'] for each line
Since we're only conerned with the first section it's redundant to split the rest of the contents of the line on the | character, so we can specify a maxsplit argument, which will stop splitting the string after it's encoutered that many chacters
So split("|", 1)
gives['CHEMBL2057820','MUBD_HDAC2_ligandset|mol2|42|dock12']
Since we're only interested in the first part split("|", 1)[0] returns
the "CHEMBL..." section

Use split and readlines:
with open('foo.txt') as f:
g = open('bar.txt')
lines = f.readlines()
for line in lines:
l = line.strip().split('|')[0]
g.write(l)

CSV File to list

I have a CSV file which is made of words in the first column. (1 word per row)
I need to print a list of these words, i.e.
CSV File:
a
and
because
have
Output wanted:
"a","and","because","have"
I am using python and so far I have the follwing code;
text=open('/Users/jessieinchauspe/Dropbox/Smesh/TMT/zipf.csv')
text1 = ''.join(ch for ch in text)
for word in text1:
print '"' + word + '"' +','
This is returning:
"a",
"",
"a",
"n",
...
Whereas I need everything one one line, and not by character but by word.
Thank you for your help!
EDIT: this is a screenshot of the preview of the CSV file

Just loop over the file directly:
with open('/Users/jessieinchauspe/Dropbox/Smesh/TMT/zipf.csv') as text:
print ','.join('"{0}"'.format(word.strip()) for word in text)
The above code:
Loops over the file; this gives you a line (including the newline \n character).
Uses .strip() to remove whitespace around the word (including the newline).
Uses .format() to put the word in quotes ('word' becomes '"word"')
Uses ','.join() to join all quoted words together into one list with commas in between.

When you do :
text=open('/Users/jessieinchauspe/Dropbox/Smesh/TMT/zipf.csv')
that basically returns an iterator with each line as an element. If you want a list out of that and you're sure that there is only one word per line than all you need to do is
result=list(text)
print result
Otherwise you can get the first words only like so :
result = list(x.split(',')[0] for x in text)
print result

You could also use the CSV module:
import csv
input_f = '/Users/jessieinchauspe/Dropbox/Smesh/TMT/zipf.csv'
output_f = '/Users/jessieinchauspe/Dropbox/Smesh/TMT/output.csv'
with open(input_f, 'r') as input_handle, open(output_f, 'w') as output_handle:
writer = csv.writer(output_handle)
writer.writerow(list(input_handle))

If you put a comma at the end of the print statement it suppresses the newline.
print '"' + word + '"' +',',
Will give you the output on one line.

print ','.join('"%s"' % line.strip() for line in open('/tmp/test'))

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python String Manipulation - python

Having a hard time understanding what is happening in this snippet of code. Particularly with the 2nd line of code. for line in infile: data = line.strip('\n').split(':') user_dict[data[0]] = data[1]

The line sets the variable data equal to the string represented by the variable line with the new line character '\n' removed and then split anywhere a : occurs.

line.strip('\n') is removing all the \n (new line) from the string and the split(':') it is going to split your string using : as the delimeter into an array of strings.

line.strip('\n') will remove all the newlines from the string. and split(':') will split your string using ':' into array of strings.

Basically line.strip('\n') removes leading consecutive newlines and trailing consecutive newlines, but leaves embedded newlines alone. from line; and then split(':') separates anywhere ":" is.This is then stored as a list in the variable named data.

Related

How to read file with lines of integers separated by a comma to a list without "\n"

Reading a multiline file with "&" as separator

Python printing lines from a file

Slice strings in .txt and return only one of the new strings

CSV File to list

Categories

Resources