Python split rows in txt file - python

I have .txt list with many rows in the following format:
3AD544532F-272|5SD32332S-F72|5FD2124L-Y21|4WA32332P-A26|6DW3224C-I72
(...)
How can I split these numbers by vertical bar | so that I can receive output .txt file with the following items:
3AD544532F-272-14
5SD32332S-F72-12
5FD2124L-Y21-41
4WA32332P-A26-17
6DW3224C-I72-41
I tried using this script but with no correct result.
import sys
output = open("output_list.txt","w")
print("read line and put it into list or array as you like to call it")
list = open("list.txt").read().splitlines()
for i in list:
re.compiler()
input()

If you have a string like this 3AD544532F-272|5SD32332S-F72|5FD2124L-Y21|4WA32332P-A26|6DW3224C-I72
You can split by | using split built in function
line = `3AD544532F-272|5SD32332S-F72|5FD2124L-Y21|4WA32332P-A26|6DW3224C-I72`
your_splitted_line = line.split("|")
>>print your_splitted_line
>>['3AD544532F-27','5SD32332S-F72','5FD2124L-Y21','4WA32332P-A26','6DW3224C-I72']

split() is indeed the best solution. The alternative using regex is
>>> import re
>>> text = "3AD544532F-272|5SD32332S-F72|5FD2124L-Y21|4WA32332P-A26|6DW3224C-I72"
>>> for i in re.split('\|',text):
... print (i)
...
3AD544532F-272
5SD32332S-F72
5FD2124L-Y21
4WA32332P-A26
6DW3224C-I72

Related

Get the full word(s) by knowing only just a part of it

I am searching through a text file line by line and i want to get back all strings that contains the prefix AAAXX1234. For example in my text file i have these lines
Hello my ID is [123423819::AAAXX1234_3412] #I want that(AAAXX1234_3412)
Hello my ID is [738281937::AAAXX1234_3413:AAAXX1234_4212] #I
want both of them(AAAXX1234_3413, AAAXX1234_4212)
Hello my ID is [123423819::XXWWF1234_3098] #I don't care about that
The code i have a just to check if the line starts with "Hello my ID is"
with open(file_hrd,'r',encoding='utf-8') as hrd:
hrd=hrd.readlines()
for line in hrd:
if line.startswith("Hello my ID is"):
#do something
Try this:
import re
with open(file_hrd,'r',encoding='utf-8') as hrd:
res = []
for line in hrd:
res += re.findall('AAAXX1234_\d+', line)
print(res)
Output:
['AAAXX1234_3412', 'AAAXX1234_3413', 'AAAXX1234_4212']
I’d suggest you to parse your lines and extract the information into meaningful parts. That way, you can then use a simple startswith on the ID part of your line. In addition, this will also let you control where you find these prefixes, e.g. in case the lines contains additional data that could also theoretically contain something that looks like an ID.
Something like this:
if line.startswith('Hello my ID is '):
idx_start = line.index('[')
idx_end = line.index(']', idx_start)
idx_separator = line.index(':', idx_start, idx_end)
num = line[idx_start + 1:idx_separator]
ids = line[idx_separator + 2:idx_end].split(':')
print(num, ids)
This would give you the following output for your three example lines:
123423819 ['AAAXX1234_3412']
738281937 ['AAAXX1234_3413', 'AAAXX1234_4212']
123423819 ['XXWWF1234_3098']
With that information, you can then check the ids for a prefix:
if any(ids, lambda x: x.startswith('AAAXX1234')):
print('do something')
Using regular expressions through the re module and its findall() function should be enough:
import re
with open('file.txt') as file:
prefix = 'AAAXX1234'
lines = file.read().splitlines()
output = list()
for line in lines:
output.extend(re.findall(f'{prefix}_[\d]+', line))
You can do it by findall with the regex r'AAAXX1234_[0-9]+', it will find all parts of the string that start with AAAXX1234_ and then grabs all of the numbers after it, change + to * if you want it to match 'AAAXX1234_' on it's own as well

How to delete everything from string up to the specific character in Python

I wanted to extract only date from following string. Here is variable:
file = '62-201809.csv'
I used rsplit to get rid of file csv extension like this:
splitf = file.rsplit('.', 1)[0]
I got 62-201809 so it's okey but now i need to get rid of everything to '-' and store only 201809 into variable.How to do it?
Try using:
>>> file = '62-201809.csv'
>>> file.split('-', 1)[1].split('.')[0]
'201809'
>>>
Or use regex:
>>> import re
>>> file = '62-201809.csv'
>>> re.search('-(\d+)', file).group(1)
'201809'
>>>
If want only use split can do that:
filen = '62-201809.csv'
number = filen.split('.')[0]
number2 = number.split('-')[1]
print(number2)
first get only number, and later the number 201809 only.

Get list from string with exec in python

I have:
"[15765,22832,15289,15016,15017]"
I want:
[15765,22832,15289,15016,15017]
What should I do to convert this string to list?
P.S. Post was edited without my permission and it lost important part. The type of line that looks like list is 'bytes'. This is not string.
P.S. №2. My initial code was:
import urllib.request, re
f = urllib.request.urlopen("http://www.finam.ru/cache/icharts/icharts.js")
lines = f.readlines()
for line in lines:
m = re.match('var\s+(\w+)\s*=\s*\[\\s*(.+)\s*\]\;', line.decode('windows-1251'))
if m is not None:
varname = m.group(1)
if varname == "aEmitentIds":
aEmitentIds = line #its type is 'bytes', not 'string'
I need to get list from line
line from web page looks like
[15765, 22832, 15289, 15016, 15017]
Assuming s is your string, you can just use split and then cast each number to integer:
s = [int(number) for number in s[1:-1].split(',')]
For detailed information about split function:
Python3 split documentation
What you have is a stringified list. You could use a json parser to parse that information into the corresponding list
import json
test_str = "[15765,22832,15289,15016,15017]"
l = json.loads(test_str) # List that you need.
Or another way to do this would be to use ast
import ast
test_str = "[15765,22832,15289,15016,15017]"
data = ast.literal_eval(test_str)
The result is
[15765, 22832, 15289, 15016, 15017]
To understand why using eval() is bad practice you could refer to this answer
You can also use regex to pull out numeric values from the string as follows:
import re
lst = "[15765,22832,15289,15016,15017]"
lst = [int(number) for number in re.findall('\d+',lst)]
Output of the above code is,
[15765, 22832, 15289, 15016, 15017]

How to extract a floating number from a string and add it using simple operation on python

I have a file named ping.txt which has the values that shows the time taken to ping an ip for n number of times.
I have my ping.txt contains:
time=35.9
time=32.4
I have written a python code to extract this floating number alone and add it using regular expression. But I feel that the below code is the indirect way of completing my task. The findall regex I am using here outputs a list which is them converted, join and then added.
import re
add,tmp=0,0
with open("ping.txt","r+") as pingfile:
for i in pingfile.readlines():
tmp=re.findall(r'\d+\.\d+',i)
add=add+float("".join(tmp))
print("The sum of the times is :",add)
My question is how to solve this problem without using regex or any other way to reduce the number of lines in my code to make it more efficient?
In other words, can I use different regex or some other method to do this operation?
~
You can use the following:
with open('ping.txt', 'r') as f:
s = sum(float(line.split('=')[1]) for line in f)
Output:
>>> with open('ping.txt', 'r') as f:
... s = sum(float(line.split('=')[1]) for line in f)
...
>>> s
68.3
Note: I assume each line of your file contains time=some_float_number
You could do it like this:
import re
total = sum(float(s) for s in re.findall(r'\d+(\.\d+)?', open("ping.txt","r+").read()))
If you have the string:
>>> s='time=35.9'
Then to get the value, you just need:
>>> float(s.split('=')[1]))
35.9
You don't need regular expressions for something with a simple delimiter.
You can use the string split to split each line at '=' and append them to a list. At the end, you can simply call the sum function to print the sum of elements in the list
temp = []
with open("test.txt","r+") as pingfile:
for i in pingfile.readlines():
temp.append(float(str.split(i,'=')[1]))
print("The sum of the times is :",sum(temp))
Use This in RE
tmp = re.findall("[0-9]+.[0-9]+",i)
After that run a loop
sum = 0
for each in tmp:
sum = sum + float(each)

Extraction from python over multiple lines

I'm working with python and am trying to extract numbers from a .txt file and then group them into multiple categories. The .txt file looks like this:
IF 92007<=ZIPCODE<=92011 OR ZIPCODE=92014 OR ZIPCODE=92024
OR 92054<=ZIPCODE<=92058 OR ZIPCODE=92067 OR ZIPCODE=92075
OR ZIPCODE=92083 OR ZIPCODE=92084 OR ZIPCODE=92091 OR ZIPCODE=92672
OR ZIPCODE=92081 THEN REGION=1; ** N COASTAL **;
This code was used to extract numbers from the first line:
import re
TXTPATH = 'C:/zipcode_mapping.txt'
f = open(TXTPATH,'r')
expr= "IF 92007<=ZIPCODE<=92011 OR ZIPCODE=92014 OR ZIPCODE=92024"
for line in f:
L = line
print(L)
matches = re.findall("([0-9]{5})",expr)
for match in matches:
print match
I can't seem to pull out the numbers from the other lines though. Any suggestions?
Just do:
matches = re.findall("([0-9]{5})",f.read())
You can extract them all at once - no need to loop over lines.
Don't you just need to change 'expr' to 'L'?
matches = re.findall("([0-9]{5})",L)
Maybe I'm being naive, but shouldn't you search for numbers in L, instead of in expr?
matches = re.findall("([0-9]{5})", L)
^^^^^^

Categories

Resources