using regex to find multiple occurences in a file in python - python

I m trying the following:
myfile.txt has the following content ,I want to extract data between each 'abc start' and 'abc end' using regular expression in python. thanks for the help
abc start
1
2
3
4
abc end
5
6
7
abc start
8
9
10
abc end
expecting a output as 1 2 3 4 8 9 10

import re
with open('myfile.txt') as f:
txt = f.read()
strings = re.findall('abc start\n(.+?)\nabc end', txt, re.DOTALL)
# to transform to your output..
result = []
for s in strings:
result += s.split('\n')
print(result)
#['1', '2', '3', '4', '8', '9', '10']

using regex
import re
string = ''
with open('file.txt','r') as f:
for i in f.readlines():
string +=i.strip()+' '
f.close()
exp = re.compile(r'abc start(.+?)abc end')
result = [[int(j) for j in list(i.strip().split())] for i in exp.findall(string)]
print(result)
# [[1, 2, 3, 4], [8, 9, 10]]

Related

Getting a nested list with "" in the outer list

After opening and reading an input file, I'm trying to split the input on different characters. This works well, although I seem to be getting a nested list which I don't want. My list does not look like [[list]], but like ["[list]"]. What did I do wrong here?
The input looks like this:
name1___1 2 3 4 5
5=20=22=10=2=0=0=1=0=1something,something
name2___1 2 3 4
2=30=15=8=4=3=2=0=0=0;
The output looks like this:
["['name1", '', '', "1 2 3 4 5', 'name2", '', '', "1 2 3 4']"]
Here is my code:
file = open("file.txt")
input_of_this_file = file.read()
a = input_of_this_file.split("\n")
b = a[0::2] # so i get only the even lines
c = str(b) # to make it a string so the .strip() works
d = c.strip() # because there were whitespaces
e = d("_")
print e
If i then do:
x = e[0]
I get:
['name1
This removes the outer list, but also removes the last ].
I would like it to look like: name1, name2
So that i only get the names.
Use itertools.islice and a list comprehension.
>>> from itertools import islice
>>> with open("tmp.txt") as f:
... [line.rstrip("\n").split("_") for line in islice(f, None, None, 2)]
...
[['name1', '', '', '1 2 3 4 5'], ['name2', '', '', '1 2 3 4']]
Keeping your code syntax without imports:
c=[]
input_of_file = '''name1___1 2 3 4 5
5=20=22=10=2=0=0=1=0=1something,something
name2___1 2 3 4
2=30=15=8=4=3=2=0=0=0;'''
a = input_of_file.split("\n")
b = a[::2]
for item in b:
new_item = item.split('__')
c.append(new_item)
Results
c = [['name1', '_1 2 3 4 5'], ['name2', '_1 2 3 4']]
c[0][0] = 'name1'

read txt file input and add values to two arrays

A B C D
2 4 5 6
4 5 3 7
3 6 7 8
I want to get A, B, C column values to array(3 x 3) and D column to another array(3 x 1).
simple brute-force method:
a33 = [[],[],[]]
a31 = []
with open('dat.txt') as f:
for ln in f:
a,b,c,d = ln.split()
a33[0] += a
a33[1] += b
a33[2] += c
a31 += d
print a33
print a31
[['2', '4', '3'], ['4', '5', '6'], ['5', '3', '7']]
['6', '7', '8']
import numpy as np
# Read the data from a file
with open('data.txt') as file:
lines = file.readlines()
# Chop of the columns
raw_data = lines[1:]
# Now fetch all the data
data_abc = []
data_d = []
for line in raw_data:
values = line.split()
data_abc.append(values[:3])
data_d.append(values[3])
# Convert to matrix
data_abc = np.asmatrix(data_abc)
data_d = np.asmatrix(data_d)
# Display the result
print('Data A B C:', data_abc)
print('Data D:', data_d)

How to edit the *.txt or *.dat file information in Python?

I am a very beginner in Python and have the next 'problem'. I would be glad, if you could help me)
I have a *.dat file (let's name it file-1, first row is just a headline which I use only here to mark the columns) which looks like:
1 2 3 4 5 6
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
6 5 -1000 "" "" ""
I need it to be like (file-1 (converted)):
6 5 1 -1000
6 5 1 -1000
6 5 2 -1000
6 5 3 -1000
6 5 3 -1000
6 5 3 -1000
6 5 3 -1000
So, file-1 has 9 rows (7 with information and 2 empty) and 6 columns and I have to do the next:
Delete the last 3 columns in the file-1.
Add 1 new column that will take place between the columns 2 and 3.
The value of this new column should be increased by 1 unit (like '+= 1') after passing the empty line.
Delete all the empty lines. The result is represented as the 'file-1 (converted)'.
I've tried to do this but stucked. For now I am on the level of:
import sys
import csv
with open("file-1.dat", "r", newline="") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
incsv = csv.reader(f, delimiter="\t")
for row in incsv:
if len(row) == 6:
i = 0
row = row[0:3]
row.insert(2, i)
print(row)
and it looks like:
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
I don't know for now how to change 0 to 1 and 2 and so on, so it could be like:
['6', '5', 0, '-1000']
['6', '5', 0, '-1000']
['6', '5', 1, '-1000']
['6', '5', 2, '-1000']
['6', '5', 2, '-1000']
['6', '5', 2, '-1000']
['6', '5', 2, '-1000']
And the result should be like the 'file-1 (converted)' file.
P.S. All the examples are simplified, real file has a lot of rows and I don't know where the empty lines appear.
P.P.S. Sorry for such a long post, hope, it makes sense. Ask, suggest - I would be really glad to see other opinions) Thank you.
seems like you're almost there, you're just inserting i=0 all the time instead of the count of empty rows, try something like:
with open("file-1.dat", "r", newline="") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
incsv = csv.reader(f, delimiter="\t")
empties = 0 # init empty row counter
for row in incsv:
if len(row) == 6:
row = row[0:3]
row.insert(2, empties) # insert number of empty rows
print(row)
else:
empties += 1 # if row is empty, increase counter
This is bit different without using csv module. Hope this helps. :)
import sys
count = 0
with open("file-1.dat", "r") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
for line in f:
converted_line = line.split()[:-3] #split each line and remove last 3 column
if not converted_line: # if list/line is empty
count += 1 #increase count but DO NOT PRINT/ WRITE TO FILE
else:
converted_line.insert(2,str(count)) # insert between 2nd and 3rd column
print ('\t'.join(converted_line)) # join them and print them with tab delimiter
You need to increment i on every empty line
import sys
import csv
with open("file-1.dat", "r") as f:
sys.stdout = open('%s2 (converted).txt' % f.name, 'a')
incsv = csv.reader(f, delimiter="\t")
incsv.next() # ignore first line
i = 0
for row in incsv:
if len(row) == 6:
row = row[0:3]
row.insert(2, i)
print(row)
elif len(row) == 0:
i += 1
Also, I couldn't execute your code on my machine (with Python 2.7.6). I changed the code according to run with Python 2.x.
Edit: I see it runs with Python 3.x

How to add inputed numbers in string.(Python)

How do i add all imputed numbers in a string?
Ex:
input:
5 5 3 5
output
18
and it must supports ('-')
Ex.
input
-5 5 3 5
output
8
I write something like this:
x = raw_input()
print sum(map(int,str(x)))
and it adds normally if x>0
But what to do with ('-') ?
I understand that i need to use split() but my knowledge is not enough (
You're close, you just need to split the string on spaces. Splitting will produce the list of strings ['-5', '5', '3', '5']. Then you can do the rest of the map and sum as you intended.
>>> s = '-5 5 3 5'
>>> sum(map(int, s.split()))
8
its simple
>>> input = raw_input('Enter your input: ')
Enter your input: 5 5 10 -10
>>> list_numbers = [int(item) for item in input.split(' ')]
>>> print list_numbers
[5, 5, 10, -10]
And after what you want :)
You can use the following line:
sum(map(int, raw_input().split()))

Python, How to use line from file as key in dictionary, use next line for value?

I have a file like this:
Ben
0 1 5 2 0 1 0 1
Tim
3 2 1 5 4 0 0 1
I would like to make a dictionary that looks like this:
{Ben: 0 1 5 2 0 1 0 1, Tim : 3 2 1 5 4 0 0 1}
so I was thinking something like:
for line in file:
dict[line] = line + 1
but you can't iterate through a file like that, so how would I go about
doing this?
This might be what you want:
dict_data = {}
with open('data.txt') as f:
for key in f:
dict_data[key.strip()] = next(f).split()
print dict_data
Output:
{'Tim': ['3', '2', '1', '5', '4', '0', '0', '1'], 'Ben': ['0', '1', '5', '2', '0', '1', '0', '1']}
Discussion
The for loop assumes each line is a key, we will read the next line in the body of the loop
key.strip() will turn 'Tim\n' to 'Tim'
f.next() reads and returns the next line -- the line after the key line
f.next().split() therefore splitting that line into a list
dict_data[key.strip()] = ... will do something like: dict_data['Tim'] = [ ... ]
Update
Thank to Blckknght for the pointer. I changed f.next() to next(f)
Update 2
If you want to turn the list into a list of integers instead of string, then instead of:
dict_data[key.strip()] = next(f).split()
Do this:
dict_data[key.strip()] = [int(i) for i in next(f).split()]
state = 0
d = {}
for line in file:
if state == 0:
key = line.strip()
state = 1
elif state == 1:
d[key] = line.split()
state = 0
I think the easiest approach is to first load the full file with file.readlines(), which loads the whole file and returns a list of the lines. Then you can create your dictionary with a comprehension:
lines = my_file.readlines()
my_dict = dict(lines[i:i+2] for i in range(0, len(lines), 2))
For your example file, this will give my_dict the contents:
{"Ben\n": "0 1 5 2 0 1 0 1\n", "Tim\n": "3 2 1 5 4 0 0 1\n"}
An alternative approach would be to use a while loop that reads two lines at a time:
my_dict = {}
while True:
name = file.readline().strip()
if not name: # detect the end of the file, where readline returns ""
break
numbers = [int(n) for n in file.readline().split()]
my_dict[name] = numbers
This approach allows you easily do some processing of the lines than the comprehension in the earlier version, such as stripping newlines and splitting the line of numbers into a list of actual int objects.
The result for the example file would be:
{"Ben": [0, 1, 5, 2, 0, 1, 0, 1], "Tim": [3, 2, 1, 5, 4, 0, 0, 1]}

Categories

Resources