I have a list in list as follows:
bounding_time = [['58'], ['68']]
v = [['-0.00162439495203'], ['-0.000178892778126'],]
and 58 in bounding_time correspond to first item in v and subsequently for 68. I trying to write to a file in such a way that I should get
58 -0.00162439495203
68 -0.000178892778126
However, with my code, which is:
for bt_new in bounding_time:
bt = ''.join(map(str, bt_new))
print bt
for v_new in v[0]:
print v_new
I am getting
58
68
['-0.00162439495203']['-0.000178892778126']
Is there a way to format these lists to the desired output?
First set up the data:
>>> bounding_time = [['58'], ['68']]
>>> v = [['-0.00162439495203'], ['-0.000178892778126'],]
Now, use zip to iterate over each sublist in each list, accessing the first item in each, and concatenating those strings with 4 empty spaces:
>>> for i, j in zip(bounding_time, v):
... print i[0] + ' ' + j[0]
...
58 -0.00162439495203
68 -0.000178892778126
Or you can ensure you have the same width for the first column with str.ljust.
>>> for i, j in zip(bounding_time, v):
... print i[0].ljust(6) + j[0]
...
58 -0.00162439495203
68 -0.000178892778126
Related
When I write
a1 = list([b'1,690569\n1,315892\n1,189226\n2,834328\n2,1615927\n2,1194519\n'])
print(a1)
for edge_ in a1:
print('edge =' + str(edge_))
z[(edge_[0], edge_[1])] = 1
print('edge_0 =' + str(edge_[0]))
print('edge_1 =' + str(edge_[1]))
print(z)
I get the output as
[b'1,690569\n1,315892\n1,189226\n2,834328\n2,1615927\n2,1194519\n']
edge =b'1,690569\n1,315892\n1,189226\n2,834328\n2,1615927\n2,1194519\n'
edge_0 =49
edge_1 =44
{(49, 44): 1}
Can anyone explain why it is 49 and 44? These values are coming irrespective of the element inside the list.
Firstly, as others have already mentioned, your array below is a byte array. This is evident due to the 'b' at the start. You don't need to use 'list()' by the way.
a1 = [b'1,690569\n1,315892\n1,189226\n2,834328\n2,1615927\n2,1194519\n']
Given that z is an empty dictionary (i.e. z = dict())
Below is just adding a tuple as a key and an integer as value:
z[(edge_[0], edge_[1])] = 1
We can see the following:
edge_ = a1[0] = b'1,690569\n1,315892\n1,189226\n2,834328\n2,1615927\n2,1194519\n'
edge_[0] = a1[0][0] = ord(b'1') = 49
edge_[1] = a1[0][1] = ord(b',') = 44
Hence z[(edge_[0], edge_[1])] = 1 becomes:
z[(49, 44)] = 1
z = {(49, 44): 1}
The following regex does exactly what I want it to do, except that it also outputs the index as a digit ( I think it's the index). This messes up my output. So how can I tell it not to take the index ?
import re
import pandas as pd
df = pd.read_excel("tstfile.xlsx", names=["col1"])
for index, row in df.iterrows():
# print(index)
if str(row[0]).split():
if not re.findall("(.[A-Z]\d+\-\d+)", str(row)):
for i in re.findall("(\d+)", str(row)):
print(i)
Input data would look like:
123, 456
111 * 222
LL123-456
35
I get an output that looks like this:
123
0
456
1
111
2
222
3
35
4
The final desired output should be:
123
456
111
222
35
So only the data that is actually given in as input.
You can change your code like this:
for row in df.values.astype(str):
for word in row:
if not re.findall("(.[A-Z]\d+\-\d+)", word):
for num in re.findall("(\d+)", word):
print(num)
Alternatively, here is a one liner that converts the dataframe values into a string and uses the re.findall method to extract the numbers as strings. Words that start with upper case letters and contain a minus sign are excluded.
all_numbers = re.findall(r'(\d+)', ' '.join([j for i in df.values.astype(str) for j in i if not re.search(r'[A-Z].+\-', j)]))
for item in all_numbers:
print(item)
If you need integer numbers instead of strings, you can convert the list into a generator with
all_integers = map(int, all_numbers)
for i in all_integers:
print(i)
But remember, that generators can only be used once.
You can try this:
>>> data = """123, 456
... 111 * 222
... LL123-456
... 35"""
>>> data = data.replace(',', '')
>>> data = data.split()
>>> x = [int(i) for i in data if i.isdigit()]
>>> print(x)
The output is
[123, 456, 111, 222, 35]
How does this code, involving assignment and the yield operator, work? The results are rather confounding.
def test1(x):
for i in x:
_ = yield i
yield _
def test2(x):
for i in x:
_ = yield i
r1 = test1([1,2,3])
r2 = test2([1,2,3])
print list(r1)
print list(r2)
Output:
[1, None, 2, None, 3, None]
[1, 2, 3]
The assignment syntax ("yield expression") allows you to treat the generator as a rudimentary coroutine.
First proposed in PEP 342 and documented here: https://docs.python.org/2/reference/expressions.html#yield-expressions
The client code that is working with the generator can communicate data back into the generator using its send() method. That data is accessible via the assignment syntax.
send() will also iterate - so it actually includes a next() call.
Using your example, this is what it would be like to use the couroutine functionality:
>>> def test1(x):
... for i in x:
... _ = yield i
... yield _
...
>>> l = [1,2,3]
>>> gen_instance = test1(l)
>>> #First send has to be a None
>>> print gen_instance.send(None)
1
>>> print gen_instance.send("A")
A
>>> print gen_instance.send("B")
2
>>> print gen_instance.send("C")
C
>>> print gen_instance.send("D")
3
>>> print gen_instance.send("E")
E
>>> print gen_instance.send("F")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Note that some of the sends are lost because of the second yield in each loop iteration that doesn't capture the sent data.
EDIT:
Forgot to explain the Nones yielded in your example.
From https://docs.python.org/2/reference/expressions.html#generator.next:
When a generator function is resumed with a next() method, the current
yield expression always evaluates to None.
next() is used when using the iteration syntax.
_ = yield i
yield _
First it yields the value referenced by i, e.g. 1. Then it yields the value returned by the yield operation, which is None. It does this on each iteration of the loop.
for i in x:
_ = yield i
This simply yields the value referenced by i, e.g. 1, then proceeds to the next iteration of the loop, producing 2, then 3.
Unlike return, the yield keyword can be used in an expression:
x = return 0 # SyntaxError
x = yield 0 # perfectly fine
Now, when the interpreter sees a yield, it will generate the indicated value. However, when it does so, that operation returns the value None, just like mylist.append(0) or print('hello') will return the value None. When you assign that result to a reference like _, you're saving that None.
So, in the first snippet, you're yielding an object, then you save the "result" of that yield operation, which is None, and then you yield that None. In the second snippet, you yield an object, then you save the "result" of that yield operation, but you never yield that result, so None does not appear in the output.
Note that yield won't always return None - this is just what you sent to the generator with send(). Since that was nothing in this case, you get None. See this answer for more on send().
To expand on TigerhawkT3's answer, the reason that the yield operation is returning None in your code is because list(r1) isn't sending anything into the generator. Try this:
def test1(x):
for i in x:
_ = yield i
yield _
r1 = test1([1, 2, 3])
for x in r1:
print(' x', x)
print('send', r1.send('hello!'))
Output:
x 1
send hello!
x 2
send hello!
x 3
send hello!
Here's a somewhat manufactured example where sending values into a generator could be useful:
def changeable_count(start=0):
current = start
while True:
changed_current = yield current
if changed_current:
current = changed_current
else:
current += 1
counter = changeable_count(10)
for x in range(20):
print(next(counter), end=' ')
print()
print()
print('Sending 51, printing return value:', counter.send(51))
print()
for x in range(20):
print(next(counter), end=' ')
print()
print()
print('Sending 42, NOT printing return value')
print()
counter.send(42)
for x in range(20):
print(next(counter), end=' ')
print()
Output:
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Sending 51, printing return value: 51
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
Sending 42, NOT printing return value
43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62
I am new to python and trying to write my dictionary values to a file using Python 2.7. The values in my Dictionary D is a list with at least 2 items.
Dictionary has key as TERM_ID and
value has format [[DOC42, POS10, POS22], [DOC32, POS45]].
It means the TERM_ID (key) lies in DOC42 at POS10, POS22 positions and it also lies in DOC32 at POS45
So I have to write to a new file in the format: a new line for each TERM_ID
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22 (tab) DOC32:POS45
Following code will help you understand what exactly am trying to do.
for key,valuelist in D.items():
#first value in each list is an ID
docID = valuelist[0][0]
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file,write('\t0:' + lst[0])
lst.pop(0)
The output I get is :
TERM_ID (tab) DOC42:POS10 (tab) 0:POS22
DOC32:POS45
I tried using the new line tag as well as commas to continue file writing on the same line at no of places, but it did not work. I fail to understand how the file write really works.
Any kind of inputs will be helpful. Thanks!
#Falko I could not find a way to attach the text file hence here is my sample data-
879\t3\t1
162\t3\t1
405\t4\t1455
409\t5\t1
13\t6\t15
417\t6\t13
422\t57\t1
436\t4\t1
141\t8\t1
142\t4\t145
170\t8\t1
11\t4\t1
184\t4\t1
186\t8\t14
My sample running code is -
with open('sampledata.txt','r') as sample,open('result.txt','w') as file:
d = {}
#term= ''
#docIndexLines = docIndex.readlines()
#form a d with format [[doc a, pos 1, pos 2], [doc b, poa 3, pos 8]]
for l in sample:
tID = -1
someLst = l.split('\\t')
#if len(someLst) >= 2:
tID = someLst[1]
someLst.pop(1)
#if term not in d:
if not d.has_key(tID):
d[tID] = [someLst]
else:
d[tID].append(someLst)
#read the dionary to generate result file
docID = 0
for key,valuelist in d.items():
file.write(str(key))
for lst in valuelist:
file.write('\t' + lst[0] + ':' + lst[1])
lst.pop(0)
lst.pop(0)
for n in range(len(lst)):
file.write('\t0:' + lst[0])
lst.pop(0)
My Output:
57 422:1
3 879:1
162:1
5 409:1
4 405:1455
436:1
142:145
11:1
184:1
6 13:15
417:13
8 141:1
170:1
186:14
Expected output:
57 422:1
3 879:1 162:1
5 409:1
4 405:1455 436:1 142:145 11:1 184:1
6 13:15 417:13
8 141:1 170:1 186:14
You probably don't get the result you're expecting because you didn't strip the newline characters \n while reading the input data. Try replacing
someLst = l.split('\\t')
with
someLst = l.strip().split('\\t')
To enforce the mentioned line breaks in your output file, add a
file.write('\n')
at the very end of your second outer for loop:
for key,valuelist in d.items():
// ...
file.write('\n')
Bottom line: write never adds a line break. If you do see one in your output file, it's in your data.
I have a minor problem while checking for elements in a list:
I have two files with contents something like this
file 1: file2:
47 358 47
48 450 49
49 56 50
I parsed both files into two lists and used the following code to check
for i in file_1:
for j in file_2:
j = j.split()
if i == j[1]:
x=' '.join(j)
print >> write_in, x
I am now trying to get a "0" if the value of file_1 is not there in file_2 for example, value "48" is not there is file_2 so I need to get the output like (with only one space in between the two numbers) Also both the conditions should produce only one output file:
output_file:
358 47
0 48
450 49
56 50
I tried using the dictionary approach but I didn't quite get what I wanted (actually I don't know how to use dictionary in python correctly ;)). Any help will be great.
r1=open('file1').read().split()
r2=open('file2').read().split()
d=dict(zip(r2[1::2],r2[::2]))
output='\n'.join(x in d and d[x]+' '+x or '0 '+x for x in r1)
open('output_file','wb').write(output)
Test
>>> file1='47\n48\n49\n50'
>>> file2='358 47\n450 49\n56 50'
>>>
>>> r1=file1.split()
>>> r2=file2.split()
>>>
>>> d=dict(zip(r2[1::2],r2[::2])) #
>>> d
{'47': '358', '50': '56', '49': '450'}
>>>
>>> print '\n'.join(x in d and d[x]+' '+x or '0 '+x for x in r1)
358 47
0 48
450 49
56 50
>>>
You could modify your code quite easily:
for i in file_1:
x = None
for j in file_2:
j = j.split()
if i == j[1]:
x = ' '.join(j)
if x is None:
x = ' '.join(['0', i])
Depending on your inputs, the whole task might be of course simplified even further. At the moment, your code is 0(n**2) complexity.
Here's a readable solution using a dictionary:
d = {}
for k in file1:
d[k] = 0
for line in file2:
v, k = line.split()
d[k] = v
for k in sorted(d):
print d[k], k
You can try something like:
l1 = open('file1').read().split()
l2 = [line.split() for line in open('file2')]
for x, y in zip(l1, l2):
if x not in y:
print 0, x
print ' '.join(y)
but if you follow your logic, the output should be
358 47
0 48
450 49
0 49
56 50
and not
358 47
0 48
450 49
56 50
def file_process(filename1, filename2):
# read first file with zeroes as values
with open(filename1) as fp:
adict= dict( (line.rstrip(), 0) for line in fp)
# read second file as "value key"
with open(filename2) as fp:
adict.update(
line.rstrip().partition(" ")[2::-2] # tricky, read notes
for line in fp)
for key in sorted(adict):
yield adict[key], key
fp= open("output_file", "w")
fp.writelines("%s %s\n" % items for items in file_process("file1", "file2"))
fp.close()
str.partition(" ") returns a tuple of (pre-space, space, post-space). By slicing the tuple, starting at item 2 (post-space) and moving by a step of -2, we return a tuple of (post-space, pre-space), which are (key, value) for the dictionary that describes the solution.
PS Um :) I just noticed that my answer is essentially the same as Daniel Stutzbach's.