Python string extraction from array of strings

Python string extraction from array of strings - python

I am having trouble figuring out the following:
Suppose I have a list of strings
strings = ["and","the","woah"]
I want the output to be a list of strings where the ith position of every string becomes a new string item in the array like so
["atw","nho","dea","h"]
I am playing with the following list comprehension
u = [[]]*4
c = [u[i].append(stuff[i]) for i in range(0,4) for stuff in strings]
but its not working out. Can anyone help? I know you can use other tools to accomplish this, but i am particularly interested in making this happen with for loops and list comprehensions. This may be asking a lot, Let me know if I am.

Using just list comprehensions and for loops you can:
strings = ["and","the","woah"]
#Get a null set to be filled in
new = ["" for x in range(max([len(m) for m in strings]))]
#Cycle through new list
for index,item in enumerate(new):
for w in strings:
try:
item += w[index]
new[index] = item
except IndexError,err:
pass
print new

My idea would be to use itertools.izip_longest and a list comprehension.
>>> from itertools import izip_longest
>>> strings = ["and","the","woah"]
>>> [''.join(x) for x in izip_longest(*strings, fillvalue='')]
['atw', 'nho', 'dea', 'h']

Try
array = ["and","the","woah"]
array1 = []
longest_item = 0
for i in range(0,3): #length of array
if len(array[i]) > longest_item:
longest_item = len(array[i]) #find longest string
for i in range(0,longest_item):
str = ""
for i1 in range(0,3): #length of array
if len(array[i1]) < longest_item:
continue
str += array[i1][i:i+1]
array1.append(str)
I didn't actually try this code out, I just improvised it. Please leave a comment ASAP if you find a bug.

Related

Getting value from a list corresponding to another list

I have a list containing:
NewL = [(1.1,[01,02]),(1.2,[03,04]),(1.3,[05,06])]
and i used enumerate to obtain the list as above where the square brackets containing [01,02],[03,04] and [05,06] are generally obtained from another list. I'll show it just in case:
L = [[01,02],[03,04],[05,06]]
and initially the output list is just:
OutputList = [1.1,1.2,1.3]
i used enumerate on both of this list to get what i have as the first list i've written above.
The problem i'm facing now is, let's say i want to only output the value for [05,06] which is 1.3 from the NewL. How would i do that? I was thinking of something like:
for val in NewL:
if NewL[1] == [05,06]:
print NewL[0]
but it's totally wrong as cases might change where it's not necessary always be [05,06] as it can be obtaining value for [03,04] and [01,02] too. I'm pretty new using enumerate so I'll appreciate any help for this.

The for loop should like this:
for val in NewL:
if val[1] == [5,6]:
print val[0]
It will print 1.3

I'm not sure I understand the question, so I will extrapolate what you need:
Given your 2 intial lists:
L = [[01,02],[03,04],[05,06]]
OutputList = [1.1,1.2,1.3]
you can generate your transformed list using:
NewL = list(zip(OutputList, L))
then, given 1 item from L, if you want to retrieve the value from OutputList:
val = [x for x, y in NewL if y == [05,06]][0]
But it would be a lot easier to just do:
val = OutputList[L.index([05,06])]
Note that both those expressions will raise an IndexError if the searched item is not found

How do you concatenate the same string to every element in a list? [duplicate]

I have a list of strings in Python - elements. I would like to edit each element in elements. See the code below (it doesn't work, but you'll get the idea):
for element in elements:
element = "%" + element + "%"
Is there a way to do this?

elements = ['%{0}%'.format(element) for element in elements]

You can use list comprehension:
elements = ["%" + e + "%" for e in elements]

Python 3.6+ version (f-strings):
elements = [f'%{e}%' for e in elements]

You can use list comprehensions:
elements = ["%{}%".format(element) for element in elements]

There are basically two ways you can do what you want: either edit the list you have, or else create a new list that has the changes you want. All the answers currently up there show how to use a list comprehension (or a map()) to build the new list, and I agree that is probably the way to go.
The other possible way would be to iterate over the list and edit it in place. You might do this if the list were big and you only needed to change a few.
for i, e in enumerate(elements):
if want_to_change_this_element(e):
elements[i] = "%{}%".format(e)
But as I said, I recommend you use one of the list comprehension answers.

elements = map(lambda e : "%" + e + "%", elements)

Here some more examples
char = ['g:', 'l:', 'q:']
Using Replace
for i in range(len(char)):
char[i] = char[i].replace(':', '')
Using strip
for i in range(len(char)):
char[i] = char[i].strip(':')
Using a function
def list_cleaner(list_):
return [char_.strip(':') for char_ in list_]
new_char = list_cleaner(char)
print(new_char)
Using Generator function(adviced if you have a large piece of data)
def generator_cleaner(list_):
yield from (char_.strip(':') for char_ in list_)
# Prints all results in once
gen_char = list(generator_cleaner(char))
# Prints as much as you need, in this case only 8 chars
# I increase a bit the list so it makes more sense
char = ['g:', 'l:', 'q:', 'g:', 'l:', 'q:', 'g:', 'l:', 'q:']
# let's call our Generator Function
gen_char_ = generator_cleaner(char)
# We call only 8 chars
gen_char_ = [next(gen_char_) for _ in range(8)]
print(gen_char_)

Find + Find next in Python

Let L be a list of strings.
Here is the code I use for finding a string texttofind in the list L.
texttofind = 'Bonjour'
for s in L:
if texttofind in s:
print 'Found!'
print s
break
How would you do a Find next feature ? Do I need to store the index of the previously found string?

One approach for huge lists would be to use a generator. Suppose you do not know whether the user will need the next match.
def string_in_list(s, entities):
"""Return elements of entities that contain given string."""
for e in entities:
if s in e:
yield e
huge_list = ['you', 'say', 'hello', 'I', 'say', 'goodbye'] # ...
matches = string_in_list('y', huge_list) # look for strings with letter 'y'
next(matches) # first match
next(matches) # second match
The other answers suggesting list comprehensions are great for short lists when you want all results immediately. The nice thing about this approach is that if you never need the third result no time is wasted finding it. Again, it would really only matter for big lists.
Update: If you want the cycle to restart at the first match, you could do something like this...
def string_in_list(s, entities):
idx = 0
while idx < len(entities):
if s in entities[idx]:
yield entities[idx]
idx += 1
if idx >= len(entities):
# restart from the beginning
idx = 0
huge_list = ['you', 'say', 'hello']
m = string_in_list('y', huge_list)
next(m) # you
next(m) # say
next(m) # you, again
See How to make a repeating generator for other ideas.
Another Update
It's been years since I first wrote this. Here's a better approach using itertools.cycle:
from itertools import cycle # will repeat after end
# look for s in items of huge_list
matches = cycle(i for i in huge_list if s in i)
next(matches)

Finding all strings in L which have as substring s.
[f for f in L if s in f]

If you want to find all indexes of strings in L which have s as a substring,
[i for i in range(0, len(L)) if L[i].find(s) >= 0]

This will find next if it exists. You can wrap it in function and return None/Empty string if it doesn't.
L = ['Hello', 'Hola', 'Bonjour', 'Salam']
for l in L:
if l == texttofind:
print l
if L.index(l) >= 0 and L.index(l) < len(L):
print L[L.index(l)+1]

ValueError while Merging Lists in Python

I'm trying to get one array out of several arrays in python 2.7
I found on the internet that this is done simply by adding both lists:
lista = [1,2,3]
listb = [3,4,5]
listc = lista + listb
In my case my first list i empty and the next list has 99 elements.
My code looks like this
data_complete = []
for i in range(1, numberOfFiles+1):
data = getDataFromFile(i)
data_complete = data_complete + data
The last line of code does not work, it returns the error:
data_complete = data_complete + data
ValueError: operands could not be broadcast together with shapes (0) (99)
I would be glad if someone can solve this.
Kind Regards

You can use append method if its a single item
data_complete.append (data)
You can use extend method if data itself is a list
data_complete.extend (data)

It looks like getDataFromFile is returning a numpy array, rather than a list. In this case, + will use the array's concatenation routine, which has some extra requirements compared to lists (and returns another array). You can use the list extend method instead to get around this:
data_complete = []
for i in range(1, numberOfFiles+1):
data = getDataFromFile(i)
data_complete.extend(data)

just append the data to your list
for example:
evens = []
for i in xrange(10):
if i%2 == 0:
evens.append(i)
at the end of this program evens will equal [2,4,6,8]

Removing duplicates from the list of unicode strings

I am trying to remove duplicates from the list of unicode string without changing the order(So, I don't want to use set) of elements appeared in it.
Program:
result = [u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
result.reverse()
for e in result:
count_e = result.count(e)
if count_e > 1:
for i in range(0, count_e - 1):
result.remove(e)
result.reverse()
print result
Output:
[u'http://google.com', u'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']
Expected Output:
[u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://amazon.com', u'http://yahoo.com']
So, Is there any way of doing it simple as possible.

You actually don't have duplicates in your list. One time you have http://catb.org while another time you have http://www.catb.org.
You'll have to figure a way to determine whether the URL has www. in front or not.

You can create a new list and add items to it if they're not already in it.
result = [ /some list items/]
uniq = []
for item in result:
if item not in uniq:
uniq.append(item)

You could use a set and then sort it by the original index:
sorted(set(result), key=result.index)
This works because index returns the first occurrence (so it keeps them in order according to first appearance in the original list)
I also notice that one of the strings in your original isn't a unicode string. So you might want to do something like:
u = [unicode(s) for s in result]
return sorted(set(u), key=u.index)
EDIT: 'http://google.com' and 'http://www.google.com' are not string duplicates. If you want to treat them as such, you could do something like:
def remove_www(s):
s = unicode(s)
prefix = u'http://'
suffix = s[11:] if s.startswith(u'http://www') else s[7:]
return prefix+suffix
And then replace the earlier code with
u = [remove_www(s) for s in result]
return sorted(set(u), key=u.index)

Here is a method that modifies result in place:
result = [u'http://google.com', u'http://catb.org/~esr/faqs/hacker-howto.html', u'http://www.catb.org/~esr/faqs/hacker-howto.html',u'http://amazon.com', 'http://www.catb.org/esr/faqs/hacker-howto.html', u'http://yahoo.com']
seen = set()
i = 0
while i < len(result):
if result[i] not in seen:
seen.add(result[i])
i += 1
else:
del result[i]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python string extraction from array of strings - python

My idea would be to use itertools.izip_longest and a list comprehension. >>> from itertools import izip_longest >>> strings = ["and","the","woah"] >>> [''.join(x) for x in izip_longest(*strings, fillvalue='')] ['atw', 'nho', 'dea', 'h']

Related

Getting value from a list corresponding to another list

How do you concatenate the same string to every element in a list? [duplicate]

Find + Find next in Python

ValueError while Merging Lists in Python

Removing duplicates from the list of unicode strings

Categories

Resources