Parsing string with Python correct way

Parsing string with Python correct way - python

I have some problems with parsing the correct way. I want to split the complete string in two seperate strings. And then remove the "="-signs frome the first string and the ","-sign from the 2nd string. From my output I can conclude that I did something wrong, but I do not seem to get where the problem lies. I want the first part to convert to integers, and I've already tried it with map(int, split()).
If anyone has a tip, I would appreciate that.
This is my output:
('5=20=22=10=2=0=0=1=0=1', 'Vincent Appel,Johannes Mondriaan')
This is my program:
mystring = "5=20=22=10=2=0=0=1=0=1;Vincent Appel,Johannes Mondriaan"
def split_string(mystring):
strings = mystring.split(";")
x = strings[0]
y = strings[-1]
print(x,y)
def split_scores(x):
scores = x.split("=")
score = scores[0]
names = scores[-1]
stnames(names)
print score
def stnames(y):
studentname = y.split(",")
name = studentname[1]
print name
split_string(mystring)

split_string(mystring) runs the 1st function, producing the tuple with 2 strings. But nothing runs the other functions which are intended to perform further splitting.
try:
x, y = split_string(mystring)
x1 = split_scores(x)
y1 = stnames(y)
(x1, y1)
oops, your functions print the results, don't return them. So you also need:
def split_string(mystring):
# split mystring on ";"
strings = mystring.split(";")
return strings[0],strings[1]
def split_string(mystring):
# this version raises an error if mystring does not have 2 parts
x, y = mystring.split(";")
return x,y
def split_scores(x):
# return a list with all the scores
return x.split("=")
def stnames(y):
# return a list with all names
return y.split(",")
def lastname(y):
# return the last name (, delimited string)
return y.split(",")[-1]
If you are going to split the task among functions, it is better to have them return the results rather than print them. That way they can be used in various combinations. print within a function only for debugging purposes.
Or a compact, script version:
x, y = mystring.split(';')
x = x.split('=')
y = y.split(',')[-1]
print y, x
If you want the scores as numbers, add:
x = [int(x) for x in x]
to the processing.

Try this:
def split_string(mystring):
strings = mystring.split(";")
x = int(strings[0].replace("=",""))
y = strings[-1].replace(","," ")
print x,y

My two cents.
If I understood what you want to achieve, this code could help:
mystring = "5=20=22=10=2=0=0=1=0=1;Vincent Appel,Johannes Mondriaan"
def assignGradesToStudents(grades_and_indexes, students):
list_length = len(grades_and_indexes)
if list_length%2 == 0:
grades = grades_and_indexes[:list_length/2]
indexes = grades_and_indexes[list_length/2:]
return zip([students[int(x)] for x in indexes], grades)
grades_and_indexes, students = mystring.split(';')
students = students.split(',')
grades_and_indexes = grades_and_indexes.split('=')
results = assignGradesToStudents(grades_and_indexes, students)
for result in results:
print "[*] {} got a {}".format(result[0], result[1])
Output:
[*] Vincent Appel got a 5
[*] Vincent Appel got a 20
[*] Johannes Mondriaan got a 22
[*] Vincent Appel got a 10
[*] Johannes Mondriaan got a 2

Related

How can I split a string on the Nth character before a given parameter (Python)

Usually I use the .split() method to split strings on a dataframe, but this time I have a string that looks like this:
"3017381ª Série - EM"
or
"3017381º Ano - Iniciais"
And I'd like to find "º" or "ª" and then split it 2 characters before. The string should look like this after the split:
["301728","1ª Série - EM"]

You could use the str.find function, and alter the location in the string you received.
in a function it would look like this (do consider it will split the string on the first entry of the options defined in charList, but you could extend the methodology if required for multiple splits):
def splitString(s,offset=0,charList=[]):
for opt in charList:
x = s.find(opt)
if x != -1: # default when not found
return [s[:x+offset],s[x+offset:]]
return [s] # input char not found
You can then call the function:
splitString("3017381ª Série - EM",offset=-1,charList=[ "º","ª"])

Maybe it's a little bit confusing but the result it's the one you want.
a = "3017381ª Série - EM"
print(a[7])
print(a[5])
b = a[7]
c = a[5]
print(b)
print(c)
x = a.split(b)
y = a.split(c)
newY = y[1]
newX = x[0]
print(newY)
print(newX)
minusX = newX[:-1]
print(minusX)
z =[]
z.append(minusX)
z.append(y[1])
print("update:", z)

Simplifying Vigenere cipher program in Python

I have the program below, which is passed on to another function which simply prints out the original and encrypted messages. I want to know how I can simplify this program, specifically the "match = zip" and "change = (reduce(lambda" lines. If possible to do this without using lambda, how can I?
from itertools import cycle
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
def vigenereencrypt(message,keyword):
output = ""
match = zip(message.lower(),cycle(keyword.lower()))
for i in match:
change = (reduce(lambda x, y: alphabet.index(x) + alphabet.index(y), i)) % 26
output = output + alphabet[change]
return output.lower()

Two things:
You dont need to have a local variable match, just loop zip
Your can split up your two indices x and y in your for loop definition rather than using reduce; reduce is normally used for larger iterables and since you only have 2 items in i, it's adding unnecessary complexity.
ie, you can change your for loop definition to:
for x, y in zip(...):
and your definition of change to:
change = (alphabet.index(x) + alphabet.index(y)) % 26

Starting with what R Nar said:
def vigenereencrypt(message,keyword):
output = ""
for x, y in zip(message.lower(), cycle(keyword.lower())):
change = (alphabet.index(x) + alphabet.index(y)) % 26
output = output + alphabet[change]
return output.lower()
We can be more efficient by using a list and then joining it, instead of adding to a string, and also by noticing that the output is already lowercase:
def vigenereencrypt(message,keyword):
output = []
for x, y in zip(message.lower(), cycle(keyword.lower())):
change = (alphabet.index(x) + alphabet.index(y)) % 26
output.append(alphabet[change])
return "".join(output)
Then we can reduce the body of the loop to one line..
def vigenereencrypt(message,keyword):
output = []
for x, y in zip(message.lower(), cycle(keyword.lower())):
output.append(alphabet[(alphabet.index(x) + alphabet.index(y)) % 26])
return "".join(output)
... so we can turn it into a list comprehension:
def vigenereencrypt(message,keyword):
output = (
alphabet[(alphabet.index(x) + alphabet.index(y)) % 26]
for x, y in zip(message.lower(), cycle(keyword.lower()))
)
return "".join(output)
I feel like there's something we could do with map(alphabet.index, ...) but I can't think of a way that's any better than the list comprehension.

you could do it with a bunch of indexing instead of zip...
alphabet = ["a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z"]
alphaSort = {k:n for n,k in enumerate(alphabet)}
alphaDex = {n:k for n,k in enumerate(alphabet)}
def vigenereencrypt(message,keyword):
output = ""
#match = zip(message.lower(),cycle(keyword.lower())) # zip(a,cycle(b)) Creates [(a[n],b[n%len(b)]) for k in range(len(a)) ]
op = "" # So lets start with for k in range(len(a))
for k in range(len(message)):
op += alphaDex[(alphaSort[message.lower()[k]]+alphaSort[keyword.lower()[k%len(keyword)]])%len(alphabet)]
return(op)

Split list based on first character - Python

I am new to Python and can't quite figure out a solution to my Problem. I would like to split a list into two lists, based on what the list item starts with. My list looks like this, each line represents an item (yes this is not the correct list notation, but for a better overview i'll leave it like this) :
***
**
.param
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
.model
+spam = eggs
+spamspam = eggseggs
+spamspamspam = eggseggseggs
So I want a list that contains all lines starting with a '+' between .param and .model and another list that contains all lines starting with a '+' after model until the end.
I have looked at enumerate() and split(), but since I have a list and not a string and am not trying to match whole items in the list, I'm not sure how to implement them.
What I have is this:
paramList = []
for line in newContent:
while line.startswith('+'):
paramList.append(line)
if line.startswith('.'):
break
This is just my try to create the first list. The Problem is, the code reads the second block of '+'s as well because break just Exits the while Loop, not the for Loop.
I hope you can understand my question and thanks in advance for any pointers!

What you want is really a simple task that can be accomplish using list slices and list comprehension:
data = ['**','***','.param','+foo = bar','+foofoo = barbar','+foofoofoo = barbarbar',
'.model','+spam = eggs','+spamspam = eggseggs','+spamspamspam = eggseggseggs']
# First get the interesting positions.
param_tag_pos = data.index('.param')
model_tag_pos = data.index('.model')
# Get all elements between tags.
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
models = [model for model in data[model_tag_pos + 1: -1] if model.startswith('+')]
print(params)
print(models)
Output
>>> ['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
>>> ['+spam = eggs', '+spamspam = eggseggs']
Answer to comment:
Suppose you have a list containing numbers from 0 up to 5.
l = [0, 1, 2, 3, 4, 5]
Then using list slices you can select a subset of l:
another = l[2:5] # another is [2, 3, 4]
That what we are doing here:
data[param_tag_pos + 1: model_tag_pos]
And for your last question: ...how does python know param are the lines in data it should iterate over and what exactly does the first paramin param for paramdo?
Python doesn't know, You have to tell him.
First param is a variable name I'm using here, it cuold be x, list_items, whatever you want.
and I will translate the line of code to plain english for you:
# Pythonian
params = [param for param in data[param_tag_pos + 1: model_tag_pos] if param.startswith('+')]
# English
params is a list of "things", for each "thing" we can see in the list `data`
from position `param_tag_pos + 1` to position `model_tag_pos`, just if that "thing" starts with the character '+'.

data = {}
for line in newContent:
if line.startswith('.'):
cur_dict = {}
data[line[1:]] = cur_dict
elif line.startswith('+'):
key, value = line[1:].split(' = ', 1)
cur_dict[key] = value
This creates a dict of dicts:
{'model': {'spam': 'eggs',
'spamspam': 'eggseggs',
'spamspamspam': 'eggseggseggs'},
'param': {'foo': 'bar',
'foofoo': 'barbar',
'foofoofoo': 'barbarbar'}}

I am new to Python
Whoops. Don't bother with my answer then.
I want a list that contains all lines starting with a '+' between
.param and .model and another list that contains all lines starting
with a '+' after model until the end.
import itertools as it
import pprint
data = [
'***',
'**',
'.param',
'+foo = bar',
'+foofoo = barbar',
'+foofoofoo = barbarbar',
'.model',
'+spam = eggs',
'+spamspam = eggseggs',
'+spamspamspam = eggseggseggs',
]
results = [
list(group) for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
pprint.pprint(results)
print '-' * 20
print results[0]
print '-' * 20
pprint.pprint(results[1])
--output:--
[['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar'],
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']]
--------------------
['+foo = bar', '+foofoo = barbar', '+foofoofoo = barbarbar']
--------------------
['+spam = eggs', '+spamspam = eggseggs', '+spamspamspam = eggseggseggs']
This thing here:
it.groupby(data, lambda x: x.startswith('+')
...tells python to create groups from the strings according to their first character. If the first character is a '+', then the string gets put into a True group. If the first character is not a '+', then the string gets put into a False group. However, there are more than two groups because consecutive False strings will form a group, and consecutive True strings will form a group.
Based on your data, the first three strings:
***
**
.param
will create one False group. Then, the next strings:
+foo = bar
+foofoo = barbar
+foofoofoo = barbarbar
will create one True group. Then the next string:
'.model'
will create another False group. Then the next strings:
'+spam = eggs'
'+spamspam = eggseggs'
'+spamspamspam = eggseggseggs'
will create another True group. The result will be something like:
{
False: [strs here],
True: [strs here],
False: [strs here],
True: [strs here]
}
Then it's just a matter of picking out each True group: if key, and then converting the corresponding group to a list: list(group).
Response to comment:
where exactly does python go through data, like how does it know s is
the data it's iterating over?
groupby() works like do_stuff() below:
def do_stuff(items, func):
for item in items:
print func(item)
#Create the arguments for do_stuff():
data = [1, 2, 3]
def my_func(x):
return x + 100
#Call do_stuff() with the proper argument types:
do_stuff(data, my_func) #Just like when calling groupby(), you provide some data
#and a function that you want applied to each item in data
--output:--
101
102
103
Which can also be written like this:
do_stuff(data, lambda x: x + 100)
lambda creates an anonymous function, which is convenient for simple functions which you don't need to refer to by name.
This list comprehension:
[
list(group)
for key, group in it.groupby(data, lambda s: s.startswith('+'))
if key
]
is equivalent to this:
results = []
for key, group in it.groupby(data, lambda s: s.startswith('+') ):
if key:
results.append(list(group))
It's clearer to explicitly write a for loop, however list comprehensions execute much faster. Here is some detail:
[
list(group) #The item you want to be in the results list for the current iteration of the loop here:
for key, group in it.groupby(data, lambda s: s.startswith('+')) #A for loop
if key #Only include the item for the current loop iteration in the results list if key is True
]

I would suggest doing things step by step.
1) Grab every word from the array separately.
2) Grab the first letter of the word.
3) Look if that is a '+' or '.'
Example code:
import re
class Dark():
def __init__(self):
# Array
x = ['+Hello', '.World', '+Hobbits', '+Dwarves', '.Orcs']
xPlus = []
xDot = []
# Values
i = 0
# Look through every word in the array one by one.
while (i != len(x)):
# Grab every word (s), and convert to string (y).
s = x[i:i+1]
y = '\n'.join(s)
# Print word
print(y)
# Grab the first letter.
letter = y[:1]
if (letter == '+'):
xPlus.append(y)
elif (letter == '.'):
xDot.append(y)
else:
pass
# Add +1
i = i + 1
# Print lists
print(xPlus)
print(xDot)
#Run class
Dark()

Python 'for' loop issue, wht are these two variables not adding together properly in my 'for' loop?

I am writing a code snippet for a random algebraic equation generator for a larger project. Up to this point, everything has worked well. The main issue is simple. I combined the contents of a dictionary in sequential order. So for sake of argument, say the dictionary is: exdict = {a:1 , b:2 , c:3 , d:4}, I append those to a list as such: exlist = [a, b, c, d, 1, 2, 3, 4]. The length of my list is 8, which half of that is obviously 4. The algorithm is quite simple, whatever random number is generated between 1-4(or as python knows as 0-3 index), if you add half of the length of the list to that index value, you will have the correct value.
I have done research online and on stackoverflow but cannot find any answer that I can apply to my situation...
Below is the bug check version of my code. It prints out each variable as it happens. The issue I am having is towards the bottom, under the ### ITERATIONS & SETUP comment. The rest of the code is there so it can be ran properly. The primary issue is that a + x should be m, but a + x never equals m, m is always tragically lower.
Bug check code:
from random import randint as ri
from random import shuffle as sh
#def randomassortment():
letterss = ['a','b','x','d','x','f','u','h','i','x','k','l','m','z','y','x']
rndmletters = letterss[ri(1,15)]
global newdict
newdict = {}
numberss = []
for x in range(1,20):
#range defines max number in equation
numberss.append(ri(1,20))
for x in range(1,20):
rndmnumber = numberss[ri(1,18)]
rndmletters = letterss[ri(1,15)]
newdict[rndmletters] = rndmnumber
#x = randomassortment()
#print x[]
z = []
# set variable letter : values in list
for a in newdict.keys():
z.append(a)
for b in newdict.values():
z.append(b)
x = len(z)/2
test = len(z)
print 'x is value %d' % (x)
### ITERATIONS & SETUP
iteration = ri(2,6)
for x in range(1,iteration):
a = ri(1,x)
m = a + x
print 'a is value: %d' % (a)
print 'm is value %d' %(m)
print
variableletter = z[a]
variablevalue = z[m]
# variableletter , variablevalue
edit - My questions is ultimately, why is a + x returning a value that isn't a + x. If you run this code, it will print x , a , and m. m is supposed to be the value of a + x, but for some reason, it isnt?

The reason this isn't working as you expect is that your variable x originally means the length of the list, but it's replaced in your for x in range loop- and then you expect it to be equal to the length of the list. You could just change the line to
for i in range(iteration)
instead.
Also note that you could replace all the code in the for loop with
variableletter, variablevalue = random.choice(newdict.items())

Your problem is scope
which x are you looking for here
x = len(z)/2 # This is the first x
print 'x is value %d' % (x)
### ITERATIONS & SETUP
iteration = ri(2,6)
# x in the for loop is referencing the x in range...
for x in range(1,iteration):
a = ri(1,x)
m = a + x

Better looping, for string manipulation (python)

If i have this code
s = 'abcdefghi'
for grp in (s[:3],s[3:6],s[6:]):
print "'%s'"%(grp)
total = calc_total(grp)
if (grp==s[:3]):
# more code than this
p = total + random_value
x1 = my_function(p)
if (grp==s[3:6]):
# more code than this
p = total + x1
x2 = my_function(p)
if (grp==s[6:]):
# more code than this
p = total + x2
x3 = my_function(p)
If the group is the first group, perform code for this group, if the group is the second group, perform code using the a value generated from code performed for the first group, the same applies for the third group, using a generated value from code for the second group:
How can i tidy this up to use better looping?
Thanks

I may have misunderstood what you're doing, but it appears that you want to do something to s[:3] on the first iteration, something different to s[3:6] on the second, and something else again to s[6:] on the third. In other words, that isn't a loop at all! Just write those three blocks of code out one after another, with s[:3] and so on in place of grp.

I must say I agree with Peter in that the loop is redundant. If you are afraid of duplicating code, then just move the repeating code into a function and call it multiple times:
s = 'abcdefghi'
def foo(grp):
# Anything more you would like to happen over and over again
print "'%s'"%(grp)
return calc_total(grp)
def bar(grp, value):
total = foo(grp)
# more code than this
return my_function(total + value)
x1 = bar(s[:3], random_value)
x2 = bar(s[3:6], x1)
x3 = bar(s[6:], x2)
If
# more code than this
contains non-duplicate code, then you must of course move that out of "bar" (which together with "foo" should be given a more descriptive name).

I'd code something like this as follows:
for i, grp in enumerate((s[:3],s[3:6],s[6:])):
print "'%s'"%(grp)
total = calc_total(grp)
# more code that needs to happen every time
if i == 0:
# code that needs to happen only the first time
elif i == 1:
# code that needs to happen only the second time
etc. The == checks can be misleading if one of the groups "just happens" to be the same as another one, while the enumerate approach runs no such risk.

x = reduce(lambda x, grp: my_function(calc_total(list(grp)) + x),
map(None, *[iter(s)] * 3), random_value)
At the end, you'll have the last x.
Or, if you want to keep the intermediary results around,
x = []
for grp in map(None, *[iter(s)] * 3):
x.append(my_function(calc_total(list(grp)) + (x or [random_value])[-1]))
Then you have x[0], x[1], x[2].

Get your data into the list you want, then try the following:
output = 0
seed = get_random_number()
for group in input_list:
total = get_total(group)
p = total + seed
seed = my_function(p)
input_list will need to look like ['abc', 'def', 'ghi']. But if you want to extend it to ['abc','def','ghi','jkl','mno','pqr'], this should still work.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Parsing string with Python correct way - python

Try this: def split_string(mystring): strings = mystring.split(";") x = int(strings[0].replace("=","")) y = strings[-1].replace(","," ") print x,y

Related

How can I split a string on the Nth character before a given parameter (Python)

Simplifying Vigenere cipher program in Python

Split list based on first character - Python

Python 'for' loop issue, wht are these two variables not adding together properly in my 'for' loop?

Better looping, for string manipulation (python)

Categories

Resources