I'm not sure how to go about this in Python. In searching for this, I have come across itertools but I'm not sure how I might apply it in this case.
What I am trying to do is create a script that can take a string input containing query marks (like AB?D?) and a set of options (ABC, DEF) to output all of the possible combinations, like below.
ABADD, ABADE, ABADF
ABBDD, ABBDE, ABBDF
ABCDD, ABCDE, ABCDF
In searching, I also found this but I'm not entirely sure how I might be able to implement this around my input.
Would it be most efficient to break down the input string into multiple substrings around the question marks (so the above example becomes AB + ? + D + ?). Would something like list (s) be suitable for this?
Thanks in advance for any help offered.
You can use itertools.product to get the combinations and string.format to merge those into the template string. (First, replace the ? with {} to get format string syntax.)
def combine(template, options):
template = template.replace('?', '{}')
for opts in itertools.product(*options):
yield template.format(*opts)
Example:
>>> list(combine('AB?D?', ['ABC', 'DEF']))
['ABADD', 'ABADE', 'ABADF', 'ABBDD', 'ABBDE', 'ABBDF', 'ABCDD', 'ABCDE', 'ABCDF']
Related
I have recently stumbled upon a task utilizing some CSV files that are, to say the least, very poorly organized, with one cell containing what should be multiple separate columns. I would like to use this data in a Python script but want to know if it is possible to delete a portion of the row (all of it after a certain point) then write that to a dictionary.
Although I can't show the exact contents of the CSV, it looks like this:
useful. useless useless useless useless
I understand that this will most likely require either a regular expression or an endswith statement, but doing all of that to a CSV file is beyond me. Also, the period written after useful on the CSV should be removed as well, and is not a typo.
If you know the character you want to split on you can use this simple method:
good_data = bad_data.split(".")[0]
good_data = good_data.strip() # remove excess whitespace at start and end
This method will always work. split will return a tuple which will always have at least 1 entry (the full string). Using index may throw an exception.
You can also limit the # of splits that will happen if necessary using split(".", N).
https://docs.python.org/2/library/stdtypes.html#str.split
>>> "good.bad.ugly".split(".", 1)
['good', 'bad.ugly']
>>> "nothing bad".split(".")
['nothing bad']
>>> stuff = "useful useless"
>>> stuff = stuff[:stuff.index(".")]
ValueError: substring not found
Actual Answer
Ok then notice that you can use indexing for strings just like you do for lists. I.e. "this is a very long string but we only want the first 4 letters"[:4] gives "this". If we now new the index of the dot we could just get what you want like that. For exactly that strings have the index method. So in total you do:
stuff = "useful. useless useless useless useless"
stuff = stuff[:stuff.index(".")]
Now stuff is very useful :).
In case we are talking about a file containing multiple lines like that you could do it for each line. Split that line at , and put all in a dictionary.
data = {}
with open("./test.txt") as f:
for i, line in enumerate(f.read().split("\n")):
csv_line = line[:line.index(".")]
for j,col in enumerate(csv_line.split(",")):
data[(i,j)] = col
How one would do this
Notice that most people would not want to do it by hand. It is a common task to work on tabled data and there is a library called pandas for that. Maybe it would be a good idea to familiarise yourself a bit more with python before you dive into pandas though. I think a good point to start is this. Using pandas your task would look like this
import pandas as pd
pd.read_csv("./test.txt", comment=".")
giving you what is called a dataframe.
I have a CSV file with 100,000 rows.
Each row in column A is a sentence comprised of both chars and integers.
I want column B to contain only integers.
I want the new columns to be in the same CSV file.
How can I accomplish this?
If I'm understanding your question correctly, I would use .isdigit() to parse the data in column A. I'm frankly not sure what the format of column A is, so I don't know exactly what you would do with this (if you gave more information I could give a more specific answer). Your solution will likely come in a similar form to this:
def find(lines):
B = []
for line in lines:
numbers = [c for c in line if c.isdigit()]
current = int(''.join(numbers))
# current is the concatenation of all
# integers found in column A from left to right
B.append(current)
return B
Let me know if this makes sense or is even in the right track for your solution. Once again, without knowing what you're trying to do, and what A looks like, I'm not sure what your actual goals are.
EDIT
I'm not going to explain the csv stuff for you, mainly because there is a fantastic resource and library for it included in python here. If you have specific questions related to writing csv, definitely post them.
It sounds like you essentially want to pull int values out of column A then add them to a new column B. There are definitely many ways to solve this, but the general form of the problem is for each row you'll filter out the int, then you'll add the filtered int into the new column. I'll list a couple:
Regex: You could use a pattern such as [0-9]+ to pull the string out of A, then use int(whatever that output is) to cast to int, then store those values in B. I'm a sucker for a good regular expression and this one is fairly straight forward. Regexr is a great resource to learn about this and test your pattern.
Use an algorithm similar to above: The above algorithm worked before, but I've updated it slightly. Now that it's been updated it'll return an array of numbers correspondent to numbers in A from left to right. This is relatively sound, but it doesn't necessarily guarantee you have the right integer, given that if the title has an int in it, it'll mess some things up. It is likely one of the more clear ways of doing this, though.
Is there a way to write the following expression in way
that I will not repeat the variables twice? it's very annoying
when you have to write it multiple times.
['hello {} {}'.format(A,B) for A,B in product(As,Bs)]
That's exactly what you would do for a for loop anyway isn't it? Anyway, you can use map:
map(lambda x:'hello {} {}'.format(*x), product(As,Bs)]
I realize now the parameter expansion *x syntax might shorten what you wrote above as well (one variable only). If you intend to apply someFormatString.format(*x) always you can shorten this using a function to generate your formatter:
def mkStr(formatStr):
return lambda x,f=formatStr: f.format(*x)
Then you can map with
map(mkStr('hello {} {}'), product(As,Bs)]
and switch it up with different format strings.
with only a minor change you could get there:
['hello {} {}'.format(*ab) for ab in product(As, Bs)]
using *args-magic.
I have the following parameters in a Python file that is used to send commands pertaining to boundary conditions to Abaqus:
u1=0.0,
u2=0.0,
u3=0.0,
ur1=UNSET,
ur2=0.0,
ur3=UNSET
I would like to place these values inside a list and print that list to a .txt file. I figured I should convert all contents to strings:
List = [str(u1), str(u2), str(u3), str(ur1), str(ur2), str(ur3)]
This works only as long as the list does not contain "UNSET", which is a command used by Abaqus and is neither an int or str. Any ideas how to deal with that? Many thanks!
UNSET is an Abaqus/cae defined symbolic constant. It has a member name that returns the string representation, so you might do something like this:
def tostring(v):
try:
return(v.name)
except:
return(str(v))
then do for example
bc= [0.,1,UNSET]
print "u1=%s u2=%s u3=%s\n"%tuple([tostring(b) for b in bc])
u1=0. u2=1 u3=UNSET
EDIT simpler than that. After doing things the hard way I realize the symbolic constant is handled properly by the string conversion so you can just do this:
print "u1=%s u2=%s u3=%s\n"%tuple(['%s'%b for b in bc])
Given a regexp, I would like to generate random data x number of time to test something.
e.g.
>>> print generate_date('\d{2,3}')
13
>>> print generate_date('\d{2,3}')
422
Of course the objective is to do something a bit more complicated than that such as phone numbers and email addresses.
Does something like this exists? If it does, does it exists for Python? If not, any clue/theory I could use to do that?
Pyparsing includes this regex inverter, which returns a generator of all permutations for simple regexes. Here are some of the test cases from that module:
[A-C]{2}\d{2}
#|TH[12]
#(#|TH[12])?
#(#|TH[12]|AL[12]|SP[123]|TB(1[0-9]?|20?|[3-9]))?
#(#|TH[12]|AL[12]|SP[123]|TB(1[0-9]?|20?|[3-9])|OH(1[0-9]?|2[0-9]?|30?|[4-9]))?
(([ECMP]|HA|AK)[SD]|HS)T
[A-CV]{2}
A[cglmrstu]|B[aehikr]?|C[adeflmorsu]?|D[bsy]|E[rsu]|F[emr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airu]|M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|S[bcegimnr]?|T[abcehilm]|Uu[bhopqst]|U|V|W|Xe|Yb?|Z[nr]
(a|b)|(x|y)
Edit:
To do your random selection, create a list (once!) of your permutations, and then call random.choice on the list each time you want a random string that matches the regex, something like this (untested):
class RandomString(object):
def __init__(self, regex):
self.possible_strings = list(invRegex.invert(regex))
def random_string(self):
return random.choice(self.possible_strings)
There is a post on the Python mailing list about a module that generates all permutations of a regex. I'm not so sure how you might go about randomising it though. I'll keep checking.
I will probably be flogged for suggesting this, but perl has a module that does exactly this. You might want to take a look at the code how to implement it in python:
http://p3rl.org/String::Random