I want to manipulate my text in python. I will use this text to embed as JavaScript data. I need the text in my text file to display exactly as follows. It should have the format I mention below, not only when it prints.
I have text:
""text""
and I want:
\"text\"
with open('phase2.2.1.csv', 'w', newline='') as csvFile:
writer = csv.writer(csvFile)
for b in batches:
writer.writerow([b.replace('\n', '').replace('""', '\\"')])
Unfortunately, the above yields
\""text\""
Any help will be much appreciated.
I would suggest:
.replace('""', '\\"')
And it really works, see:
In [8]: x = '""text""'
In [9]: print(x.replace('""', '\\"'))
\"text\"
If what you're trying to generate is JSON-encoded strings, the right way to do that is to use the json module:
text = json.dumps(text)
If you're trying to generate actual JavaScript source code, that's still almost the right answer. JSON is very close to being a subset of JavaScript—a lot closer than a quick&dirty fix for one error you happen to have noticed so far is going to be.
If you actually want to generate correct JS code for any possible string, you have to deal with the corner cases where JSON is not quite a subset of JS. But nobody ever does (it took years before anyone even noticed the difference in the specs).
Related
I'm about halfway through Automate the Boring Stuff with Python textbook and video tutorials, however I have a big project at work where I need to autopopulate 60 Chemical Purchase Review documents that we can't seem to find. Rather than fill them out individually, I'd like to use what I've learned so far. I've had to jump ahead in chapters, but I can't seem to figure out how to get past the last line of code.
Basically, I have an excel spreadsheet with four columns of information I need to be input into certain areas on the word document form template.
I have "AAAA, BBBB..." in the word doc as a something to be found and replaced.
import openpyxl,os,docx,re
os.chdir(r'C:\Users\MYUSERNAME\OneDrive\Documents\Programming\ChemInv')
wb = openpyxl.load_workbook('cheminv.xlsx')
sheet = wb.get_sheet_by_name('Sheet1')
doc = docx.Document('ChemPurchaseForm_.docx')
fillObj = ('AAAA','BBBB','CCCC','DDDD')
for a in range(1,61):
for b in range(1,5):
fill = sheet.cell(row=a,column=b).value
for x in range(len(fillObj)):
inputRegex = re.compile(fillObj[x])
inputRegex.sub(fill,doc)
doc.save('ChemPurcaseForm_' + fill + '.docx')
I'm getting this error:
Traceback (most recent call last):
File "C:/Users/MYUSERNAME/OneDrive/Documents/Programming/ChemInv/autofill.py", line
15, in <module>
inputRegex.sub(fill,doc)
TypeError: expected string or bytes-like object
I'm assuming that either the "fill" variable or "doc" variable are not binary or string values?
Thank you in advance for help!
To debug this, you'll need to figure out which of the values are not binary or string values. A convenient way is to begin adding print statements for each value. For instance, you might try
print(fill)
print(doc)
print(type(fill))
print(type(doc))
I don't know exactly how the docx module works, but two hypotheses occur to me:
doc is not the appropriate type for the sub function; you'll have to cast the object to something different, or access it a different way if that's the case.
fill is None. That's easier to fix, it means you're not reading the Excel document properly.
Reading the docx documentation, I lean towards 1, since it doesn't look like it's a byte or string object, or a byte or string-compatible object, and so the sub method won't be able to properly operate on it; if that's correct, read the python-docx docs for more details that might help you figure out what you need to do. I'd explore what properties exist on your document, it seems there are some for directly accessing the text.
Good luck!
I have a text file that I need to read, identify some parts to change, and write to a new file. Here's a snippet of what the text file (which is about 600 lines long) would look similar to:
<REAPER_PROJECT 0.1 "4.731/x64" 1431724762
RIPPLE 0
RECORD_PATH "Audio" ""
<RECORD_CFG
ZXZhdxgA
>
<APPLYFX_CFG
>
LOCK 1
<METRONOME 6 2
VOL 0.25 0.125
FREQ 800 1600 1
BEATLEN 4
SAMPLES "" ""
>
>
So, for example, I'd need to change "LOCK 1" to "LOCK 0". Right now I'm reading the file line by line, looking for when I hit the "LOCK" keyword and then instead of writing "LOCK 1", I write "LOCK 0" (all other lines are written as is). Pretty straightforward.
Part of this seems kinda messy to me, though, as sometimes when I have to use nested for loops to parse a sub-section of the text file I run into weirdness dealing with the file pointer off-by-one errors - not a biggie and manageable, but I was kinda looking for some opinions on this. Instead, I was wondering if it would make more sense to read the entire file into a list, parse through the list, looking for keywords to change, updating those specific lines in the list, and then writing the whole list to the new file. It seems like I would have a bit more control over things as I wouldn't have to process the file in a linear fashion which I'm kinda forced to do now.
So, I guess the last sentence kinda justified why it could be advantageous to pull it all into a list, process the list, and then write it out. I'm kinda curious how others with more programming experience (as mine is somewhat limited) would tackle this kind of issue. Any other ways that would prove even more efficient?
Btw, I didn't generate this file - other software did, and I don't have any communication with the developer so I have no way of knowing what they're using to read/write the file. I'd absolutely love it if I had a neat reader that could read the file and populate it into variables and then rewrite it out, but for me to code something that would do that would be overkill for what I'm trying to accomplish.
I'm kinda tempted to rewrite my script to read it into a list as it seems like it would be a better way to go, but I thought I'd ask people what they thought before I did. My version works, but I don't mind going through the motions, either, as it's a good lesson regardless. I figured this could also be a case where there are always different ways to tackle a problem, but I'd like to try and be as efficient as possible.
UPDATE
So, I probably should have mentioned this, but I was still trying to figure out what to ask - while I need to find certain elements and change them, I can only find those elements by finding their header (i.e. "ITEM") and then replacing the element within the block. So it'll be something like this:
<METRONOME
NAME Clicky
SPEED fast
>
<ITEM
LOOP 0
NAME Mike
FILE something.wav
..
>
<ITEM
LOOP 1
NAME Joe
FILE anotherfile.wav
..
>
So the only way to identify the correct block of data is to first find the ITEM header, then keep reading until I find the NAME element, and then update the file name for that whole ITEM block. There are other elements within that block that I need to update, and the name header isn't the first item. Also, I can't assume that the name element also exists just in ITEM blocks.
So maybe this really has less to do with reading it into memory and more of how to properly parse this type of file? Or are there some benefits to reading it into memory and being easier to manipulate? Sorry I didn't clarify that in the original question...
If it has only ~600 lines, you can take it into memory
replace = [('LOCK 1', 'LOCK 0'), (), ()....]
with open('read.txt') as r:
read = r.read()
for i in replace:
read.replace(*i)
with open('write.txt', 'w') as w:
w.write(read)
Here's my answer using regex:
import re
text = """<REAPER_PROJECT 0.1 "4.731/x64" 1431724762
RIPPLE 0
RECORD_PATH "Audio" ""
<RECORD_CFG
ZXZhdxgA
>
<APPLYFX_CFG
>
LOCK 1
<METRONOME 6 2
VOL 0.25 0.125
FREQ 800 1600 1
BEATLEN 4
SAMPLES "" ""
>
>
"""
print(re.sub("LOCK 1\D", "LOCK 0" + "\n", text))
If you're interested in writing the file to disk.
with open("written.txt", 'w') as f:
f.write(re.sub("LOCK 1\D", "LOCK 0" + "\n", text))
EDIT
You said that you wanted it to be more flexible?
Okay, I tried to make an example, however for that I would need more information about your setup..etc. So instead, I'll point you to a resource that could help you. This will also be good, if you ever want to change or add anything, now you'll understand what to do.
https://www.youtube.com/watch?v=DRR9fOXkfRE # How regex works for
python in general.
https://regexone.com/references/python # Some information about
regex and python.
https://stackoverflow.com/a/5658439/4837005 # An example of using
regex to replace a string.
I hope this helps.
I'm currently going through the Udacity course on data analysis in python, and we've been using the unicodecsv library.
More specifically we've written the following code which reads a csv file and converts it into a list. Here is the code:
def read_csv(filename):
with open(filename,'rb')as f:
reader = unicodecsv.DictReader(f)
return list(reader)
In order to get my head around this, I'm trying to figure out how the data is represented in the dictionary and the list, and I'm very confused. Can someone please explain it to me.
For example, one thing I don't understand is why the following throws an error
enrollment['cancel_date']
While the following works fine:
for enrollment in enrollments:
enrollments['cancel_date'] = parse_date(enrollment['cancel_date'])
Hopefully this question makes sense. I'm just having trouble visualizing how all of this is represented.
Any help would be appreciated.
Thanks.
I too landed up here for some troubles related to the course and found this unanswered. However I think you already managed it. Anyway answering here so that someone else might find this helpful.
Like we all know, dictionaries can be accessed like
dictionary_name['key']
and likewise
enrollments['cancel_date'] should also work.
But if you do something like
print enrollments
you will see the structure
[{u'status': u'canceled', u'is_udacity': u'True', ...}, {}, ... {}]
If you notice the brackets, it's like a list of dictionaries. You may argue it is a list of list. Try it.
print enrollments[0][0]
You'll get an error! KeyError.
So, it's like a collection of dictionaries. How to access them? Zoom down to any dictionary (rather rows of the csv) by enrollments[n].
Now you have a dictionary. You can now use freely the key.
print enrollments[0]['cancel_date']
Now coming to your loop,
for enrollment in enrollments:
enrollment['cancel_date'] = parse_date(enrollment['cancel_date'])
What this is doing is the enrollment is the dummy variable capturing each of the iterable element enrollments like enrollments[1], enrollments[2] ... enrollments[n].
So every-time enrollment is having a dictionary from enrollments and so enrollment['cancel_date'] works over enrollments['cancel_date'].
Lastly I want to add a little more thing which is why I came to the thread.
What is the meaning of "u" in u'..' ? Ex: u'cancel_date' = u'11-02-19'.
The answer is this means the string is encoded as an Unicode. It is not part of the string, it is python notation. Unicode is a library that contains the characters and symbol for all of the world's languages.
This mainly happens because the unicodecsv package does not take the headache of tracking and converting each item in the csv file. It reads them as Unicode to preserve all characters. Now that's why Caroline and you defined and used parse_date() and other functions to convert the Unicode strings to the desired datatype. This is all a part of the Data Wrangling process.
I'm new to programming, and also to this site, so my apologies in advance for anything silly or "newbish" I may say or ask.
I'm currently trying to write a script in python that will take a list of items and write them into a csv file, among other things. Each item in the list is really a list of two strings, if that makes sense. In essence, the format is [[Google, http://google.com], [BBC, http://bbc.co.uk]], but with different values of course.
Within the CSV, I want this to show up as the first item of each list in the first column and the second item of each list in the second column.
This is the part of my code that I need help with:
with open('integration.csv', 'wb') as f:
writer = csv.writer(f, delimiter=',', dialect='excel')
writer.writerows(w for w in foundInstances)
For whatever reason, it seems that the delimiter is being ignored. When I open the file in Excel, each cell has one list. Using the old example, each cell would have "Google, http://google.com". I want Google in the first column and http://google.com in the second. So basically "Google" and "http://google.com", and then below that "BBC" and "http://bbc.co.uk". Is this possible?
Within my code, foundInstances is the list in which all the items are contained. As a whole, the script works fine, but I cannot seem to get this last step. I've done a lot of looking around within stackoverflow and the rest of the Internet, but I haven't found anything that has helped me with this last step.
Any advice is greatly appreciated. If you need more information, I'd be happy to provide you with it.
Thanks!
In your code on pastebin, the problem is here:
foundInstances.append(['http://' + str(num) + 'endofsite' + ', ' + desc])
Here, for each row in your data, you create one string that already has a comma in it. That is not what you need for the csv module. The CSV module makes comma-delimited strings out of your data. You need to give it the data as a simple list of items [col1, col2, col3]. What you are doing is ["col1, col2, col3"], which already has packed the data into a string. Try this:
foundInstances.append(['http://' + str(num) + 'endofsite', desc])
I just tested the code you posted with
foundInstances = [[1,2],[3,4]]
and it worked fine. It definitely produces the output csv in the format
1,2
3,4
So I assume that your foundInstances has the wrong format. If you construct the variable in a complex manner, you could try to add
import pdb; pdb.set_trace()
before the actual variable usage in the csv code. This lets you inspect the variable at runtime with the python debugger. See the Python Debugger Reference for usage details.
As a side note, according to the PEP-8 Style Guide, the name of the variable should be found_instances in Python.
This question is a bit of an ask, but it's been giving me a headache all day (as I am fairly new to programming).
Basically I have huge list of ID's (named pk's) and I need to get all of them as they are surrounded by other text.
How would I go about retrieving all of the ID's? By the way each ID looks like this:
"pk":12345678
"pk":123456789
The ID is either a 8 or 9 digit number.
Thanks a lot guys, any help would be appreciated!
Editor's note: Asker did post his full json data in a comment to this answer.
ids = [var["pk"]]
where var is the variable of your JSON
If you clarify your JSON a little more I might be able to make this more precise.
I'd just use JSONPath. A simple, but extremely general way to extract all the ids would be this:
>>> from jsonpath import jsonpath
>>> from json import loads
>>> instagram_pop = open("instagram_popular_list.json"), "r").read()
>>> instagram_data = loads(instagram_pop)
>>> jsonpath(instagram_data, '$..id')[:3]
[u'234148392791340801_11305924', u'234098919041318605_2364270', u'234153616185741448_1907035']
Of course, since your data is flat, you can get away with a direct loop, such as:
[item['id'] for item in instagram_data['items']]
but I have a feeling you have more struct parsing to do, so I think jsonpath is a more flexible answer.