String splitting in python in CSV columns

String splitting in python in CSV columns - python

So I am working with a CSV that has a many to one relationship and I have 2 problems I need assistance in solving. The first is that I have the string set up like
thisismystr=thisisanemail#addy.com,blah,blah,blah, startnewCSVcol
So I need to split the string twice, once on = and once on , as I am basically attempting to get the portion that is an e-mail address (thisisanemail#addy.com) so far I have figured out how to split the string on the = using something like this:
str = thisismystr=thisisanemail#addy.com,blah,blah,blah
print str.split("=")
Which returns this "thisisanemail#addy.com,blah,blah,blah"... however this leaves the ,blah,blah,blah portion to be removed... after a bit of research I am stumped as nothing explains how to remove from the middle, just the 1st part or the last part. Does anyone know how to do this?
For the 2nd part I need to do this from multiple line, so this is more of an advice question... is it best to plug this into a variable and loop through like (i = 1 for i, #endofCSV do splitcmd) or is there a more efficient manner to do this? I am more familiar with LUA and I am learning that the more I work with python the more it differs from LUA.
Please help. Thanks!

Does this solve your problem?
#!/usr/bin/env python
#-*- coding:utf-8 -*-
myString = 'thisismystr=thisisanemail#addy.com,blah,blah,blah'
a = myString.split('=')
b = []
for i in a:
b.extend(i.split(','))
print b

I believe you want the email out of strings in this format: 'thisismystr=thisisanemail#addy.com,blah,blah,blah'
This is how you would do that:
str = 'thisismystr=thisisanemail#addy.com,blah,blah,blah'
email = str.split('=')[1].split(',')[0]
print email

Related

How do I write and structure code in most efficient way possible?

Few weeks ago I needed a crawler for data collection and sorting so I started learning python.
Same day I wrote a simple crawler but the code looked ugly as hell. Mainly because I don't know how to do certain things and I don't know how to properly google them.
Example:
Instead of deleting [, ] and ' in one line I did
extra_nr = extra_nr.replace("'", '')
extra_nr = extra_nr.replace("[", '')
extra_nr = extra_nr.replace("]", '')
extra_nr = extra_nr.replace(",", '')
Because I couldn't do stuff to list object and when I did str(list object) It looked like ['this', 'and this'].
Now I'm creating discord bot that will upload data that I feed to it to google spreadsheet. The code is long and ugly. And it takes like 2-3 secs to start the bot (idk if this is normal, I think the more I write the more time it takes to start it which makes me think that code is garbage). Sometimes it works, sometimes it doesn't.
My question is how do I know that I wrote something good? And if I just keep adding stuff like in the example, how will it affect my program? If I have a really long code do I split it and call the parts of it only when they are needed or how does it work?

tl;dr to get good at Python and write good code, write a lot of Python and read other people's code. Learn multiple approaches to different problem types and get a feel for which to use and when. It's something that comes over time with a lot of practice. As far as resources, I highly recommend the book "Automate the Boring Stuff with Python".
As for your code sample, you could use translate for this:
def strip(my_string):
bad_chars = [*"[],'"]
return my_string.translate({ord(c): None for c in bad_chars})
translate does a character by character translation of the string given a translation table, so you create a small translation table with the characters you don't want set to None.
The list of characters you don't want is created by unpacking (splatting) a string of the characters.
>>> [*"abc"] == ["a", "b", "c"]
True
Another option would be using comprehensions:
def strip(my_string):
bad_chars = [*"[],'"]
return "".join(c for c in my_string if c not in bad_chars)
Here we use the comprehension format [x for x in y] to build a new list of xs from y, just specifying to drop the character if it appears in bad_chars. We then join the remaining list of characters into a string that doesn't have the specified characters in it.

You will definitely improve quickly from reading (or listening) up on Python best practices from resources like Real Python and Talk Python To Me.
Meanwhile, I'd recommend starting using some code analysers like pylint and bandit as part of your regular workflow.
In any case, welcome to the world of Python and enjoy! :-)

You can use maketrans() to define characters to remove (3rd parameter):
def clean(S): return S.translate(str.maketrans("","","[],'"))
clean("A['23']") # 'A23'

Issue using a variable with an r-string in Python

Fairly new to Python, and I've got a batch job that I now have to start saving some extracts from out to a company Sharepoint site. I've searched around and cannot seem to find a solution to the issue I keep running into. I need to pass a date into the filename, and was first having issues with using a normal string. If I just type out the entire thing as a raw string, I get the output I want:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\2021-02-15_aRoute.xlsx"
print (x)
The output is: \mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\2021-02-15_aRoute.xlsx
However, if I break the string into it's parts so I can get a parameter in there, I wind up having to toss an extra double-quote on the "x" parameter to keep the code from running into a "SyntaxError: EOL while scanning string literal" error:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts\""
timestamp = date_time_obj.date().strftime('%Y-%m-%d')
filename = "_aRoute.xlsx"
print (x + timestamp + filename)
But the output I get passes that unwanted double quote into my string: \mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts"2021-02-15_aRoute.xlsx
The syntax I need is clearly escaping me, I'm just trying to get the path built so I can save the file itself. If it happens to matter, I'm using pandas to write the file:
data = pandas.read_sql(sql, cnxn)
data.to_excel(string_goes_here)
Any help would be greatly appreciated!

Per the comment from #Matthias, as it turns out, an r-string can't end with a single backslash. The quick workaround, therefore, was:
x = r"\\mnt4793\DavWWWRoot\sites\GlobalSupply\Plastics\DataExtracts" + "\\"
The comment from #sammywemmy also linked to what looks to be a much more thorough solution.
Thank you both!

How to prevent % signs from ignoring the following 2 charachters

I am trying to use requests to get a links content
r = requests.get('https://exampleurl.com/search/user_agent=Mozilla%2F5.0(Windows+NT+10.0%3B+Win64%3B+x64)+AppleWebKit%2F537.36+(KHTML,+like+Gecko)+Chrome%2F84.0.4147.105+Safari%2F537.36
the % signs ignore the following 2 characters thus the endpoint becomes invalid and returns nothing.
Probably a very beginner question, but any help is appreciated. :)

Python has a feature called 'Raw strings'. You create one by prefixing your string with an r. so in your case, it would be:
r = requests.get(r'https://exampleurl.com/search/user_agent=Mozilla%2F5.0(Windows+NT+10.0%3B+Win64%3B+x64)+AppleWebKit%2F537.36+(KHTML,+like+Gecko)+Chrome%2F84.0.4147.105+Safari%2F537.36')

How to fix certain 'Line too long' errors in a Python file?

I have the following line inside a for loop (i.e. it's indented by 4 spaces):
abcdefgh_ijklm_nopqrstuvwxy = abcdefgh_ijklm_nopqrstuvwxy.append(abc_de)
The line is 80 characters long. How can I split it up so that it I do not get a 'Line too long' notification? Please note that I've changed the variable names for privacy reasons (not my code), but I can't modify the name of the variable, so naming it something shorter to fix the problem is not a viable option
As a secondary question, how would I split up a formatted string of the form:
data_header = f"FILE_{heading_angles}_{moment_of_inertia}_{mass_of_object}_{type_of_object}"
to span multiple lines?
I already tried
data_header = f"FILE_{heading_angles}_{moment_of_inertia}_"
f"{mass_of_object}_{type_of_object}"
but that gives me an indentation error.
Any kind of help would be greatly appreciated!

Hope that these points answer your questions:
To simplify your expressions, try to replace the variables with simpler ones before the expressions. This may be inappropriate, if more serious operations are needed. For example:
a = abcdefgh_ijklm_nopqrstuvwxy
b = abcdefgh_ijklm_nopqrstuvwxy.append(abc_de)
a = b
In your case, try using a forward-leaning backlash (\) at the end of the line. For example:
if a == True and \
b == False
Here is a link from another discussion on a similar matter.
Hope this helps.

For you data_header example - you need brackets.
For example:
data_header = (
f"FILE_{heading_angles}_{moment_of_inertia}_"
f"{mass_of_object}_{type_of_object}"
)

how to split a string after a specific character?

I want to split a list of items with specific symbol.
I have used the following code
data = "launch, 7:30am, watch tv, workout, snap, running, research study and learn"
items = data.split(',')
print(', '.join([items[0], items[-1].split('—')[1]]))
Here what I wanted is that to split this data and print like this:
launch, study and learn
but a problem appears when data changed like this:
data = "launch, 7:30am, watch tv, workout, snap, running, research — discussion, study and learn"
items = data.split(',')
print(', '.join([items[0], items[-1].split('—')[1]]))
and in this I case I expected to get this result:
launch, discussion, study and learn
as such, an error appears "list index out of range"! that is right because there is no symbol "-" after last element, because of "," and I instructed data to be splitted as "," therefore in "discussion, study and learn" will be treated as separate data so an error appears. I wanted to not rewrite any code, is it possible to use code reuse to read both data. is it possible to read after "-" symbol?

Seems like your expected output is dependent on word research
We can implement the same using regex which will search research word and gives you characters after it.
You can try this -
# -*- coding: utf-8 -*-
import re
(re.split(r'*research[^A-Za-z0-9]+',data))[-1]
#study and learn
#discussion, study and learn
Full Code:
# -*- coding: utf-8 -*-
import re
print ("{0}, {1}".format(data.split(',')[0], (re.split(r' *research[^A-Za-z0-9]+',data))[-1]))
#launch, study and learn
#launch, discussion, study and learn
Read more about python Regex :
https://docs.python.org/3/library/re.html
or about expressions here :
https://www.w3schools.com/python/python_regex.asp

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

String splitting in python in CSV columns - python

Does this solve your problem? #!/usr/bin/env python #-- coding:utf-8 -- myString = 'thisismystr=thisisanemail#addy.com,blah,blah,blah' a = myString.split('=') b = [] for i in a: b.extend(i.split(',')) print b

I believe you want the email out of strings in this format: 'thisismystr=thisisanemail#addy.com,blah,blah,blah' This is how you would do that: str = 'thisismystr=thisisanemail#addy.com,blah,blah,blah' email = str.split('=')[1].split(',')[0] print email

Related

How do I write and structure code in most efficient way possible?

Issue using a variable with an r-string in Python

How to prevent % signs from ignoring the following 2 charachters

How to fix certain 'Line too long' errors in a Python file?

how to split a string after a specific character?

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

String splitting in python in CSV columns - python

Does this solve your problem? #!/usr/bin/env python #-*- coding:utf-8 -*- myString = 'thisismystr=thisisanemail#addy.com,blah,blah,blah' a = myString.split('=') b = [] for i in a: b.extend(i.split(',')) print b

I believe you want the email out of strings in this format: 'thisismystr=thisisanemail#addy.com,blah,blah,blah' This is how you would do that: str = 'thisismystr=thisisanemail#addy.com,blah,blah,blah' email = str.split('=')[1].split(',')[0] print email

Related

How do I write and structure code in most efficient way possible?

Issue using a variable with an r-string in Python

How to prevent % signs from ignoring the following 2 charachters

How to fix certain 'Line too long' errors in a Python file?

how to split a string after a specific character?

Categories

Resources

Does this solve your problem? #!/usr/bin/env python #-- coding:utf-8 -- myString = 'thisismystr=thisisanemail#addy.com,blah,blah,blah' a = myString.split('=') b = [] for i in a: b.extend(i.split(',')) print b