So I have a list of lists that I need to parse through and manipulate the contents of. There are strings of numbers and words in the sublists, and I want to change the numbers into integers. I don't think it's relevant but I'll mention it just in case: my original data came from a CSV that I split on newlines, and then split again on commas.
What my code looks like:
def prep_data(data):
list = data.split('\n') #Splits data on newline
list = list[1:-1] #Gets rid of header and last row, which is an empty string
prepped = []
for x in list:
prepped.append(x.split(','))
for item in prepped: #Converts the item into an int if it is able to be converted
for x in item:
try:
item[x] = int(item[x])
except:
pass
return prepped
I tried to loop through every sublist in prepped and change the type of the values in them, but it doesn't seem like the loop does anything as the prep_data returns the same thing as it did before I implemented that for loop.
I think I see what is wrong, you are thinking python is more generous with it's assignment than it actually is.
def prep_data(data):
list = data.split('\n') #Splits data on newline
list = list[1:-1] #Gets rid of header and last row, which is an empty string
prepped = []
for x in list:
prepped.append(x.split(','))
for i in prepped: #Converts the item into an int if it is able to be converted
item = prepped[i]
for x in item:
try:
item[x] = int(item[x])
except:
pass
prepped[i] = item
return prepped
I can't run this on the machine I'm on right now but it seems the problem is that "prepped" wasn't actually receiving any new assignments, you were just changing values in the sub array "item"
I'm not sure about your function, because maybe I didn't understand your income data, but you could try something like the following because if you only pass, you could lose string or weird data:
def parse_data(raw_data):
data_lines = raw_data.split('\n') #Splits data on newline
data_rows_without_header = data_lines[1:-1] #Gets rid of header and last row, which is an empty string
parsed_date = []
for raw_row in data_rows_without_header:
splited_row = raw_line.split(',')
parsed_row = []
for value in splited_row:
try:
parsed_row.append(int(value)
except:
print("The value '{}' is not castable".format(value))
parsed_row.append(value) # if cast fails, add the string as it is
parsed_date.append(parsed_row)
return parsed_date
Related
The code should take the changes_text.txt, and format it using dictionaries and such. Except its returning as if theres no schedule changes even though there is.
# Store the text data in a string variable instead of writing it to a file
raw_data = changes_txt.text
print(raw_data, 0)
# Use a dictionary to map strings to functions
operation_map = {
'ביטול שעור': lambda line, i: f'Period {i} cancelled! W\n',
'הזזת שיעור': lambda line, i: f'Class "{line.split(" לשיעור")[0].split(", ")[2]}" moved to period {i}\n',
'מילוי מקום': lambda line, i: f'Period {i} replaced with class "{line.split(", ")[4]}"\n',
'החלפת חדר': lambda line,
i: f'Class "{line.split(", ")[2]}" moved to room {line.split(", ")[-1].split(":")[-1]}\n',
}
# Split the raw data into lines
data_lines = raw_data.splitlines()
print(data_lines, 1)
# Use a list comprehension to apply the operations to the data
results = [operation_map[line.split(" ")[0]](line, i) if line.split(" ")[0] in operation_map else "\n"
for line in data_lines for i in range(8)]
# Use string interpolation to format the output data
output = ''.join(results)
return output
The changes_txt.text/raw_data is:
11.12.2022, שיעור 1, כרמי ציפי, מילוי מקום מורג אורית, אזרחות, חדר: 304 0
I'm expecting it to return something like this, where the first period is replaced and the rest of the lines should be empty
Period 1 replaced with class "אזרחות"
I'm obviously not great with dictionaries which is why I need help
full code:
https://pastebin.com/YymFmfjW
I have text file having this content
group11#,['631', '1051']#,ADD/H/U_LS_FR_U#,group12#,['1', '1501']#,ADD/H/U_LS_FR_U#,group13#,['31', '28']#,ADD/H/UC_DT_SS#,group14#,['18', '27', '1017', '1073']#,AN/H/UC_HR_BAN#,group15#,['13']#,AD/H/U_LI_NW#,group16#,['1031']#,AN/HE/U_LE_NW_IES#
Requirment is to pull each element separated by #, and to store it in separate variable. And text file above is not having fixed length. So if there are 200 #, separated values then, those should be stored in 200 varaiables.
So the expected output would be
a = group11, b = [631, 1051] c = ADD/H/U_LS_FR_U, d = group12, e = [1, 1501] f = ADD/H/U_LS_FR_U and so on
I'd use those a,b,c,d further as
url = (url+c)
rjson = {"reqparam":{"ids":[str(b)]+str(b)}]}
freq = json.dumps(rjson)
resp = request.request("Post",url,rjson)
Actually in reqparam 'b' have to use values like 631 and 1051
Not sure how to achieve this?
I've started with
with open("filename.txt", "r") as f:
data = f.readlines()
for line in data:
value = line.strip().split('#')
print(value)
You should not use new variable for each object, there are different containers for this, e.g. list.
To parse this string into a list, you can just split string using "#," as a divider and cut last symbol (which is "#") from source before strip:
result = src[:-1].split(",#")
But in output sample you show that you want items which contains list to be converted into a list. You can do this using ast.literal_eval():
import ast
result = [ast.literal_eval(s) if "[" in s else s for s in src[:-1].split("#,")]
I used list comprehesion in previous example, but you can write it using regular for loop:
import ast
result = []
for s in src[:-1].split(",#"):
if "[" in s:
try:
converted = ast.literal_eval(s) # string repr of list into a list
except Exception as e:
print(f"\"{s}\" throws an error: {e}")
else:
result.append(converted)
else:
result.append(s)
You can also use str.strip() to cut "#" and "," from the end of the string (and from the start):
src.strip(",#").split(",#")
I am using the following code to bring back prices from an ecommerce website:
response.css('div.price.regularPrice::text').extract()
but getting the following result:
'\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t',
I do not want the slashes and letters and only the number 5. How do I get this?
First you can use strip() to remove tabs "\t" and enters "\n".
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
data = [item.strip() for item in data]
and you get
['Dhs 5.00', '']
Next you can use if to skip empty elements
data = [item for item in data if item]
and you get
['Dhs 5.00']
If item always has the same structure Dns XXX.00
then you can use slicing [4:-3] to remove "Dhs " and ".00"
data = [item[4:-3] for item in data]
and you get
['5']
So now you have to only get first element data[0] to get 5.
If you need you can convert string "5" to integer 5 using int()
result = int(data[0])
You can even put all in one line
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
data = [item.strip()[4:-3] for item in data if item.strip()]
result = int(data[0])
If you always need only first element from list then you can write it
data = ['\r\n\t\t\tDhs 5.00\r\n\t\t\t\t\t\t\t\t',
'\r\n\t\t\t\t\t\t\t\r\n\t\t\t\t\t\t']
result = int( data[0].strip()[4:-3] )
Use regex to fetch only the numbers.
\d+ regex expression should do the trick.
I'm trying to remove a lot of stuff from a text file to rewrite it.
The text file has several hundred items each consisting of 6 lines of.
I got my code working to a point where puts all lines in an array, identifies the only 2 important in every item and deletes the whitespaces, but any further stripping gives me the following error:
'list' object has no attribute 'strip'
Here my code:
x = 0
y = 0
names = []
colors = []
array = []
with open("AA_Ivory.txt", "r") as ins:
for line in ins:
array.append(line)
def Function (currentElement, lineInSkinElement):
name = ""
color = ""
string = array[currentElement]
if lineInSkinElement == 1:
string = [string.strip()]
# string = [string.strip()]
# name = [str.strip("\n")]
# name = [str.strip(";")]
# name = [str.strip(" ")]
# name = [str.strip("=")]
names.append(name)
return name
# if lineInSkinElement == 2:
# color = [str.strip("\t")]
# color = [str.strip("\n")]
# color = [str.strip(";")]
# color = [str.strip(" ")]
# color = [str.strip("=")]
# colors.append(color)
# return color
print "I got called %s times" % currentElement
print lineInSkinElement
print currentElement
for val in array:
Function(x, y)
x = x +1
y = x % 6
#print names
#print colors
In the if statement for the names, deleting the first # will give me the error.
I tried converting the list item to string, but then I get extra [] around the string.
The if statement for color can be ignored, I know it's faulty and trying to fix this is what got me to my current issue.
but then I get extra [] around the string
You can loop through this to get around the listed string. For example:
for lst, item in string:
item = item.strip("\n")
item = item.strip(";")
item = item.strip(" ")
item = item.strip("=")
name.append(item)
return name
This will get you to the string within the list and you can append the stripped string.
If this isn't what you were looking for, post some of the data you're working with to clarify.
Alright, I found the solution. It was a rather dumb mistake of mine. The eerror occured due to the [] arroung the strip function making the outcome a list or list item. Removing them fixed it. Feeling relieved now, a bit stupid, but relieved.
You can also do that in one line using the following code.
item = item.strip("\n").strip("=").strip(";").strip()
The last strip will strip the white spaces.
I have a file i am trying to replace parts of a line with another word.
it looks like bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212
i need to delete everything but bob123#bobscarshop.com, but i need to match 23rh32o3hro2rh2 with 23rh32o3hro2rh2:poniacvibe , from a different text file and place poniacvibe infront of bob123#bobscarshop.com
so it would look like this bob123#bobscarshop.com:poniacvibe
I've had a hard time trying to go about doing this, but i think i would have to split the bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212 with data.split(":") , but some of the lines have a (:) in a spot that i don't want the line to be split at, if that makes any sense...
if anyone could help i would really appreciate it.
ok, it looks to me like you are using a colon : to separate your strings.
in this case you can use .split(":") to break your strings into their component substrings
eg:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
print(firststring.split(":"))
would give:
['bobkeiser', 'bob123#bobscarshop.com', '0.0.0.0.0', '23rh32o3hro2rh2', '234212']
and assuming your substrings will always be in the same order, and the same number of substrings in the main string you could then do:
firststring = "bobkeiser:bob123#bobscarshop.com:0.0.0.0.0:23rh32o3hro2rh2:234212"
firstdata = firststring.split(":")
secondstring = "23rh32o3hro2rh2:poniacvibe"
seconddata = secondstring.split(":")
if firstdata[3] == seconddata[0]:
outputdata = firstdata
outputdata.insert(1,seconddata[1])
outputstring = ""
for item in outputdata:
if outputstring == "":
outputstring = item
else
outputstring = outputstring + ":" + item
what this does is:
extract the bits of the strings into lists
see if the "23rh32o3hro2rh2" string can be found in the second list
find the corresponding part of the second list
create a list to contain the output data and put the first list into it
insert the "poniacvibe" string before "bob123#bobscarshop.com"
stitch the outputdata list back into a string using the colon as the separator
the reason your strings need to be the same length is because the index is being used to find the relevant strings rather than trying to use some form of string type matching (which gets much more complex)
if you can keep your data in this form it gets much simpler.
to protect against malformed data (lists too short) you can explicitly test for them before you start using len(list) to see how many elements are in it.
or you could let it run and catch the exception, however in this case you could end up with unintended results, as it may try to match the wrong elements from the list.
hope this helps
James
EDIT:
ok so if you are trying to match up a long list of strings from files you would probably want something along the lines of:
firstfile = open("firstfile.txt", mode = "r")
secondfile= open("secondfile.txt",mode = "r")
first_raw_data = firstfile.readlines()
firstfile.close()
second_raw_data = secondfile.readlines()
secondfile.close()
first_data = []
for item in first_raw_data:
first_data.append(item.replace("\n","").split(":"))
second_data = []
for item in second_raw_data:
second_data.append(item.replace("\n","").split(":"))
output_strings = []
for item in first_data:
searchstring = item[3]
for entry in second_data:
if searchstring == entry[0]:
output_data = item
output_string = ""
output_data.insert(1,entry[1])
for data in output_data:
if output_string == "":
output_string = data
else:
output_string = output_string + ":" + data
output_strings.append(output_string)
break
for entry in output_strings:
print(entry)
this should achieve what you're after and as prove of concept will print the resulting list of stings for you.
if you have any questions feel free to ask.
James
Second edit:
to make this output the results into a file change the last two lines to:
outputfile = open("outputfile.txt", mode = "w")
for entry in output_strings:
outputfile.write(entry+"\n")
outputfile.close()