.split from a file and putting it in an array - python

Im reading a file with some information and each part is separated with a # however on each line i want it to be a different array so i did this and im not sure why its not working.
main_file = open("main_file.txt","r")
main_file_info=main_file.readlines()
test=[]
n=0
for line in main_file_info:
test[n]=line.split("#")
test=test[n][1:len(test)-1] # to get rid of empty strings at the start and the end
print(test)# see what comes out
main_file.close()

The way you are inserting the output of line.split("#") in your list is wrong. Your list is not initialized, hence, you can't simply assign anything to any element of the list. So, what you need to do is this :
test.append(line.split("#"))
Or, you can initialize your list as below :
test = [[]]*(len(main_file_info))

test = [None for _ in range(total)]
# instead of test = []
or simply just append to test:
test.append( line.split("#") )

Related

AttributeError: 'tuple' object has no attribute 'write' Error

I keep getting this error and I have no idea what it means. I have taken measures to get rid of a tuple in my code. The program is supposed to read in a document which has a series of numbers and then sort those numbers using a bubble sort function and then print the old list and the new, sorted list onto a new file.
My assignment is to create a new file and print the original array, from the given file, and the sorted array, sorted using a bubble sort function, as two lines in a comma separated file.
# reading in the document into the program
file = open("rand_numb.csv", "r")
# creating a new file that will have the output printed on it
newFile = ("SORTED.csv", "w+")
# creating a blank list that will hold the original file's contents
orgArr = []
# creating the bubblesort function
def bubblesort(arr):
# creating a variable to represent the length of the array
length = len(arr)
# traverse through all array elements
for i in range(length):
# last i elements are already in place
for j in range(0, length-i-1):
# traverse the array from 0 to length-i-1 and swap if the element found is greater than the next element
if arr[j] > arr[j+1] :
arr[j], arr[j+1] = arr[j+1], arr[j]
return arr
# Prog4 Processing
# using a for loop to put all of the numbers from the read-in file into a list
listoflists = [(line.strip()).split() for line in file]
# closing the original file
file.close()
# creating a variable to represent the length of the list
listLen = len(listoflists)
# using a for loop to have the elements in the list of lists into one list
for num in range(0, listLen):
orgArr.append(num)
# using list function to change the tuple to a list
orgArr = list(orgArr)
# using the bubblesort function
sortArr = bubblesort(orgArr)
# Prog4 Output
# outputting the two lists onto the file
newFile.write(orgArr + "\n")
newFile.write(sortArr)
# closing the new file
newFile.close() ```
Rather than create a new file in your line:
newFile = ("Sorted.csv", "w+")
you have instead defined a tuple containing two strings "Sorted.csv" and "w+" by declaring these comma separated values between parenthesis. Rather than create your newFile at the top of your code, you can wait to create it until you actually intend to populate it.
with open("Sorted.csv", "w+") as newFile:
newFile.write(orgArr + "/n")
newFile.write(sortArr)
newFile.close()
I suspect you may have issues that your newFile is formatting how you want, but I will let you raise a new question in the event that that is true.

Problem parsing data from a firewall log and finding "worm"

I am struggling with trying to see what is wrong with my code. I am new to python.
import os
uniqueWorms = set()
logLineList = []
with open("redhat.txt", 'r') as logFile:
for eachLine in logFile:
logLineList.append(eachLine.split())
for eachColumn in logLineList:
if 'worm' in eachColumn.lower():
uniqueWorms.append()
print (uniqueWorms)
eachLine.split() returns a list of words. When you append this to logLineList, it becomes a 2-dimensional list of lists.
Then when you iterate over it, eachColumn is a list, not a single column.
If you want logLineList to be a list of words, use
logLineList += eachLine.split()
instead of
logLineList.append(eachLine.split())
Finally, uniqueWorms.append() should be uniqueWOrms.append(eachColumn). And print(uniqueWorms) should be outside the loop, so you just see the final result, not every time a worm is added.

in my for loop .split() only works once

So I have this app. It records data from the accelerometer. Here's the format of the data. It's saved in a .csv file.
time, x, y, z
0.000000,0.064553,-0.046095,0.353776
Here's the variables declared at the top of my program.
length = sum(1 for line in open('couchDrop.csv'))
wholeList = [length]
timeList = [length]#holds a list of all the times
xList = [length]
yList = [length]
zList = [length]
I'm trying to create four lists; time, x, y, z. Currently the entire data file is stored in one list. Each line in the list contains 4 numbers representing time, x, y, and z. I need to split wholeList into the four different lists. Here's the subroutine where this happens:
def easySplit():
print "easySplit is go "
for i in range (0, length):
current = wholeList[i]
current = current[:-2] #gets rid of a symbol that may be messing tings up
print current
print i
timeList[i], xList[i], yList[i], zList[i] = current.split(',')
print timeList[i]
print xList[i]
print yList[i]
print zList[i]
print ""
Here's the error I get:
Traceback (most recent call last):
File "/home/william/Desktop/acceleration plotter/main.py", line 105, in <module>
main()
File "/home/william/Desktop/acceleration plotter/main.py", line 28, in main
easySplit()
File "/home/william/Desktop/acceleration plotter/main.py", line 86, in easySplit
timeList[i], xList[i], yList[i], zList[i] = current.split(',')
IndexError: list assignment index out of range`
Another weird thing is that my dot split seems to work fine the first time through the loop.
Any help would be greatly appreciated.
For data manipulation like this, I'd recommend using Pandas. In one line, you can read the data from the CSV into a Dataframe.
import pandas as pd
df = pd.read_csv("couchDrop.csv")
Then, you can easily select each column by name. You can manipulate the data as a pd.Series or manipulate as a np.array or convert to a list. For example:
x_list = list(df.x)
You can find more information about pandas at http://pandas.pydata.org/pandas-docs/stable/10min.html.
EDIT: The error with your original code is that syntax like xList = [length] does not create a list of length length. It creates a list of length one containing a single int element with the value of length.
the code line wholeList = [length] doesnt create a list of length = length. It creates a list with only one element which is integer length, so if you were to print(wholeList) you will only see [3]
Since lists are mutable in python, you can just to wholeList =[] and keep appending elements to it. It doesnt have to be of specific length.
And when you do current.split(',') which is derived from wholeList, its trying to split only data available for first iteration i.e. [3]

Python appending a list to a list and then clearing it

I have this part of code isolated for testing purposes and this question
noTasks = int(input())
noOutput = int(input())
outputClist = []
outputCList = []
for i in range(0, noTasks):
for w in range(0, noOutput):
outputChecked = str(input())
outputClist.append(outputChecked)
outputCList.append(outputClist)
outputClist[:] = []
print(outputCList)
I have this code here, and i get this output
[[], []]
I can't figure out how to get the following output, and i must clear that sublist or i get something completely wrong...
[["test lol", "here can be more stuff"], ["test 2 lol", "here can be more stuff"]]
In Python everything is a object. A list is a object with elements. You only create one object outputclist filling and clearing its contents. In the end, you have one list multiple times in outputCList, and as your last thing is clearing the list, this list is empty.
Instead, you have to create a new list for every task:
noTasks = int(input())
noOutput = int(input())
output = []
for i in range(noTasks):
checks = []
for w in range(noOutput):
checks.append(input())
output.append(checks)
print(output)
Instead of passing the contained elements in outputClist to outputCList (not the greatest naming practice either to just have one capitalization partway through be the only difference in variable names), you are passing a reference to the list itself. To get around this important and useful feature of Python that you don't want to make use of, you can pretty easily just pass a new list containing the elements of outputClist by changing this line
outputCList.append(outputClist)
to
outputCList.append(list(outputClist))
or equivalently, as #jonrsharpe states in his comment
outputCList.append(outputClist[:])

Access an element in a list of lists in python

I am new to python and am trying to access a single specific element in a list of lists.
I have tried:
line_list[2][0]
this one isn't right as its a tuple and the list only accepts integers.
line_list[(2, 0)]
line_list[2, 0]
This is probably really obvious but I just can't see it.
def rpd_truncate(map_ref):
#Munipulate string in order to get the reference value
with open (map_ref, "r") as reference:
line_list = []
for line in reference:
word_list = []
word_list.append(line[:-1].split("\t\t"))
line_list.append(word_list)
print line_list[2][0]
I get the exact same as if I used line_list[2]:
['Page_0', '0x00000000', '0x002DF8CD']
actually split will return a list
more over you don't require word_list variable
for line in reference:
line_list.append(line[:-1].split("\t\t"))
print line_list[2][0]

Categories

Resources