Python - writing data from a csv to new csv but row overwriten - python

I have a few thousand Twitter tweets in a csv with one tweet per row (there are blank rows between each tweet). Each column of each row contains different parts of the tweet (like time, text, language, location, etc.) but not each column has the same information (ie: sometimes language appears in column AG or AH or some other one). I'm trying to clean up the data by creating a new CSV containing only English tweets and also filtering out the punctuations from each of these (English) tweets.
I'm currently stuck on how to filter out only the English tweets. This is what I have so far:
import csv
f = open('twitDB.csv')
csv_f = csv.reader(f) # csv_f is a list of lists
for row in csv_f:
for col in row:
if col == 'lang:"en"':
with open('cleaned.csv', 'w') as fp:
wr = csv.writer(fp, delimiter = ',')
wr.writerow(row)
wr.writerow('\n')
The new cleaned.csv only contains the last English tweet (of thousands) in its Row 1. I have a feeling that my code is continuously overwriting the first row of cleaned.csv and not writing each tweet onto the next row but I'm unsure how to fix this.

you need to use open('cleaned.csv', 'a') the 'a' will append each time.. 'w' will open and overwrite what is there each time.. This is why you are only seeing 1 row..

Related

How to get rid of space after header row when appending to csv file [python]?

I created a blank csv with some field names, and then have a script that calculates some values for each of those new fields/columns, and appends that row to the initially blank csv. I iterate through a folder of .csv files, and based on the data in those files, I create a new row of values for each iteration in the for loop, and then append those rows consecutively to the initially made .csv file. However, when I tried doing this, and then looking at the newly created .csv file, I saw that there was a space, a blank row, after each entry, so a space after the header row, and then a space after each newly appended row. I am using python for this.
I created the .csv file with this code:
fields = ['Field_1', 'Field_2', 'Field_3']
filename = 'final_results.csv'
with open(filename, 'w') as csvfile:
# creating a csv writer object
csvwriter = csv.writer(csvfile)
# writing the fields
csvwriter.writerow(fields)
In my script, I do calculations on the data from each of the .csv files I loop through. I ultimately compute a "Value_1", "Value_2", and a "Value_3", where "Value_1" should fall under "Field_1", "Value_2" should fall under "Field_2", and "Value_3" should fall under "Field_3". I then create a new row of these new values to be appended to my .csv file with simply:
new_row = [Value_1, Value_2, Value_3]
I then appended with this code:
with open('Final_results.csv', 'a') as f_object:
writer_object = writer(f_object)
writer_object.writerow(new_row)
f_object.close()
This led to spaces after the header rows and then spaces after each consecutively added row when I looked at the "Final_results.csv", the final product.
I then tried this:
with open('Final_results.csv', 'a', newline="") as f_object:
writer_object = writer(f_object)
writer_object.writerow(new_row)
f_object.close()
adding newline=""
And now when I look at "Final_results.csv", there are no spaces/blank rows between the newly appended rows, which is what I want, but there is still a blank row between the header row and the appended rows. How can I get rid of this space? I can't find what specific argument/parameter that would need to be added/changed within the "writer" module to address this issue.

How to store data in python with number of row limit

For a project I have devices who send payloads and I should store them on a localfile, but I have memory limitation and I dont want to store more than 2000 data rows. again for the memory limitation I cannot have a database so I chose to store data in csv file.
I tried to use open('output.csv', 'r+') as f: ; I'm appending the rows to the end of my csv and I have to check each time the lenght with sum(1 for line in f) to be sure its not more than 2000.
The big problem starts when I reach 2000 rows and I want to ideally delete the first row and add another row to the end or start to write rows from the beginning of the file and overwrite the old rows without deleting evrything, but I dont know how to do it. I tried to use open('output.csv', 'w+') or open('output.csv', 'a+') but it will delete all the contents with w+ while writing only one row and by a+ it just continues to append to the end. I on the otherhand I cannot count the number of rows anymore with both. can you pleas help me which command should I use to start to rewrite each line from the beginning or delete one line from the beginning and append one to the end? I will also appriciate if you can tell me if there is a better chioce than csv files for storing many data or I can use a better way to count the number of rows.
This should help. See comments inline
import pandas as pd
allowed_length = 2 # Set it to the required value
df = pd.read_csv('output.csv') #Read your csv file to df
row_count = df.shape[0] #Get row count
df.loc[row_count] = ['Fridge', 15] #Insert row at end of df. In my case it has only 2 values
#if count of dataframe is greater or equal to allowed_length, the delete first row
if row_count >= allowed_length:
df = df.drop(df.head(1).index)
df.to_csv('output.csv', index=False)

How to transpose rows of data separated by commas in certain cells to single column using data from a CSV file?

I have a CSV file with text data separated by commas in some columns, but not in others, e.g.:
https://i.imgur.com/X6bq09I.png
I want to export each row of my CSV file to a new CSV file. An example desired output for the first row of my original file would look like this:
https://i.imgur.com/QB9sLeL.png
I have tried the code offered in the first answer of this post: Open CSV file and writing each row to new, dynamically named CSV file.
This is the code I used:
import csv
counter = 1
with open('mock_data.csv', 'rU') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if row:
filename = "trial%s" % str(counter)
with open(filename, 'w') as csvfile_out:
writer = csv.writer(csvfile_out)
writer.writerow(row)
counter = counter + 1
This code does produce a new .csv file for each row. However...
EDIT: I have three remaining issues, for which I have not found the right code:
I want each word to have its own cell in each row; I don't know
how to do this when certain cells contain a multiple words separated
by commas, while other cells contain only a single word;
Once each word has its own cell, I want to transpose each row into a single column in the new .csv file;
I want to remove duplicate values from the column.
If you actually want a file extension, then use filename = "trial%s.csv" % str(counter)
But CSV files don't care about file extensions. Any file reader or code should be able to read the file.
TextEdit is just the Mac default for that.
I need a single column with one word in each cell, in each new output file
When you do writer.writerow(row), then make sure if len(row) == 1 rather than if row

why is csv reader creating a nested list?

Started learning python after lots of ruby experience. With that context in mind:
I have a csv file that looks something like this:
city_names.csv
"abidjan","addis_ababa","adelaide","ahmedabad"
With the following python script I'd like to read this into a list:
city_names_reader.py
import csv
city_name_file = r"./city_names.csv"
with open(city_name_file, 'rb') as file:
reader = csv.reader(file)
city_name_list = list(reader)
print city_name_list
The result surprised me:
[['abidjan', 'addis_ababa', 'adelaide', 'ahmedabad']]
Any idea why I'm getting a nested list rather than a 4-element list? I must be overlooking something self-evident.
A CSV file represents a table of data. A table contains both columns and rows, like a spreadsheet. Each line in a CSV file is one row in the table. One row contains multiple columns, separated by ,
When you read a CSV file you get a list of rows. Each row is a list of columns.
If your file have only one row you can easily just read that row from the list:
city_name_list = city_name_list[0]
Usually each column represent some kind of data (think "column of email addresses"). Each row then represent a different object (think "one object per row, each row can have one email address"). You add more objects to the table by adding more rows.
It is not common with wide tables. Wide tables are those that grow by adding more columns instead of rows. In your case you have only one kind of data: city names. So you should have one column ("name"), with one row per city. To get city names from your file you could then read the first element from each row:
city_name_list = [row[0] for row in city_name_list]
In both cases you can flatten the list by using itertools.chain:
city_name_list = itertools.chain(city_name_list)
As others suggest, your file is not an idiomatic CSV file. You can simply do:
with open(city_name_file, "rb") as fp:
city_names_list = fp.read().split(",")
Based on comments, here is a possible solution:
import csv
city_name_file = r"./city_names.csv"
city_name_list = []
with open(city_name_file, 'rb') as file:
reader = csv.reader(file)
for item in reader:
city_name_list += item
print city_name_list

Printing the final row when I need all rows

I'm trying to skip the first pipe delimited piece of data in my .txt file when reading it with a csv.DictReader. Here is a sample of the data I'm working with:
someCSVfile.csv|cust_no,0|streetaddr,1|city,2|state,3|zip,4|phone_home,5|firstname,6|lastname,7|status,9|
someCSVfile1.csv|cust_no,0|streetaddr,1|city,2|state,3|zip,4|phone_home,5|firstname,6|lastname,7|status,9|
And here is my code so far:
import csv
reader = csv.reader(open('match_log.txt','rb'), dialect='excel', delimiter='|')
for row in reader:
skipfirstRow=reader.next()
skipfirstRowAgain=reader.next()
Dictreader=csv.DictReader(reader,skipfirstRow)
print row
I've been researching .next() pretty thoroughly, but that doesn't seem to work. When I print my rows, it prints every row, when I don't want the first row (the .csv files) to be printed. Is there another method that may work?
EDIT: Here is my latest code:
import csv
reader = csv.reader(open('match_log.txt','rb'), dialect='excel', delimiter='|')
data = {}
for row in reader:
filenameVariable = row[0]
data = dict(item.split(',') for item in row[1:])
print data
print filenameVariable
Right now, data and filenameVariable are printing the final row when I need all rows. I tried .append but that didn't work. What else could I use?
The .csv parts are the first column/field, not the first row. Advancing reader will indeed skip rows, but won't affect what's in each individual row. (Rows go across!)
If you want to leave off the first item in a sequence, print row[1:] instead of row.

Categories

Resources