So I have a several tables in the format of csv, I am using Python and the csv module. I want to extract a particular value, lets say column=80 row=109.
Here is a random example:
import csv
with open('hugetable.csv', 'r') as file:
reader = csv.reader(file)
print(reader[109][80])
I am doing this many times with large tables and I would like to avoid loading the whole table into an array (line 2 above) to ask for a single value. Is there a way to open the file, load the specific value and close it again? Would this process be more efficient than what I have done above?
Thanks for all the answers, all answers so far work pretty well.
You could try reading the file without csv library:
row = 108
column = 80
with open('hugetable.csv', 'r') as file:
header = next(file)
for _ in range(row-1):
_ = next(file)
line = next(file)
print(line.strip().split(',')[column])
You can try pandas to load only certain columns of your csv file
import pandas as pd
pd.read_csv('foo.csv',usecols=["column1", "column2"])
You could use pandas to load it
import pandas as pd
text = pd.read_csv('Book1.csv', sep=',', header=None, skiprows= 100, nrows=3)
print(text[50])
Related
Data from datadog
I am looking for some assistance reading this data from Datadog, I am reading it from the downloaded cvs. Wants to read in python so that create an application for the reading the same on regular intervals.
I have tried reading the data like below
import pandas as pd
fileload = pd.read_csv("DataSource/extract-2023-02-02T19_10_32.790Z.csv")
print(fileload)
fileload1 = pd.read_csv("DataSource/extract-2023-02-02T19_11_05.899Z.csv")
final = pd.concat([fileload, fileload1])
print(final)````
import csv
with open("DataSource/extract-2023-02-02T19_10_32.790Z.csv", 'r' ) as file:
csvread = csv.reader(file)
for i in file:
print(i)
a = pd.DataFrame([csvread])
print(type(a))
My expectation is that i can pick the last column with the all the data in the above format and further give column names to it. and then analyse data applying some aggregations on top.
Please assist
Have you tried:
final[["final_column_name"]]
final['New_col_name'] = ...
Trying to whip this out in python. Long story short I got a csv file that contains column data i need to inject into another file that is pipe delimited. My understanding is that python can't replace values, so i have to re-write the whole file with the new values.
data file(csv):
value1,value2,iwantthisvalue3
source file(txt, | delimited)
value1|value2|iwanttoreplacethisvalue3|value4|value5|etc
fixed file(txt, | delimited)
samevalue1|samevalue2| replacedvalue3|value4|value5|etc
I can't figure out how to accomplish this. This is my latest attempt(broken code):
import re
import csv
result = []
row = []
with open("C:\data\generatedfixed.csv","r") as data_file:
for line in data_file:
fields = line.split(',')
result.append(fields[2])
with open("C:\data\data.txt","r") as source_file, with open("C:\data\data_fixed.txt", "w") as fixed_file:
for line in source_file:
fields = line.split('|')
n=0
for value in result:
fields[2] = result[n]
n=n+1
row.append(line)
for value in row
fixed_file.write(row)
I would highly suggest you use the pandas package here, it makes handling tabular data very easy and it would help you a lot in this case. Once you have installed pandas import it with:
import pandas as pd
To read the files simply use:
data_file = pd.read_csv("C:\data\generatedfixed.csv")
source_file = pd.read_csv('C:\data\data.txt', delimiter = "|")
and after that manipulating these two files is easy, I'm not exactly sure how many values or which ones you want to replace, but if the length of both "iwantthisvalue3" and "iwanttoreplacethisvalue3" is the same then this should do the trick:
source_file['iwanttoreplacethisvalue3'] = data_file['iwantthisvalue3]
now all you need to do is save the dataframe (the table that we just updated) into a file, since you want to save it to a .txt file with "|" as the delimiter this is the line to do that (however you can customize how to save it in a lot of ways):
source_file.to_csv("C:\data\data_fixed.txt", sep='|', index=False)
Let me know if everything works and this helped you. I would also encourage to read up (or watch some videos) on pandas if you're planning to work with tabular data, it is an awesome library with great documentation and functionality.
I am trying to get the dimensions (shape) of a data frame using pandas in python without reading the entire data frame first in memory given that the file is quite large.
To get the number of columns with minimal loading of the file into the memory, I can for example use the argument below.
import pandas as pd
pd = pd.read_csv("myData.csv", nrows=1)
print(pd.shape)
To get the row numbers I can use the argument usecols = [1] when reading the file but there must be a simpler way of doing this.
If there are other packages or scripts that can easily give me such metadata information, I would be happy as well. It is really metadata I am looking for such as column names, number of rows, number of columns etc but I don't want to read the entire file in!
You don't even need pandas for this. Use the built-in csv module to parse the file:
import csv
with open('myData.csv')as fp:
reader = csv.reader(fp)
headers = next(reader) # The header row is now consumed
ncol = len(headers)
nrow = sum(1 for _ in reader) # What remains are the data rows
I wanted to delete specific rows from every single csv. files in my directory (i.e. from row 0 to 33), but I have 224 separate csv. files which need to be done. I would be happy if you help me how can I use one code to carry out this.
I think you can use glob and pandas to do this quite easily, I'm not sure if you want to write over your original files something I never recommend, so be careful as this code will do that.
import os
import glob
import pandas as pd
os.chdir(r'yourdir')
allFiles = glob.glob("*.csv") # match your csvs
for file in allFiles:
df = pd.read_csv(file)
df = df.iloc[33:,] # read from row 34 onwards.
df.to_csv(file)
print(f"{file} has removed rows 0-33")
or something along those lines..
This is a simple combination of two separate tasks.
First, you need to loop through all the csv files in a folder. See this StackOverflow answer for how to do that.
Next, within that loop, for each file, you need to modify the csv by removing rows. See this answer for how to read a csv, write a csv, and omit certain rows based on a condition.
One final aspect is that you want to omit certain line numbers. A good way to do this is with the enumerate function.
So code such as this will give you the line numbers.
import csv
input = open('first.csv', 'r')
output = open('first_edit.csv', 'w')
writer = csv.writer(output)
for i, row in enumerate(input):
if i > 33:
writer.writerow(row)
input.close()
output.close()
Iterate over CSV files and use Pandas to remove the top 34 rows of each file then save it to an output directory.
Try this code after installing pandas:
from pathlib import Path
import pandas as pd
source_dir = Path('path/to/source/directory')
output_dir = Path('path/to/output/directory')
for file in source_dir.glob('*.csv'):
df = pd.read_csv(file)
df.drop(df.head(34).index, inplace=True)
df.to_csv(output_dir.joinpath(file.name), index=False)
I made csv file in my python code itself and going to append next data in ti it but the error is comming
io.UnsupportedOperation: not readable
I tried code is:
df.to_csv('timepass.csv', index=False)
with open(r'timepass.csv', 'a') as f:
writer = csv.reader(f)
your_list = list(writer)
print(your_list)
want to append next data and store in the same csv file. so that csv file having both previous and current data.
so please help me to find out..
Thanks in advance...
It is so simple just try this:
import pandas as pd
df = pd.read_excel("NSTT.xlsx","Sheet1") #reading Excel
print(df) #Printing data frame
df.to_excel("new.xlsx") #Writing Dataframe into New Excel file
Now here if you want to append data in the same file then use
df.to_excel("new.xlsx","a")
And no need to add in a list as you can directly access the data same as a list with data frame only you have to define the location .
Please check this.
You can use pandas in python to read csv and write csv:
import pandas as pd
df = pd.read_csv("csv file")
print(df)
Try:
with open(r'timepass.csv', 'r') as f:
reader = list(csv.reader(f))
print(reader)
Here you are opening your file as r, which means read-only and assigning the list contents to reader with list(csv.reader(f)). Your earlier code a opens the file for appending only where in the documentation is described as:
'a' opens the file for appending; any data written to the file is
automatically added to the end
and does not support the read().
And if you want to append data to the csv file from a different list, use the with open as a with the writer method.
with open('lake.csv','a') as f:
csv.writer(f,[1,2,3]) #dummy list [1,2,3]
Or directly from the pandas.DataFrame.to_csv method from your new dataframe, with header = False so as not to append headers:
df.to_csv('timepass.csv', index=False)
df_new.to_csv(r'timepass.csv', mode='a', header=False) #once you have updated your dataframe, you can directly append it to the same csv file
you can use pandas for appending two csv quickly
import pandas as pd
dataframe1=pd.read_csv("a.csv")
dataframe2=pd.read_csv("b.csv")
dataframe1=dataframe1.append(dataframe2)
dataframe1=dataframe1.reset_index(drop=True)
dataframe1.to_csv("a.csv")