Truncated Beginning String Characters When Reading txt file to pandas

Truncated Beginning String Characters When Reading txt file to pandas - python

I am trying to read a txt file to a pandas data using pandas.read_fwf. Here's my line of code:
klia_sepang = pd.read_fwf('KLIA_SEPANG.txt', sep='[ ]{1,}')
However, I'm finding out that all string of 100th decimal places will be truncated at the beginning. So a 791.0 becomes 91.0, 309.0 becomes 09.0, and so on. I'm not sure why this happens. I've tried adding parameters like colspecs and widths to no avail.
txt file
pandas data

Looking at your text file, you probably want to use the widths or colspec parameters to define how to break up the file into columns. Or you might have success just letting read_fwf infer how to organize the columns of data.
I don't think passing "sep" with those characters is helping, and it may be confusing the parser.

Related

Search for a word, and modify the whole line in Python text processing

This is my carDatabase.txt
CarID:c01 ModelName:honda VehicleType:city Price:20
CarID:c02 ModelName:honda VehicleType:x Price:30
I want to search for the carID and be only able to modify the whole line without interrupting others
my current code is here:
# Converting txt data into a string and modify
carsDatabaseFile = open('carsDatabase.txt', 'r')
allDataFromDatabase = [line.split(',') for line in carsDatabaseFile.readlines()]

Note:
Your question has a couple of issues: your sample from carDatabase.txt looks like it is tab-delimited, but your current code looks like it is splitting the line around the ',' character. This also looks like a place where a list comprehension might be hurting you more than it is helping you. Break that up into a for-loop if you're trying to add some logic to manipulate a single line.
For looking at CSV files, I would highly recommend using pandas for general manipulation of data in comma ceparated as well as a number of other formats.
That said, if you are truly restricted to only using built-in packages, or you are looking at this as a learning exercise, and your goal is to directly manipulate just one line of that file, what you are looking for is the seek method. You can use this in combination with the tell method ( documented just blow seek in the above link ) to find where you are in the file.
Write a for loop to identify which line in the file you are looking for
From there, you can get the output of tell() to find the specific place in the file you are trying to manipulate
Using the output from the above two steps, you can set the file pointer to a specific location using the seek() method (by byte: files are really stored as one dimensional).
You can now use the write() method to directly update the file at the location you determined above.

Replace blank values with string

I need to manipulate a csv file in a way to go into the csv file look for blank fields between c0-c5 in my example csv file. with the csv file where ever there are blanks I would like to replace the blank with any verbage i want, like "not found"
the only thing for code I have so far is dropping a column I do not need, but the manipulation I need I really can not find anything.. maybe it is not possible?
also, i am wondering how to change a column name..thanks..
#!/bin/env python
import pandas
data = pandas.read_csv('report.csv')
data = data.drop(['date',axis=1)
data.to_csv('final_report.csv')

Alternatively and taking your "comment question" into account (if you do not necessarily want to use pandas as in n1colas.m's answer) use string replacements and
simply loop over your file with:
with open("modified_file.csv","w") as of:
with open("report.csv", "r") as inf:
for line in inf:
if "#" not in line: # in the case your csv file has a comment marker somewhere and it is called #, the line is skipped, which means you get a clean comma separated value file as the outfile- if you do want to keep such lines simply remove the if condition
mystring=line.replace(", ,","not_found").replace("data","input") # in case it is not only one blank space you can also use the regex for n times blank space here
print(mystring, file=of, end=""); # prints the replaced line to outfile and writes no newline
I know this is not the most efficient way to do it, but probably the one where you easily understand what you are doing and are able to modify this to your heart's desire.
For any reasonably sized csv files it sould still work nearly instantaneously.
Also for testing purposes always use a separate file (of) for such replacements instead of writing to your infile as your question seems to state. Check that it did what you wanted. ONLY THEN overwrite your infile. This may seem unnecessary at first, but mistakes happen...

You have to perform this line
data['data'] = data['data'].fillna("not found")
Here the documentation https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.fillna.html
Here an example
import pandas
data = pandas.read_csv('final_report.csv')
data.info()
data['data'] = data['data'].fillna("Something")
print(data)
I would suggest to change the data variable to something different, because your column has the same name and can be confusing.

Pandas: Read CSV: ValueError: could not convert string to float

I'm trying to read a large and complex CSV file with pandas.read_csv.
The exact command is
pd.read_csv(filename, quotechar='"', low_memory=True, dtype=data_types, usecols= columns, true_values=['T'], false_values=['F'])
I am pretty sure that the data types are correct. I can read the first 16 million lines (setting nrows=16000000) without problems but somewhere after this I get the following error
ValueError: could not convert string to float: '1,123'
As it seems, for some reason pandas thinks two columns would be one.
What could be the problem? How can I fix it?

I found the mistake. The problem was a thousand separator.
When writing the CSV file, most numbers were below thousand and were correctly written to the CSV file. However, this one value was greater than thousand and it was written as "1,123" which pandas did not recognize as a number but as a string.

How do I convert a .csv file to .db file using python?

I want to convert a csv file to a db (database) file using python. How should I do it ?

You need to find a library that helps you to parse the csv file, or read the file line by line and parse it with standard python, it could be as simple as split the line on commas.
Insert in the Sqlite database. Here you have the python documentation on SQLite. You could also use sqlalchemy or other ORM .
Another way, could be using the sqlite shell itself.

I don't think this can be done in full generality without out-of-band information or just treating everything as strings/text. That is, the information contained in the CSV file won't, in general, be sufficient to create a semantically “satisfying” solution. It might be good enough to infer what the types probably are for some cases, but it'll be far from bulletproof.
I would use Python's csv and sqlite3 modules, and try to:
convert the cells in the first CSV line into names for the SQL columns (strip “oddball” characters)
infer the types of the columns by going through the cells in the second CSV file line (first line of data), attempting to convert each one first to an int, if that fails, try a float, and if that fails too, fall back to strings
this would give you a list of names and a list of corresponding probably types from which you can roll a CREATE TABLE statement and execute it
try to INSERT the first and subsequent data lines from the CSV file
There are many things to criticize in such an approach (e.g. no keys or indexes, fails if first line contains a field that is a string in general but just so happens to contain a value that's Python-convertible to an int or float in the first data line), but it'll probably work passably for the majority of CSV files.

Using numpy.loadtxt, how does one convert strings in the .txt file into integer values/floats?

So, I have a .txt file that I want to read into Pylab. The problem is that, when I try to do so using the numpy.loadtxt("filename.txt"), Pylab cannot read the numbers in the array in my file as float values (it returns the error: cannot convert string to float.).
I am not sure if there is something wrong with my syntax as above; when I remove the quotation marks inside the parentheses, numpy.loadtxt(filename.txt), Pylab returns the error: filename is not defined.
Any suggestions on how to read a series of numbers saved in a .txt file into Pylab as an array of floats?

You need to provide sample lines in your filename.txt file. I guess you may need to read the doc for numpy.loadtxt here. There are some good examples on the document page.
BTW, the second command numpy.loadtxt(filename.txt) is wrong since you have not defined a variable filename.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Truncated Beginning String Characters When Reading txt file to pandas - python

Related

Search for a word, and modify the whole line in Python text processing

Replace blank values with string

Pandas: Read CSV: ValueError: could not convert string to float

How do I convert a .csv file to .db file using python?

Using numpy.loadtxt, how does one convert strings in the .txt file into integer values/floats?

Categories

Resources