I used PyPDF2 to retrieve tabular data and I put it into a text file. It comes out with each word or number as a new line. I want to specify that the "states" be rows, the "years" to be column headers, and the following numbers to be put into the rows 10 at a time.
The PDF of the file has a good illustration of what I am trying to do. Does anyone have some good ideas as to how to reformat the text file to do so?
Provided here is the link to the pdf which I am taking data from.
file:///V:/Final%20Project/Data/DuckStampSales.pdf
My text file looks as such.
textfile
Related
I have an html form where I am getting the user to select their bank and upload a CSV file of their transactions to handle financial data:
I can store the file in a variable named 'file' but can't find a way to open it with traditional methods:
e.g. this doesn't work
I know the file is valid in the python code because I can open it with pandas, it messes up the column headings as there is some preamble data in the file.
Here is the file:
I am trying to do this so I can search for a row number by string. I need to know what row number 'Date' is on so I can pass that value into skiprows() with pandas in order to get a correct dataframe. This is what I came up with so far:
But obviously I cannot open the file in the first place. Ideally my output would be 7. I can't just use a static value of 7 for skiprows() with pandas as the amount of preamble data before the table changes from file to file.
This may not be an optimal answer, but maybe it will work for you:
file_content = file.stream.read().decode("UTF8")
lines = file_content.split('\n')
Then you can look for the line starting with Date to figure your skiprows value.
I have mutiple text files that contains data like file1 ,file2,file3. Its just an example, I am wondering how to populate specific data in an Excel sheet like this excel sheet
I am new to learning python and the combination of text to excel through python that's why finding it hard to approch
Basically what you need is to Parse and write a new File in the csv File format for the use in excel
file1 -> PythonScript.py -> excel.csv
File Parser Python Tutorial Tutorial
The .csv File looks like this. You have a header and the data seperated with commas.
excel.csv:
Name,Data
hibiscus_3,54k
hibiscus_7,67k
Rose_3,87MB
Hope i could help you
My issue is as follows.
I've gathered some contact data from SurveyMonkey using the SM API, and I've converted that data into a txt file. When opening the txt file, I see the full data from the survey that I'm trying to convert into csv, however when I use the following code:
df = pd.read_csv("my_file.txt",sep =",", encoding = "iso-8859-10")
df.to_csv('my_file.csv')
It creates a csv file with only two lines of values (and cuts off in the middle of the second line). Similarly if I try to organize the data within a pandas dataframe, it only registers the first two lines, meaning most of my txt file is not being read registered.
As I've never run into this problem before and I've been able to convert into CSV without issues, I'm wondering if anyone here has ideas as to what might be causing this issue to occur and how I could go about solving it?
All help is much appreciated.
Edit:
I was able to get the data to display properly in csv, when I converted it directly into csv from json instead of converting it to a txt file first. I was not however able to figure out what when wrong in the conversion from txt to csv, as I tried multiple different encodings but came to the same result.
I fetched some tweets with tweepy api and saved them in a txt file. Now I want to extract them into data frame with panda, like, the content of the tweet and maybe the date.
Any ideas how I can do it ?
Btw. I'm really new in python.
Thanks in advance
Depending on the format of your txt file,your approach may vary but overall:
You want to open the file in python, read it (probably line by line), and parse it into a panda dataframe.
for example to extract a line of your document:
file = open(“testfile.txt”, “r”)
for line in file:
# do something (like parsing) on your line.
I'm struggling to prepare my text data in the most suitable format for spacy.
I have created a CSV file - DOWNLOAD HERE which shows how my raw text data is structured.
If you opened this in excel you would see that each cell is a document however the 3rd document contains line breaks, but i do not want a new line within a document to be parsed as a new document, only a new cell to represent the new document.
if I import the CSV to a pandasDF the resulting DF retains the structure that I want, but spacy will not work directly from this DF
df = pd.read_csv('test_line_breaks.csv')
I need to get this data into a format usable by spacy so that it recognises new documents correctly and doesnt interpret new lines within a document as a new document.
I hope that makes sense.