Why does looping over tweepy data seems do delete the data? [duplicate] - python

This question already has answers here:
Resetting generator object in Python
(19 answers)
Closed 7 months ago.
I'm using the Twitte API (with Tweepy) to extract a number of tweets via Python.
I'm looping
tweets = tweepy.Cursor(api.search,
q=((search_term) ),
since = str(t2)).items(10)
After I get the tweets, I run through a loop that puts the data within a dataframe:
However, when I run the code again, data has seemed to dissapear:
Is there something I could be doing differently? My purpose is to continue adding columns to the dataframe from the same tweet data, but since the data appears to dissapear after the first loop, I can't get it done.
Thanks in advance.

The items property of class tweepy.Cursor is not a list, but an iterator, see documentation.

Related

Is there simple way to transfer string values from one colomn to numerical values in another colomn? [duplicate]

This question already has answers here:
Pandas: convert categories to numbers
(6 answers)
Closed 4 months ago.
I'm new to python and have task to solve. I got large a .csv file and I was wondering is there simple way to transfer string values from one column to numerical values in another colomn.
For example, in one column I have a bunch of different factory names and in the new colum should be numerical value for every factory:
Factories
NumValues
FactoryA
1
FactoryB
2
FactoryA
1
FactoryC
3
I know that i could do this with dictionaries, but since there is quite a lot of different names(factories) i was wondering if there is already some library to make this process easier and faster?
I hope I explained my problem well.
you can use ngroup() . Basically, group by factories and give an id for every factory. Does this give the output you want?
df['NumValues']=df.groupby('Factories').ngroup() + 1

Removing the index when appending data and rewriting CSV using pandas [duplicate]

This question already has answers here:
How to get rid of "Unnamed: 0" column in a pandas DataFrame read in from CSV file?
(11 answers)
Closed 1 year ago.
I have a script that runs on a daily basis to collect data.
I record this data in a CSV file using the following code:
old_df = pd.read_csv('/Users/tdonov/Desktop/Python/Realestate Scraper/master_data_for_realestate.csv')
old_df = old_df.append(dataframe_for_cvs, ignore_index=True)
old_df.to_csv('/Users/tdonov/Desktop/Python/Realestate Scraper/master_data_for_realestate.csv')
I am using append(ignore_index=True), but after every run of the code I still get additional columns created at the start of my CSV. I delete them manually, but is there a way to stop them from the code itself? I looked the function but I am still not sure if it is possible.
My result file gets the following columns added after every run (one at a time, after each run):
This is really annoying to have to delete everytime.
Update: Data looks like that:
However the id is not unique. Every day it can be repeated. In my case it is not unique. This is an id of an online offer. The offer can be available for one day or for 5 months, or couple of days.
Did you try
to_csv(index=False)

Is there a way to loop through rows of data in excel with python until an empty cell is reached? [duplicate]

This question already has answers here:
How to find the last row in a column using openpyxl normal workbook?
(4 answers)
Closed 3 years ago.
I am working with a large excel chart. For each row of data I need to perform several tasks. Is there a way to construct a loop in python to run through each line until an empty cell is found?
For example:
Project1 Data Data Data
Project2 Data Data Data
Project3 Data Data Data
Project4 Data Data Data
In this scenario, I would want to run through the chart until after Project4. But different documents will have various sized charts so it will need to run until it hits an empty cell, not limited by a specific cell.
I am thinking a Do until (as you can tell I don't know python very well) type loop would be useful. I also know there is a way to attempt empty cells via openpyxl which I am using for this project.:
if sheet.cell(0, 0).value == xlrd.empty_cell.value:
# Do something
Currently, I would try to figure out a way to do something similar to this, unless someone suggests a better alternative:
For i=10 to 1000 in range:
#setting an upper limit of 1000 rows
if sheet.cell(0,i) <> xlrd.empty_cell.value:
variable = sheet.cell(2,i).value
#other body stuff
Else:
break
I know this code is rather undeveloped, I just wanted to ask before going in the wrong direction. I also am unsure how to assign i to run through the rows.
If what you need is to read the excel in python, I'd recommend taking a look at pandas read_excel.
Hope this helps!

Limit max number of columns displayed from a data frame in Pandas [duplicate]

This question already has answers here:
Selecting pandas column by location
(7 answers)
Closed 3 years ago.
I am trying to display 4 data frames in a list from a web-scraper project I'm working on. I'm new to Python/Pandas, and am trying to write a for loop to do this. My thinking is if I can set the display restrictions, it will just do this for each data frame in my list, if it means anything, I'm working out of a Jupyter Notebook. The only thing is, that I need to limit the number of columns, not rows, shown to only the first 5 (columns 0 - 4). I'm kind of at a loss here on how to do this.
I've tried to set up the initial loop as seen below, and I'm able to display each of my data-frames correctly, just not limited on columns like I want. I would also like to figure out how to add a header if you will, like a chart title in Excel to each, but that's a little less urgent at the moment.
Players = [MJ,KB,LJ,SC]
for players in Players:
display(players)
Additional information is that each data-frame has 11 columns, each df is stored to the corresponding variable in the list above.
Check out this link:
https://pandas.pydata.org/pandas-docs/stable/user_guide/options.html
There is an option that you can set using:
import pandas as pd
pd.set_option("display.max_columns", 4)
After doing so you can use display(players) and it should only show the first 4 columns

Why does formatting of a dataframe automatically get applied to another? [duplicate]

This question already has answers here:
why should I make a copy of a data frame in pandas
(8 answers)
Closed 4 years ago.
Sorry if this is a really dumb question. I'm a total noob at pandas and can't even figure out what key words to use to search for a solution for the problem I have.
Basically, I have a numeric data frame,
numeric_df = pd.DataFrame({"colA": [1.23, 2.34, 3.45],
"colB":[1.00, 2.00, 3.00]})
Now I create a second df that duplicates the value of numeric_df
formatted_df = numeric_df
Then I format the two columns in "formatted_df" according my needs, I'm doing it this way because I want to keep the values in numeric_df as numbers, so I can operate on them later.
formatted_df["colA"] = formatted_df["colA"].map("${:}".format)
formatted_df["colB"] = formatted_df["colB"].map("{:}Years".format)
But now, if I view numeric_df, its columns are already formatted and converged into a string. What is causing the problem? Why does my map method modify the original data frame?
Thank you in advance for any help you can give.
Using formatted_df = numeric_df mean the variables share the same memory footprint. Referencing the same object. To manipulate one independently you need a seperate object. For this you can clone an object or Pandas offers copy()
formatted_df = numeric_df.copy()
why should I make a copy of a data frame in pandas

Categories

Resources