Skip to content

Dataset can't be loaded.  #1

@guillaume-chevalier

Description

@guillaume-chevalier

I've tried to download the dataset, but it seems impossible to download.
I went from your recent article: https://ahmedbesbes.com/overview-and-benchmark-of-traditional-and-deep-learning-models-in-text-classification.html
To this: http://thinknook.com/twitter-sentiment-analysis-training-corpus-dataset-2012-09-22/
To then this: http://www.sananalytics.com/lab/twitter-sentiment/
However the last link of sananalytics.com doesn't load at all.

Or else, I try to download the data from your previous blog post:
https://ahmedbesbes.com/sentiment-analysis-on-twitter-using-word2vec-and-keras.html
I've tried to download the dataset from the Google Drive, but it seems erroneous. First, I copied your def ingest(): method. Then, I tried. first it didn't load: had the change the encoding to latin-1. Then, I got this and I realized the dataset had no columns. I had the error: ValueError: labels ['ItemID' 'SentimentSource'] not contained in axis, and it was on this line: data.drop(['ItemID', 'SentimentSource'], axis=1, inplace=True).

I wonder how I would be able to reproduce your experiments or at least use the same data for a quick comparison. I didn't tried further than what I've put above. I guess adding names to the columns manually might do it, but from this point on I suspect that probably other things wouldn't work as expected too down the road. It'd be very cool if you could an easy data loading pipeline.

Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions