Getting started with Elasticsearch(ES), Kibana and the ES Python client for data analysis
In this article, we will set up Elasticsearch and Kibana using Docker Compose. We will then create an index in ElasticSearch and import Elon Musk’s tweets data using Pandas and the Python Elasticsearch client. Finally, we will do some analysis using Kibana on the Tweets.
Set up Elasticsearch and Kibana using Docker compose
Ensure that the latest version of Docker is installed and running on your machine. Copy the below docker compose file.
Start the containers by running the below command from the same location where the above docker-compose.yml file is downloaded.
docker-compose up
To verify that the containers are started successfully, navigate to http://localhost:9200 to check that Elasticsearch is running and http://localhost:5601 to check that Kibana is running.
Gather the data to load to Elasticsearch
We will download all Elon Musk’s tweets between 2010 and 2021 from Kaggle here. Extract the contents of the ZIP and store the downloaded TweetsElonMusk.csv file inside a directory called ‘staging’.
Create the Elasticsearch index and load the data
We will use Python to create the index in Elasticsearch. Install the python elastic search client library and then execute the below.
Once the index is created successfully, we can import the data to Elasticsearch using the Python client.
Visualize and analyze Tweets using Kibana
A new data view needs to be created in Kibana that matches the Index pattern as created previously. We will also set the timestamp field to the date field so that we can view the query results based on the date of the Tweet. Note that it will take a while before the approx. 12,000 tweets imported earlier show up in Kibana results.
Once the data view is set up, we can select it and set the time range as required and then search for some text. The below screenshot shows all Tweets from Elon Musk with the “Mars” text between 2015 and 2020.
We can see that the number of tweets in 2020 containing the “Mars” keyword were significantly higher than the prior years based on the distribution of the number of Tweets.
Next we will investigate the number of likes received on Mr. Musk’s tweets that contain the word “Tesla” between 2008 and 2020. The below screenshot shows the median number of likes for the month on the Y-axis and the month on the X axis.
It can be seen there is a huge spike of likes in November 2017 by hovering over the tallest bar in the bar chart. We will investigate the tweets that contributed to the huge number of likes by investigating the Tweets in November 2017. We see the below tweets in November 2017 with “tesla” keyword.
We can check the “likes_count” for each of the 5 tweets and see that the likes count for a Tweet on 17 November 2017 has 100K+ likes.
We can navigate to the Tweet details using the link and see it was about unveiling the “Tesla Semi” and probably that is why it had generated so much interest and likes.
This concludes the brief introduction to Elasticsearch, Kibana and using the Python Elasticsearch client. To stop the containers and remove the containers, networks, volumes and images run the below command from the same location as we ran the previous docker compose command:
docker-compose down —-volume --rmi all
This analysis of the Tweets using Kibana just scratches the surface and I am sure more interesting insights can be obtained by further analyzing the Tweets. Happy investigating using Kibana!