Ingest Data From Postgresql to Elasticsearch using Logstash

Ingest Data From Postgresql to Elasticsearch using Logstash
Postgresql to Elasticsearch

Table of Contents

  1. Introduction
  2. Steps to Ingest Data from PostgreSQL to Elasticsearch using Logstash in Docker

    2.1. Set up Docker (Here)

    2.2. Setup ElasticSearch, Logstash and Kibana using Docker

    2.3. Setup Postgresql using Docker

    2.4. Create Logstash.conf

    2.5 Start Docker Containers

    2.6. Verify data ingestion

Introduction

In this tutorial, we'll show you how to use Logstash in a Docker environment to ingest data from PostgreSQL to Elasticsearch. You may gather, transform, and store data from numerous sources with Logstash, a potent data processing pipeline. Elasticsearch is a distributed search and analytics engine, and PostgreSQL is a well-known open-source relational database.

You may quickly create a containerized environment to speed up the data ingesting process by combining Logstash and Docker. This technique guarantees a smooth data transfer from PostgreSQL to Elasticsearch and provides flexibility, scalability, and isolation for running Logstash.

Let's now explore the procedures needed to complete this task.

Steps to Ingest Data from PostgreSQL to Elasticsearch using Logstash in Docker

2.1. Set up Docker (Here)

Before proceeding, ensure that Docker is installed and running on your system. Docker provides a convenient way to manage and deploy containers.

2.2. Setup ElasticSearch, Logstash and Kibana using Docker

You folder structure should look something like this :

Postgresql to Elasticsearch

To setup ELK create a new file inside elk folder named docker-compose.yml and paste the following content: (Check the folder structure)


version: "3.8"
services:
  Elasticsearch:
    image: elasticsearch:8.8.0
    container_name: elasticsearch
    restart: always
    volumes:
      - elastic_data:/usr/share/elasticsearch/data/
    environment:
      ES_JAVA_OPTS: "-Xmx256m -Xms256m"
      discovery.type: single-node
      xpack.security.enabled: false
      xpack.security.enrollment.enabled: false
    ports:
      - "9200:9200"
      - "9300:9300"
    networks:
      - blog-builder-backend

  Logstash:
    image: logstash:8.8.0
    container_name: logstash
    restart: always
    volumes:
      - ./logstash/:/logstash_dir
    command: logstash -f /logstash_dir/logstash.conf
    depends_on:
      - Elasticsearch
    ports:
      - "9601:9600"
    environment:
      LS_JAVA_OPTS: "-Xmx256m -Xms256m"
    networks:
      - blog-builder-backend

  Kibana:
    image: kibana:8.8.0
    container_name: kibana
    restart: always
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_URL=http://elasticsearch:9200
    depends_on:
      - Elasticsearch
    networks:
      - blog-builder-backend
volumes:
  elastic_data: {}

networks:
  blog-builder-backend:
    driver: bridge
    

2.2. Setup Postgres using Docker

To setup Postgresql and manage data within the database we need two docker images Postgresql and Pgadmin

Now lets create a postgres folder inside our project directory and create a file named docker-compose.yml and paste the following content: (Check the folder structure)


version: "3.8"
services:
  postgres:
    image: postgres
    container_name: postgres
    hostname: postgres
    environment:
      POSTGRES_DB: blogService
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
    ports:
      - 5432:5432
    volumes:
      - blog-service-db:/var/lib/postgresql/data
    networks:
      - elk_blog-builder-backend
  pgadmin:
    image: dpage/pgadmin4
    container_name: pgadmin
    restart: always
    depends_on:
      - postgres
    ports:
      - "5050:80"
    environment:
      PGADMIN_DEFAULT_EMAIL: admin@admin.com
      PGADMIN_DEFAULT_PASSWORD: admin
    # volumes:
    #   - user-service-pgadmin:/var/lib/pgadmin
    networks:
      - elk_blog-builder-backend

volumes:
  blog-service-db:

networks:
  elk_blog-builder-backend:
    external: true


2.4. Create Logstash.conf

To configure Logstash for ingesting data from PostgreSQL to Elasticsearch, you need to create a Logstash configuration file. This file defines the input source, output destination, and any necessary transformations.

    2.4.1 Create a folder inside elk directory named logstash

    2.4.2 Save the logstash.conf file inside the logstash folder that we created just now (Check the folder structure)

    2.4.3 Now paste the following code inside logstash.conf

    
    input {
        jdbc {
            jdbc_connection_string => "jdbc:postgresql://postgres:5432/blogService"
            jdbc_user => "postgres"
            jdbc_password => "postgres"
            jdbc_driver_library => "/logstash_dir/postgresql-42.6.0.jar"
            jdbc_driver_class => "org.postgresql.Driver"
            schedule => "* * * * *"
            statement => "select * from blogdb where updated_at >= :sql_last_value order by updated_at asc" 
        }  
    }
    
    output {
      stdout {
        codec => rubydebug
      }
    }
    
    
    output {
        elasticsearch {
            hosts => ["elasticsearch:9200"]
            index => "blogs"
            document_id => "%{[id]}"
        }
    }
        

    2.4.4 Now download the latest postgresql jdbc jar plugin here and copy it to the logstash directory

2.5. Start Docker containers

Go to the elk folder and run

docker-compose up

Similarly go to the postgres folder and run

docker-compose up

Wait till all containers are up and running

2.6. Verify data ingestion

After starting the Logstash container, check its logs to ensure there are no errors. Run the following command to view the container logs:

docker logs logstash

Once the Logstash container is running without any issues,go to the Pgdmin and create a database and a table with name specified in logstash.conf file (currently its blogService and blogdb respectively ). Each time you create or update or delete to the table in postgres database table you will see an updated logs in logstash and then access your Elasticsearch instance and verify that the data has been successfully ingested into the specified index(in current config its named as blog).

Congratulations! You have now successfully set up Logstash in Docker to ingest data from PostgreSQL to Elasticsearch. This process allows you to efficiently transfer and store data, enabling powerful search and analytics capabilities using Elasticsearch.

Comments