What is Logstash?
Logstash is a powerful, open-source data processing pipeline that ingests data from multiple sources simultaneously, transforms it, and sends it to your favorite “stash” like Elasticsearch. Part of the Elastic Stack (formerly ELK Stack), Logstash excels at collecting, parsing, and transforming logs and events from various sources into a common format.
On Linux systems, Logstash serves as the central hub for data processing, capable of handling everything from simple log forwarding to complex data enrichment and transformation tasks. It’s designed to handle data from any source, in any format, with over 200 plugins available for different inputs, filters, and outputs.
Installing Logstash on Linux
Installation via Package Manager (Recommended)
The most straightforward way to install Logstash on Linux is through the official Elastic repository:
# Add Elastic repository GPG key
wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
# Add repository to sources list
echo "deb https://artifacts.elastic.co/packages/8.x/apt stable main" | sudo tee /etc/apt/sources.list.d/elastic-8.x.list
# Update package list and install
sudo apt update
sudo apt install logstash
For Red Hat-based systems (CentOS, RHEL, Fedora):
# Add Elastic repository
sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch
# Create repository file
cat << EOF | sudo tee /etc/yum.repos.d/elastic.repo
[elastic-8.x]
name=Elastic repository for 8.x packages
baseurl=https://artifacts.elastic.co/packages/8.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md
EOF
# Install Logstash
sudo yum install logstash
Manual Installation
For manual installation, download the appropriate package:
# Download Logstash (replace version as needed)
wget https://artifacts.elastic.co/downloads/logstash/logstash-8.11.0-linux-x86_64.tar.gz
# Extract archive
tar -xzf logstash-8.11.0-linux-x86_64.tar.gz
# Move to desired location
sudo mv logstash-8.11.0 /opt/logstash
# Create symlink for easier access
sudo ln -s /opt/logstash/bin/logstash /usr/local/bin/logstash
Logstash Configuration Fundamentals
Logstash configurations follow a simple three-section structure: input, filter, and output. Each section defines how data flows through the pipeline.
Basic Configuration Structure
input {
# Define data sources
}
filter {
# Transform and enrich data
}
output {
# Send processed data to destinations
}
Configuration File Location
Logstash configuration files are typically stored in:
/etc/logstash/conf.d/– Main configuration directory/etc/logstash/logstash.yml– Main settings file/etc/logstash/pipelines.yml– Pipeline definitions
Essential Logstash Commands
Starting and Managing Logstash
# Start Logstash service
sudo systemctl start logstash
# Enable auto-start on boot
sudo systemctl enable logstash
# Check service status
sudo systemctl status logstash
# Stop Logstash
sudo systemctl stop logstash
# Restart Logstash
sudo systemctl restart logstash
Running Logstash with Custom Configuration
# Run with specific configuration file
/usr/share/logstash/bin/logstash -f /path/to/config.conf
# Test configuration syntax
/usr/share/logstash/bin/logstash -f /path/to/config.conf --config.test_and_exit
# Run in debug mode
/usr/share/logstash/bin/logstash -f /path/to/config.conf --log.level debug
Input Plugins and Configuration
File Input Plugin
The file input plugin is one of the most commonly used inputs for reading log files:
input {
file {
path => "/var/log/apache2/access.log"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => "plain"
}
}
Expected Output: Logstash will continuously monitor the specified log file and process new entries as they’re written.
Beats Input Plugin
For receiving data from Beats (Filebeat, Metricbeat, etc.):
input {
beats {
port => 5044
host => "0.0.0.0"
}
}
Syslog Input Plugin
To receive syslog messages over the network:
input {
syslog {
port => 514
type => "syslog"
}
}
TCP Input Plugin
For receiving data over TCP connections:
input {
tcp {
port => 9999
codec => json_lines
}
}
Filter Plugins for Data Processing
Grok Filter
Grok is the primary filter for parsing unstructured log data into structured data:
filter {
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
}
}
Custom grok patterns for specific log formats:
filter {
grok {
match => {
"message" => "%{TIMESTAMP_ISO8601:timestamp} %{LOGLEVEL:level} %{GREEDYDATA:message}"
}
}
}
Date Filter
Parse timestamps and set the @timestamp field:
filter {
date {
match => [ "timestamp", "yyyy-MM-dd HH:mm:ss" ]
target => "@timestamp"
}
}
Mutate Filter
Modify fields, rename, remove, or add fields:
filter {
mutate {
rename => { "old_field" => "new_field" }
remove_field => [ "unwanted_field" ]
add_field => { "environment" => "production" }
convert => { "response_time" => "integer" }
}
}
Conditional Processing
Apply filters conditionally based on field values:
filter {
if [type] == "apache" {
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
}
} else if [type] == "nginx" {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
}
}
Output Plugins and Destinations
Elasticsearch Output
Send processed data to Elasticsearch:
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "logstash-%{+YYYY.MM.dd}"
document_type => "_doc"
}
}
File Output
Write processed data to files:
output {
file {
path => "/var/log/logstash/processed-%{+YYYY-MM-dd}.log"
codec => line { format => "%{timestamp} %{level} %{message}" }
}
}
Stdout Output (for Testing)
Display output in the console for debugging:
output {
stdout {
codec => rubydebug
}
}
Real-World Configuration Examples
Apache Log Processing Pipeline
Complete configuration for processing Apache access logs:
input {
file {
path => "/var/log/apache2/access.log"
start_position => "beginning"
sincedb_path => "/var/lib/logstash/sincedb_apache"
type => "apache"
}
}
filter {
if [type] == "apache" {
grok {
match => { "message" => "%{COMMONAPACHELOG}" }
}
date {
match => [ "timestamp", "dd/MMM/yyyy:HH:mm:ss Z" ]
}
if [response] {
mutate {
convert => { "response" => "integer" }
}
}
if [bytes] {
mutate {
convert => { "bytes" => "integer" }
}
}
mutate {
remove_field => [ "timestamp", "message" ]
}
}
}
output {
elasticsearch {
hosts => ["localhost:9200"]
index => "apache-logs-%{+YYYY.MM.dd}"
}
stdout {
codec => rubydebug
}
}
Multi-Input Pipeline
Configuration handling multiple log sources:
input {
file {
path => "/var/log/nginx/access.log"
type => "nginx"
tags => ["nginx", "access"]
}
file {
path => "/var/log/nginx/error.log"
type => "nginx"
tags => ["nginx", "error"]
}
file {
path => "/var/log/syslog"
type => "syslog"
tags => ["system"]
}
}
filter {
if "nginx" in [tags] and "access" in [tags] {
grok {
match => { "message" => "%{NGINXACCESS}" }
}
} else if "nginx" in [tags] and "error" in [tags] {
grok {
match => { "message" => "%{NGINXERROR}" }
}
} else if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:timestamp} %{IPORHOST:host} %{PROG:program}: %{GREEDYDATA:message}" }
overwrite => [ "message" ]
}
}
date {
match => [ "timestamp", "MMM dd HH:mm:ss", "MMM d HH:mm:ss" ]
}
}
output {
if [type] == "nginx" {
elasticsearch {
hosts => ["localhost:9200"]
index => "nginx-%{+YYYY.MM.dd}"
}
} else if [type] == "syslog" {
elasticsearch {
hosts => ["localhost:9200"]
index => "syslog-%{+YYYY.MM.dd}"
}
}
}
Performance Tuning and Optimization
Pipeline Configuration
Optimize Logstash performance by tuning pipeline settings in /etc/logstash/logstash.yml:
# Pipeline workers (usually CPU cores)
pipeline.workers: 4
# Batch size for processing
pipeline.batch.size: 1000
# Batch delay
pipeline.batch.delay: 50
# Pipeline buffer size
pipeline.queue.max_bytes: 1gb
# Enable persistent queues
queue.type: persisted
JVM Settings
Configure JVM heap size in /etc/logstash/jvm.options:
# Set heap size (should be 50% of available RAM)
-Xms2g
-Xmx2g
# Garbage collection settings
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
Monitoring and Troubleshooting
Monitoring APIs
Logstash provides APIs for monitoring:
# Check node info
curl -XGET "localhost:9600/_node?pretty"
# Check pipeline stats
curl -XGET "localhost:9600/_node/stats/pipelines?pretty"
# Check hot threads
curl -XGET "localhost:9600/_node/hot_threads?pretty"
Log File Monitoring
Monitor Logstash logs for troubleshooting:
# View Logstash logs
sudo tail -f /var/log/logstash/logstash-plain.log
# Check for errors
sudo grep -i error /var/log/logstash/logstash-plain.log
# Monitor with journalctl
sudo journalctl -u logstash -f
Common Troubleshooting Commands
# Test configuration syntax
sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t
# Check configuration files
sudo /usr/share/logstash/bin/logstash --path.settings /etc/logstash --config.test_and_exit
# Run in debug mode
sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash --log.level debug
Security Considerations
File Permissions
Ensure proper file permissions for security:
# Set ownership for Logstash files
sudo chown -R logstash:logstash /etc/logstash/
sudo chown -R logstash:logstash /var/log/logstash/
sudo chown -R logstash:logstash /var/lib/logstash/
# Set proper permissions
sudo chmod 640 /etc/logstash/conf.d/*.conf
sudo chmod 600 /etc/logstash/logstash.yml
Network Security
Configure secure communication with Elasticsearch:
output {
elasticsearch {
hosts => ["https://localhost:9200"]
user => "logstash_writer"
password => "secure_password"
ssl => true
ssl_certificate_verification => true
cacert => "/path/to/ca.crt"
}
}
Integration with Elastic Stack
Filebeat to Logstash
Configure Filebeat to send data to Logstash:
# Filebeat configuration
output.logstash:
hosts: ["localhost:5044"]
input {
beats {
port => 5044
}
}
Logstash to Kibana
Data processed by Logstash and stored in Elasticsearch is automatically available in Kibana for visualization and analysis.
Advanced Features
Multiple Pipelines
Configure multiple pipelines in /etc/logstash/pipelines.yml:
- pipeline.id: apache
path.config: "/etc/logstash/conf.d/apache.conf"
pipeline.workers: 2
- pipeline.id: nginx
path.config: "/etc/logstash/conf.d/nginx.conf"
pipeline.workers: 2
Dead Letter Queue
Handle failed events with dead letter queue:
# Enable in logstash.yml
dead_letter_queue.enable: true
dead_letter_queue.max_bytes: 1gb
# Process dead letter queue
input {
dead_letter_queue {
path => "/var/lib/logstash/dead_letter_queue"
pipeline_id => "main"
}
}
Best Practices
- Configuration Management: Use version control for configuration files
- Resource Monitoring: Monitor CPU, memory, and disk usage regularly
- Field Naming: Use consistent field naming conventions
- Error Handling: Implement proper error handling and logging
- Testing: Test configurations in development environments first
- Documentation: Document custom grok patterns and configurations
- Security: Regularly update Logstash and secure network communications
Logstash on Linux provides a robust foundation for building scalable data processing pipelines. By following these examples and best practices, you can create efficient log processing systems that handle large volumes of data while maintaining performance and reliability. Regular monitoring and optimization ensure your Logstash deployment continues to meet your data processing requirements as they evolve.








