Posts for: #Logstash

Elasticsearch Performance and Tuning

A dedicated performance course run by Matt Gregory from Elastic, an absolute legend with deep Elasticsearch expert. Contents Cool takeaways Tuning for Index Speed Increase the refresh interval Index architecting Bulk Hardware settings to improve performance Disable swapping Indexing Buffer size Best practices and scaling Disable replics for initial loads Use auto-generated IDs Use Cross Cluster Replication Thread Pools Memory Locking Transforms Tuning for search API settings and data modelling to improve search performance Search as few fields as possible One big copy_to field as opposed to individual text multi field Consider mapping identifiers as keywords Document modeling Consider mapping numeric fields as keyword Hardware settings to improve search Warm Up Global Ordinals Warm up filesystem Cache Use index sorting to speed up search Ways to improve searches must and should clauses filter and must not clauses node query cache shard request cache Aggregation performance Search rounded dates Force merge read only indices Search profiler and Explain API Search profiler Search profiler API ID Query section Timing breakdown Collection section Collectors reasons Rewrite section Explain and Tasks API Explain API Score Field length normalization and coordindation Other Query Parameters API Settings to improve indexing performance Hardware settings to improve performance Best Practices and scaling Transforms Cool takeaways Increase the refresh_interval from default 1s to something higher, like 10s.
Read more →

Elasticsearch Engineer 8.1

Revised 2024 edition based on Elasticsearch 8.1. Recently the opportunity to attend the latest revision of the 4-day Elasticsearch engineer course, which I did in-person about 5 years ago in Sydney. Elasticsearch has often been an integral part of the data solutions I’ve been involved with and I’m quite fond of it. This time round the course only runs in a virtual class room format (using strigo.io) with our awesome trainers Krishna Shah and Kiju Kim.
Read more →

Black belt Elasticsearch

Some more advanced Elasticsearch wisdom I gleaned from Jason Wong and Mark Laney from Elastic. Contents Environment with Config X-Pack Security (the 1337 way) Roles Built-in Query Web UI (batteries included) Internals Lucene Segments Elasticsearch Indexing Transaction Log and Flushing Doc Values Caching Field Modelling Typing Denormalising Range Types Mapping Parameters Fixing Data Painless Reindexing API’s Picking up Mapping Changes Multi-fields Custom Marker (flag) Field Fixing Fields Advanced Search and Aggregations Patterns Wildcard Query Regexp Qury Null Script (painless) Query Script Field Performance Considerations Search Templates Aggregations Percentile Top Hits Scripted (painless) Aggregations Significant Terms Aggregation Pipeline Aggregations Cluster Management Dedicated Nodes Hot Warm Architecture Tags Verify Shard Allocation Forced Awareness Capacity Planning Shard Allocation Litmus Test Primary Shards Scaling with Indices Scaling with Replicas Resources Time Based Data API’s for Managing Indices Document Modelling Nested Objects Nested Aggregations Parent Child Relationships Argh Which Technique is Best?
Read more →

Elasticsearch Basics

Some Elasticsearch wisdom I gleaned from Jason Wong and Mark Laney from Elastic. Contents Use cases Log stash vs Beats? Time Series vs Static Data Logstash Installation Starting and Stopping Elasticsearch Killing Communication Discovery module (networking) Security Read-only Enabling X-Pack (Elasticsearch Security) CRUD Ingestion Reading Search Query and Filter Contexts Mapping Inverted Index Multi Fields (keyword fields) Anatomy of an Analyzer Custom Analyzer The reindex API Node Types Cluster state Shards Anatomy of Search (Shards) Troubleshooting Configuration Responses Cluster and Shard Health Diagnosing Issues Improving Search Results Multi-field Search Boosting Fuzziness Exact Terms Sorting Paging Highlighting Aggregations Best Practices Index Aliases Index Templates Scroll Search Cluster Backup Use cases Search Logging Metrics - unlike logs, are typically not in a text format.
Read more →

Logstash

A quick walkthrough of Logstash, the ETL engine offered by the Elastic Stack. Logstash is an open source, server-side data processing pipeline that ingests data from a multitude of sources simultaneously, transforms it, and then sends it to your favorite stash Logstash gained its initial popularity with log and metric collection, such as log4j logs, Apache web logs and syslog. Its application has broadened, to all kinds of data sources like large scale event streams, webhooks, database and message queue integration.
Read more →