Log Store

What is logstore?

It is a lossless text-based log compressor and indexor that uses regexless parsing and does not need prior knowledge of the log format to achieve good compression. The compression mechanism is IO instead of CPU bound and uses memory to reduce IO latency.

Why use it?

I was researching methods of parsing text-based log data without prior knowledge of the data and backed into several interesting properties of my parsing strategy. One advantage was that by creating a template that represented the static and variable parts of a given log line and storing the variable parts in memory, I could represent any log line by providing the unique identifier for the format template and a unique identifier for each of the variables. This allowed me to store several hundred bytes with a few dozen. With optimization, I could get the ratio up to 400 to 1 and the strategy was very fast. The second suprize was that by noting the line number where each variable was seen while I parsed the logs, I created a search index of all variables on the fly.

I have used the tool on several occasions to store large volumes of log data where free disk space was limited.

What is in the works?

If there is some interest in this tool, I will migrate the build to GNU autotools and post the source code. My last version uses a pretty low tech compression algorithm for storing the identifying the variables in ram and on disk. That could be improved. Some who have played with the code believe that 1000 to 1 compression is possible. That has yet to be demonstrated.

Please report issues to webmaster@uberadmin.com

Last updated: 2010-05-31 @ 10:25pm