Permanent log archives

Introduction

Each night, Papertrail automatically uploads your log messages and metadata to Amazon’s cloud storage service, S3. Papertrail stores one copy in our S3 bucket, and optionally, also stores a copy in a bucket that you provide. You have full control of the optional archive in your own bucket, since it’s tied to your AWS account.

Already use S3? Jump to Automatic S3 Archive Export.

Format

For most accounts, Papertrail creates one file per day in tab-separated value (tsv) format, gzip compressed (gz). Days are from midnight to midnight UTC.

For accounts with higher-volume plans (above about 50 GB/month of logs, though the specifics vary), Papertrail creates one file per hour so the files are of a manageable size.

Each line contains one message. The fields are ordered:

id
generated_at
received_at
source_id
source_name
source_ip
facility_name
severity_name
program
message

For a longer description of each column, see Log Search API: Responses.

Here’s an example log message. Tabs have been converted to linebreaks for readability:

50342052
2011-02-10 00:19:36 -0800
2011-02-10 00:19:36 -0800
42424
mysystem
208.122.34.202
User
Info
testprogram
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor

A line actually looks like this:

50342052\t2011-02-10 00:19:36 -0800\t2011-02-10 00:19:36 -0800\t42424\tmysystem\t208.122.34.202\tUser\tInfo\ttestprogram\tLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor

The tab-separated values (TSV) format is easy to read and parse. The archives are stored in a directory-per-day structure that makes it easy to load and analyze a single day’s records.

Usage example

Show identical messages

Here’s how to extract the message (field 10) from the archive file 2016-10-31.tsv.gz, then show the messages sorted by the number of identical occurrences (duplicates).

gzip -cd 2016-10-31.tsv.gz | cut -f10 | sort | uniq -c | sort -n

Windows PowerShell can do the same thing, with 7-Zip’s help. In this example, [9] still selects the message (field 10), due to zero-based indexing.

7z x -so 2016-10-31.tsv.gz | %{($_ -split '\t')[9]} | group | sort count,name | ft count,name -wrap

Show similar messages

The most common messages often differ only by a random number, IP address, or message suffix. These near-duplicates can be discovered with a bit more work.

Here’s how to extract the sender, program, and message (fields 5, 9, and 10) from all archive files, squeeze whitespace and digits, truncate after eight words, and sort the result by the number of identical occurrences (duplicates).

gzip -cd *.tsv.gz | # extract all archives
 cut -f 5,9-      | # sender, program, message
 tr -s '\t' ' '   | # squeeze whitespace
 tr -s 0-9 0      | # squeeze digits
 cut -d' ' -f 1-8 | # truncate after eight words
 sort | uniq -c | sort -n

# or, as a one-liner:
gzip -cd *.tsv.gz | cut -f 5,9- | tr -s '\t' ' ' | tr -s 0-9 0 | cut -d' ' -f 1-8 | sort | uniq -c | sort -n

Once again, Windows PowerShell can do the same thing, with 7-Zip’s help.

7z x -so *.tsv.gz                     | # extract all archives
 %{($_ -split '\t')[4,8,9] -join ' '} | # sender, program, message
 %{$_ -replace ' +',' '}              | # squeeze whitespace
 %{$_ -replace '[0-9]+','0'}          | # squeeze digits
 %{($_ -split ' ')[0..7] -join ' '}   | # truncate after eight words
 group | sort count,name | ft count,name -wrap

# or, as a one-liner:
7z x -so *.tsv.gz | %{($_ -split '\t')[4,8,9] -join ' '} | %{$_ -replace ' +',' '} | %{$_ -replace '[0-9]+','0'} | %{($_ -split ' ')[0..7] -join ' '} | group | sort count,name | ft count,name -wrap

Downloading logs

In addition to being downloadable on Archives, you can retrieve archive files using your Papertrail HTTP API key. The URL format is simple and predictable.

Papertrail generates either daily or hourly archives based on the amount of log data transfer included in your plan, and thus, which duration is likely to be a manageable size. The examples below cover each situation separately.

Simple example

Daily

If archives show that Papertrail is generating daily files, download the archive for 2016-09-24 (UTC) with:

curl --no-include -o 2016-09-24.tsv.gz -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/2016-09-24/download

Hourly

Alternatively, if archives show that Papertrail is generating hourly files, download the archive for 2016-09-24 at 14:00 UTC with:

curl --no-include -o 2016-09-24-14.tsv.gz -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/2016-09-24-14/download

Downloading a single archive

Because the day or hour is included in the URL, more sophisticated and automation-friendly examples - like a relative day or hour - are also possible.

Daily

For example, to download yesterday’s daily archive on a Linux host, run:

curl -silent --no-include -o `date -u --date='1 day ago' +%Y-%m-%d`.tsv.gz -L \
    -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/`date -u --date='1 day ago' +%Y-%m-%d`/download

Hourly

If Papertrail generates hourly archives for your account, download the archive for 16 hours ago with:

curl -silent --no-include -o `date -u --date='16 hours ago' +%Y-%m-%d-%H`.tsv.gz -L \
    -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/`date -u --date='16 hours ago' +%Y-%m-%d-%H`/download

Command syntax

As you can see, there’s a lot going on in those cURL one-liners. The main parts are:

Downloading multiple archives

Daily

To download multiple daily archives in one command, use:

seq 1 X | xargs -I {} date -u --date='{} day ago' +%Y-%m-%d | \
    xargs -I {} curl --progress-bar -f --no-include -o {}.tsv.gz \
    -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" https://papertrailapp.com/api/v1/archives/{}/download

where X is the number of days + 1 that you want to download. For example, to guarantee 2 days, change X to 3; see note below for details. To specify a start date, for example: 10th August 2013, change:

date -u --date='{} day ago' +%Y-%m-%d

to:

date -u --date='2013-08-10 {} day ago' +%Y-%m-%d

Hourly

To download multiple hourly archives in one command, use:

seq 1 X | xargs -I {} date -u --date='{} hours ago' +%Y-%m-%d-%H | \
    xargs -I {} curl --progress-bar -f --no-include -o {}.tsv.gz \
    -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" https://papertrailapp.com/api/v1/archives/{}/download

where X is the number of hours + 1 that you want to download. For example, to guarantee 8 hours, change X to 9.

Command syntax

The seq 1 X command is being used to generate date or hour offsets, starting with 1 (1 day or hour ago) because the current day or hour will not yet have an archive. Since archive processing takes time, near the beginning of the hour or UTC day, the previous day or hour also may not have an archive yet (and will return 404 when requested). Thus, to guarantee that you get at least X days/hours, replace X with the number of days/hours + 1.

Your API token can be found under your profile.

More information on the HTTP API is available here.

OS X

Using OS X and see date: illegal option -- -? In the examples above, change:

Searching

To find an entry in a particular archive, use commands such as:

gzip -cd 2016-02-25.tsv.gz | grep Something

gzip -cd 2016-02-25.tsv.gz | grep Something | cut -f5,9,10 | tr '\t' ' '

The files are generic gzipped TSV files, so after un-gzipping them, anything capable of working with a text file can work with them.

If the downloaded files have file names such as 2013-08-18.tsv.gz (the default), multiple archives can be searched through using:

gzip -cd 2013-08-* | grep SEARCH_TERM

Syncing

To transfer multiple archives from Papertrail’s S3 bucket to a custom bucket, use the relevant download command mentioned above, and then upload them to another bucket using:

s3cmd put --recursive path/to/archives/ s3://bucket.name/the/path/

where path/to/archives/ is the local directory where all the archives are stored, and bucket.name/the/path/ is the bucket and path of the target S3 storage location.

S3 Bucket Setup

See Automatic S3 Archive Export