Permanent log archives

Introduction

Each night, Papertrail automatically uploads your log messages and metadata to Amazon’s cloud storage service, S3. Papertrail stores one copy in our S3 bucket, and optionally, also stores a copy in a bucket that you provide. You have full control of the optional archive in your own bucket, since it’s tied to your AWS account.

Already use S3? Jump to Create and share an S3 bucket.

Format

For most accounts, Papertrail creates one file per day in tab-separated value (tsv) format, gzip compressed (gz). Days are from midnight to midnight UTC.

For accounts with higher-volume plans (above about 50 GB/month of logs, though the specifics vary), Papertrail creates one file per hour so the files are of a manageable size.

Each line contains one message. The fields are ordered:

id
generated_at
received_at
source_id
source_name
source_ip
facility_name
severity_name
program
message

For a longer description of each column, see Log Search API: Responses.

Here’s an example log message. Tabs have been converted to linebreaks for readability:

50342052
2011-02-10 00:19:36 -0800
2011-02-10 00:19:36 -0800
42424
mysystem
208.122.34.202
User
Info
testprogram
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor

A line actually looks like this:

50342052\t2011-02-10 00:19:36 -0800\t2011-02-10 00:19:36 -0800\t42424\tmysystem\t208.122.34.202\tUser\tInfo\ttestprogram\tLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor

The tab-separated values (TSV) format is easy to read and parse. The archives are stored in a directory-per-day structure that makes it easy to load and analyze a single day’s records.

Usage example

Show identical messages

Here’s how to extract the log message body (field 10) from the archive file 2016-10-31.tsv.gz, then show the log messages sorted by the number of identical occurrences (duplicates).

gzcat 2016-10-31.tsv.gz | awk -F \\t '{print $10}' | sort | uniq -c | sort -n

Downloading logs

In addition to being downloadable on Archives, you can retrieve archive files using your Papertrail HTTP API key. The URL format is simple and predictable.

Papertrail generates either daily or hourly archives based on the amount of log data transfer included in your plan, and thus, which duration is likely to be a manageable size. The examples below cover each situation separately.

Simple example

Daily

If archives show that Papertrail is generating daily files, download the archive for 2016-09-24 (UTC) with:

curl --no-include -o 2016-09-24.tsv.gz -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/2016-09-24/download

Hourly

Alternatively, if archives show that Papertrail is generating hourly files, download the archive for 2016-09-24 at 14:00 UTC with:

curl --no-include -o 2016-09-24-14.tsv.gz -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/2016-09-24-14/download

Downloading a single archive

Because the day or hour is included in the URL, more sophisticated and automation-friendly examples - like a relative day or hour - are also possible.

Daily

For example, to download yesterday’s daily archive on a Linux host, run:

curl -silent --no-include -o `date -u --date='1 day ago' +%Y-%m-%d`.tsv.gz -L \
    -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/`date -u --date='1 day ago' +%Y-%m-%d`/download

Hourly

If Papertrail generates hourly archives for your account, download the archive for 16 hours ago with:

curl -silent --no-include -o `date -u --date='16 hours ago' +%Y-%m-%d-%H`.tsv.gz -L \
    -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" \
    https://papertrailapp.com/api/v1/archives/`date -u --date='16 hours ago' +%Y-%m-%d-%H`/download

Command syntax

As you can see, there’s a lot going on in those cURL one-liners. The main parts are:

Downloading multiple archives

Daily

To download multiple daily archives in one command, use:

seq 1 X | xargs -I {} date -u --date='{} day ago' +%Y-%m-%d | \
    xargs -I {} curl --progress-bar -f --no-include -o {}.tsv.gz \
    -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" https://papertrailapp.com/api/v1/archives/{}/download

where X is the number of days + 1 that you want to download. For example, to guarantee 2 days, change X to 3; see note below for details. To specify a start date, for example: 10th August 2013, change:

date -u --date='{} day ago' +%Y-%m-%d

to:

date -u --date='2013-08-10 {} day ago' +%Y-%m-%d

Hourly

To download multiple hourly archives in one command, use:

seq 1 X | xargs -I {} date -u --date='{} hours ago' +%Y-%m-%d-%H | \
    xargs -I {} curl --progress-bar -f --no-include -o {}.tsv.gz \
    -L -H "X-Papertrail-Token: YOUR-HTTP-API-KEY" https://papertrailapp.com/api/v1/archives/{}/download

where X is the number of hours + 1 that you want to download. For example, to guarantee 8 hours, change X to 9.

Command syntax

The seq 1 X command is being used to generate date or hour offsets, starting with 1 (1 day or hour ago) because the current day or hour will not yet have an archive. Since archive processing takes time, near the beginning of the hour or UTC day, the previous day or hour also may not have an archive yet (and will return 404 when requested). Thus, to guarantee that you get at least X days/hours, replace X with the number of days/hours + 1.

Your API token can be found under your profile.

More information on the HTTP API is available here.

OS X

Using OS X and see date: illegal option -- -? In the examples above, change:

Searching

To find an entry in a particular archive, use commands such as:

gzcat 2016-02-25.tsv.gz | grep Something

gzcat 2016-02-25.tsv.gz | grep Something | awk -F \t '{print $5 " " $9 " " $10 }'

The files are generic gzipped TSV files, so after un-gzipping them, anything capable of working with a text file can work with them.

If the downloaded files have file names such as 2013-08-18.tsv.gz (the default), multiple archives can be searched through using:

gzcat 2013-08-* | grep SEARCH_TERM

On some distributions, you may need to substitute gzcat with zcat.

Syncing

To transfer multiple archives from Papertrail’s S3 bucket to a custom bucket, use the relevant download command mentioned above, and then upload them to another bucket using:

s3cmd put --recursive path/to/archives/ s3://bucket.name/the/path/

where path/to/archives/ is the local directory where all the archives are stored, and bucket.name/the/path/ is the bucket and path of the target S3 storage location.

S3 Bucket Setup

Here’s how to sign up for Amazon Web Services, create a bucket for log archives, and share write-only access to Papertrail for nightly uploads.

Sign up for Amazon Web Services

Skip this step if you already have an AWS account, like for Amazon EC2, S3, or another AWS product.

Activate Amazon S3

Skip this step if your AWS account is already activated for S3.

Create and share an S3 bucket

Note: After submission, Amazon’s management console may change the grantee name to aws or another label different from what was entered. This is expected.

Amazon also has instructions for editing bucket permissions.

Alternative: Define sharing policy with IAM

If you followed the instructions above to grant Upload/Delete permissions via the AWS Management Console, skip this step.

If you prefer defining a bucket policy to control access, here’s an example policy that permits Papertrail to List and Upload:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PapertrailLogArchive",
            "Effect": "Allow",
            "Principal": {
                "AWS": [
                    "arn:aws:iam::719734659904:root"
                ]
            },
            "Action": [
                "s3:List*",
                "s3:Put*"
            ],
            "Resource": [
                "arn:aws:s3:::bucket-name/papertrail/logs/*"
            ]
        }
    ]
}

where bucket-name/papertrail/logs/ is the directory for Papertrail.

Tell Papertrail the bucket name

On Settings, enable S3 archive copies and provide the S3 bucket name.

Papertrail will perform a test upload as part of saving the bucket name (and will then delete the test file). Note that a new bucket can sometimes take several hours to become available, due to DNS propagation delays. If it fails, wait two hours, and try again.

When archives are uploaded to the bucket, each file is named under the path (key prefix) provided to Papertrail, typically papertrail/logs/<xxx> where <xxx> is an ID. For example, February 25, 2016 would be:

bucket-name/papertrail/logs/54321/dt=2016-02-25/2016-02-25.tsv.gz

Days are from midnight to midnight UTC. Alternatively, an hourly archive file for 3 PM UTC would be:

bucket-name/papertrail/logs/54321/dt=2016-02-25/2016-02-25-15.tsv.gz

Screenshots

Sharing bucket access in AWS Management Console (the bucket name and existing bucket user have been obscured):

s3_permissions-1.png

Papertrail S3 archive copy settings:

s3.png

Questions

Why does Papertrail support S3 but not Glacier?

Papertrail supports S3 rather than Glacier because:

Are archives encrypted at rest?

Yes, Papertrail takes advantage of S3’s server-side encryption so that archived data is encrypted at rest using AES-256.