Permanent log archives

Introduction

Each night, Papertrail automatically uploads your log messages and metadata to Amazon's cloud storage service, S3. Papertrail stores one copy in our S3 bucket, and optionally, also stores a copy in a bucket that you provide. You have full control of this archive - it's tied to your AWS account.

Already use S3? Jump to "Create and share an S3 bucket."

Format

For most services, Papertrail creates one file per day in tab-separated value format, gzip compressed. For higher-volume plans (above about 50 GB/month of logs, though the specifics vary), Papertrail creates one file per hour so the files are of a manageable size.

Each file is named under a path (key prefix) provided to Papertrail, typically papertrail/logs/<xxx> where <xxx> is an ID. For example, February 25, 2011 is:

your-bucket-name/papertrail/logs/54321/dt=2011-02-25/2011-02-25.tsv.gz

Days are from midnight to midnight UTC. Alternatively, an hourly archive file for 3 PM UTC would be:

your-bucket-name/papertrail/logs/54321/dt=2011-02-25/2011-02-25-15.tsv.gz

Each line contains one message. The fields are ordered:

id generated_at received_at source_id 
source_name source_ip facility_name severity_name program 
message

Here's an example (tabs converted to linebreaks for readability):

50342052
2011-02-10 00:19:36 -0800
2011-02-10 00:19:36 -0800
42424 
mysystem
208.122.34.202
User
Info
testprogram
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor

Fields are delimited by tabs, so an actual line looks like this:

50342052\t2011-02-10 00:19:36 -0800\t2011-02-10 00:19:36 -0800\t42424\tmysystem\t208.122.34.202\tUser\tInfo\ttestprogram\tLorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor

The tab-separated value (TSV) format is easy to parse and the directory-per-day structure make it easy to load and analyze a single day's records.

Usage example

Retrieval

You can retrieve download links to the Papertrail S3 bucket archives using your Papertrail HTTP API key. The URL format is simple and predictable.

Downloading a single archive

On Linux, you can download yesterday's archive using:

curl -silent --no-include -o `date -u --date='1 day ago' +%Y-%m-%d`.tsv.gz -L \
    -H "X-Papertrail-Token: YOUR HTTP API KEY" \
    https://papertrailapp.com/api/v1/archives/`date -u --date='1 day ago' +%Y-%m-%d`/download`

As you can see, there's quite a lot going on in that one line. The main parts are:

-o `date -u --date='1 day ago' +%Y-%m-%d`.tsv.gz - Downloads the archive to a file with yesterday's date (UTC) in the format YYYY-MM-DD.ts.gz

-H "X-Papertrail-Token: YOUR HTTP API KEY" - Authenticates the request via your API token which can be found under your profile.

Downloading multiple archives

To download multiple daily archives in one go, use:

seq 0 X | xargs -I {} date -u --date='{} day ago' +%Y-%m-%d | \
    xargs -I {} curl --progress-bar -f --no-include -o {}.tsv.gz \
    -L -H "X-Papertrail-Token: YOUR HTTP API KEY" https://papertrailapp.com/api/v1/archives/{}/download

Where X is the number of days + 1 that you wish to download.

To specify a start date, for example: 10th August 2013, change date -u --date='{} day ago' +%Y-%m-%d to date -u --date='2013-08-10 {} day ago' +%Y-%m-%d.

Your API token can be found under your profile.

Presuming that the downloaded files have file names such as 2013-08-18.tsv.gz, multiple archives can be searched through using:

gzcat 2013-08-* | grep SEARCH_TERM

On some distributions, you may need to substitute gzcat for zcat.

More information on the HTTP API is available here.

Searching

To find an entry in a particular archive, use commands such as:

gzcat 2011-02-25.tsv.gz | grep Something

gzcat 2011-02-25.tsv.gz | grep Something | awk -F \t '{print $5 " " $9 " " $10 }'

The files are generic gzipped TSV files, so after un-gzipping them, anything capable of working with a text file can work with them.

Setup

Here's how to sign up for Amazon Web Services, create a bucket for log archives, and share write-only access to Papertrail for nightly uploads.

Sign up for Amazon Web Services

Skip this step if you already have an AWS account, like for Amazon EC2, S3, or another AWS product.

  • Visit http://aws.amazon.com/
  • Click "Create an AWS Account" (upper right)
  • Enter your email and choose "I am a new user"
  • Complete the signup form. Confirm the activation email.

Activate Amazon S3

Skip this step if your AWS account is already activated for S3.

  • Visit http://aws.amazon.com/s3/
  • Click "Sign Up For Amazon S3"
  • Provide a credit card. You are accepting responsibility for the storage, data transfer, and requests consumed for your logs, and will be charged for it. Typically this is well under $1 (one dollar) per month.
  • Visit http://aws.amazon.com/ and click "Sign In to AWS Management Console." Sign in. If a warning is displayed that your account isn't active yet, try again in 5 minutes.

Create and share an S3 bucket

  • Visit http://aws.amazon.com/ and click "Sign In to AWS Management Console." Sign in. Click the "Amazon S3" tab (top)
  • Click "Create Bucket" (left menu)
  • Fill in "Bucket Name" with a hostname, such as papertrail.yourdomain.com. The hostname is used only for global uniqueness and does not need to exist in DNS. If you have an existing bucket, you may use it too, though we recommend a bucket just for this purpose. We recommend an alphanumeric hostname (no underscores or hyphens) for use when loading in Hive.
  • In the left-hand menu, right-click the newly-created bucket (or in OS X, Ctrl-click). Choose "Properties." You'll see a set of properties with the "Permissions" tab selected.
  • Give Papertrail permission to write to this bucket. Click the "Add more permissions" button and a second row should appear. For Grantee, enter the email address aws@papertrailapp.com. Check the "Upload/Delete" option and the "List" option. Click "Save" to save the policy.

Note: After submission, Amazon's management console may change the grantee name to aws or another label different from what was entered. This is expected.

Amazon also has instructions for editing bucket permissions.

Tell Papertrail the bucket name

On Account, enable S3 archive copies and provide the S3 bucket name.

Papertrail will perform a test upload as part of saving the bucket name (and will then delete the test file).

Screenshots

Sharing bucket access in AWS Management Console (the bucket name and existing bucket user have been obscured):

Papertrail S3 archive copy settings:

s3_permissions-1.png

s3.png

Questions

Why does Papertrail support S3 but not Glacier?

Papertrail supports S3 rather than Glacier because:

  • AWS offers the ability to trickle files from S3 to Glacier using a policy that you define, so by supporting S3, Glacier is automatically a possible destination. Visit S3 Object Lifecycle Management.
  • Archived log files compress extremely well, often 15:1 or more, so the total cost of archived logs is extremely small (often pennies per month). Storing a long-term log archive in your S3 bucket will almost always cost less than 1% of the total cost of Papertrail, which itself is presumably a very small part of all infrastructure costs. There's effectively no cost savings.