SELECT * FROM events WHERE message LIKE '%error%';
To use Amazon Elastic MapReduce (EMR) or for a step-by-step walkthrough of setting up analytics services on AWS, see Log analytics with Hadoop and Hive.
We recommend using the
id as the sort key and evenly distributing rows across nodes with
DISTSTYLE even. To create the table, paste the following into
CREATE TABLE events ( id BIGINT SORTKEY NOT NULL, received_at_raw VARCHAR(25) NOT NULL, generated_at_raw VARCHAR(25) NOT NULL, source_id INT NOT NULL, source_name VARCHAR(128) ENCODE Text32k, source_ip VARCHAR(15) NOT NULL ENCODE Text32k, facility VARCHAR(8) NOT NULL ENCODE Text255, severity VARCHAR(9) NOT NULL ENCODE Text255, program VARCHAR(64) ENCODE Text32k, message VARCHAR(8192) DEFAULT NULL, PRIMARY KEY(id)) DISTSTYLE even;
Load data from log archives:
copy events from 's3://<bucket_name>/<object_prefix>' credentials 'aws_access_key_id=<my aws_access key id>;aws_secret_access_key=<my aws secret access key>' delimiter '\t' gzip emptyasnull blanksasnull maxerror as 100000;
copy events from 's3://mylogbucket/papertrail/logs' credentials 'aws_access_key_id=abc123;aws_secret_access_key=def456ghi789' delimiter '\t' gzip emptyasnull blanksasnull maxerror as 100000;
/papertrail/logs with the path on Archives.
Data from only a specific day or days may be loaded by running the above command with specific paths instead of the key root. For example:
copy events from 's3://mylogbucket/papertrail/logs/dt=2013-02-10/2013-02-10.tsv.gz'
Redshift assumes that the S3 bucket is in the same region. To specify a bucket in a different region, see the REGION operator.
Invalid or unsupported characters
You may receive load errors due to UTF-8 characters, such as:
String contains invalid or unsupported UTF8 codepoints. Bad UTF8 hex sequence: e9 (error 2)
ACCEPTINVCHARS option (here) to tell Redshift how to handle these un-importable characters. With
ACCEPTINVCHARS, Redshift will replace unrecognized UTF-8 characters with
?. Consider instructing Redshift to replace them with an underscore by adding this to the
ACCEPTINVCHARS AS '_'
Alternatively, consider escaping them with:
To confirm that events were loaded, run:
SELECT COUNT(*) FROM events;
To see load errors, run:
SELECT * from stl_load_errors;
received_at timestamps were imported as
VARCHAR because Redshift cannot recognize ISO 8601 timestamps. After import, add new
TIMESTAMP columns and use the CAST() function to populate them:
ALTER TABLE events ADD COLUMN received_at TIMESTAMP DEFAULT NULL; UPDATE events SET received_at = CAST(received_at_raw as timestamp); ALTER TABLE events ADD COLUMN generated_at TIMESTAMP DEFAULT NULL; UPDATE events SET generated_at = CAST(generated_at_raw as timestamp);
Finally, if you forsee no more imports to this table, the raw
VARCHAR timestamp columns may be removed. If you forsee importing more events from S3, do not remove these columns. To remove the columns, run:
ALTER TABLE events DROP COLUMN received_at_raw; ALTER TABLE events DROP COLUMN generated_at_raw;
To reset the
events table back to its initial empty state (and delete all loaded data), use either
DROP TABLE events or
Trying to accomplish something? Please ask us.