Athena Empty Logs
Recently, I was querying for some ALB load balancer logs. It’s not something that happens too often but I was surprised to find that the query rows were empty except for the date
column.
I quickly figured out that only logs from May 30th, 2024 and onwards were missing and after a quick comparison, realised that the log formats had changed.
Sure enough, as noted in a banner on this page, the ALB access log format changed adding classification
, classification_reason
and conn_trace_id
fields.
Unfortunately for anyone with AWS Athena tables, that means your pre-existing tables will likely have an out of date regex if you’re using org.apache.hadoop.hive.serde2.RegexSerDe
as your row format.
I don’t believe there is any way to update this and it requires recreating each table given that Athena doesn’t support altering SERDEPROPERTIES
.
Honestly, I found this whole thing kind of disappointing and I’m not sure what prevents it from happening again in future but as mentioned, I don’t personally use Athena too often to feel the pain.