Athena Empty Logs

Recently, I was querying for some ALB load balancer logs. It’s not something that happens too often but I was surprised to find that the query rows were empty except for the date column.

I quickly figured out that only logs from May 30th, 2024 and onwards were missing and after a quick comparison, realised that the log formats had changed.

Sure enough, as noted in a banner on this page, the ALB access log format changed adding classification, classification_reason and conn_trace_id fields.

Unfortunately for anyone with AWS Athena tables, that means your pre-existing tables will likely have an out of date regex if you’re using org.apache.hadoop.hive.serde2.RegexSerDe as your row format.

I don’t believe there is any way to update this and it requires recreating each table given that Athena doesn’t support altering SERDEPROPERTIES.

Honestly, I found this whole thing kind of disappointing and I’m not sure what prevents it from happening again in future but as mentioned, I don’t personally use Athena too often to feel the pain.