TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Apache's Common Log Format Datetime converted to Unix Timestamp with C++

The datetime in Apache's log format looks like this: day/month/year:hour:minute:second zone. It usually has wrapping brackets but I'm assuming those have been taken care of. The datetime format has a standard name but I don't remember it right now. An example would be "04/Apr/2012:10:37:29 -0500".

This is great for displaying to humans but annoying to pass around to computers, so let's convert it to a Unix timestamp that is simply the number of seconds since the Unix epoch, i.e. 1970-01-01 00:00:00 +0000. Notice that since the simple seconds timestamp has no time zone information, that information will be lost.

The code, released in public domain (I don't think one could assert copyright over this anyway):


#include <string>
#include <time.h>
#include <sstream> // for converting time_t to str
using std::string;
using std::sstream;
/*
* Parses apache logtime into tm, converts to time_t, and reformats to str.
* logtime should be the format: day/month/year:hour:minute:second zone
* day = 2*digit
* month = 3*letter
* year = 4*digit
* hour = 2*digit
* minute = 2*digit
* second = 2*digit
* zone = (`+' | `-') 4*digit
*
* e.g. 04/Apr/2012:10:37:29 -0500
*/
string logtimeToUnix(const string& logtime) {
struct tm tm;
time_t t;
if (strptime(logtime.c_str(), "%d/%b/%Y:%H:%M:%S %Z", &tm) == NULL)
return "-";

tm.tm_isdst = 0; // Force dst off
// Parse the timezone, the five digits start with the sign at idx 21.
int hours = 10*(logtime[22] - '0') + logtime[23] - '0';
int mins = 10*(logtime[24] - '0') + logtime[25] - '0';
int off_secs = 60*60*hours + 60*mins;
if (logtime[21] == '-')
off_secs *= -1;

t = mktime(&tm);
if (t == -1)
return "-";
t -= timezone; // Local timezone
t += off_secs;

string retval;
stringstream stream;
stream << t;
stream >> retval;
return retval;
}


The annoying parts of this code are knowing to use strptime combined with mktime, knowing to subtract off the program's local timezone, and handling the string's timezone yourself. In Linux, the tm struct has an additional field to hold the zone, but it's good practice to be cross platform compatible and the zone is easy to handle yourself anyway.

This code may contain bugs, it's your responsibility to test it. (If you find bugs I'd like to know, of course!) Time handling code is a pain in the butt with C or C++.


Posted on 2012-07-24 by Jach

Tags: c, c++, programming, tips

Permalink: https://www.thejach.com/view/id/257

Trackback URL: https://www.thejach.com/view/2012/7/apaches_common_log_format_datetime_converted_to_unix_timestamp_with_c

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.