TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Fixed an embarrassing security bug...

Earlier tonight I started getting some error emails as a result of some requests from 5.188.62.214 who started POSTing to /admin/post/new, which as revealed in my notes is the URL this blog uses when I submit a new post. I knew I had to deal with this immediately because the error message was PHP aborting because of an undefined index in a map -- the bot using that IP wasn't sending the expected new post form parameters.

This is odd to me since the HTML form is part of the response payload and all the needed inputs are in there, including one that the bot did send. That one ended up being the only type="hidden" one, too, which makes it more bizarre. It reminds me of some people filtering requests from dumb bots by setting a hidden input and then setting it to something else with JS, or unsetting it, and catching bots that aren't JS-aware. Apparently many still aren't. The bot also sent some parameters from other forms on the page, like the hidden sitesearch param for the google search box, and the pw param for the login form, with guesses as brilliant as 123456, admin, admin123, and gw44444 (what's that from? too many 4s...) -- with variants gw111111 and gw66666666.

But all that's beside the point, because what is an unauthenticated request doing getting this far into the handler of an admin page, which is supposed to be restricted to admin users (i.e. me)? And after a bit of investigation I found to my horror that a plain curl on that URL would serve the form...

What a huge blunder. In this case if the bot had actually used the form with all the JS processing involved, they wouldn't have been able to make a new post anyway, because the handler would have errored out at not finding a user id for the author, but still, there are other admin URLs that don't check for such things and could lead to editing or deleting things, or they could have registered an account and then done some annoying things. (Catastrophic, if they figured out how to upload files.)

What's the cause? Ultimately it goes back to some questionable architecture from the site's very beginning in 2009, but it was introduced when I added page caching in 2012. Oops. It's a good thing no one reads this blog, and that I don't have motivated enemies...

On the architecture side, as described in the earlier rewrite notes post, a request URL gets mapped to a Service class. Prior to adding the cache, the code in the index.php file was literally just $handler = new $serviceClass(); -- with $handler then not being used, of course. Me_2009 wasn't the cleanest coder. With the cache, it turned into a call into the caching class, Cacher::handle_service($serviceClass).

What let me get away in the beginning with just a call to the constructor is that each of my service classes have an inheritance hierarchy to a BaseService class, and they call their parent constructor, so the base class is where the work is normally done to map the URL to a function, call it, and render/serve any output. Since some services ended up needing to call methods of other services (poor factoring), every service's constructor also takes an optional parameter (default false) for whether it's an "internal" construction or not. If it's an "internal" construction, then the service class's constructor skips calling the parent constructor and no work apart from the url-pattern-to-function definition is done.

The caching class instantiated the Service class with the "internal" parameter set to true, so by default nothing would happen. Then it called the class's URL-to-function mapper (needing the function data for its cache key, and the decision of whether a request was even cacheable or not) and could then do the logic of if-cacheable { if-not-cached { render normally, cache it } else { serve from cache }} else { render normally }.

On the surface that's fine, it worked for nearly 10 years. But the trouble was that prior to the cache, I had implemented authentication checking as part of the constructor calling chain! Most services just inherit from BaseService, but UserService inherits from SecureBaseService, which as part of its constructor checks to see if the user is logged in (checking a $_SESSION entry) before calling the parent constructor, or else redirecting. And the AdminService inherits from SecureAdminBaseService, which further checks that the session var shows that the logged in user is an admin, before passing on to the SecureBaseService (whose check is now redundant) that then goes to the BaseService to do work.

With the cache, all Service classes were now instantiated "internal", so that whole constructor chain got skipped, and so did all the basic authentication, and the handler function started executing immediately. Oops.

I've now fixed it by having the cacher "render normally" by just re-constructing the service in non-internal mode, and not explicitly rendering (to avoid a double render). Still, the whole thing is super hacky, from the foundations up.

I guess this serves as a good reminder of the practice to avoid doing work in constructors unrelated to simply setting/acquiring some slots/properties of the object, but thinking back on code I've written in just the last few months, it's a practice I don't necessarily follow. The appeal of not having to make another function call after the constructor that in practice would need to be called every time is too strong. I guess, if you find the case that it doesn't need to be called every time, as I had with the "internal" parameter, that can be taken as a big warning signal that you should refactor and do some things explicitly.

I'm also not sure if it's extra-oops or just dumb luck that because I didn't code defensively in areas of the blog only used by me in expected ways, I got an error email from PHP dying and thus detected this issue before something bad happened. Normal defensive coding would have checked each parameter and constructed a helpful error message telling the user that not everything was supplied, as all my public-facing forms have done, and if I did that for the new post form I'd still be ignorant that it was sitting there in public.

I see a resemblance to the argument people have about programs crashing hard and loud rather than trying to be defensive about everything and having a graceful response. I don't really want to get into that. Perhaps a better more mature solution that sidesteps that issue would be to 1) actually have automated tests, especially around security sensitive things (my blog was written before I valued automated testing) and 2) have monitoring, especially around security sensitive things. A set of tests as simple as "this page can't be accessed by a guest, can't be accessed by a user not in the admin group, can be accessed by a user in the admin group" would have caught my error when I added the cache. Something as simple as an alert that "hey, a different IP successfully accessed (response 200) this security-sensitive URL, maybe you should look into it if it's not just you from a different network" would have been sufficient.

In the end, once again I've been burned by a cache, albeit this time more indirectly... Fortunately there wasn't any negative impact over the last decade. But I'm slightly more of the opinion that I should get around to doing the Lisp rewrite after all, since this problem also would have been fixed by that. And maybe automatic CSRF protection too!


Posted on 2021-11-15 by Jach

Tags: programming

Permalink: https://www.thejach.com/view/id/388

Trackback URL: https://www.thejach.com/view/2021/11/fixed_an_embarrassing_security_bug

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.