TheJach.com

Jach's personal blog

(Largely containing a mind-dump to myselves: past, present, and future)
Current favorite quote: "Supposedly smart people are weirdly ignorant of Bayes' Rule." William B Vogt, 2010

Time for a blog rewrite?

I wrote this blog from scratch over a decade ago. It's still serviceable, and pretty easy to go in and edit stuff or add new things. I still like the little micro-framework I use. The language is PHP.

But a decade has come and gone, and the PHP I liked then is not the PHP in fashion now... My server is running PHP 5.5, but the code is effectively capped at 5.3, and even then uses very little features beyond the basics that PHP 5 added. That is, while other PHP projects I've worked on used things like namespaces (!) and big frameworks (!) like Yii or Symfony, this old blog is just fine with a handful of PHP 5 classes.

So what's got me thinking about a rewrite? I've heard the upgrade from PHP 5 to PHP 7 is relatively painless (my concern without looking at it is that I have to go through the code again to spruce up potential unicode issues, which I last did in 2011...). However even if that's so, and I test it out locally and do the migration work (I don't even have PHP on my gentoo system right now, partly because v5 isn't in mainline portage anymore) then I still have another obstacle: Ubuntu.

My server is running on an archaic Ubuntu 14. I'm pretty sure I started with 10 and have upgraded it overtime, but everything I've seen (and my own experience upgrading my box at work) indicates that 14 to 16 is not going to be pleasant. Especially because of the PHP update, but other stuff on the server too. I'm using old versions of Python for stuff. (I had to manually compile Python 3.5 just to keep my Matrix homeserver up, but even that's too old now for the latest.) I'm using Postgres, MySQL/MariaDB, Apache... Anyway the prospect of updating it all is a bit annoying to think about. I suspect the easiest way to do it would be to just start from scratch with a new instance running a new server OS from amazon (probably a modern ubuntu; I'd like to use gentoo, but I don't trust any of the community builds, and last time I tried to compile my own gentoo AMI the process was broken because of new amazon API changes) then port and test everything piecemeal, eventually switching over the assigned IP to the new machine when it's ready. I'll probably still have to do that at some point.

But before that, perhaps a blog rewrite is in order!

I'm older and grumpier now and I've lost a lot of passion for learning new things for the sake of learning. Now I'm much more choosy in what I want to learn. (I still am a chronic dabbler as things catch my fancy, but dedicated learning is different.) When it comes to technology, I want to devote my time towards things that reward mastery. My editor of choice, vim, is such a tool, and I'm glad I started learning it way back near the beginning. Over such a long period it hasn't fundamentally changed, it's still very effective for editing.

Most programming languages aren't like that. I learned PHP around the death of 4, and really only learned enough about 4 to know some of the bad stuff I had to avoid or could be thankful to 5 for fixing. I really got going with 5 and never looked back (while I still used it anyway). But PHP doesn't reward mastery. I learned Python around the time of 2.3 or 2.4, and I'm still a die-hard v2er (Tauthon forever?), but the v3 debacle alone is enough to show that Python also doesn't reward mastery.

C rewards mastery. C++ does not reward mastery, it's changing and breaking at quite an incredible pace this last 10 years, and in many cases not in ways that interest me to bother with.

I could go through other languages I know or used, but I'll first jump to the conclusion. I want to devote all my XP points into leveling up with Common Lisp. CL rewards mastery, and there's a lot to master.

I know that I "should" update my knowledge about PHP (which is stick at 5.3), JS (which is stuck around ES2015, Node from 2012), C++ (stuck at C++98), Python (stuck at 2.7 plus awareness about what things are in __future__), Java (which is stuck at 8), Clojure (which is stuck at whatever version was out in 2012)...

But why? I have no job incentive (part of why my Java is stuck at 8, we haven't upgraded to even 11 yet) so internal motivation is what's left. Furthermore the refreshers aren't in general anything groundbreaking ("omg you guys these things called lambdas are crazy cool!"), just changes to the defaults or idioms that I have to get used to. And none of these languages reward mastery -- if I refresh, I know I'll just have to refresh again in another year or ten. What a hapless existence is this sort of treadmill. I'd rather just only bother refreshing should a job require it. This is a big difference between current me and my younger self.

Meanwhile Common Lisp is timeless and won't ever die. If I wrote sophisticated code in the 90s, chances are it would work today, unmodified. And that code could have had all the great things people rave about, and more! (No one else has Lisp's condition system.) Sure, the ecosystem has changed (new/better libraries, quicklisp and qlot), but that's like the change in Javaland from dumping all your thirdparty jars into a thirdparty folder and building with ant, to using maven and automatically fetching stuff from maven central. It's not a change that impacts your mastery of the language, just a new way of getting things done productively.

I want that stability coupled with the most expressive language for the next decade. Since I can't get it at work, I have to find abstinence elsewhere.. so why not rewrite my blog in Common Lisp? I can even be very server agnostic at that point, since deployment could be a single binary or more likely a file tree similar to now but with an SBCL image binary built separately from the OS distribution. (Assuming I go with ubuntu anyway; gentoo at least keeps up to date.)

So as a bit of preliminary (and to help my future self) I'll describe the current design of this blog, and end with some short considerations for what using Lisp implies I do pretty different.

To start with, let's look at the DB schema. As mentioned, I'm using MySQL. Not exactly because I like it (though I don't dislike it) but because that's what the web world used when I learned, and I haven't had any great reason to switch to Postgres, even if I like Postgres' design better. This site uses a 'database' (schema) called 'jach_thejach'. Here are the tables:


+------------------------+
| Tables_in_jach_thejach |
+------------------------+
| arrays |
| comments |
| email_followup |
| groups |
| posts |
| sessions |
| tags |
| users |
+------------------------+


The most important table is the blog content, stored in posts:


+-----------+------------------+------+-----+---------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------+------------------+------+-----+---------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| author_id | int(10) unsigned | NO | | NULL | |
| title | varchar(255) | NO | | NULL | |
| url_title | varchar(255) | YES | | NULL | |
| content | longtext | NO | | NULL | |
| karma | int(10) | NO | | 0 | |
| post_date | timestamp | NO | | CURRENT_TIMESTAMP | |
| mod_date | timestamp | NO | | 0000-00-00 00:00:00 | |
+-----------+------------------+------+-----+---------------------+----------------+


Nothing too crazy here: id, author id linking to the users table, post title, url title, content, post date, last-edit date. The odd one out here is 'karma', which I've never used. I once had ideas for upvoting/downvoting but they never materialized.

Users is similarly basic:


+-----------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+------------------+------+-----+-------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| group_id | int(10) unsigned | NO | | 3 | |
| user_name | varchar(50) | NO | UNI | NULL | |
| pass | char(50) | NO | | NULL | |
| email | varchar(100) | NO | UNI | NULL | |
| register_date | timestamp | NO | | CURRENT_TIMESTAMP | |
| reset_pass_conf | char(40) | YES | | NULL | |
+-----------------+------------------+------+-----+-------------------+----------------+


The 'pass' column is a mistake, I'll talk about it when I get to describing the user login flow. (It's a combination salt + sha1 (!) hash.) The two columns that might otherwise stand out are group_id and reset_pass_conf. The latter is for handling the "forgot password" flow in a simple way -- you can request a reset (there's no rate limiting here!!!) and I generate a random string, store it in that column, and email it. If it gets clicked, well it seems I never implemented that part. :/ If I did it again, I would do it differently, so this column can go away / be ignored for the rewrite.

The 'group_id' column corresponds to the group table, which was my way of designating user types/roles. Let's see:


+-------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| name | varchar(50) | NO | UNI | NULL | |
| info | varchar(200) | YES | | NULL | |
+-------+------------------+------+-----+---------+----------------+
mysql> select * from groups;
+----+-------+--------------------------------------------------------------+
| id | name | info |
+----+-------+--------------------------------------------------------------+
| 1 | root | Jach |
| 2 | admin | Any future possible helpers like moderators or other posters |
| 3 | user | Regular user posting comments only |
+----+-------+--------------------------------------------------------------+


Including me and a test account there are only 8 registered users (my discouragement to register has been successful!). For the rewrite, I'm thinking I might just do away with visible users entirely.. I could still make a login page to authenticate myself, but don't need to link it or put the form on the sidebar, and if it's just me I don't even need a table for it.

Let's look at comments next. There are only 280 comments on this blog, 188 if you exclude my own, 34 if you exclude my own and non-guests. (32 from one person I no longer really talk with, 2 from some rando.)


+--------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+-------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| user_id | int(10) | NO | | 0 | |
| post_id | int(10) unsigned | NO | | NULL | |
| display_name | varchar(50) | YES | | NULL | |
| comment | text | NO | | NULL | |
| karma | int(10) | NO | | 0 | |
| ip_addr | varchar(15) | NO | | NULL | |
| post_date | timestamp | NO | | CURRENT_TIMESTAMP | |
| email | varchar(255) | YES | | NULL | |
| sendemail | tinyint(1) | NO | | 1 | |
+--------------+------------------+------+-----+-------------------+----------------+


Comment id, user id (defaults to 0 for guest comments). Another unused 'karma' column. If commenters specify an email, I store it, and if they want to be emailed on followup comments, I store that choice too. A better design would be to have a separate table so that when it comes time to send the alerts, I don't have to dedupe... There is some sign of such thinking in the email_followup table, but it appears that 0 code uses this table, so I don't really know what exactly I was thinking when I created it:


+------------+------------------+------+-----+-------------------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+------------------+------+-----+-------------------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| user_id | int(10) unsigned | NO | | NULL | |
| post_id | int(10) unsigned | NO | | NULL | |
| times_left | int(10) | NO | | 1 | |
| start_date | timestamp | NO | | CURRENT_TIMESTAMP | |
+------------+------------------+------+-----+-------------------+----------------+


Especially the times_left column. Eh?

The 'sessions' table is a by-the-book kind in order to have PHP store session info in the DB, as opposed to tmp files or whatever.


+---------------+-----------+------+-----+-------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
+---------------+-----------+------+-----+-------------------+-----------------------------+
| id | char(32) | NO | PRI | NULL | |
| data | text | YES | | NULL | |
| last_accessed | timestamp | NO | | CURRENT_TIMESTAMP | on update CURRENT_TIMESTAMP |
+---------------+-----------+------+-----+-------------------+-----------------------------+


The final two tables are related, tags and arrays.


mysql> describe tags;
+--------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+--------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| tag_name | varchar(255) | NO | UNI | NULL | |
| tag_array_id | int(10) unsigned | NO | UNI | NULL | |
+--------------+------------------+------+-----+---------+----------------+
mysql> describe arrays;
+-------------+------------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+-------------+------------------+------+-----+---------+----------------+
| id | int(10) unsigned | NO | PRI | NULL | auto_increment |
| array_id | int(10) unsigned | NO | MUL | NULL | |
| array_index | int(10) | NO | | 0 | |
| array_value | int(10) | NO | | 0 | |
+-------------+------------------+------+-----+---------+----------------+


This is the design I came up with when I decided to add the tag cloud to my blog... I think I'd need to rethink it in the rewrite. But the idea is that when I add tags to a new post, if the tag has never been seen before, it gets a new entry in the tags table first, and a new 'tag array id' assigned to it. If the tag has been seen before, then we just read out the 'tag array id'. This id is unrelated to the arrays' 'id' column, which is just there to provide a primary key. No, it instead maps to the 'array_id' column. The 'index' is the position in this pseudo-array, starting from.. 1. Finally the array_value is the post id. For example, in my last post, the only tag was 'artificial intelligence', a tag which had 3 posts with it prior. After posting a new row was made:


| 806 | 53 | 4 | 368 |


i.e. tag id 53, index (also since it's the latest, it corresponds to total count of that tag) is 4, post id is 368.

As you can imagine, this design is fragile and so I never bothered to support editing or deleting tags...

Anyway that summarizes the database schema. Overall there's not much to change

As it's late, this post will be continued in a part 2... stay tuned.


Posted on 2020-02-04 by Jach

Tags: lisp, php, programming

Permalink: https://www.thejach.com/view/id/369

Trackback URL: https://www.thejach.com/view/2020/2/time_for_a_blog_rewrite

Back to the top

Back to the first comment

Comment using the form below

(Only if you want to be notified of further responses, never displayed.)

Your Comment:

LaTeX allowed in comments, use $$\$\$...\$\$$$ to wrap inline and $$[math]...[/math]$$ to wrap blocks.