Output
One of the interesting things about having a blog is that it lets me track some of my own written output over time. I have a super-secret (okay you can access it if you guess the URL) page that displays my total word count to date as well as my average post length and first two longest posts. When I want more data I'll just ssh into the server and do some mysql queries. As of today: I've written 183,498 words over 3 years. Not very much, relatively. I'm still 816,502 words short of my goal of one million words.Why one million? Part of the reason for starting this blog was to reach one million words in an easily traceable way. My written output didn't start becoming significant until 2003 or so, so there is 6 years of "lost" output I'm not counting, but I figure I can write it off as statistical error in the end. So again, why one million? I believe that everyone's first million written words are crap. If you care about writing you should try and get those million words out of you as fast as you can.
I saw this notion expressed recently but for artists. Lauren Faust mentioned here:
"Look for guidance and art education wherever you can, but the biggest, most important thing of all is to draw, draw, draw and never stop drawing. Imagine you have several thousand crappy drawings you have to get out of your system before your[sic] any good and try your damnedest to get those crappy drawings out as fast as you can."
See Full Post and Comments
An explanation and example of Naive Bayes
Here I embark on a slow, carefree explanation of Naive Bayes, but if you're just interested in code, or the single line of math Naive Bayes takes, then look to the bottom. Naive Bayes is an algorithm to classify things. It's probably most popular for its use in spam classification, but you can use it for pretty much anything else and it's somewhat embarrassing how successful it's been for all these years compared to anything else. Moving away from Naive Bayes to, say, a fully Bayesian system, carries a large computation cost for frequently little benefit on the types of problems Naive Bayes is good at.Allow me a brief biblical tangent. In Genesis, God created Adam, the first man. God saw it was not good for the man to be alone, so he decided to create a helper for Adam. At the same time, he had Adam name, i.e. classify, all the creatures brought before him:
And out of the ground the LORD God formed every beast of the field, and every fowl of the air; and brought them unto Adam to see what he would call them: and whatsoever Adam called every living creature, that was the name thereof.
And Adam gave names to all cattle, and to the fowl of the air, and to every beast of the field; but for Adam there was not found an help meet for him.
--Genesis 2:19-20(KJV)
See Full Post and Comments
Migrating and remerging perforce histories into git with git grafts
I recently migrated a perforce project to git. The perforce project had some branches, but the git-p4 script sucks with branches. (Well, it's really perforce, but I digress.) Anyway, as a first step I got all the branches from p4 in to the git repo as totally separated branches.
$ git-p4 clone //open/dev/@all project
# 'open/dev/' becomes the new 'master'
$ cd project
$ git-p4 sync --branch=dy //open/dy/dev/@all
I repeated the sync command for the rest of the branches. So now I had everything, but if you looked at a graph of the git commit history you would see four "swim lanes" for the four branches, each completely separated from the other. Naturally we made integrations on the perforce side to merge things together, but git-p4 sucks at detecting that. After some trial I finally got most of those integrations back. (Some integrations were from branches we're not migrating into git so they sort of just pop into existence.)
See Full Post and Comments
Removing crap from a git repository's history
When I google "git remove from history" (because I frequently forget the exact sequence of commands as I don't have to remove history very often), this is the first result. It almost works. Don't use it, use the second result. (To further be in favor of the second link, the first is from 2009, the second is from github itself and they're pretty good at keeping their material up-to-date with recent gits.)My current git version is 1.7.3.4; not the bleeding edge, but if you're using 1.7 at the end of 2011 you're generally in good shape. Anyway, the "I don't know, I don't wanna know" version to getting rid of crap you don't want with some commentary in between:
$ du -sh .git
946M .git
See Full Post and Comments
Is this really going down? God is not great?
So this is trending on Twitter:
(Reactions from confused theists are hilarious.) Apparently it's due to the recently deceased Christopher Hitchens' book of the same title. I wonder how long this trending topic will last...
See Full Post and Comments
Thoughts On Immortality, Cryonics
"Remembering that I’ll live forever is the most important tool I’ve ever encountered to help me make the big choices in life. Because almost everything – all external expectations, all pride, all fear of embarrassment or failure – these things just fall away in face of endless time, leaving only what is truly important."--The immortal Steve Mobs
People say a lot of nice things about death. Even atheists who should know better. It's depressing. An immortalist can often counter those things directly, trying to argue why in fact they're not nice, but there are other roads one can take.
An elevator pitch against them all is: "suppose there was a civilization much like ours, only no one died from old age or sickness because their medical technology was pretty advanced. (Some still died from suicide or catastrophic accidents but those were rare and decreasing every year.) Do you think they would want death around ages 70-90 forced upon them like we currently have?"
See Full Post and Comments
A basic application of logarithms
You may have noticed I finally made my tag cloud an actual cloud with different font sizes based on how many posts per tag. This has been on my todo list for over a year. Why did it take so long for me to get around to it? Probably because it was so simply boring. Boring work gets put off.Anyway, the "secret" of my implementation is just taking the natural log of the total posts for each tag, then rounding to the nearest integer. I then subtracted that number from 5, and made the resulting number the <hX> number for a header. (With <h1> being the largest size.) (I also used the number 4 as a max for the rounded log result in case it was bigger.)
The log approach was the first idea I had over a year ago for how to implement it, and the results don't look too bad. One downside is that as tag counts approach 33 (about e^3.5), suddenly they'll all start receiving the biggest size of text. To counteract that, one solution is to increase the log base from e to 3 (needing 47 posts to get the biggest font) or even 4 (needing 128). A tiny bit of manual work every so often, it's not bad. There was another solution I thought of but didn't try, but it would keep me from needing to do manual work.
See Full Post and Comments
Recent Posts
2025-08-18
2025-08-16
2025-07-31
2025-07-19
2025-07-07