Rethinking my Webpage Generation

Published: 04/01/2020

Subscribe Premium $10/month

Checkout with Stripe

Every time I do anything I keep going up levels of abstraction until instead of accomplishing whatever I set out to do I'm stuck pondering what is it that I'm going to do with my life anyway. This time I stopped one level short and refactored my website. Hopefully everything looks exactly the same to you! It only took a couple hours to achieve that. Since my original problem was wanting to write a blog post I'll now circle back to that problem by writing about the myriad changes which are (hopefully) invisible to you. I'm not sure if this is one level up or down in interest to the average person from me describing a dream I had or a notable bowel movement. These are the perils of starting a blog when you have nothing to say.

Basically the problem my refactor was solving goes like this: In the beginning I had a blog post. Blog 0. Then there was a second blog post. Of course around every blog post is some html that I want to be mostly the same across blogs. Menus to go forward and backwards. Headers to get my beautiful css. When you just have two to three blogs or you're confident you'll never want to change the layout, copy and paste are a satisfactory solution. But since I keep churning out garbage and the layout is also garbage a more long term solution is required.

At this point the astute reader will cry out "have you heard of wordpress? or substack? or github pages?". Once upon a time I published on Google Blogger. Somehow I can't trust centralized services though. Google Blogger hasn't been Google+ed yet (or Google Readered or Google Waved or ...) but it does have the air of the passe. Do I really want to publish on Google Blogger in 2020? And maybe today Medium is hip and cool and easy. But do I trust it be those things in 2025? I guess it's just like my about me says. I'd rather be uniquely bad.

At this point the astuter reader may cry out "have you heard of angular? Or [my preferred front end stack]? Just because you're crafting your own bespoke website doesn't mean you have to figure out how to minimize html duplication!". To which I reply: yes, but I cannot read. I'll never forget at an AI Safety Workshop when one of the organizers declared that they could not read. And ever since then I've realized that I can't really read either. I can't just pore over some documentation and know how to make a website. I think what I've done here constitutes a buggy and partial reimplementation of Angular. But goddamn it its my buggy and partial reimplementation and I know how to use it.

Design Doc

As I redesigned I realized my vision was just slightly too big to fit in my head so I wrote out a design document of sorts to track the big picture. This a touched up version of what I wrote for myself:

The central abstraction this redesign created is separating my webpages into three parts: content, template and data.

The content is stuff that's truly unique to a given webpage. Except not because after refactoring I added a podcast menu item and duplicated the content of my two podcasts since I didn't want to break the original links. In the case of this webpage the content is the blog post itself.

The template is the duplicated html structure which is important to have around a blog. Right now I have two templates: blog.temp (which should be called periodical.temp since its the template for blogs and podcasts) and toplevel.temp which is the template for my blog and podcast indexes, the aboutme page and my experiments page.

The data is what ties it all together. Each webpage has an associated json file which are organized in a data/ folder with the same directory structure as they have on the final website. The data file has any information that will be needed to create the website such as:

links to their appropriate template and content files (originally I planned to just mirror the structure of data/ in template/ and content/ but since the only actual bijection is data/ to pages on website this was a bad idea)
Title, date and author.
The URL of the file. Originally this was generated by removing spaces from the title but now I specify it separately to allow me to have long titles and short urls.

I designed a nice syntax for the template files to fill in the data and content. I have a python script which goes through the template files and replaces the following tags with the appropriate information. The tags are as follows:

<: ... :> : The contents are a file. The contents of the file are inserted. I made snippets for my html header and sidemenu to keep the level one templates cleaner.
<$ ... $> : The contents are a value in the associated json file. The value is inserted. For example <$Title$>.
<[ ... ]> : The contents of [] are as of now always the string "Content". This tag is then replaced with the content file. The content file can either be an html file in which case its contents are just placed here or a python file. If they are a python file they contain a method generate which takes my sites index and the file's data as arguments and returns a string which should replace this tag. The reason a python script is a possible type for the content file is because pages such as the blog index need to be procedurally generated from the list of blogs in the blog directory.

Things I Learned

Whenever I write even the simplest code I end up googling a dozen different things most of which I can't help but think I knew at one point. Here is a list of things I learned while doing this refactor. I've written it out for the twin purposes that maybe something in the list will be useful to you and maybe by writing these things out I'll actually remember them and not need to google them in the future. In the past I've always had a dismissive attitude towards memorization. What value could there be in actually knowing things when I can just outsource all my knowledge collection and storage to digital systems I don't control? But I'm starting to think all the time spent Googling adds up and there's value in having knowledge conveniently stored in the shape of your synapses.

sed - I've for sure known about this unix utility before but not well enough to use it without referencing the man pages. I used sed -i 's/content\/blogs/blog/g' * to quickly change the paths to content files for all my blogs. The flag -i means in place. The alternative being to print the modified file to standard out.
wa - vim write all. To be honest I'm not sure how I used vim for so long without knowing about this command. I guess my primary vim use case has been writing tex documents, writing programs for Codeforces and writing blog posts, none of which lead to opening a lot of buffers. This refactor was a more complicated process and learning about this command definitely saved some time.
Python's imp module - I'm not sure that this is the most elegant or pythonic thing. But in order to allow my content file be a method in some .py file I used the imp module. I've actually used this before because for some reason in the python interpreter if you import foo, edit foo.py, and then import foo again the module foo that you have doesn't change. To get around that I've used imp.reload. Now to import a python file, name in directory dire I import find_module and load_module from imp and use the code:
```
fp,pathname,desc = find_module(name, [dire])
mod = load_module(name,fp,pathname,desc)
```
I'm not sure this is the most pythonic way to do this so please let me know if there's a better way.
strptime and strftime are great. The most interesting thing I'm going to remember is that %Y is the year and %y is just the last two digits of the year. I would have named them the other way around personally.
In python s.find('ex') returns the index of the first 'ex' in the string s. I learned about rfind which searches from the right, i.e. it finds the last copy of 'ex'.
Somehow I didn't know about f.read() which just reads the whole file as a string. I guess in all my previous use cases it just made sense to go readline by readline. But I really wanted my template files all in memory.
I didn't know about isinstance in python. I think my use of it is probably something of an anti pattern: surely I should keep track of which of my objects are lists and which are dictionaries. But I decided to structure the index of my site as a list of tuples, with first entry the name of the file or directory and the second entry the data dictionary if it was a file or a list with the same recursive structure if it was a directory. For similar reasons I learned about python's inline if else notation. I definitely wrote some lambdas which deserve to be elevated to named methods.
While we're talking about lists of tuples of course dict(list) will turn a list of tuples into a dictionary with first entry key and second entry value. But its important I used lists here because I wanted blogs to be ordered by publication date. And it wouldn't make sense to have a dictionary of dictionaries here because conceptually there are two types of dictionaries: those mapping file names to file data and those mapping file attribute names to file attribute values.
I learned about python's utilities for navigating a file system. I used the os module's methods: list_dir, and isdir. Looking more carefully at the documentation I realize I probably should have used also used walk, join and endswith. But I'm happy with my hacky string additions and splittings.

Rethinking my Webpage Generation

Published: 04/01/2020

Subscribe to Ja3k

Design Doc

Things I Learned