Technical Details

Disclaimer: this is not going to be a rigorous audit of my stack but instead a brain dump of my approach and corresponding thoughts.

An Obligatory Note on AI

I used AI to build this project, if I hadn't it wouldn't exist.

That said, I try to use it in a manner that is both well-motivated and responsible — I am open to the argument that responsible AI use doesn't exist but I'll defer that conversation for now. I use the Plan+Execute pattern in which I spend a significant amount of time creating a plan for the new feature or fix that I want and then instruct the agent to implement, usually in one shot. I think that this kind of project is also an ideal use case for AI as much of it is instantly verifiable and the stakes are very low. With most of the output being a database schema or an interactive frontend I can check and iterate in real time with ultimately zero consequences if something goes wrong.

I do not (and never will) use AI for any written text or research in this project. I respect people far too much than to deceive them into trudging through slop masquerading as a human dialogue, my writing may not necessarily be good but at least it's mine. On the research side, every geospatial geometry and data point has been searched for and added by me, how else would I trust any of the data and present it in good faith?

Networking

I understand embarrassingly little about networking and am frankly a little scared of the security holes that would appear if I did it myself so I'm using Cloudflare to do the necessary routing and domain handling. I'm also renting a Hetzner server to actually host the site (CAX21 to be precise). I went through the process of self-hosting my personal site a couple months ago on my old PC that is running Unraid and, while an interesting exercise, it doesn't make sense for a chunkier application like this. On a general level, everything is containerised with Docker Compose for simple separation and reproducibility across local and prod.

Tabular

I'm using a Postgres database to store basically all the data that I'm collecting. I started with a relatively simple model that modelled datacenters and their attributes like companies or dates. Soon this proved to be too wide a table and was incapable of modelling multiple relationships (e.g. several companies might be co-investors) so I instituted a entity-attribute-value model whereby I had tables for investments or dates that linked to the original datacenters table.

This evolved naturally as the data model got more and more complex. I didn't have to start with a complete holistic view of what a data center actually looks like; the data model could develop at the same pace as my mental model.

I set up a dbt project for downstream models and applications that necessitate joining between entities or additional logic. This allows me to start doing things beyond just presenting recorded facts such as estimating unknown features like the capacity of a prospective site given known ones like building_sqft or investment.

When I was first building out the data model I added variables manually via DBeaver which proved to be quite painful once the complexity ramped up so I built a local admin dashboard with Streamlit that's hooked up to Postgres and allows me to add, remove and edit entries easily. The main benefit here is that I can focus on collecting all of the data for a single data center without having to switch between tables, it also reduces the scope for data entry errors.

The local database is the single source of truth (with regular offsite backups) and I copy this over to the production server by a manual deploy script. It's tempting to overengineer with GitHub Actions and other nonsense but realistically this is a single dev project and I can afford to be a little scrappy with my deployment. Finally, data is exposed to the frontend application via FastAPI.

Geospatial

The thing that drew me most to this project was the idea of mapping out all of the data centers and their respective sites. There is something innately appealing about seeing things on a map and being able to track their progress over time (or lack thereof).

I'm storing the geometries in PostGIS and this links nicely to the other data that I collect. Geometries are drawn in QGIS via a custom plugin that links directly to the database and is much better than exporting geoms manually and uploading to PostGIS with a script which I did to begin with.

In order to collect the images for the remote sensing layer I have a simple pipeline that for each data center:

Gets a bounding box and date range based on the geometry and recorded dates
Checks if there are already images collected for the given bounds and skips them
Does a STAC search to grab the relevant images
Removes cloud cover (if relevant) and reprojects
Writes (or appends) to a source Zarr file
Writes to a web Zarr with additional reprojection and rechunking for web serving

Then we have TiTiler read the web Zarrs and make them available to the map view. I run the pipeline manually on my local PC and rsync the images to my server. I experimented with doing this on a CRON schedule on the server itself but ran into memory issues and it was more tricky to debug and manage. I will adopt a more sophisticated orchestration solution later if it proves necessary.

Frontend

The static frontend is Astro because that's the go-to for Sonnet 4.6 nowadays and frankly I do not care enough about frontend frameworks to disagree. To be clear, this does not mean that I don't care about aesthetics, I just don't care about what is going on behind the scenes to render said aesthetics and my uninformed assumption is that it doesn't matter much either. I also don't use GenAI for logos or art so I mocked up the current logo and favicon in Affinity — feel free to scroll up the page and judge my graphic design abilities now.

For the map application it's a little more involved as there are many more degrees of freedom design-wise and I'm much more picky on the end result. I'm using deck.gl to render the map and visualisations and again leaning a lot on the ability of coding agents to sort out the actual code. I think this is an ideal use for LLMs as broadly what-you-see-is-what-you-get. I have instant feedback on the output of these models and interacting is testing in this case. Undoubtedly, there are edge cases introduced and I don't have frontend experience that can steer us away from bad design choices in the same way I can on the backend but this is an acceptable compromise when I can spin something up in hours that would have taken me weeks before.

Testing

I don't have tests.

Next Steps

I'm going to continue building and see where it takes me :)