Projects

Posts on Projects

Choosing a gRPC communication strategy
April 12, 2025
Overview of the problem
One of the key parts of the design of my search engine is the ability for the spiders to send the pages which they have explored to a central point. Additionally, a more conventional design might use a push/pull system, so the conductor polls each spider in turn and requests the pages which it has seen using pagination. This has several problems.
Search Engine Update 1
April 6, 2025
As, I mentioned in my last post . I’ve been busy building my search engine lately, and it’s been a lot of fun diving deep into Kubernetes and learning how to set everything up from scratch.
Overview of the System
Building a Search engine
March 16, 2025
I was re-reading “I am feeling lucky” and this inspired me, so I have decided to create a search engine. This is a very ambitious project; however, it will teach me a lot about building a micro service with Kubernetes all from scratch.
My plan is to refactor the spider to use a relational database. I am learning towards each spider having its own SQLite db. Then create a service which can go to the spiders and get their current crawl and add them to a central DB. This will create a single graph of all the visited notes.
Hello Hugo
December 16, 2024
I have been a happy WordPress user for the past eight years. It has served me well, but the time has come to explore other options for hosting my blog.
My Requirements for a New Platform
Before choosing a new platform, I outlined a few key requirements
Making a cloud Native webcrawler in Go
May 12, 2024
Over the past few weeks I have been making a webcralwer. I wanted to do it as way to get better at Go with useful for learnings for Graph databases as well as being fun. The project made use of cloud native items such as AWS SQS, DynamoDB and optionally Neptune which could be swapped out for Neo4j.
What is a webcrawler?
A webcrawler or web spider is a program which visits a website, and fetches all of the links on that site and then visits them. This is how sites like Google/Bing/DuckDuckGo get the pages to populate when searching.

Posts on Projects

Overview of the problem

Overview of the System

My Requirements for a New Platform

What is a webcrawler?