Skip to main content Alex Collie's blog

Posts on Projects

  1. Choosing a gRPC communication strategy

    Overview of the problem

    One of the key parts of the design of my search engine is the ability for the spiders to send the pages which they have explored to a central point. Additionally, a more conventional design might use a push/pull system, so the conductor polls each spider in turn and requests the pages which it has seen using pagination. This has several problems.

  2. Search Engine Update 1

    As, I mentioned in my last post . I’ve been busy building my search engine lately, and it’s been a lot of fun diving deep into Kubernetes and learning how to set everything up from scratch.

    Overview of the System

    Diagram of the full search engine

  3. Building a Search engine

    I was re-reading “I am feeling lucky” and this inspired me, so I have decided to create a search engine. This is a very ambitious project; however, it will teach me a lot about building a micro service with Kubernetes all from scratch.

    My plan is to refactor the spider to use a relational database. I am learning towards each spider having its own SQLite db. Then create a service which can go to the spiders and get their current crawl and add them to a central DB. This will create a single graph of all the visited notes.

  4. Hello Hugo

    I have been a happy WordPress user for the past eight years. It has served me well, but the time has come to explore other options for hosting my blog.

    My Requirements for a New Platform

    Before choosing a new platform, I outlined a few key requirements

  5. Making a cloud Native webcrawler in Go

    map of internet Over the past few weeks I have been making a webcralwer. I wanted to do it as way to get better at Go with useful for learnings for Graph databases as well as being fun. The project made use of cloud native items such as AWS SQS, DynamoDB and optionally Neptune which could be swapped out for Neo4j.

    What is a webcrawler?

    A webcrawler or web spider is a program which visits a website, and fetches all of the links on that site and then visits them. This is how sites like Google/Bing/DuckDuckGo get the pages to populate when searching.