Files
squi.bid/blog/rss.xml
2025-08-23 20:29:46 -04:00

730 lines
52 KiB
XML

<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/css" href="rss.css" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<title>Squibid's Blog</title>
<description>My blog.</description>
<language>en-us</language>
<link>http://squi.bid/blog/rss.xml</link>
<atom:link href="http://squi.bid/blog/rss.xml" rel="self" type="application/rss+xml" />
<!-- LB -->
<item>
<title> 'Writing my own web crawler in go'</title>
<guid>https://squi.bid/blog/Writing-my-own-web-crawler-in-go/index.html</guid>
<link>https://squi.bid/blog/Writing-my-own-web-crawler-in-go/index.html</link>
<pubDate>Sat, 23 Aug 2025 20:25:26 -0400</pubDate>
<description><![CDATA[<!DOCTYPE HTML>
<html lang="en">
<title>'Writing my own web crawler in go'</title>
<meta name="date" content="2025/08/23">
<link rel="stylesheet" href="/style.css">
<style>
.-spell {}
.Number {color: #e29eca}
.Function {color: #c1c0d4}
.String {color: #90b99f}
.Character {color: #90b99f}
.PreCondit {color: #ea83a5}
.-property {color: #c1c0d4}
.Comment {color: #757581; font-style: italic}
.Macro {color: #ea83a5}
.-type-builtin {color: #e29eca}
.Type {color: #b9aeda}
.Keyword {color: #aca1cf}
.Constant {color: #ea83a5}
.-variable-parameter {color: #e29eca}
.-punctuation-delimiter {color: #9998a8}
.-punctuation-bracket {color: #9998a8}
.Include {color: #aca1cf}
.Structure {color: #e6b99d}
.-variable {color: #c9c7cd}
.Operator {color: #e6b99d}
</style>
<body id="blog">
<h1>Writing my own web crawler in go</h1>
<p>
I got bored, it happens to everyone (especially software developers).
So like every software developer I've started a new side-project: a web
crawler. This isn't for any actual usecase, I kinda just wanna learn how
to use go and (hopefully) sharpen my SQL skills in the process. At this
point in the writing process I've just started so let me show you what
I've currently gotten working.
</p>
<p>
To start I'm just searching through the hrefs of all &lt;a&gt; tags on a
site and printing them out. On my site that looks like this:
</p>
<pre>
/
mailto:me@zacharyscheiman.com
https://github.com/squibid
https://codeberg.org/squibid
https://git.squi.bid/squibid/wiz
https://git.squi.bid/squibid
/blog/rss.xml
/blog/New-Keyboard!
/blog/Serializing-data-in-C
/blog/Why-"suckless"-software-is-important
/blog/What-is-a-squibid
/blog/librex-and-dots
/?all_blog
https://lunarflame.dev
https://eggbert.xyz/
</pre>
<p>
Here's the code which fetched this list:
</p>
<pre>
<span class="Statement"><span class="Keyword">package</span></span> <span class="Structure">main</span>
<span class="Statement"><span class="Keyword">import</span></span> <span class="-punctuation-bracket">(</span>
<span class="String"><span class="String">&quot;fmt&quot;</span></span>
<span class="String"><span class="String">&quot;io&quot;</span></span>
<span class="String"><span class="String">&quot;net/http&quot;</span></span>
<span class="String"><span class="String">&quot;golang.org/x/net/html&quot;</span></span>
<span class="-punctuation-bracket">)</span>
<span class="Keyword"><span class="-keyword-function">func</span></span> <span class="-variable"><span class="Function">deal_html</span></span><span class="-punctuation-bracket">(</span><span class="-variable"><span class="-variable-parameter"><span class="DiagnosticUnderlineInfo">site</span></span><span class="DiagnosticUnderlineInfo"></span></span><span class="DiagnosticUnderlineInfo"> <span class="Type"><span class="Type"><span class="-type-builtin">string</span></span></span></span><span class="Type"><span class="Type"><span class="-type-builtin"></span></span></span><span class="-punctuation-delimiter">,</span> <span class="-variable"><span class="-variable-parameter">reader</span></span> <span class="Structure">io</span><span class="-punctuation-delimiter">.</span><span class="Type">Reader</span><span class="-punctuation-bracket">)</span> <span class="-punctuation-bracket">{</span>
<span class="-variable">doc</span><span class="-punctuation-delimiter">,</span> <span class="-variable">err</span> <span class="Operator">:=</span> <span class="-variable">html</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Parse</span></span><span class="-punctuation-bracket">(</span><span class="-variable">reader</span><span class="-punctuation-bracket">)</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">err</span> <span class="Operator">!=</span> <span class="Constant"><span class="-constant-builtin">nil</span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String"><span class="-spell">&quot;Error parsing HTML:&quot;</span></span></span><span class="-punctuation-delimiter">,</span> <span class="-variable">err</span><span class="-punctuation-bracket">)</span>
<span class="Statement"><span class="Keyword">return</span></span>
<span class="-punctuation-bracket">}</span>
<span class="Keyword"><span class="Keyword">var</span></span><span class="goSingleDecl"> </span><span class="-variable">walk</span> <span class="Keyword"><span class="-keyword-function">func</span></span><span class="-punctuation-bracket">(</span><span class="-variable"><span class="-variable-parameter">n</span></span> <span class="Operator">*</span><span class="Structure">html</span><span class="-punctuation-delimiter">.</span><span class="Type">Node</span><span class="-punctuation-bracket">)</span>
<span class="-variable">walk</span> <span class="Operator">=</span> <span class="Keyword"><span class="-keyword-function">func</span></span><span class="-punctuation-bracket">(</span><span class="-variable"><span class="-variable-parameter">n</span></span> <span class="Operator">*</span><span class="Structure">html</span><span class="-punctuation-delimiter">.</span><span class="Type">Node</span><span class="-punctuation-bracket">)</span> <span class="-punctuation-bracket">{</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">n</span><span class="-punctuation-delimiter">.</span><span class="-property">Type</span> <span class="Operator">==</span> <span class="-variable">html</span><span class="-punctuation-delimiter">.</span><span class="-property">ElementNode</span> <span class="Operator">&amp;&amp;</span> <span class="-variable">n</span><span class="-punctuation-delimiter">.</span><span class="-property">Data</span> <span class="Operator">==</span> <span class="String"><span class="String"><span class="-spell">&quot;a&quot;</span></span></span> <span class="-punctuation-bracket">{</span>
<span class="Repeat"><span class="Keyword">for</span></span> <span class="-variable">_</span><span class="-punctuation-delimiter">,</span> <span class="-variable">attr</span> <span class="Operator">:=</span> <span class="Repeat"><span class="Keyword">range</span></span> <span class="-variable">n</span><span class="-punctuation-delimiter">.</span><span class="-property">Attr</span> <span class="-punctuation-bracket">{</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Key</span> <span class="Operator">==</span> <span class="String"><span class="String"><span class="-spell">&quot;href&quot;</span></span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
<span class="-punctuation-bracket">}</span>
<span class="-punctuation-bracket">}</span>
<span class="Repeat"><span class="Keyword">for</span></span> <span class="-variable">c</span> <span class="Operator">:=</span> <span class="-variable">n</span><span class="-punctuation-delimiter">.</span><span class="-property">FirstChild</span><span class="-punctuation-delimiter">;</span> <span class="-variable">c</span> <span class="Operator">!=</span> <span class="Constant"><span class="-constant-builtin">nil</span></span><span class="-punctuation-delimiter">;</span> <span class="-variable">c</span> <span class="Operator">=</span> <span class="-variable">c</span><span class="-punctuation-delimiter">.</span><span class="-property">NextSibling</span> <span class="-punctuation-bracket">{</span>
<span class="-variable"><span class="Function">walk</span></span><span class="-punctuation-bracket">(</span><span class="-variable">c</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
<span class="-punctuation-bracket">}</span>
<span class="-variable"><span class="Function">walk</span></span><span class="-punctuation-bracket">(</span><span class="-variable">doc</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
<span class="Keyword"><span class="-keyword-function">func</span></span> <span class="-variable"><span class="Function">main</span></span><span class="-punctuation-bracket">(</span><span class="-punctuation-bracket">)</span> <span class="-punctuation-bracket">{</span>
<span class="-variable">site</span> <span class="Operator">:=</span> <span class="String"><span class="String"><span class="-spell">&quot;https://squi.bid/&quot;</span></span></span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String"><span class="-spell">&quot;fetching &quot;</span></span></span> <span class="Operator">+</span> <span class="-variable">site</span><span class="-punctuation-bracket">)</span>
<span class="-variable">resp</span><span class="-punctuation-delimiter">,</span> <span class="-variable">err</span> <span class="Operator">:=</span> <span class="-variable">http</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Get</span></span><span class="-punctuation-bracket">(</span><span class="-variable">site</span><span class="-punctuation-bracket">)</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">err</span> <span class="Operator">!=</span> <span class="Constant"><span class="-constant-builtin">nil</span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String"><span class="-spell">&quot;Error getting the website&quot;</span></span></span><span class="-punctuation-bracket">)</span>
<span class="Statement"><span class="Keyword">return</span></span>
<span class="-punctuation-bracket">}</span>
<span class="Statement"><span class="Keyword">defer</span></span> <span class="-variable">resp</span><span class="-punctuation-delimiter">.</span><span class="-property">Body</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Close</span></span><span class="-punctuation-bracket">(</span><span class="-punctuation-bracket">)</span>
<span class="-variable"><span class="Function">deal_html</span></span><span class="-punctuation-bracket">(</span><span class="-variable">site</span><span class="-punctuation-delimiter">,</span> <span class="-variable">resp</span><span class="-punctuation-delimiter">.</span><span class="-property">Body</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
</pre>
<p>
After taking a short look at the output of this we need to handle
multiple different "url formats" like the mailto: and / links. For right
now I'm going to respect peoples privacy and not index their email
addresses.
</p>
<pre>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="Identifier"><span class="-variable"><span class="Function"><span class="Special">len</span></span></span></span><span class="-punctuation-bracket">(</span><span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">)</span> <span class="Operator">&gt;</span> <span class="Number"><span class="Number">1</span></span> <span class="Operator">&amp;&amp;</span> <span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">[</span><span class="-punctuation-delimiter">:</span><span class="Number"><span class="Number">2</span></span><span class="-punctuation-bracket">]</span> <span class="Operator">==</span> <span class="String"><span class="String"><span class="-spell">&quot;//&quot;</span></span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String"><span class="-spell">&quot;https:&quot;</span></span></span> <span class="Operator">+</span> <span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span> <span class="Conditional"><span class="Keyword">else</span></span> <span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">[</span><span class="-punctuation-delimiter">:</span><span class="Number"><span class="Number">1</span></span><span class="-punctuation-bracket">]</span> <span class="Operator">==</span> <span class="String"><span class="String"><span class="-spell">&quot;/&quot;</span></span></span> <span class="-punctuation-bracket">{</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">site</span><span class="-punctuation-bracket">[</span><span class="Identifier"><span class="-variable"><span class="Function"><span class="Special">len</span></span></span></span><span class="-punctuation-bracket">(</span><span class="-variable">site</span><span class="-punctuation-bracket">)</span> <span class="Operator">-</span> <span class="Number"><span class="Number">1</span></span><span class="-punctuation-delimiter">:</span><span class="-punctuation-bracket">]</span> <span class="Operator">==</span> <span class="String"><span class="String"><span class="-spell">&quot;/&quot;</span></span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="-variable">site</span> <span class="Operator">+</span> <span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">[</span><span class="Number"><span class="Number">1</span></span><span class="-punctuation-delimiter">:</span><span class="-punctuation-bracket">]</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span> <span class="Conditional"><span class="Keyword">else</span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="-variable">site</span> <span class="Operator">+</span> <span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
<span class="-punctuation-bracket">}</span> <span class="Conditional"><span class="Keyword">else</span></span> <span class="Conditional"><span class="Keyword">if</span></span> <span class="Identifier"><span class="-variable"><span class="Function"><span class="Special">len</span></span></span></span><span class="-punctuation-bracket">(</span><span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">)</span> <span class="Operator">&gt;</span> <span class="Number"><span class="Number">4</span></span> <span class="Operator">&amp;&amp;</span> <span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">[</span><span class="-punctuation-delimiter">:</span><span class="Number"><span class="Number">4</span></span><span class="-punctuation-bracket">]</span> <span class="Operator">==</span> <span class="String"><span class="String"><span class="-spell">&quot;http&quot;</span></span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="-variable">attr</span><span class="-punctuation-delimiter">.</span><span class="-property">Val</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
</pre>
<p>
Now that we've gotten the actual links from the website, it's time to
store and get the links from their sites too. For now I've decided that
because this is already just a toy project I will not be storing all the
info I would if this were a real project. Instead I will only be storing
the link to the site, and a boolean representing whether I'd fetched it's
contents yet. So, let's go impl...
</p>
<p>
Step #1 is to find a sql library. I just went with Go's built in
database/sql mappings, and then visited
<a href="https://golang.org/s/sqldrivers">golang.org/s/sqldrivers</a>
and decided on
<a href="https://github.com/mattn/go-sqlite3">github.com/mattn/go-sqlite3</a>
because sqlite is a name I'm familiar with, and I really don't want to go
through the hassle of looking into different dbs for a toy project.
<a href="#footnote-1">[1]</a>
</p>
<p>
Now that we've chosen our db I'll setup our table like I mentioned
earlier, with one string and one boolean:
</p>
<pre>
<span class="-variable">db</span><span class="-punctuation-delimiter">,</span> <span class="-variable">err</span> <span class="Operator">=</span> <span class="-variable">sql</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Open</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String"><span class="-spell">&quot;sqlite3&quot;</span></span></span><span class="-punctuation-delimiter">,</span> <span class="String"><span class="String"><span class="-spell">&quot;./sites.db&quot;</span></span></span><span class="-punctuation-bracket">)</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">err</span> <span class="Operator">!=</span> <span class="Constant"><span class="-constant-builtin">nil</span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">log</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Fatal</span></span><span class="-punctuation-bracket">(</span><span class="-variable">err</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
<span class="Statement"><span class="Keyword">defer</span></span> <span class="-variable">db</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Close</span></span><span class="-punctuation-bracket">(</span><span class="-punctuation-bracket">)</span>
<span class="-variable">db</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Exec</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String">`</span>
<span class="String"> create table if not exists</span>
<span class="String"> urls (url text not null primary key, indexed boolean not null);</span>
<span class="String"> `</span></span><span class="-punctuation-bracket">)</span>
</pre>
<p>
Now we need to start adding entries to the db. To do this I wanted to
ensure I wouldn't end up shooting myself in the foot therefore I decided
to go with a small tiny function to make it a teensy tiny bit safer:
</p>
<pre>
<span class="Keyword"><span class="-keyword-function">func</span></span> <span class="-variable"><span class="Function">db_insert_url</span></span><span class="-punctuation-bracket">(</span><span class="-variable"><span class="-variable-parameter">url</span></span> <span class="Type"><span class="Type"><span class="-type-builtin">string</span></span></span><span class="-punctuation-delimiter">,</span> <span class="-variable"><span class="-variable-parameter">seen</span></span> <span class="Type"><span class="Type"><span class="-type-builtin">bool</span></span></span><span class="-punctuation-bracket">)</span> <span class="-punctuation-bracket">{</span>
<span class="-variable">db</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Exec</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String">`insert into urls values (?, ?)`</span></span><span class="-punctuation-delimiter">,</span> <span class="-variable">url</span><span class="-punctuation-delimiter">,</span> <span class="-variable">seen</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
</pre>
<p>
It could use some error handling, but if you look at section 3 part 4 of
the software engineers manual it reads "side projects aren't stable
because if it's stable it's not a side project".
</p>
<p>
Now that we've gotten all the easy stuff out of the way it's time to work
on making this run forever, or close to it at least. For now I'm going to
keep this project in an inefficient state and we're not going to use any
worker pools or something fancy like that. To get started we first need
to make a decision: depth or breadth first searching? incase you're not
sure what I mean by this let me give you an example:
</p>
<p>
Let's say we have site example-a.com which contains the following links:
</p>
<ul>
<li>example-a.com/blog</li>
<li>example-b.com</li>
<li>example-c.com</li>
</ul>
<p>
With a breadth first search we would first go to either example-b or
example-c wheras with a depth first search we would go with
example-a.com/blog. For my use case I want to find as many sites as
possible therefore I will be targeting sites with other base urls.
</p>
<p>
Now that we know how we want to decide the next url to fetch let's impl
the loop which handles this.
</p>
<pre>
<span class="Repeat"><span class="Keyword">for</span></span> <span class="-variable">i</span> <span class="Operator">:=</span> <span class="Number"><span class="Number">0</span></span><span class="-punctuation-delimiter">;</span><span class="-punctuation-delimiter">;</span> <span class="-variable">i</span><span class="Operator">++</span> <span class="-punctuation-bracket">{</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">i</span> <span class="Operator">&gt;</span> <span class="Number"><span class="Number">0</span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">rows</span><span class="-punctuation-delimiter">,</span> <span class="-variable">err</span> <span class="Operator">:=</span> <span class="-variable">db</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Query</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String">`select url from urls where indexed is false`</span></span><span class="-punctuation-bracket">)</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">err</span> <span class="Operator">!=</span> <span class="Constant"><span class="-constant-builtin">nil</span></span> <span class="-punctuation-bracket">{</span>
<span class="Statement"><span class="Keyword">return</span></span>
<span class="-punctuation-bracket">}</span>
<span class="Repeat"><span class="Keyword">for</span></span> <span class="-variable">rows</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Next</span></span><span class="-punctuation-bracket">(</span><span class="-punctuation-bracket">)</span> <span class="-punctuation-bracket">{</span>
<span class="Keyword"><span class="Keyword">var</span></span><span class="goSingleDecl"> </span><span class="-variable">test</span> <span class="Type"><span class="Type"><span class="-type-builtin">string</span></span></span>
<span class="-variable">rows</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Scan</span></span><span class="-punctuation-bracket">(</span><span class="Operator">&amp;</span><span class="-variable">test</span><span class="-punctuation-bracket">)</span>
<span class="-variable">site</span> <span class="Operator">=</span> <span class="-variable">test</span>
<span class="Comment"><span class="Comment"><span class="-spell">/* we can't just check if the site is the same because then when we're</span></span><span class="-spell">
<span class="Comment"> * checking squi.bid/example it won't register squi.bid as the same</span>
<span class="Comment"> * domain, although maybe that's what we want.</span>
<span class="Comment"> */</span></span><span class="Comment"></span></span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="Operator">!</span><span class="-variable">strings</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Contains</span></span><span class="-punctuation-bracket">(</span><span class="-variable">test</span><span class="-punctuation-delimiter">,</span> <span class="-variable">site</span><span class="-punctuation-bracket">)</span> <span class="-punctuation-bracket">{</span>
<span class="Statement"><span class="Keyword">break</span></span>
<span class="-punctuation-bracket">}</span>
<span class="-punctuation-bracket">}</span>
<span class="-variable">rows</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Close</span></span><span class="-punctuation-bracket">(</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String"><span class="-spell">&quot;fetching &quot;</span></span></span> <span class="Operator">+</span> <span class="-variable">site</span><span class="-punctuation-bracket">)</span>
<span class="-variable">resp</span><span class="-punctuation-delimiter">,</span> <span class="-variable">err</span> <span class="Operator">:=</span> <span class="-variable">http</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Get</span></span><span class="-punctuation-bracket">(</span><span class="-variable">site</span><span class="-punctuation-bracket">)</span>
<span class="Conditional"><span class="Keyword">if</span></span> <span class="-variable">err</span> <span class="Operator">!=</span> <span class="Constant"><span class="-constant-builtin">nil</span></span> <span class="-punctuation-bracket">{</span>
<span class="-variable">fmt</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Println</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String"><span class="-spell">&quot;Error getting&quot;</span></span></span><span class="-punctuation-delimiter">,</span> <span class="-variable">site</span><span class="-punctuation-bracket">)</span>
<span class="-variable">os</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Exit</span></span><span class="-punctuation-bracket">(</span><span class="Number"><span class="Number">1</span></span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
<span class="-variable"><span class="Function">deal_html</span></span><span class="-punctuation-bracket">(</span><span class="-variable">site</span><span class="-punctuation-delimiter">,</span> <span class="-variable">resp</span><span class="-punctuation-delimiter">.</span><span class="-property">Body</span><span class="-punctuation-bracket">)</span>
<span class="-variable">resp</span><span class="-punctuation-delimiter">.</span><span class="-property">Body</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Close</span></span><span class="-punctuation-bracket">(</span><span class="-punctuation-bracket">)</span>
<span class="-punctuation-bracket">}</span>
</pre>
<p>
If you read through my code you might've seen the comment about how our
check doesn't actually prevent accessing the same site, the solution I'm
currently thinking of is to add a column to the db which keeps the
highest point in the site for example: squi.bid/example/1/2/3/4 would
have a highest point of squi.bid. But currently this isn't something I'm
too concerned about so for now we'll just leave it as is and deal with
another issue you might've spotted.
</p>
<p>
We don't modify the db, after fetching a site successfully at no point do
we actually say that we fetched it. Therefore whenever we try and fetch
a new site the program with query the db and find the same route as
before. Thankfully this is a simple fix which just takes adding this line
right after where we index a new site:
</p>
<pre>
<span class="-variable">db</span><span class="-punctuation-delimiter">.</span><span class="-property"><span class="Function">Exec</span></span><span class="-punctuation-bracket">(</span><span class="String"><span class="String">`update urls set indexed = true where url == ?`</span></span><span class="-punctuation-delimiter">,</span> <span class="-variable">site</span><span class="-punctuation-bracket">)</span>
</pre>
<p>
Remember when I referenced section 3 part 4 of the software engineers
manual? Well I regret it:
</p>
<pre>
fetching https://squi.bid/
fetching https://github.com/EggbertFluffle/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F%3Cuser-name%3E&source=header
panic: runtime error: slice bounds out of range [:1] with length 0
goroutine 1 [running]:
main.deal_html.func1(0xc000315b90)
/home/squibid/documents/coding/go/scraper/main.go:40 +0x2fe
main.deal_html.func1(0x0?)
/home/squibid/documents/coding/go/scraper/main.go:55 +0x83
main.deal_html.func1(0xc0002dd110?)
/home/squibid/documents/coding/go/scraper/main.go:55 +0x83
main.deal_html.func1(0x0?)
/home/squibid/documents/coding/go/scraper/main.go:55 +0x83
main.deal_html.func1(0xc00032e000?)
/home/squibid/documents/coding/go/scraper/main.go:55 +0x83
main.deal_html.func1(0x0?)
/home/squibid/documents/coding/go/scraper/main.go:55 +0x83
main.deal_html.func1(0x8c36e0?)
/home/squibid/documents/coding/go/scraper/main.go:55 +0x83
main.deal_html.func1(0x7fd180fd9db8?)
/home/squibid/documents/coding/go/scraper/main.go:55 +0x83
main.deal_html({0xc00002a280, 0x7c}, {0x7fd180fd9db8?, 0xc0004bd200?})
/home/squibid/documents/coding/go/scraper/main.go:58 +0x11e
main.main()
/home/squibid/documents/coding/go/scraper/main.go:105 +0x12d
exit status 2
</pre>
<p>
Turns out we need some stability if we want to actually use the code.
This like most bugs is another simple fix which just takes guarding our
url handler with a call to len to make sure we're not doing anything
stupid on empty strings.
</p>
<p>
And now it works! With a small exception, but here's a clean run showing
my lil web crawler doing it's thing... and failing pretty fast.
</p>
<pre>
fetching https://squi.bid/
fetching https://eggbert.xyz/
fetching https://www.linkedin.com/in/harrison-diambrosio-505443229/
fetching https://github.com/EggbertFluffle/
fetching https://support.github.com?tags=dotcom-footer
fetching https://docs.github.com/
fetching https://services.github.com
fetching https://github.com/github
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories/git-guides
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories/git-guides/git-guides
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories/git-guides/git-guides/git-guides
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories/git-guides/git-guides/git-guides/git-guides
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories/git-guides/git-guides/git-guides/git-guides/git-guides
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories/git-guides/git-guides/git-guides/git-guides/git-guides/git-guides
fetching https://github.com/github/search?q=topic%3Aactions+org%3Agithub+fork%3Atrue&type=repositories/git-guides/git-guides/git-guides/git-guides/git-guides/git-guides/git-guides
...
</pre>
<p>
I'm sure you can use your imagination to figure out how long that was
going to happen for. This bug would be partially fixed by switching
which site we're searching, but ultimately we wouldn't make it far if we
keep falling for these redirections that just keep going. For now that's
fine though, and I have a semi-working web crawler. All code can be found
here:
<a href="https://git.squi.bid/squibid/web-crawler">git.squi.bid/squibid/web-crawler</a>.
Thank you for reading, I'll probably write a followup when I find some
time.
</p>
<p id="footnote-1">
[1] Yes I'm just including more links to make my site a good starting point
why do you ask?
</p>
]]></description>
</item>
<item>
<title> 'New Keyboard!'</title>
<guid>https://squi.bid/blog/New-Keyboard!/index.html</guid>
<link>https://squi.bid/blog/New-Keyboard!/index.html</link>
<pubDate>Tue, 12 Aug 2025 03:23:38 -0400</pubDate>
<description><![CDATA[<!DOCTYPE HTML>
<html lang="en">
<title>'New Keyboard!'</title>
<meta name="date" content="2025/08/12">
<link rel="stylesheet" href="/style.css">
<style>
img { width: 100%; }
pre { color: white; }
</style>
<body id="blog">
<h1>New Keyboard!</h1>
<a href="#fine_here_it_is">tl;dr show me the board</a>
<p>
Throughout the past few years I've hopped from keyboard to keyboard
initally as a need for something to type on, but eventually as an obession
with the sound and feel which to this day I cannot shake.
</p>
<h2>Keyboard #1</h2>
<p>
I started on a Razer Cynosa Chroma which as far as I can tell is no longer
for sale. But for the sake of context you should know that it's a 100%
membrane keyboard with per key backlighting. For a starter keyboard it was
fine, but looking back any old office keyboard would've worked and the
$40(?) that I spent on it was not worth it. But who cares, lets go to the
next keyboard!
</p>
<h2>Keyboard #2</h2>
<p>
After being pushed by a friend who was <s>obsessed</s> interested in
keyboards I finally took the plunge and built my first custom keyboard.
This keyboard was a what I thought would be best after using a membrane
for over two years (I had very little clue what I was doing). I ended up
choosing a TKL board called the NINJA87BT which came with gateron milky
yellow switches. This may not sound custom, but then I went and ordered
some very very expensive switches called Helios v2s which are very quiet
and so soft to type on. I also bought some keycaps with legends printed
on the side nothing too expensive, but very nice to look at. Because this
was my first board I had no clue what I was doing and I ended up spending
around $300...
</p>
<h2>Keyboard #3</h2>
<p>
I started thinking about the future and how I really needed to take care
of the hands that I use every day for programming. Though I wanted to go
fully ergonomic, like where I'm at now, I chose to pace myself and decided
to go with a UHK 60v2. It was expensive, but it promised something
spectacular: a split keyboard without the ortho keywells and qmk
configuration of my current keyboard which would've been very hard to
switch to coming from a normal TKL. Instead of sticking with the cherry
reds it came with I put my Helios in (because they are still the best
switches I've ever felt). While this board was not nearly as custom as my
last I was able to enjoy it much more knowing I was not going to get
carpel tunnel halfway through my life.
</p>
<p>
I ended up using this board for about a year and a half until around mid
July of 2025 when I updated the firmware for the first time since getting
the board and it caused the keyboard to start crashing every once in a
while. I tried to roll back to the version I was using before, but my
configuration wasn't able to migrate back. So I decided it was time to
move on to the keyboard I'd been dreaming of making.
</p>
<h2>Keyboard #4 (my current one)</h2>
<p>
The keyboard I've been typing this post on is a dactyl manuform 4x5, and
It's my first truly hand built keyboard. I 3d printed the case, sanded,
primed, painted (although it did not hide the layer lines very well), and
wired. Wiring was a bit tricky but thanks to the pictures in the
<a href="https://github.com/abstracthat/dactyl-manuform">github repo</a>
I was able to do it without too much trouble.
</p>
<img src="/blog/New-Keyboard!/pics/sanded.jpg">
<img src="/blog/New-Keyboard!/pics/wiring.jpg">
<p>
After finishing the wiring, which took around 12 hours, I tried to flash
qmk to both halves. At which point realized that the right half had the
rows wired to the arduino pro micro in reverse order. After fixing the
slight hiccup I flashed and viola a working keyboard. I then put on some
black legend-less keycaps, and here is the final(ish) result:
</p>
<img id="fine_here_it_is" src="/blog/New-Keyboard!/pics/final-ish.jpg">
<p>
The ish in final(ish) is because I've yet to add a baseplate which would
add some much needed weight so the halves doesn't slide across my desk,
but for now I'm happy with it.
</p>
<h3>Build your own</h3>
<p>
Incase you're reading this in the hopes of some tips for building your
own here they are:
<ul>
<li>get the model for the keyboard from
<a href="https://ryanis.cool/dactyl/#manuform">ryanis.cool/dactyl/#manuform</a></li>
<li>when wiring your keyboard try and make the wires going from the
rows/columns around 1.5-2x longer than they need to be that way you
don't snap when you're fiddling around in there</li>
<li>if you want to get rid of layer lines look into acetone dipping your
print, I only learned about this after showing my fully wired board
to a friend otherwise I would've done it</li>
</ul>
</p>
<p>
When it comes to using my keyboard it's setup for typing as that's what I
do on it it most of the time, however when I play games things get a bit
tricky. For games where I can remap the keys I shift every key over by one
except for the keys on the bottom row, and then I set the sprint key as a.
For the games where I can't remap the keys... I just stop playing them.
If I had more of an interest in gaming I would've gone with the 4x6 as
6 more keys it offeres could've been really nice.
</p>
<p>
For those curious about the specs: I decided on a rj9 port mainly because
I like the look of them over the TRRS cables everyone seems to be using
nowadays. For the pro micro I went with the cheapest one I could find
with a usb-c port, you can't really go wrong here. As for the actual
layout my qmk config is below incase you really wanna know how I type:
</p>
<pre>
/*
This is the c configuration file for the keymap
Copyright 2012 Jun Wako <wakojun@gmail.com>
Copyright 2015 Jack Humbert
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
*/
#include QMK_KEYBOARD_H
bool process_record_user(uint16_t keycode, keyrecord_t* record) {
switch (keycode) {
case KC_BSPC: {
static uint16_t registered_key = KC_NO;
if (record->event.pressed) { // On key press.
const uint8_t mods = get_mods();
#ifndef NO_ACTION_ONESHOT
uint8_t shift_mods = (mods | get_oneshot_mods()) & MOD_MASK_SHIFT;
#else
uint8_t shift_mods = mods & MOD_MASK_SHIFT;
#endif // NO_ACTION_ONESHOT
if (shift_mods) { // At least one shift key is held.
registered_key = KC_DEL;
// If one shift is held, clear it from the mods. But if both
// shifts are held, leave as is to send Shift + Del.
if (shift_mods != MOD_MASK_SHIFT) {
#ifndef NO_ACTION_ONESHOT
del_oneshot_mods(MOD_MASK_SHIFT);
#endif // NO_ACTION_ONESHOT
unregister_mods(MOD_MASK_SHIFT);
}
} else {
registered_key = KC_BSPC;
}
register_code(registered_key);
set_mods(mods);
} else { // On key release.
unregister_code(registered_key);
}
} return false;
}
return true;
}
#define _BASE 0
#define _RAISE 1
#define _LOWER 2
#define SFT_ESC SFT_T(KC_ESC)
#define CTL_BSPC CTL_T(KC_BSPC)
#define ALT_SPC ALT_T(KC_SPC)
#define SFT_ENT SFT_T(KC_ENT)
#define KC_ML KC_MS_LEFT
#define KC_MR KC_MS_RIGHT
#define KC_MU KC_MS_UP
#define KC_MD KC_MS_DOWN
#define KC_MB1 KC_MS_BTN1
#define KC_MB2 KC_MS_BTN2
#define RAISE MO(_RAISE)
#define LOWER MO(_LOWER)
const uint16_t PROGMEM keymaps[][MATRIX_ROWS][MATRIX_COLS] = {
[_BASE] = LAYOUT(
KC_Q, KC_W, KC_E, KC_R, KC_T, KC_Y, KC_U, KC_I, KC_O, KC_P,
KC_A, KC_S, KC_D, KC_F, KC_G, KC_H, KC_J, KC_K, KC_L, KC_SCLN,
KC_Z, KC_X, KC_C, KC_V, KC_B, KC_N, KC_M, KC_COMM, KC_DOT, KC_QUOT,
KC_LBRC, KC_RBRC, KC_MINS, KC_EQL,
KC_LCTL, KC_LSFT, KC_TAB, RSFT_T(KC_ESC),
KC_SPC, KC_LALT, KC_ENT, KC_BSPC,
LOWER, KC_LGUI, KC_RGUI, RAISE
),
[_RAISE] = LAYOUT(
QK_BOOT, KC_MPRV, KC_MSTP, KC_MPLY, KC_MNXT, KC_PGDN, MS_BTN1, MS_BTN2, KC_PGUP, KC_VOLU,
_______, MS_LEFT, MS_DOWN, MS_UP, MS_RGHT, KC_LEFT, KC_DOWN, KC_UP, KC_RGHT, KC_MUTE,
_______, MS_WHLL, MS_WHLD, MS_WHLU, MS_WHLR, KC_BSLS, KC_SLSH, KC_LBRC, KC_RBRC, KC_VOLD,
_______, _______, _______, _______,
_______, _______, _______, _______,
_______, _______, _______, _______,
_______, _______, _______, _______
),
[_LOWER] = LAYOUT(
KC_EXLM, KC_AT, KC_HASH, KC_DLR, KC_PERC, KC_CIRC, KC_AMPR, KC_ASTR, KC_LPRN, KC_RPRN,
KC_1, KC_2, KC_3, KC_4, KC_5, KC_6, KC_7, KC_8, KC_9, KC_0,
KC_F1, KC_F2, KC_F3, KC_F4, KC_F5, KC_F6, KC_F7, KC_F8, KC_F9, KC_F10,
KC_F11, KC_F12, KC_GRV, _______,
_______, _______, _______, _______,
_______, _______, _______, _______,
_______, _______, _______, _______
)
};
</pre>
]]></description>
</item>
<item>
<title> 'Serializing data in C'</title>
<guid>https://squi.bid/blog/Serializing-data-in-C/index.html</guid>
<link>https://squi.bid/blog/Serializing-data-in-C/index.html</link>
<pubDate>Sat, 09 Aug 2025 08:50:12 -0400</pubDate>
<description><![CDATA[This post seems to screw up my rss feed. You can read it on my website: https://squi.bid/blog/Serializing-data-in-C/index.html]]></description>
</item>
<item>
<title>Why "suckless" software is important</title>
<guid>https://squi.bid/blog/Why-"suckless"-software-is-important/index.html</guid>
<link>https://squi.bid/blog/Why-"suckless"-software-is-important/index.html</link>
<pubDate>Sun, 14 Jan 2024 20:22:27 -0500</pubDate>
<description><![CDATA[<!DOCTYPE HTML>
<html lang="en">
<title>'Why "suckless" software is important'</title>
<meta name="date" content="2024/01/14">
<link rel="stylesheet" href="/style.css">
<style> html, body {
display: unset !important;
max-width: 80ch;
margin: auto;
} </style>
<body id="blog">
<p>
When it comes to learning how to program there are a few things you can
do:
</p>
<ol>
<li>Read a textbook</li>
<li>Watch videos</li>
<li>Read some source code</li>
</ol>
<p>
Of these options I find the best way to truly learn how to program is to
read someone else's program and try and understand it. For example
recently I've been working on my own
<a href="https://tools.suckless.org/dmenu">dmenu</a> clone for Wayland.
Throughout working on it instead of looking for tutorials on how to render
a square using pixman I decided to take a look at
<a href="https://github.com/djpohly/dtao">dtao</a> which is a clone of
dzen for Wayland. By just reading the code and messing around with the
program I was able to get an understanding for how rendering is done in
pixman.
</p>
<p>
Now you may be asking yourself something like: "But what does this have to
do with suckless software?". The answer to that is in their philosophy
which is about: "keeping things simple, minimal and usable". The idea of
keeping things minimal and useable allows them to create wonderful
programs that not only work, but also showcase how to do things without
extra fluff that something like i3 might have.
</p>
<p>
Even if you don't like suckless software it still serves as a great place
to learn how to do the bare minimum. And for those who do enjoy using it,
it can serve as a great starting place to hack upon until you get the
software of your dreams.
</p>
]]></description>
</item>
<item>
<title>What is a squibid?</title>
<guid>https://squi.bid/blog/What-is-a-squibid/index.html</guid>
<link>https://squi.bid/blog/What-is-a-squibid/index.html</link>
<pubDate>Mon, 30 Oct 2023 12:47:05 -0400</pubDate>
<description><![CDATA[<!DOCTYPE HTML>
<html lang="en">
<meta name="date" content="2023/10/30">
<title>'What is a squibid?'</title>
<link rel="stylesheet" href="/style.css">
<link rel="stylesheet" href="/blog/style.css">
<body style="background-color: #161617;">
<p>
Recently, a few people have been asking me: "what is a squibid?" or
"where did your name come from?". In this blog post I will answer those
questions.
<br>
<br>
A few years ago I came up with a drawing of an animal
reason to do anything with it, but regardless I chose to name it a
squibid. Eventually, when trying to find a good username I chose squibid
because that would cover both the username and profile picture.
</p>
]]></description>
</item>
<item>
<title>librex and dots</title>
<guid>https://squi.bid/blog/librex-and-dots</guid>
<link>https://squi.bid/blog/librex-and-dots</link>
<pubDate>Tue, 27 Jun 2023 12:17:35 -0400</pubDate>
<description><![CDATA[
<p>
Hello!
<br><br>
In my first post
<a href="https://squi.bid/blog/state-of-the-site">state of the site</a> I
talked about a searxng instance however I found something better! I am now
running a <a href="https://github.com/hnhx/librex/">librex</a> instance @
https://librex.squi.bid. My only modification to the site is changing the theme
to the <a href="https://github.com/kvrohit/mellow-theme">mellow theme</a>.
<br><br>
As for my dots. I have continued to update my Neovim dotfiles, and I am
currently in the process of making some MPV dot files. After I am done with
my MPV config I'll get to work on putting together a git repo with my dotfiles
(using submodules for the bigger parts of the config like Neovim).
<br><br>
I will also soon be setting up a Matrix account (not instance) but for now
feel free to <a href="mailto:me@zacharyscheiman.com">email me</a>.
<!-- secret message: there also might be some more blogs coming soon -->
</p>
]]></description>
</item>
<item>
<title>It's Alive!</title>
<guid>https://squi.bid/blog/It's-Alive!</guid>
<link>https://squi.bid/blog/It's-Alive!</link>
<pubDate>Mon, 17 Apr 2023 13:22:03 +0000</pubDate>
<description><![CDATA[
<p>
Cloning via http(s) now works!
<br><br>
btw I will be posting my dotfiles soontm
</p>
]]></description>
</item>
<item>
<title>state of the site</title>
<guid>https://squi.bid/blog/state-of-the-site</guid>
<link>https://squi.bid/blog/state-of-the-site</link>
<pubDate>Sat, 11 Mar 2023 15:00:32 -0500</pubDate>
<description><![CDATA[
<p>
Hello o/, and welcome to my website!<br>
As of right now I am still setting things up, I have a git server running but I am
still working on getting cloning to work via https. On top of the git server I also
have a cgit instance which I have gotten close to perfect (for some reason the site
is only sometimes in darkmode).
<br><br>
As of right now that is all I've got running but I might be setting up a SearXNG
instance soon.
<br><br>
However somethings that I will never put on my server are: <br>
- social media frontend's eg: invious, and mastadon <br>
- probably some other things that I can't think about right now <br>
</p>
]]></description>
</item>
</channel>
</rss>