The Next Book And Getting Locked Out Of Arxiv

Sun Mar 31, 2019

The votes are in, we've confirmed with a quorum-reaching meeting that in May we'll be starting a full read-through of Paradigms of Artificial Intelligence Programming, aka PAIP. I've read large chunks of the Norvig book in order to build some prolog-like logic engines, but I've never gone through the whole thing, and certainly not in a group setting.

This should be an interesting year or two, reading-wise.

If you're inclined to join us, either in Toronto or remotely, feel free to contact either dann or myself.

Papers Update

The vellum project also proceeds apace. This week was mostly data-poking. As I mentioned; we're trying to build a model of all computer science papers in an effort to create a CS curriculum generator.

The next part of our process needs a bunch of test data, and I'm sure you can see where this is going based on the title of the post.

We collected a bunch of URLs to various arxiv.org Computer Science papers, then pointed aria2 at them using

aria2c -x 10 --user-agent="Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0" -i list.csv

This was probably not the best idea. Ideally, you'd write a long-running process to extract a bunch of data over a long period of time, kind of like I did with the BGG API a while back. This time, I got seriously ahead of myself and decided that multiple threads pulling papers was a good idea for some reason.

Our access got blocked once we hit about 300 papers downloaded, so we did still get our hands on enough test data to start poking around at things. We briefly considered alternative ways of getting a bunch more papers, including arxiv wardriving and the use of Amazon Lambda, but ultimately figured that the most ethical approach was to just look into contacting them and seeing if they'd be willing to fill a terabyte drive or two with Computer Science writings. Although the other ideas we had seemed like they'd do pretty well at either Defcon or Sigbovik or possibly both.

I guess I'll let you know how it goes.


Creative Commons License

all articles at langnostic are licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License

Reprint, rehost and distribute freely (even for profit), but attribute the work and allow your readers the same freedoms. Here's a license widget you can use.

The menu background image is Jewel Wash, taken from Dan Zen's flickr stream and released under a CC-BY license