The votes are in, we've confirmed with a quorum-reaching meeting that in May we'll be starting a full read-through of Paradigms of Artificial Intelligence Programming, aka
PAIP. I've read large chunks of the Norvig book in order to build some prolog-like logic engines, but I've never gone through the whole thing, and certainly not in a group setting.
This should be an interesting year or two, reading-wise.
vellum project also proceeds apace. This week was mostly data-poking. As I mentioned; we're trying to build a model of all computer science papers in an effort to create a CS curriculum generator.
The next part of our process needs a bunch of test data, and I'm sure you can see where this is going based on the title of the post.
aria2c -x 10 --user-agent="Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0" -i list.csv
This was probably not the best idea. Ideally, you'd write a long-running process to extract a bunch of data over a long period of time, kind of like I did with the BGG API a while back. This time, I got seriously ahead of myself and decided that multiple threads pulling papers was a good idea for some reason.
Our access got blocked once we hit about 300 papers downloaded, so we did still get our hands on enough test data to start poking around at things. We briefly considered alternative ways of getting a bunch more papers, including arxiv wardriving and the use of Amazon Lambda, but ultimately figured that the most ethical approach was to just look into contacting them and seeing if they'd be willing to fill a terabyte drive or two with Computer Science writings. Although the other ideas we had seemed like they'd do pretty well at either Defcon or Sigbovik or possibly both.
I guess I'll let you know how it goes.