When people want to do webscraping in Clojure, the standard recommendation/tutorial library is Enlive. (Example, and another, and there are at least two scraping libraries built on top of Enlive, Pegasus and Skyscraper.)
But Enlive doesn't seem to be really built for scraping. For example, it's very difficult to actually get the text (the rough equivalent of browser api
document.innerText, minus ajax-loaded context) out of a html document, and when you can get text, it comes out badly formatted—e.g., if you just pull all the text from the body tag, you don't get spaces between things like table rows and columns. The best I can come up with to get decently formatted text without just walking all the individual DOM nodes myself is the following tangled mess (where
html is the Enlive html namespace and I've brought
trim in from clojure.string):
I've been writing a lot of Clojure for almost two years now, but I never actually learned Java until recently, and primarily for the purpose of being able to use Java libraries in Clojure.
Until actually learning Java, I found interop (calling Java from Clojure code) the most difficult part of Clojure, mainly because Java has a lot of mental complexity that you need to take on—IO, for example, tends to involve all kinds of BufferedThis and StreamingThat and casting across like 5 different types before you can do anything, and most tutorials on Clojure-Java interop assume you know that stuff, and also assume you know, e.g., what a static method is, and that a "class" is the name for both the basic organization of java source code and the name for the compiled output files. Stuff like that.Continue reading →
This has a lot of steps to it, but the browser is a surprisingly good platform for doing it (and an unbeatable platform for delivery).Continue reading →
Here's a situation that comes up a lot. I want to throw together a quick CRUD app, or, for my uses, more of a C app (i.e., "hey research assistants, dump all this stuff into a database"). I could use a google form or something, but that always leads to weird output, like bizarre formatting, out in google sheets.
So I'm getting into Flask on Heroku for this kind of thing, for several reasons:Continue reading →
I'm learning Java, for two reasons.
First, I really like Clojure, but I constantly run into barriers where the only way to do something is to drop down to Java code that I don't understand.