The Easy Way to Get Your Data from Heroku Postgres (in Python)
Just a tiny little tidbit here. I do a lot of data collection in one-off Heroku apps, and one annoying thing that I find myself having to deal with is getting data in and out.
Generally, if you want to upload or download your data in its entirety, you can either use Heroku's pg:push
and pg:pull
commands see documentation to transmit data between remote and local Postgres instances. You can also use Postgres backups. To download a CSV, you can connect directly to psql (psql is the postgres shell). And to upload a CSV, again in psql, you can do something like:
Experimental post: jupyter notebook --> cryogen post
This is an experiment to see if I can get a jupyter notebook to play nicely with Cryogen. It looks like it works! The workflow is to use nbconvert to turn the notebook to markdown, add the required cryogen header map, and then inline the images as base64.
The formatting is far from pretty, alas.
Continue reading →For Clojure Webscraping, Try Jsoup!
When people want to do webscraping in Clojure, the standard recommendation/tutorial library is Enlive. (Example, and another, and there are at least two scraping libraries built on top of Enlive, Pegasus and Skyscraper.)
But Enlive doesn't seem to be really built for scraping. For example, it's very difficult to actually get the text (the rough equivalent of browser api document.innerText
, minus ajax-loaded context) out of a html document, and when you can get text, it comes out badly formatted—e.g., if you just pull all the text from the body tag, you don't get spaces between things like table rows and columns. The best I can come up with to get decently formatted text without just walking all the individual DOM nodes myself is the following tangled mess (where html
is the Enlive html namespace and I've brought replace
and trim
in from clojure.string):
Understanding Java for Clojurists, side-by-side
I've been writing a lot of Clojure for almost two years now, but I never actually learned Java until recently, and primarily for the purpose of being able to use Java libraries in Clojure.
Until actually learning Java, I found interop (calling Java from Clojure code) the most difficult part of Clojure, mainly because Java has a lot of mental complexity that you need to take on—IO, for example, tends to involve all kinds of BufferedThis and StreamingThat and casting across like 5 different types before you can do anything, and most tutorials on Clojure-Java interop assume you know that stuff, and also assume you know, e.g., what a static method is, and that a "class" is the name for both the basic organization of java source code and the name for the compiled output files. Stuff like that.
Continue reading →I think I've figured out how to change text encodings in the browser. I'm not sure, because it's text encodings, but maybe?
So I'm working on a slightly irrational project, namely, extracting citations from MS-Word formatted law review articles and generating bibtex download + on-screen display, in pure client-side Javascript.
This has a lot of steps to it, but the browser is a surprisingly good platform for doing it (and an unbeatable platform for delivery).
Continue reading →