Artyom's Haskell toolbox (deprecated)

NOTE: this page is deprecated. I am not updating it anymore.


By Artyom Kazak aka @availablegreen
January 3, 2021


🧱 This post is written in Brick — a super easy platform for public notes, which I co-founded. If you try it out, I'd love to hear feedback on Twitter or at artyom@brick.do.


 

I work as a Haskell consultant at Monadfix. I've been using Haskell for 10 years. Here are the libraries/tools I use.

Other lists like this:

Ping me on Twitter if you have more, or want your own list to be added.

Table of contents

Missing items: logging (everything sucks?), RPC (I used gRPC-haskell but I think it's not recommended anymore; there's also Mu-Haskell?), metrics, various math stuff, error handling, validation, JWT, OAuth, image/audio manipulation, machine learning, GUIs, compilation to JS (likely GHCJS?), SMTP/IMAP (probably HaskellNet?), URL manipulation (is there anything better than network-uri?), profiling, memory leak search (info table profiling will be nice when it lands in GHC), time/distance/etc units, bytestring builders (see the haskell-perf shootout), memoization (chimera for dense and memoize for sparse), temp files (temporary? UnliftIO.Temporary?)

The list, minus everything that's in the table of contents

AreaWhat and why
Ecosystem search

Hoogle for names, packdeps.haskellers.net/reverse to see how commonly a certain package is used, Serokell's Hackage search for grepping through Hackage.

Well, and sometimes searching through GitHub, too.

GHC Core viewer

ghc-core.

It passes the right flags to GHC for me, and highlights the Core, too.

Prelude

Either Prelude and painfully import things one by one, or base-prelude, or a custom Imports.hs file per project.

Most custom preludes redefine head, or define custom classes, or something like that, and I don't want to deal with it.

Furthermore, in my experience prohibiting unsafe functions from being used doesn't work because people do import Prelude (head) the moment you try to put the slightest obstacle on their path. So I don't like any preludes that aren't made up solely of reexports of established libraries.

Records

record-dot-preprocessor + generic-lens + -XOverloadedLabels.

record-dot-preprocessor gives me nice access syntax, and lenses give me nice #field lenses for updates. It's a pity that I need both (and a pity I need to derive Generic for everything), but whatever. Another point in favor of Ormolu/Fourmolu is that AFAIK it is the only formatter that supports the record-dot-preprocessor syntax.

If you can't use the IsLabel orphan for whatever reason, you can switch from 'lens' to optics.

Finally, AFAIK large records have quadratic typechecking time (as of Jan 2021). A super-recent solution to this problem is Well-Typed's large-records library.

Anonymous records

jrec (the improved fork of superrecord).

I wrote jrec for internal Juspay purposes, it's faster, it's easier to maintain, it's predictable, I like it. Vinyl has too much stuff in it.

Lenses

lens for apps, microlens for libraries.

I wrote microlens, but I see no reason to use it for apps and I don't get why people do. For libraries, not having a ton of dependencies feels nice though.

I don't think optics is better enough for me to switch, but who knows.

Generic traversals

Uniplate and Data.Data.

Uniplate is a bit slow but is amazing for walking through ASTs and other complex structures. It saves me a ton of time. Data.Data is only necessary in a few cases when Uniplate is not enough.

I have examples at How to walk a complex AST generically.

Generics

generics-sop or generics-eot.

When I need to write a generics-based library (e.g. JSON serialization, etc), I reach for 'generics-eot' because it is super easy to use. See Generics are simple: write your own ToJSON.

However, GHC.Generics (and correspondingly 'generics-eot') increase compilation time — sometimes too much — so if I have to derive Generic for thousands of types, I will use 'generics-sop' instead. It is harder to use, but has a Template Haskell–based instance generator. The resulting instances compile slower than the GHC.Generics ones, but anything based on those instances (e.g. JSON encoding) compiles significantly faster, e.g. 2.5x faster with -O — at the cost of worse performance at runtime.

Maps/sets

unordered-containers.

I haven't actually evaluated 'containers' vs 'unordered-containers', but I think the latter is faster, and anything I want to use as a key is usually Hashable, so I default to 'unordered-containers'.

Mutable maps/sets

stm-containers.

A great library. It's fast and I can do atomic updates, which is all I want.

List functions

extra, split, and GHC.Exts.{sortWith,groupWith}.

No particular reason. 'split' is very standard and 'extra' is written by Neil Mitchell so I trust it.

Printf

fmt.

I wrote fmt. It's fast, it has all the formatting primitives I want, and it has the least annoying way to write format strings: format "foo {}, bar {}" foo bar. In practice, I never ended up regretting going for {} instead of something typesafe (although fmt has a typesafe option too).

File paths

filepath.

There are more typesafe libraries, but eh. So far I think I haven't regretted going with the most standard library.

File manipulation and scripting

directory and process.

Again, the most standard libraries.

If I want to zip something, I do callProcess "zip" [dest, dir, "-r"] and so on. Generally, I use callProcess for anything that doesn't have a corresponding function in 'directory', and even for things like rm -rf.

I might try typed-process next time, which is supposed to be strictly better than 'process', and see how it goes.

I know that a lot of people like turtle for scripting, but I really didn't like it. Figuring out how to do things with 'turtle' was too hard when I tried it.

Command-line arguments

optparse-generic if I don't care about the interface all that much, optparse-applicative if I need to control the interface precisely (which is rare).

'optparse-generic' is like 'optparse-applicative', except that instead of defining everything by hand you can just define an ADT for your command-line interface and it will generate the parser by itself.

It has limitations, which I just ignore. Yeah, I can't fully control the resulting interface, but I don't care at this point.

Streaming

conduit?

I am not qualified to compare streaming libraries, and I suspect all of them are either painful, incomplete, full of gotchas, or all three. When I used conduit, it was reasonably fine.

I suspect pipes and streaming are also fine.

If I had to go for the fastest, I would take a look at streamly.

HTML generation

lucid.

It just works and is nicer than blaze-html. A bunch of other HTML libraries popped up recently, but I didn't even look at them.

HTML parsing and scraping

tagsoup.

I don't think you have many other choices. There is also scalpel, which is a wrapper around tagsoup, but I never used it.

Serving

servant with Servant.API.Generic.

Once you realize that you can manipulate type-level trees with type families, Servant becomes easy and nice and delightful. I can write custom combinators for things if I want. I can walk the API and collect whatever info I want about it.

Servant also gives me documentation for free (with servant-swagger), and lets me generate accessors for my API with servant-client.

HTTP requests

wreq for scripting, http-client-tls for everything else, servant-client if I control both the client and the server.

I like 'wreq' but somehow don't fully trust it, so I use 'http-client-tls' whenever I'm doing anything other than scripting.

I don't like req because it seems to be overengineered.

Downloading files

http-conduit-downloader.

BazQux uses it and they seem to have handled all the corner-cases in existence. It seems to be both the easiest to use and the most production-y library that exists, so I'm using it.

Anti-recommendation: download. Doesn't handle TLS, and, well, doesn't even compile for me.

AWS

amazonka.

aws only covers a small subset of AWS, and 'amazonka' covers everything but is sometimes buggy. Pick your poison. (At Wire we picked 'amazonka' and it was mostly fine, but there were some bugs re/ S3 that we didn't know how to work around.)

For Google Cloud, there is gogol from the author of 'amazonka'.

Parsing

megaparsec.

Probably the most common choice now and it's fine for my usecases.

Binary serialization

serialise if I control the format, and otherwise I will probably use cereal.

I like 'serialise' because it uses CBOR as the format, and so I'm not locked into Haskell if I want to work with it.

If I have to operate with an already existing format, then 'cereal', but something better might have appeared in the meantime.

Compression

zstd if I control the format, and otherwise I don't know.

My understanding is that Facebook's 'zstd' (aka Zstandard) is state of the art. I like using anything that is state of the art.

Tweet at me if you have recommendations re/ other compression formats.

Hashing and cryptography

In cryptonite we trust. (Well, but disable AESNI.)

'cryptonite' is a huge box of crypto primitives. I don't think anything else implements all the things I need, so I either have to go hunting for libraries, or "go with cryptonite for everything". I do the latter.

I also worked with Vincent, the author of cryptonite, and I trust him, so there's that.

Randomness

random, or cryptonite if I need secure randomness, or probably pcg-random if I need something super fast, or random-fu if I need a specific distribution.

Usually I don't care and so I go with 'random'. If I do care, it's almost always in the direction of "I need something more secure", and 'random' is not secure for anything crypto-related or generally anything that should be unguessable.

I haven't actually used pcg-random in anger, but I think it's state of the art when it comes to fast random number generators, and as I mentioned, I really really like state of the art. It's likely better than the Mersenne twister.

Regexes

Probably pcre-heavy or pcre2. And if I don't actually need regexes per se, just something that does arbitrary search and replacement, then megaparsec + replace-megaparsec.

Usually, if I want regexes, it's because I want convenience for myself or for my users. This means PCRE over Posix. I think 'pcre-heavy' is alright, but while writing this post I found 'pcre2' and it seems to be better still.

If I want my regexes to be understandable, then I don't actually want regexes, I want parsers. Hence 'replace-megaparsec'.

Anti-recommendations: regex-tdfa and text-icu. Both are buggy. 'regex-tdfa' in particular is based on a tricky algorithm and the original author does not maintain it anymore, so there isn't much chance that any logic bugs in it will be fixed.

Unicode text manipulation

unicode-transforms for normalization; probably unicode-collation for collation; probably unicode-data for the Unicode character info; text-icu for everything else.

'text-icu' binds to the mature and powerful ICU library. Everybody have been using it for a long while. However, a) 'text-icu' hasn't been updated since 2015, b) depending on foreign libraries can be a pain. This is why the community is trying to write pure Haskell replacements for the most commonly needed ICU features.

Note that 'unicode-transforms' is (AFAIK) slightly faster than 'text-icu' for normalization. 'unicode-collation' is 4x slower (as of Apr 2021).

Also note that 'text-icu' is buggy in places. For instance, the .Break module is buggy in a non-deterministic way — see issues #4 and #19. Doesn't quite inspire confidence. Beware.

Markdown

Probably commonmark.

A year ago, my recommendation would be — cmark if I want a standard-compliant parser, mmark if I want something powerful for my own use.

However, I just looked and it seems that 'commonmark' is better than both.

A nice thing is that you can convert commonmark's types into Pandoc types (see commonmark-pandoc) and then use Pandoc's entire ecosystem.

Concurrency

async + STM where possible.

'async' is not super super easy to use, but it's fine.

I also like atomic updates a lot, so I use 'stm' very liberally. My monad stack usually has a Reader with a bunch of TVars in it.

I usually don't use parallelism (par etc).

Retrying IO actions

retry.

Not qualified to compare with other libs, but 'retry' is good.

Keyword arguments

named instead of records or newtypes.

If you want to make functions like foo :: Text -> Text -> Text safer to use (by adding names), you can either define a new record per function, or define a newtype per kind of argument, or use 'named'. I think 'named' is a better solution.

createSymLink ::
  "from" :! FilePath ->
  "to" :! FilePath -> 
  IO ()
Parsing Haskell

ghc-lib-parser, hands down.

ghc-lib-parser is a copy of GHC's own parser, updated regularly. Ormolu uses it. HLint uses it. There is no reason to use (buggy) haskell-src-exts anymore.

Prettyprinting

prettyprinter.

If I want to pretty-print AST into a source form, or something like that, I go with 'prettyprinter'. I haven't tried other libraries much, but it says "modern" and "maintained well", so I like it.

C preprocessor

cpphs.

Sometimes using a C preprocessor (#ifdef etc) is nice, but the Clang preprocessor is different from the GCC preprocessor and overall it's a minefield. (See my guide for using CPP with Haskell.)

When I do anything even remotely nontrivial, I make sure that my code is preprocessed with 'cpphs' and not the system-wide preprocessor.

DIY build system

Shake.

Shake is a Haskell DSL for writing build systems. If you want something Make-like but don't want to learn a new DSL, you might like Shake. I used it for a static site generator and it was nice.

Benchmarking

Criterion? Or maybe tasty-bench?

Criterion is the de-facto benchmarking library in Haskell. tasty-bench is a lightweight alternative.

I used to recommend gauge, a leaner fork of Criterion, but I am told that nowadays Criterion is actually maintained better than 'gauge'. I checked in Apr 2021 and it seems to be the case.

Update Apr 3, 2021: tasty-bench is a new lightweight library with Criterion-compatible API and an out-of-the-box ability to compare benchmark results with previous results. The 'text' library has recently switched to 'tasty-bench'. I might consider it for a next project.

Date and time

time for general-purpose stuff, and clock >=0.8.2 for timestamps specifically.

I don't like 'time'. A lot of people try 'time' because it's the standard, and then say "huh, this is harder than I thought". This said, I don't know of any other popular options. The last release of thyme, for instance, was in 2014.

'clock' is a good option for timestamps, because it lets you decide on the precision/speed tradeoff.