Artyom's Haskell toolbox (deprecated)

NOTE: this page is deprecated. I am not updating it anymore.

By Artyom Kazak aka @availablegreen
January 3, 2021

🧱 This post is written in Brick — a super easy platform for public notes, which I co-founded. If you try it out, I'd love to hear feedback on Twitter or at artyom@brick.do.

I work as a Haskell consultant at Monadfix. I've been using Haskell for 10 years. Here are the libraries/tools I use.

Other lists like this:

Alexander Vershilov's list
State of the Haskell ecosystem by Gabriel Gonzalez

Ping me on Twitter if you have more, or want your own list to be added.

_{Missing items: logging (everything sucks?), RPC (I used}_gRPC-haskell_{but I think it's not recommended anymore; there's also}_Mu-Haskell_{?), metrics, various math stuff, error handling, validation, JWT, OAuth, image/audio manipulation, machine learning, GUIs, compilation to JS (likely GHCJS?), SMTP/IMAP (probably}_HaskellNet_{?), URL manipulation (is there anything better than}_network-uri_{?), profiling, memory leak search (}_{info table profiling}_{will be nice when it lands in GHC), time/distance/etc units, bytestring builders (see}_{the haskell-perf shootout}_{), memoization (}_chimera_{for dense and}_memoize_{for sparse), temp files (temporary? UnliftIO.Temporary?)}

The list, minus everything that's in the table of contents

Area	What and why
Ecosystem search	Hoogle for names, packdeps.haskellers.net/reverse to see how commonly a certain package is used, Serokell's Hackage search for grepping through Hackage. Well, and sometimes searching through GitHub, too.
GHC Core viewer	ghc-core. It passes the right flags to GHC for me, and highlights the Core, too.
Prelude	Either `Prelude` and painfully import things one by one, or base-prelude, or a custom `Imports.hs` file per project. Most custom preludes redefine `head`, or define custom classes, or something like that, and I don't want to deal with it. Furthermore, in my experience prohibiting unsafe functions from being used doesn't work because people do `import Prelude (head)` the moment you try to put the slightest obstacle on their path. So I don't like any preludes that aren't made up solely of reexports of established libraries.
Records	record-dot-preprocessor + generic-lens + `-XOverloadedLabels`. record-dot-preprocessor gives me nice access syntax, and lenses give me nice `#field` lenses for updates. It's a pity that I need both (and a pity I need to derive `Generic` for everything), but whatever. Another point in favor of Ormolu/Fourmolu is that AFAIK it is the only formatter that supports the record-dot-preprocessor syntax. If you can't use the `IsLabel` orphan for whatever reason, you can switch from 'lens' to optics. Finally, AFAIK large records have quadratic typechecking time (as of Jan 2021). A super-recent solution to this problem is Well-Typed's large-records library.
Anonymous records	jrec (the improved fork of superrecord). I wrote jrec for internal Juspay purposes, it's faster, it's easier to maintain, it's predictable, I like it. Vinyl has too much stuff in it.
Lenses	lens for apps, microlens for libraries. I wrote microlens, but I see no reason to use it for apps and I don't get why people do. For libraries, not having a ton of dependencies feels nice though. I don't think optics is better enough for me to switch, but who knows.
Generic traversals	Uniplate and `Data.Data`. Uniplate is a bit slow but is amazing for walking through ASTs and other complex structures. It saves me a ton of time. `Data.Data` is only necessary in a few cases when Uniplate is not enough. I have examples at How to walk a complex AST generically.
Generics	generics-sop or generics-eot. When I need to write a generics-based library (e.g. JSON serialization, etc), I reach for 'generics-eot' because it is super easy to use. See Generics are simple: write your own ToJSON. However, `GHC.Generics` (and correspondingly 'generics-eot') increase compilation time — sometimes too much — so if I have to derive `Generic` for thousands of types, I will use 'generics-sop' instead. It is harder to use, but has a Template Haskell–based instance generator. The resulting instances compile slower than the `GHC.Generics` ones, but anything based on those instances (e.g. JSON encoding) compiles significantly faster, e.g. 2.5x faster with `-O` — at the cost of worse performance at runtime.
Maps/sets	unordered-containers. I haven't actually evaluated 'containers' vs 'unordered-containers', but I think the latter is faster, and anything I want to use as a key is usually `Hashable`, so I default to 'unordered-containers'.
Mutable maps/sets	stm-containers. A great library. It's fast and I can do atomic updates, which is all I want.
List functions	extra, split, and `GHC.Exts.{sortWith,groupWith}`. No particular reason. 'split' is very standard and 'extra' is written by Neil Mitchell so I trust it.
Printf	fmt. I wrote fmt. It's fast, it has all the formatting primitives I want, and it has the least annoying way to write format strings: `format "foo {}, bar {}" foo bar`. In practice, I never ended up regretting going for `{}` instead of something typesafe (although fmt has a typesafe option too).
File paths	filepath. There are more typesafe libraries, but eh. So far I think I haven't regretted going with the most standard library.
File manipulation and scripting	directory and process. Again, the most standard libraries. If I want to zip something, I do `callProcess "zip" [dest, dir, "-r"]` and so on. Generally, I use `callProcess` for anything that doesn't have a corresponding function in 'directory', and even for things like `rm -rf`. I might try typed-process next time, which is supposed to be strictly better than 'process', and see how it goes. I know that a lot of people like turtle for scripting, but I really didn't like it. Figuring out how to do things with 'turtle' was too hard when I tried it.
Command-line arguments	optparse-generic if I don't care about the interface all that much, optparse-applicative if I need to control the interface precisely (which is rare). 'optparse-generic' is like 'optparse-applicative', except that instead of defining everything by hand you can just define an ADT for your command-line interface and it will generate the parser by itself. It has limitations, which I just ignore. Yeah, I can't fully control the resulting interface, but I don't care at this point.
Streaming	conduit? I am not qualified to compare streaming libraries, and I suspect all of them are either painful, incomplete, full of gotchas, or all three. When I used conduit, it was reasonably fine. I suspect pipes and streaming are also fine. If I had to go for the fastest, I would take a look at streamly.
HTML generation	lucid. It just works and is nicer than blaze-html. A bunch of other HTML libraries popped up recently, but I didn't even look at them.
HTML parsing and scraping	tagsoup. I don't think you have many other choices. There is also scalpel, which is a wrapper around tagsoup, but I never used it.
Serving	servant with `Servant.API.Generic`. Once you realize that you can manipulate type-level trees with type families, Servant becomes easy and nice and delightful. I can write custom combinators for things if I want. I can walk the API and collect whatever info I want about it. Servant also gives me documentation for free (with servant-swagger), and lets me generate accessors for my API with servant-client.
HTTP requests	wreq for scripting, http-client-tls for everything else, servant-client if I control both the client and the server. I like 'wreq' but somehow don't fully trust it, so I use 'http-client-tls' whenever I'm doing anything other than scripting. I don't like req because it seems to be overengineered.
Downloading files	http-conduit-downloader. BazQux uses it and they seem to have handled all the corner-cases in existence. It seems to be both the easiest to use and the most production-y library that exists, so I'm using it. Anti-recommendation: download. Doesn't handle TLS, and, well, doesn't even compile for me.
AWS	amazonka. aws only covers a small subset of AWS, and 'amazonka' covers everything but is sometimes buggy. Pick your poison. (At Wire we picked 'amazonka' and it was mostly fine, but there were some bugs re/ S3 that we didn't know how to work around.) For Google Cloud, there is gogol from the author of 'amazonka'.
Parsing	megaparsec. Probably the most common choice now and it's fine for my usecases.
Binary serialization	serialise if I control the format, and otherwise I will probably use cereal. I like 'serialise' because it uses CBOR as the format, and so I'm not locked into Haskell if I want to work with it. If I have to operate with an already existing format, then 'cereal', but something better might have appeared in the meantime.
Compression	zstd if I control the format, and otherwise I don't know. My understanding is that Facebook's 'zstd' (aka Zstandard) is state of the art. I like using anything that is state of the art. Tweet at me if you have recommendations re/ other compression formats.
Hashing and cryptography	In cryptonite we trust. (Well, but disable AESNI.) 'cryptonite' is a huge box of crypto primitives. I don't think anything else implements all the things I need, so I either have to go hunting for libraries, or "go with cryptonite for everything". I do the latter. I also worked with Vincent, the author of cryptonite, and I trust him, so there's that.
Randomness	random, or cryptonite if I need secure randomness, or probably pcg-random if I need something super fast, or random-fu if I need a specific distribution. Usually I don't care and so I go with 'random'. If I do care, it's almost always in the direction of "I need something more secure", and 'random' is not secure for anything crypto-related or generally anything that should be unguessable. I haven't actually used pcg-random in anger, but I think it's state of the art when it comes to fast random number generators, and as I mentioned, I really really like state of the art. It's likely better than the Mersenne twister.
Regexes	Probably pcre-heavy or pcre2. And if I don't actually need regexes per se, just something that does arbitrary search and replacement, then megaparsec + replace-megaparsec. Usually, if I want regexes, it's because I want convenience for myself or for my users. This means PCRE over Posix. I think 'pcre-heavy' is alright, but while writing this post I found 'pcre2' and it seems to be better still. If I want my regexes to be understandable, then I don't actually want regexes, I want parsers. Hence 'replace-megaparsec'. Anti-recommendations: regex-tdfa and text-icu. Both are buggy. 'regex-tdfa' in particular is based on a tricky algorithm and the original author does not maintain it anymore, so there isn't much chance that any logic bugs in it will be fixed.
Unicode text manipulation	unicode-transforms for normalization; probably unicode-collation for collation; probably unicode-data for the Unicode character info; text-icu for everything else. 'text-icu' binds to the mature and powerful ICU library. Everybody have been using it for a long while. However, a) 'text-icu' hasn't been updated since 2015, b) depending on foreign libraries can be a pain. This is why the community is trying to write pure Haskell replacements for the most commonly needed ICU features. Note that 'unicode-transforms' is (AFAIK) slightly faster than 'text-icu' for normalization. 'unicode-collation' is 4x slower (as of Apr 2021). Also note that 'text-icu' is buggy in places. For instance, the `.Break` module is buggy in a non-deterministic way — see issues #4 and #19. Doesn't quite inspire confidence. Beware.
Markdown	Probably commonmark. A year ago, my recommendation would be — cmark if I want a standard-compliant parser, mmark if I want something powerful for my own use. However, I just looked and it seems that 'commonmark' is better than both. A nice thing is that you can convert commonmark's types into Pandoc types (see commonmark-pandoc) and then use Pandoc's entire ecosystem.
Concurrency	async + STM where possible. 'async' is not super super easy to use, but it's fine. I also like atomic updates a lot, so I use 'stm' very liberally. My monad stack usually has a `Reader` with a bunch of `TVar`s in it. I usually don't use parallelism (`par` etc).
Retrying IO actions	retry. Not qualified to compare with other libs, but 'retry' is good.
Keyword arguments	named instead of records or newtypes. If you want to make functions like `foo :: Text -> Text -> Text` safer to use (by adding names), you can either define a new record per function, or define a newtype per kind of argument, or use 'named'. I think 'named' is a better solution. `createSymLink :: "from" :! FilePath -> "to" :! FilePath -> IO ()`
Parsing Haskell	ghc-lib-parser, hands down. ghc-lib-parser is a copy of GHC's own parser, updated regularly. Ormolu uses it. HLint uses it. There is no reason to use (buggy) haskell-src-exts anymore.
Prettyprinting	prettyprinter. If I want to pretty-print AST into a source form, or something like that, I go with 'prettyprinter'. I haven't tried other libraries much, but it says "modern" and "maintained well", so I like it.
C preprocessor	cpphs. Sometimes using a C preprocessor (`#ifdef` etc) is nice, but the Clang preprocessor is different from the GCC preprocessor and overall it's a minefield. (See my guide for using CPP with Haskell.) When I do anything even remotely nontrivial, I make sure that my code is preprocessed with 'cpphs' and not the system-wide preprocessor.
DIY build system	Shake. Shake is a Haskell DSL for writing build systems. If you want something Make-like but don't want to learn a new DSL, you might like Shake. I used it for a static site generator and it was nice.
Benchmarking	Criterion? Or maybe tasty-bench? Criterion is the de-facto benchmarking library in Haskell. tasty-bench is a lightweight alternative. I used to recommend gauge, a leaner fork of Criterion, but I am told that nowadays Criterion is actually maintained better than 'gauge'. I checked in Apr 2021 and it seems to be the case. Update Apr 3, 2021: tasty-bench is a new lightweight library with Criterion-compatible API and an out-of-the-box ability to compare benchmark results with previous results. The 'text' library has recently switched to 'tasty-bench'. I might consider it for a next project.
Date and time	time for general-purpose stuff, and clock >=0.8.2 for timestamps specifically. I don't like 'time'. A lot of people try 'time' because it's the standard, and then say "huh, this is harder than I thought". This said, I don't know of any other popular options. The last release of thyme, for instance, was in 2014. 'clock' is a good option for timestamps, because it lets you decide on the precision/speed tradeoff.

Artyom's Haskell toolbox (deprecated)

Table of contents

The list, minus everything that's in the table of contents