The unsafe side

I like Haskell and Purescript, and I love telling people how safe they are. However, they are not perfect, there are still unsafe aspects and I have to be honest about them. “If it compiles, it works” isn’t always true.

Stack Safety

The first question I got from a friend learning Haskell was, How do you deal with stack overflow if you use recursion that much? I don’t know Haskell’s runtime very well, but I heard it’s like reducing a graph, so there isn’t really a stack, and everything is in the heap. Of course that doesn’t mean there are infinity memory to use, so it’s still something to pay attention.

For example, List is implemented as a linked list from the first element, so it is better folded from the left side instead of the right. When folding from the left side, the GC can discard unused values while folding of the right side means the runtime have to keep the entire array in memory, which probably causes stack overflow.

foldr (+) 0 [0..10000000000000] -- Stack overflow
foldl (+) 0 [0..10000000000000] -- This works

Another technique is tail call optimisation (TCO), by having the recursive call in the end of the function, the compiler is able to optimise the function into a loop.

-- Without TCO
count :: [a] -> Int
count [] = 0
count (_:xs) = 1 + count xs

-- With TCO
count :: [a] -> Int
count = count' 0
  where count' _ [] = 0
        count' c (_:xs) = count' (c + 1) xs

But the environment is different for Purescript, it runs on the Javascript runtime, which means there is an actual stack limit, and TCO are not always that straight forward in practice. Here is a blog post on benchmarking frontend libraries of functional languages, interestingly, the author wasn’t able to implement the benchmark application with purescript-pux at the first try, because he got a stack overflow error. The problem came from a dependency it’s using, the data structure was traversed with a non-stack-safe way, the depth of stack increase as the data grows then eventually overflows. The solution is clever enough though, the Free type provided by purescript-free is recursive, but it can actually be traversed in a stack-safe way, thanks to MonadRec, Here is a paper if you are interested, the basic idea is, some monads can be implemented in a stack-safe way (and therefore in the MonadRec class), the tailRecM function provided by the class supports TCO, so any function that use tailRecM (such as runFreeM) is stack-safe too.

Although the problem is kind of solved, it always makes me worry about stack when I am writing Purescript (paranoid, I know). Is it safe to recurse here? How do I know if a certain function is stack-safe or not? Just one non-stack-safe functions is enough to propagate to some unexpected places and cause troubles. I can always look at the compiled Javascript to see if TCO is working, but is that a good solution?

When I was reading the Stack safety for Free paper, it reminded me of reading documentations about thread safety when dealing with CoreData on iOS. These things are thread safe while those things are not, you have to be careful. It wasn’t that terrible, and I don’t have any better idea anyway, but it’s quite interesting that getting rid of thread-safety issues via the FP approach invites some similar stack-safety issues.

Partial Functions

Every Haskell beginner must encounter the List type, a page on Haskell’s site introduces with the head function, which takes the first element out of a list. That though, is already unsafe.1 The type [a] -> a indicates the function can return an element for a given list, but what if you give it an empty list?

Prelude> head []
*** Exception: Prelude.head: empty list

Functions like this are called partial functions, they don’t really “handle all the possible values”. I think this is unsafe because 1. the type does not describe the function’s behaviour well. 2. There is no way to tell if a function is partial or not (unless you look at the implementation or read the documents).

In other languages such as Java and Swift, if a function throws, you have to indicate it explicitly and some codes will have to catch it eventually, but that doesn’t happens with Haskell’s partial functions. Of course there are mechanisms2 to handle errors in Haskell, but “not catching a partial function” is not a type error. (Here is how to catch the exception)

I was surprised by how many partial functions are there in libraries of a safety-branded-language, especially at how are they innocently named. For example, fromJust, and (^?!), I just wouldn’t expect them to be unsafe at first sight.

In Purescript, calling a partial function yields an error unless you “convert” it to a normal function (or mark the call site as partial as well), which I think is a better design.

fromJust :: forall a. Partial => Maybe a -> a
fromJust (Just a) = a

foo :: Int
foo = fromJust (Just 1) -- Compiler error

bar :: Int
bar = unsafePartial (fromJust (Just 1)) -- This is fine

baz:: Partial=> Int
baz = fromJust (Just 1) -- This is fine

There are also error and fail, which stop the execution and I don’t think is a good practice too.

Infinite Value

Haskell is lazy, a value is a thunk until it is evaluated. It can make codes efficient, but it’s also tricky. For example, you can have an infinite list: [(0 :: Int)..], the type is [Int], just like a normal list, except it’s never ending. It’s useful in cases like, mapping with index:

zipWith (\index value -> {- some code here -} ) [0..] xs

You can also map it, but what happens when you fold it? It never ends, so the fold function never returns.

fmap (+ 1) [0..]  -- ok
foldl (+) [0..] -- wow this is not ok

While I never really used any infinite value extensively, I always find them a little bit worrying. The type doesn’t give a hint, everything looks absolutely fine (Haskellers trust types), and yet it doesn’t work (or I should say, it doesn’t stop working) in runtime.

Purescript doesn’t have this problem because it’s not lazy, and an infinite list simply has a different type.

Side Effects

Haskell is pure. Except some functions. There are functions with the unsafe prefix on them, they are like cheats. For example unsafePerformIO is, as it’s named, perform side effect upon evaluated. This means an apparently-pure value may be impure at all. People generally avoid using unsafe functions, and I do understand it’s hard to be 100% pure. It’s just the existence of such functions breaks some guarantees.

There is also lazy IO, here is a piece of code copied from Haskell’s site.

wrong = do
    fileData <- withFile "test.txt" ReadMode hGetContents
    putStr fileData
    
right = withFile "test.txt" ReadMode $ \handle -> do
    fileData <- hGetContents handle
    putStr fileData

The difference is subtle, the types are correct so both of them compiles, but the first one doesn’t work. It’s because fileData is lazy, it’s a thunk before putStr is executed, and withFile closes the file handle before putStr is executed.

I recently had a similar issue when fetching resources from S3, the library uses a type similar to MonadResource m => m (ConduitT () ByteString (ResourceT IO) ()). I thought, there is a ResourceT in the conduit, I guess it should handle its own resource well, can I just do this?

getFile = runResourceT ......

do
  conduit <- getFile
  result <- runConduitRes $ conduit .| sink

It didn’t work. After runResourceT, the connection was closed, so the conduit couldn’t read properly. The solution is to move the runResourceT outside and read the conduit before the connection is closed.

getFile = ......

runResourceT $ do
  conduit <- getFile
  result <- runConduitRes $ conduit .| sink

I really wish there was an error when I access resources improperly.

Then what?

You may be thinking, so these safety-branded-languages are not that safe after all, what’s the point then? Well, I think it’s still better than the alternatives, the community generally aware of these unsafe aspects and try to be as safe as possible. These pure-strongly-typed languages still have a safer default environment compare to their alternatives. The most important part is, if I really care about safety in certain aspects, the type system is strong enough for me to express the “safeness”. For example I may define my own newtype Infinite a = Infinite a, just to mark as a reminder for myself that something worths paying extra attention. By the way, I think Purescript is overall doing a really good job on making things safer.

1

If you are curious about why is head implemented as a partial function, here is a SO Q&A

2

Maybe, Either, MonadCatch, MonadFail