Haskell 笔记9

April 13, 2017

这是第九章的笔记

Hello, world!

Files and streams

getContents：从标准输入读数据，返回IO Action，一直到文件末尾。

-- print lines whose length is less than 10

main = do
  lines <- getContents
  putStr . shortLines $ lines

shortLines :: String -> String
shortLines =
  unlines . filter ((< 10) . length) . lines
  -- Function composition简直好用

interact：接收一个类型为String -> String的函数作为参数，返回一个IO Action，这个IO Action接收一个输入，用之前的String -> String函数处理之后，然后返回一个IO Action，这个IO Action执行的时候会输出处理之后的字符串。

main = interact shortLines

shortLines = unlines . filter ((<5) . length) . lines

甚至可以写成一行：

main = interact $ unlines . filter ((<5) . length) . lines

从标准输入读入一个字符串，判断这个是字符串是否是回文：

main = interact palindrome

palindrome :: String -> String
palindrome =
  unlines . map (\ str ->
				   if str == reverse str then "Yes"
				   else "No") . lines

虽然我们写的程序是把一个字符串处理成了另外一个字符串，但是运行的时候，它的表现却是：读入一个字符串，输出结果，然后读入下一个字符串，输出结果，一直到文件结尾。这是因为Haskell的懒惰性质。

读文件并且输出文件内容：

import System.IO

main =
  withFile "girlfriend.txt" ReadMode ( \ h -> do
										 lines <- hGetContents h
										 putStr lines)

FilePath类型是String的一个别名：

λ> :info FilePath
type FilePath = String 	-- Defined in ‘GHC.IO’

openFile函数的类型是：

λ> :t openFile
openFile :: FilePath -> IOMode -> IO Handle

它接收一个文件路径和IOMode，返回一个IO Action，这个IO Action会用指定的模式打开文件，然后把Handle封装到返回的IO Action里。

hClose的类型；

λ> :t hClose
hClose :: Handle -> IO ()

它用来关闭文件，接收一个文件句柄作为参数。

IOMode类型的定义：

λ> :info IOMode
data IOMode = ReadMode | WriteMode | AppendMode | ReadWriteMode
  	-- Defined in ‘GHC.IO.IOMode’
instance Enum IOMode -- Defined in ‘GHC.IO.IOMode’
instance Eq IOMode -- Defined in ‘GHC.IO.IOMode’
instance Ord IOMode -- Defined in ‘GHC.IO.IOMode’
instance Read IOMode -- Defined in ‘GHC.IO.IOMode’
instance Show IOMode -- Defined in ‘GHC.IO.IOMode’

所以它是一个枚举类型。

hGetContents函数的类型：

λ> :t hGetContents
hGetContents :: Handle -> IO String

它接收一个Handle，比如openFile的返回值包装的类型。它和getContents函数类似。类似的还有hGetLine、hPutStr、hPutStrLn、hGetChar等。

readFile的类型是：

λ> :t readFile
readFile :: FilePath -> IO String

它用来读文件，比openFile、hGetContents的组合或者withFile方便一点。比如上面的例子可以写成：

import System.IO

main = do
  lines <- readFile "girlfriend.txt"
  putStr lines

用这个函数的时候，Haskell会自动关闭打开的文件。

writeFile的类型：

λ> :t writeFile
writeFile :: FilePath -> String -> IO ()

这个函数用来写文件，如果要写入的文件已经存在则会覆盖：

import System.IO
import Data.Char

main = do
  lines <- readFile "girlfriend.txt"
  writeFile "girlfriendcaps.txt" . map toUpper $ lines

appendFile这个函数和writeFile类似，只不过它是追加文件：

import System.IO

main = do
  putStrLn "Add a TODO: "
  todo <- getLine
  appendFile "todo.txt" $ todo ++ "\n"
  putStrLn "All of TODOs:"
  lines <- readFile "todo.txt"
  putStr lines

getLine读入的字符串不包括换行符，所有要追加一个。

hSetBuffering这个函数的类型：

λ> :t hSetBuffering
hSetBuffering :: Handle -> BufferMode -> IO ()

它用来设置buffer的模式，BufferMode的定义：

λ> :info BufferMode
data BufferMode
  = NoBuffering | LineBuffering | BlockBuffering (Maybe Int)

NoBuffering表示一次读一个字符，LineBuffering一次读一行，对于文本文件这是默认行为，对于二进制文件，默认的是BlockBuffering Nothing也就是按照操作系统的chunk读取，也可以用BlockBuffering (Just 1024)来指定块的大小。

hFluch可以刷新buffer，它的类型：

λ> :t hFlush
hFlush :: Handle -> IO ()

openTempFile的类型：

λ> :t openTempFile
openTempFile :: FilePath -> String -> IO (FilePath, Handle)

它接收一个目录路径作为存放临时文件的位置，然后是一个文件名字，作为临时文件的文件名的模板，比如”temp”代表临时文件的文件名是”temp_“加上一些随机字符。返回一个IO Action，包含一个pair：临时文件的文件名和文件句柄。

在todo里面删除一个条目：

import System.IO
import System.Directory
import Data.List

main = do
  handle <- openFile "todo.txt" ReadMode
  (tempName, tempHandle) <- openTempFile "/tmp" "todo"

  contents <- hGetContents handle
  let oldTasks = lines contents
	  oldTasksNumber = zipWith
		(\n line -> show n ++ " - " ++ line) [1..] oldTasks

  putStrLn . unlines $ oldTasksNumber
  putStrLn "Enter the item number you want to delete:"

  input <- getLine
  let pos = read input
	  newTasks = delete (oldTasks !! (pos - 1)) oldTasks

  hPutStr tempHandle . unlines $ newTasks
  hClose handle
  hClose tempHandle
  
  renameFile tempName "todo.txt"

  putStrLn "New Todo list:"
  contents <- readFile "todo.txt"
  putStrLn contents

renameFile函数的类型：

λ> :t renameFile
renameFile :: FilePath -> FilePath -> IO ()

接收两个FilePath，第一个是就文件名，第二个是新文件名。这个函数来自System.Directory模块。

Command line arguments

System.Environment模块包含两个IO Action用来处理命令行参数：getArgs和getProgName。

getArgs的类型：

λ> :t getArgs
getArgs :: IO [String]

它包含程序运行的时候用户提供的命令行参数。

getProgName的类型：

λ> :t getProgName
getProgName :: IO String

它包含当前运行的程序的名字。

用命令行参数来重写上面的程序：

import System.IO
import System.Directory
import System.Environment
import Data.List

dispatch :: [(String, [String] -> IO ())]
dispatch = [ ("add", add)
		   , ("view", view)
		   , ("remove", remove)]

main = do
  (command : args) <- getArgs
  let (Just action) = lookup command dispatch
  action args

add :: [String] -> IO ()
add [fileName, todoItem] = appendFile fileName $ todoItem ++ "\n"

view :: [String] -> IO ()
view [fileName] = do
  contents <- readFile fileName
  mapM_ putStrLn .
	zipWith (\n line -> show n ++ " - " ++ line) [0..] .
	lines $ contents

remove :: [String] -> IO ()
remove [filename, pos] = do
  (tempFile, tempHandle) <- openTempFile "/tmp" "TODO"
  content <- readFile filename
  let oldTasks = lines content
	  index = read pos
	  targetItem = oldTasks !! index
	  newTasks = delete targetItem oldTasks

  putStrLn $ "The Item you have removed: " ++ targetItem
  hPutStr tempHandle $ unlines newTasks
  hClose tempHandle
  renameFile tempFile filename

运行效果：

$ ./todo add TODO 红昭愿
$ ./todo add TODO 九九八十一
$ ./todo add TODO 东京不太热
$ ./todo view TODO
0 - Slackware
1 - Emacs
2 - 红昭愿
3 - 九九八十一
4 - 东京不太热
$ ./todo remove 3
todo: todo.hs:(27,1)-(38,30): Non-exhaustive patterns in function remove

$ ./todo remove TODO 3
The Item you have removed: 九九八十一
$ ./todo view TODO
0 - Slackware
1 - Emacs
2 - 红昭愿
3 - 东京不太热

在这个程序里我们把提供的命令行参数用一个关联表映射到不同的函数。这种实现方法的优势是：如果要添加功能，要做的事情很简单，在dispatch里面加一个entry，然后实现对应的函数就行了。比如，添加一个bump函数，它用来把一个项目放到TODO的顶部：

dispatch :: [(String, [String] -> IO ())]
dispatch = [ ("add", add)
		   , ("view", view)
		   , ("remove", remove)
		   , ("bump", bump)]

bump :: [String] -> IO ()
bump [fileName, pos] = do
  (tempFile, tempHandle) <- openTempFile "/tmp" "TODO"
  content <- readFile fileName
  let oldTasks = lines content
	  index = read pos
	  targetItem = oldTasks !! index
	  newTasks = targetItem : (delete targetItem oldTasks)

  putStrLn $ "You have bumpped: " ++ targetItem
  hPutStr tempHandle . unlines $ newTasks
  hClose tempHandle
  renameFile tempFile fileName

运行效果：


$ ./todo view TODO
0 - Slackware
1 - Emacs
2 - 红昭愿
3 - 东京不太热
$ ./todo bump 2
todo: todo.hs:(42,1)-(53,30): Non-exhaustive patterns in function bump

$ ./todo bump TODO 2
You have bumpped: 红昭愿
$ ./todo view TODO
0 - 红昭愿
1 - Slackware
2 - Emacs
3 - 东京不太热

Randomness

System.Random这个模块用来产生随机数。

random这个函数的类型是：

λ> :t random
random :: (RandomGen g, Random a) => g -> (a, g)

这里有两个typeclass constraint，一个是RandomGen，一个是Random。RandomGen这个typeclass是为能够作为随机值来源的类型提供的。Random这个typeclass是为能够产生随机值的类型提供的，比如Int、Bool、Double等等。所以要产生一个随机数，必须要有RandomGen的一个实例。

可以在Ghci里面输入:info Random来看那些类型是Random的实例：

λ> :info Random
class Random a where
  randomR :: RandomGen g => (a, a) -> g -> (a, g)
  random :: RandomGen g => g -> (a, g)
  randomRs :: RandomGen g => (a, a) -> g -> [a]
  randoms :: RandomGen g => g -> [a]
  randomRIO :: (a, a) -> IO a
  randomIO :: IO a
  	-- Defined in ‘System.Random’
instance Random Word -- Defined in ‘System.Random’
instance Random Integer -- Defined in ‘System.Random’
instance Random Int -- Defined in ‘System.Random’
instance Random Float -- Defined in ‘System.Random’
instance Random Double -- Defined in ‘System.Random’
instance Random Char -- Defined in ‘System.Random’
instance Random Bool -- Defined in ‘System.Random’

System.Random模块里面有一个类型：StdGen，它是RandomGentypeclass的一个实例。要产生一个random generator，可以用mkStdGen这个函数，它的类型是：

λ> :t mkStdGen
mkStdGen :: Int -> StdGen

它接收一个整数，返回一个StdGen类型的数据。可以这样产生随机数：

λ> random (mkStdGen 10) :: (Int, StdGen)
(-2774747785423059091,1925364037 2103410263)

返回的tuple，里面第一个数字是我们需要的随机数，第二个是另外一个新的random generator。如果再次运行上面这句话的话，会产生同样的结果。可以给mkStdGen提供不同的参数来产生不同的随机数据。注意这里我们必须指定random函数的返回类型，因为random函数有两个typeclass constraint，第二个typeclass constraint是Random a，也就是说，必须要指定属于Random这个typeclass的某个Type。

λ> random (mkStdGen 20) :: (Double, StdGen)
(0.9003264271598876,356856746 2103410263)

模拟抛硬币三次：

import System.Random

threeCoins :: StdGen -> (Bool, Bool, Bool)
threeCoins gen =
  let (firstCoin, newGen) = random gen
	  (secondCoin, newGen') = random newGen
	  (thirdCoin, _) = random newGen'
  in (firstCoin, secondCoin, thirdCoin)


main = do
  mapM_ putStrLn . map (show . threeCoins . mkStdGen) $ [1..10]

运行结果：

./threeCoins 
(True,False,True)
(True,True,False)
(True,True,False)
(True,False,False)
(True,True,True)
(True,False,True)
(True,False,True)
(True,True,False)
(True,False,False)
(True,True,True)

randoms这个函数接收一个generator，返回一个无限列表：

λ> :t randoms
randoms :: (RandomGen g, Random a) => g -> [a]
λ> take 5 $ randoms (mkStdGen 10) :: [Bool]
[True,True,True,False,True]
λ> take 5 $ randoms (mkStdGen 10) :: [Int]
[-2774747785423059091,-5364865979222864935,5005192715100199576,-2238708107678760508,-1609484772912886991]

randomR这个函数可以接收一个tuple和一个random generator，tuple用来指定产生的随机数的范围：

λ> randomR (1, 10) (mkStdGen 101)
(6,4081428 40692)
λ> randomR (7, 10) (mkStdGen 101)
(10,4081428 40692)

randomRs接收和randomR同样的参数，但是它产生一个无限列表：

λ> take 10 $ randomRs (7, 10) (mkStdGen 101)
[10,8,10,8,7,8,7,7,8,10]

可以，这些随机数和IO Action又什么关系呢？到现在位置，我们都是手动输入参数给mkStdGen函数，每次产生的随机数都是一样的，这并没有什么用。所以，System.Random这个模块提供了一个getStdGenIO Action，它的类型是IO StdGen，当程序运行的时候，它会像操作系统请求一个generator，然后存放在一个全局变量里面，getStdGen这个函数可以取得那个全局变量里面的generator：

import System.Random

main = do
  gen <- getStdGen
  putStrLn $ take 20 $ randomRs ('a', 'z') gen

这样程序每次运行的时候，结果是不一样的：

./random_string 
xacniwkagyijqdnvktsr
./random_string 
zqonszmyizygzgtstzar

可以用netStdGen这个IO StdGen来更新全局的generator。

一个让用户猜数的程序：

import System.Random
import Control.Monad (when)

main = do
  gen <- getStdGen
  askForNumber gen

askForNumber :: StdGen -> IO ()
askForNumber gen = do
  let (randNumber, newGen) = randomR (1, 10) gen :: (Int, StdGen)
  putStrLn "Which number in the range from 1 to 10 am I thinking of? "
  numberString <- getLine
  when (not $ null numberString) $ do
	let number = read numberString
	if number == randNumber
	  then putStrLn "You are correct!"
	  else putStrLn $ "Sorry, it was " ++ show randNumber
	askForNumber newGen

这个程序生成一个数字，然后用户只允许猜一次，然后重新生成一个新数字。

ByteStrings

Data.ByteString模块里面的函数没有懒惰的属性，功能和Data.List里面对应的函数基本一样。

Data.ByteStringLazy模块里面的函数有懒惰的属性，但是读取单位是以chunk为单位的，chunk的大小是64kb。

因为这两个模块里面的函数和Data.List里面的函数名字会冲突，所以一般用带有限定符的方式导入模块：

import qualified Data.ByteStringLazy as B
import qualified Data.ByteString as S

pack函数的类型：

λ> :t B.pack
B.pack :: [GHC.Word.Word8] -> B.ByteString

pack这个函数接收一个word8的列表，返回一个ByteString对象。

Word8类型是一个只有8位的整数。

unpack这个函数的作用相反，接收一个ByteString对象，返回一个word8列表。


Prelude BS BSL> :t BS.unpack 

BS.unpack :: BS.ByteString -> [GHC.Word.Word8]

fromChunks这个函数接收一个ByteString列表，然后返回一个lazy bytestring。


Prelude BS BSL> :t BSL.fromChunks 

BSL.fromChunks :: [BS.ByteString] -> BSL.ByteString 

Prelude BS BSL> BSL.fromChunks [BS.pack[97, 98]] 

"ab" 

Prelude BS BSL> :t BSL.fromChunks [BS.pack[97, 98]] 

BSL.fromChunks [BS.pack[97, 98]] :: BSL.ByteString

这个函数的行为和我预期的相反，我感觉这个函数名字不恰当。其实从另外一个角度想的话，也能够说得通：把一个ByteString的列表（可以想象成chunks）转化成一个lazy bytestring。也就是说它接收一些strict bytestrings（也就是chunks），然後組成一個lazy bytestring。

toChunks這個函數和fromChunks相反。

cons和cons'類似于Data.List中的(:)，它們在ByteString的前面追加一個值，不過cons是lazy的，也就是說，即使第一個chunk沒有滿，也會自動追加一個新的chunk。這種時候'cons就很有用了，它對strict ByteString進行操作。

對於文件的操作，也有ByteString的版本：

import System.Environment 

import qualified Data.ByteString.Lazy as BSL 


main = do 
  (fileSrc : fileDst : _) <- getArgs 
  copyFile fileSrc fileDst 
  

copyFile :: FilePath -> FilePath -> IO () 
copyFile src dst = do 
  contents <- BSL.readFile src 
  BSL.writeFile dst contents

Exceptions

輸出一個文本文件有多少行：


import System.Environment 
import System.IO 
  

main = do 
  (fileName : _) <- getArgs 
  contents <- readFile fileName 
  putStrLn $ "The file has " ++ (show . length . lines $ contents) ++ " lines."

可以預先判斷文件是否存在：


import System.Environment 
import System.IO 
import System.Directory 
  

main = do 

  (fileName : _) <- getArgs 
  fileExist <- doesFileExist fileName 

  if fileExist 
	then do 
	contents <- readFile fileName 
	putStrLn $ "The file has " ++ (show . length . lines $ contents) ++ " lines." 
	else do 
	putStrLn "file does not exist"

也可以加入異常處理：


import System.Environment 
import System.IO 
import System.IO.Error 
import Control.Exception 

main = do 
  toTry `catch` handler 

toTry :: IO () 

toTry = do 
  (fileName : _) <- getArgs 
  contents <- readFile fileName 
  putStrLn $ "The file has " ++ (show . length . lines $ contents) ++ " lines." 

handler :: IOError -> IO () 

handler e = do putStrLn "we have some trouble."

Learn you a Haskell裏面的程序過時了，catch這個函數不存在于System.IO.Error這個模塊裏面了，而是存在于Control.Exception裏。

catch這個函數接收兩個參數，第一個是IO Action，表示想要執行的動作，第二個是handler，表示如果在第一個IO Action裏面出現了異常，那麼就由這個handler函數來捕捉並處理這個異常。

也可以捕捉特定的異常：


handler :: IOError -> IO () 
handler e 
  | isDoesNotExistError e = putStrLn "The file doesn't exist!" 
  | otherwise = ioError e

這個函數裏面，用到了兩個函數isDoesNotExistError，它接收一個IOError對象，返回一個Bool對象，如果IOError對象是一個文件不存在異常的話返回真。ioError接收一個IOError對象，返回一個IO Action，它會繼續拋出這個異常。

可以從異常對象中得到一些信息，比如文件名：


handler :: IOError -> IO () 
handler e 
  | isDoesNotExistError e = 
case ioeGetFileName e of 
        Just path -> putStrLn $ "The file " ++ path ++ " doesn't exist!" 
        Nothing -> putStrLn "Whoops!" 
  | otherwise = ioError e

ioeGetFileName函數接收一個IOError對象，返回一個maybe String，也就是文件路徑。