Saturday, April 02, 2016

kdb+/q - Reading and Writing a CSV File

This post shows how you can load a CSV file into kdb and write a table out from kdb to a CSV file.

Reading a CSV file:

Let's say that you'd like to load a file containing comma-separated values into an in-memory table in kdb. For example, here's my CSV file, which contains country populations (source: Worldometers):

$ head -5 worldPopulation.csv
region,country,population
Africa,Algeria,40375954
Africa,Angola,25830958
Africa,Benin,11166658
Africa,Botswana,2303820

My file has got two string columns and one integer column.

I can load it into kdb using the Zero Colon function: (types; delimiter) 0: filehandle

q) data:("SSI";enlist",") 0: `$"/path/to/worldPopulation.csv"
// "SSI" creates 2 symbol columns and one integer column

// let's look at the column types
q) meta data
c         | t f a
----------| -----
region    | s
country   | s
population| i

// check the data
q) 5#data
region country                  population
------------------------------------------
Africa Algeria                  40375954
Africa Angola                   25830958
Africa Benin                    11166658
Africa Botswana                 2303820
Africa Burkina Faso             18633725

// you can use * if you want string columns, instead of symbol ones
q) data:("**I";enlist",") 0: `$"/path/to/worldPopulation.csv"

q) meta data
c         | t f a
----------| -----
region    | C
country   | C
population| i

q) 5#data
region   country        population
----------------------------------
"Africa" "Algeria"      40375954
"Africa" "Angola"       25830958
"Africa" "Benin"        11166658
"Africa" "Botswana"     2303820
"Africa" "Burkina Faso" 18633725

Writing a CSV file:

To save a table to a CSV file, you can use the command: filehandle 0: delimiter 0: table

// first, let's try printing out the csv data to console
q) "," 0: data
"region,country,population"
"Africa,Algeria,40375954"
"Africa,Angola,25830958"
"Africa,Benin,11166658"
"Africa,Botswana,2303820"

// save it
q)(`$"/tmp/out.csv") 0: "," 0: data
`/tmp/out.csv

2 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. if paths to files don't have spaces or dashes, using `:/path/to/file syntax is more convenient since you don't have to worry about precedence of cast call `$. So

    (`$"/tmp/out.csv") 0: "," 0: data

    becomes

    `:/tmp/out.csv 0: "," 0: data

    ReplyDelete