Starter Website

I often find myself setting up a website project, and all the configuration that goes with it. This set up task can be fairly repetitive, so I took some of the basic features I almost always start with, and made this project.

The project does have a couple “extra” features beyond an absolute basic website. Sometimes I want to just build a set of files I can host somewhere, other times I want to just deploy a Docker image to a cloud service.

For a build system, I’m using npm from NodeJS. For site packaging, I’m using webpack, which also provides a development server with live reload.

The repository is here: https://github.com/d4lton/jools. The name is a play on Jewels, which is a simple game I’ve been working on.

It doesn’t include a UI framework or CSS preprocessor, because those tend to go hand-in-hand, and I didn’t want to predetermine which framework I might want to use (if any).

It also isn’t set up for Typescript, even though I tend to use Typescript these days. I may create a Typescript version of this project, but converting to Typescript is fairly simple. I chose to keep this starter project as basic as I could.

The README.md should have enough instruction to get going. Follow the instructions under INSTALLING and RUNNING.

The structure of the project is hopefully easy to understand:

  • src: This directory (and any sub-directories you create) are where your JavaScript code should be. The src/index.js file is the “main” JavaScript file and will automatically be included in static/index.html.
  • static: This directory (and any sub-directories) are where your HTML and CSS files should be. The static/index.html is the “main” HTML file and will automatically include your JavaScript code.
  • config: This directory contains the configuration JSON file(s). Right now there is just the config.development.json file for local development. Also, it is basically empty. However, any properties you set in here will be “injected” into the static website and be available in the process.env object. For the curious, this is intended to mirror how environment variables are made available in a server-side NodeJS project.

The project doesn’t really do anything, as it is intended to be a starting place for a website. The included HTML, CSS, and JavaScript are all basically placeholders.

Hopefully this project helps you in some way! Good luck!

Parsing CSV and TSV Files

introduction

A very common file format for transferring data is the Comma (or TAB) Separated Value file. These formats appear simple, but have several complications that make implementing a CSV/TSV parser slightly more challenging than just breaking a row of text by its separator.

It’s probably best to start with some definitions before diving into any details. In this file format, which is a text file where each line of text is like a row in a spreadsheet, and each of those rows is made up of columns.

The generally accepted practice is to call the rows Records, and the columns Fields. The first line is typically included, but not required, and contains the Field names.

basic parsing

Fields are separated by their separator character, which is a comma for a CSV and a TAB for a TSV:

id,animal,sound
1,cat,Meow
2,dog,Bark
3,cow,Moo
EMBEDDED COMMAS

This works perfectly fine for a TSV, since the value in a Field probably wouldn’t contain a TAB character. But what about a CSV? It seems reasonable that the value in a Field could have one or more commas.

id,animal,sound,advantage,size
1,cat,Meow,warm, purring, fuzzy,small
2,dog,Bark,loyal, funny, protective,medium
3,cow,Moo,tippable, clears grass,large

There’s no reasonable way to parse the Fields in this file (note that Field names in the first row, if included, are not required to match the Record Fields). The solution is to require double quotes around a Field that contains a comma:

id,animal,sound,advantage,size
1,cat,Meow,"warm, purring, fuzzy",small
2,dog,Bark,"loyal, funny, protective",medium
3,cow,Moo,"tippable, clears grass",large

Now we can parse this file, since the double quotes indicate the beginning and end of a Field. If we see a starting double quote, we can just keep reading until we see a closing double quote.

embedded double quotes

But what about double quotes themselves? It’s certainly common to see double quotes in different situations in a Field. Text in the field could be quoted, or a double quote could be used to mean inches. If we used the above logic of using an opening and closing double quote to know when a Field begins and ends, something like this will cause trouble:

id,name,size
1,shoe,12
2,round clock,8"x8"
3,used laptop,just "like new!"

The answer is to “escape” the double quotes in some way, and the decision was to use double double quotes. However, this causes its own problem, if we’re using double quotes to determine if commas are part of a Field, or if they are the separator between Fields, how can we tell? The answer, again, is to require double quotes around a Field that contains a double quote (or, more accurately now, an escaped double quote):

id,name,size
1,shoe,12
2,round clock,"8""x8"""
3,used laptop,"just ""like new!"""

This use of escaping double quotes, and requiring double quotes around a Field that contains double quotes is common to both CSV and TSV.

embedded newlines or carriage returns

The final major thing to consider when parsing a CSV or TSV is that a Field could contain a newline or carriage return character:

id,name,description
1,shoe,these are fine shoes
2,round clock,features:
2 hands
round, 8"
maple wood
3,used laptop,technical specs:
2048 Neurobit 1098X3 CPU
12 Parsec pseudo-drive
15" screen

Since a newline (or carriage return) should indicate the end of the current Field and also the completion of the current Record, how do we deal with them as part of a Field? I’m sure you’ve guessed that double quotes are involved, and they are:

id,name,description
1,shoe,these are fine shoes
2,round clock,"features:
2 hands
round, 8""
maple wood"
3,used laptop,"technical specs:
2048 Neurobit 1098X3 CPU
12 Parsec pseudo-drive
15"" screen"

Notice that there are also escaped double quotes and commas in these Fields as well. The opening and closing double quotes allow for those in the Field.

other consideration

In general, this covers the major issues faced when parsing a CSV or TSV file. Note that a TSV can also allow for some special escaped characters:

characterescaped value
newline (\n character)\n as two characters
carriage return (\r character)\r as two characters
tab (\t character)\t as two characters
backslash\\ as two characters