The grammar for geolog is largely inspired by fnotation but with some small variations.
It is built up from the following principles.
There are (at least) 6 “name-like” things:
Identifiers start with a letter, and continue with letters, numbers, and -. Identifiers are used for variables, etc. Examples: a, car, Monoid, etc.
Whenever an identifier is lexed, it is checked against a list of reserved keywords, and is returned as a keyword instead of an identifier if it matches. Examples: theory, instance, if
Operators are a contiguous sequence of “operator characters”, e.g. <, >, +, etc.
Keyword operators are to operators as identifiers are to keywords.
Fields are identifiers that start with ., e.g. .a, .b. They are typically used for projecting out of records, e.g. x .a, which is grammatically “x applied to .a”. As a special case, x.a parses in the same way as (x .a), so that f x.a is f (x .a) rather than f x .a.
Tags are identifiers that start with '. They are typically used as constructors for sum types, e.g. 'just 1 : Maybe Nat.
We may introduce more classes of “identifier prefixed with sigil,” for instance ?x or ~x for implicit variables.
Very simple: f x is f applied to x. f x y = (f x) y.
The expression
x : Nat = y + z
can be parsed generically using a precedence table for operators :, =, + into
=
|-:
|-x
|-Nat
+
|-y
|-z
Even though the semantic roles of :, =, + are quite different, it makes sense to just treat them all the same for parsing. This also makes it easier for the user: parsing is resolved using a global precedence table that is the same in all contexts.
The only purpose of parentheses is to get a different parsing than would otherwise be gotten from precedence.
This means that parentheses are always idempotent.
This is fairly simple: [x, y, z] is the raw syntax for a sequence. This might be used for lists, tuples, records ([a : Int, b : String]), etc.
The main deviation from fnotation in the geolog syntax is that we use blocks that are signaled by keywords and have newline-separated elements. For instance:
theory Magma
car : Type
unit : car
mul : car -> car -> car
end
We are still indentation-insensitive (which means that autoindent is workable), but we are newline sensitive.
A block consists of
an opening symbol (e.g. theory, instance, etc.)
a “head” which is an arbitrary expression (e.g. Magma, VectorSpace (F : Field), etc.)
a “body” which is a sequence of newline-delimited expressions
possibly some sub-blocks, e.g. in the case of
if b
x
else
y
end
else would start a subblock.
a block always finishes with end
A declaration is something like
def f x y = x + y
and consists of
def, let)
Declarations basically are are prefix operators with an extremely low precedence.
Combining declarations and blocks allows us to make syntax that looks something like this:
def f x y = do
let z = x + y
let w = 2 * z
z + w
end
Parsing an identifier:
SIdent token.
extend
Note: We should allow symbolic fields. E.g. .+ is a perfectly valid field.
Note: Division can use unicode division (gah, why won’t ormolu parse unicode??); otherwise a// is too confusing. How often do you need to divide in a database?
Note: Do we want kebab case? It’s very aesthetic, but perhaps is_type is just as good as is-type? Subtraction is more common than division. Also, negative numbers. I think that if we want kebab case, we should require all binary operators to have spaces around them, not just -. Which is not the end of the world; this is maybe the right way to do it, and this is how it’s done in Pyret and Agda.
Then symbolic identifiers and alphanumeric identifiers may both contain -; it’s a wild card! When it starts a name, it makes that name symbolic, but it can appear in either alphanumeric or symbolic names.
Note: Should Name be a sumtype of symbolic vs. alphanumeric? If we keep it as is, it’s pretty easy to check by just looking at the first letter, and we can always look up precedence in the table, so I think that keeping name as a newtype wrapper around Symbol is going to be speediest.
Note: Because we can always check if a name is a symbol or not, maybe we should do away with different tokens for alphanums vs symbols? Nah, that’s fine
Note: We should call “symbol” something else, because Symbolize already uses the word “symbol”.