leximaven

Introduction

Leximaven is a powerful tool for searching word-related APIs from the command line. It can fetch acronyms, anagrams, bi-gram phrases, definitions, etymologies, example uses, hyphenation, offensive word flags, portmanteaus, pronunciations (Arpabet & IPA), related words, rhymes, slang, syllable stress and count, and more. See the wiki for more info. Leximaven goes great with iloa.

Platform

I use leximaven on Linux, I have no Mac to test on. Windows testing is planned. Tested on Node:

  • 4.x
  • 5.x
  • 6.x
  • 7.x

Installation

To initialize the config file and load themes, your NODE_PATH environment variable must point to the lib/node_modules directory of the Node.js installation. You can set this path automatically like this:

export NP=$(which node)
export BP=${NP%bin/node} #this replaces the string '/bin/node'
export LP="${BP}lib/node_modules"
export NODE_PATH="$LP"

This should work for a system installation of Node.js and nvm. You’ll also need to get a Wordnik API key and put it in an environment variable WORDNIK. Add all of this to .bashrc, .zshrc, etc.
Then run:

npm install -g leximaven
leximaven config init

Usage

Leximaven has a built-in help system for CLI parameters and options. Access it with leximaven -h|--help [command] [subcommand]. There is also the wiki.

Here are some examples:

// Get definitions for 'catharsis'
leximaven wordnik define catharsis

// Get antonyms for 'noise'
leximaven wordnik relate --canon --type antonym noises

// Pronounce 'quixotic'
leximaven wordnik pronounce quixotic

// Get etymology for 'special'
leximaven wordnik origin special

// Get words that sound like 'blue'
leximaven dmuse get sl=blue

// Get slang/colloquialisms for 'diesel'
leximaven urban diesel

// Get anagrams with at least 2 letters in each word and a maximum of 3 words
// per anagram using short form flags and exporting to JSON
leximaven anagram -n2 -w3 -o anagrams.json toomanysecrets

// Get a wordmap for 'ubiquity'
leximaven map ubiquity

See the tests for more.

Resources

The following links can help you use Leximaven or perform related tasks.

Contributing

See CONTRIBUTING.

Gratitude

Many thanks to all contributors to the libraries used in this project! And thanks to the creators and maintainers of the APIs that this tool consumes. Acronym Server, Datamuse, Onelook, Rhymebrain, Urban Dictionary, Wordnik, and Wordsmith are awesome!

Prose

For fun, read some of my prose

Table of Contents

Acronyms

leximaven command: acronym, acro, ac

Silmaril’s Acronym Server began as an email service in 1987. Since then it has helped people know acronyms in one form or another. See their documentation for the XML API.

Anagrams

leximaven command: anagram, an

Wordsmith’s anagram generator is one of the best. It’s been around forever and it is rock solid. Here’s Wordsmith’s own tips for finding great anagrams, and practical uses of anagrams.

Completion

leximaven command: completion, comp

Bash instructions

Run leximaven comp >> ~/.bashrc, then source ~/.bashrc to enable shell completion for leximaven commands and options.

Zsh instructions

To Zsh read your bash script, add to ~/.zshrc:
autoload bashcompinit
bashcompinit

Now run leximaven comp >> ~/.zshrc and source ~/.zshrc.

Configuration

leximaven command: configuration, config, conf

leximaven subcommands:

  • init - Creates configuration file in home directory (~/.leximaven.noon)
  • get - Gets a value for a given key. Accepts dot notation for nested values.
  • set - Sets a given key to a given value. Accepts dot notation for nested values.

leximaven can get and set values in the configuration file once it’s been initialized. leximaven uses the dotty library to allow dot notation for nested values. See the FAQ for why I chose noon for the configuration file format.

Examples

For single values like verbose you can just do leximaven config set verbose true. For nested properties, use dot notation like so:

1
2
3
4
5
$ leximaven config get wordsmith.lang
Option wordsmith.lang is english.
$ leximaven config set onelook.links true
Set option onelook.links to true.

Saving flags

By default, the merge option is set to true, so CLI flags will be merged with the configuration from file. If you pass -s|–save with this, the config file will be overwritten with the current flags. If merge is set to false, the config file will always be used and –save won’t work. To restore all defaults run leximaven config init --force.

Editing

Refer to the noon syntax guide. When editing the configuration file manually, please remember that empty strings are denoted by two pipes: || and 2 or more spaces separate keys from values.

Datamuse

leximaven command: datamuse, dmuse, dm

leximaven subcommands:

  • get Datamuse query
  • info Datamuse metrics and API usage

Datamuse powers Onelook. It defines a query syntax for conditions which are described below. Follow URL parameter syntax (ml=word+or+phrase&) and join multi-word conditions with+plus+signs. See Datamuse for detailed API info. Requests are limited to 100,000/day.

Hard Constraints

ml - Means like constraint: require that the results have a meaning related to this string value, which can be any word or sequence of words. (This is effectively the reverse dictionary feature of OneLook.)

sl - Sounds like constraint: require that the results are pronounced similarly to this string of characters. (If the string of characters doesn’t have a known pronunciation, the system will make its best guess using a text-to-phonemes algorithm.)

sp - Spelled like constraint: require that the results are spelled similarly to this string of characters, or that they match this wildcard pattern. A pattern can include any combination of alphanumeric characters, spaces, and two reserved characters that represent placeholders — * (which matches any number of characters) and ? (which matches exactly one character).

rel_[code] - Related words constraints: require that each of the resulting words, when paired with the word in this parameter, are in a predefined lexical relation indicated by [code]. Any number of these parameters may be specified any number of times. An assortment of semantic, phonetic, and corpus-statistics-based relations are available.

[code] is a three-letter identifier from the list below.

[code] Description Example
jja Popular nouns modified by the given adjective, per Google Books Ngrams gradual → increase
jjb Popular adjectives used to modify the given noun, per Google Books Ngrams beach → sandy
syn Synonyms (words contained within the same WordNet synset) ocean → sea
ant Antonyms (per WordNet) late → early
spc “Kind of” (direct hypernyms, per WordNet) gondola → boat
gen “More general than” (direct hyponyms, per WordNet) boat → gondola
com “Comprises” (direct holonyms, per WordNet) car → accelerator
par “Part of” (direct meronyms, per WordNet) trunk → tree
bga Frequent followers (w’ such that P(w’:w) ≥ 0.001, per Google Books Ngrams) wreak → havoc
bgb Frequent predecessors (w′ such that P(w:w′) ≥ 0.001, per Google Books Ngrams) havoc → wreak
rhy Rhymes (“perfect” rhymes, per RhymeZone) spade → aid
nry Approximate rhymes (per RhymeZone) forest → chorus
hom Homophones (sound-alike words) course → coarse
cns Consonant match sample → simple

Contextual hints

topics - Topic words: An optional hint to the system about the theme of the document being written. Results will be skewed toward these topics. At most 5 words can be specified. Space or comma delimited. Nouns work best.

lc - Left context: An optional hint to the system about the word that appears immediately to the left of the target word in a sentence. (At this time, only a single word may be specified.)

rc - Right context: An optional hint to the system about the word that appears immediately to the right of the target word in a sentence. (At this time, only a single word may be specified.)

In the above table, the first four parameters (ml, sl, sp, rel_[code]) can be thought of as hard constraints on the result set, while the next three (topic, lc, and rc) can be thought of as context hints. The latter only impact the order in which results are returned. All parameters are optional.

Defaults

  • The flags and their short forms are carefully chosen to avoid conflicts with that specific command’s other flags. Another command may use different short forms for disambiguation. For example, most commands use the short form -l/m for –limit/max. Now compare the anagram command, which has four long options that start with ‘l’, and three that start with ‘m’.
  • The three options that are considered ‘global’ and are always the same are –force (-f), –out (-o), and –save (-s). Output format is determined automatically by the extension of the outfile.
  • leximaven tries to have sensible defaults so that you can get interesting results without having to use flags. Most of the time you can just run the command with a query.
  • Using the -h/–help flag with a command will also list default values for each flag.
  • Mostly CLI defaults are based on the API’s defaults. In a few cases (like the WLMI option for Wordnik’s bi-gram phrases) I based the default on my experimentation with the API.
  • The date section of Datamuse, Onelook, Rhymebrain, and Wordnik is hardcoded defaults used for rate-limiting.
  • Find what works for you and use the configuration system to override the built-in defaults.

Engine

  • chalk, ora, and yargonaut are used to style the terminal output.
  • moment manipulates timestamps for rate limiting.
  • good-guy-http gets and caches HTTP requests.
  • yargs is used for creating the CLI.
  • xml2js is used for XML format, and noon is used for all other formats.
  • x-ray scrapes sites without an exposed JSON or XML API.

Frequently (or Never) Asked Questions

What is leximaven?

leximaven is a command-line tool that fetches information about words and pretty-prints the results in your terminal. It is based on Lyracyst which I released under my other pseudonym weirdpercent.

What is Lyracyst?

Lyracyst is a similar tool written in Ruby and is no longer maintained. I’ve wanted a Javascript version for awhile, and leximaven is the result. It has features never before seen in Lyracyst and is generally more robust.

What is noon?

noon, or ‘nother ordinary object notation, is a human-readable data format created by monsterkodi. Despite noon being very new, I decided to use it for leximaven’s configuration file because it is incredibly terse, even more so than hjson and cson. This is me supporting a young project that I would like to see get wider recognition. It also makes it dead simple to load and save cson, json, plist, and yaml.

Why Node.js?

I’ve been wanting to learn Javascript and Node for a long time, and porting a codebase I know very well seemed a good way to do that. Though Ruby is cross-platform, I don’t like the headache of trying to support multiple implementations (1.8 vs 1.9, JRuby, Rubinius, Macruby, etc). Node just seems to do cross-platform better. It also makes building some kind of web interface in the future infinitely easier.

Why ES6(2015)?

Because you can take the parts you like and leave the rest. Tools like Babel make it really easy to do this.

Why create a tool like this?

I am a leximaven, a lover of words. I do a lot of writing and I wanted a tool for constructing prose that rocks.

Features

  • Extensible
  • Maps of word info
  • Configuration file in your home directory called .leximaven.noon
  • Access configuration settings by passing commands, setting flags, or just editing the configuration file manually
  • Sensible command line defaults and aliases
  • XML parsing and building with xml2js
  • In-memory caching and other neat features handled by good-guy-http, which is based on request
  • Scrape websites with x-ray
  • Save data to CSON, JSON, noon, Plist, XML, and YAML
  • Acronyms and Dewey Decimal Classification codes from Acronym Server
  • Many kinds of related words based on constraints from Datamuse
  • Definitions, phrases, related words, and resource links from Onelook
  • Rhymes, word info, and portmanteaus from Rhymebrain
  • Definitions from Urban Dictionary
  • Definitions, examples, related words, pronunciations, hyphenation, phrases, and etymologies from Wordnik
  • Anagrams from Wordsmith
  • Automatic rate-limiting for Datamuse, Onelook, Rhymebrain, and Wordnik

New APIs

Adding new APIs to leximaven is pretty straightforward. Look at the docs for yargs to understand the module format. Urban Dictionary has the simplest API, so it’s a good example of how to implement a new module.

Each module registers a command with yargs. The command string defines the command and arguments. The builder object defines our options. The handler function contains our command logic.

After configuration checks and theme loading, we assemble the URL from the parsed arguments, argv. good-guy-http fetches the JSON response. For each piece of data that is printed in the console, the same data is added to the tofile object. If --out is specified, this object is passed to tools.outFile which is then encoded and saved. Finally, if --save is specified, options are saved to configuration.

As a more complete example, here is the Datamuse module:

datamuse.js sets our subdirectory and loads each module in this folder as a subcommand of datamuse.

get.js is the API query command. After the config is loaded, the timestamp is checked for rate limiting. If the limit hasn’t been reached, the proceed check runs the command logic.

info.js displays Datamuse metrics and API usage.

Onelook

leximaven command: onelook, one, ol

Onelook offers an XML API for basic search features. Onelook is good for definitions and quick searches, use Datamuse (which powers Onelook) when you need more control. Requests are limited to 10,000/day.

leximaven fetches definitions, phrases, related words, and optionally resource links (verbose).

Rate limiting

Rate limits are as follows:

  • Datamuse - 100,000/day
  • Onelook - 10,000/day
  • Rhymebrain - 350/hour
  • Wordnik - 15,000/hour

For these services, leximaven automatically handles these limits through the hardcoded date section of the config file. Timestamps are initialized during config init. In addition to notifying you of the remaining requests and each reset, an error will be thrown if:

  • Attempting to set these values with config set
  • Using a command when limit has been reached

You can turn off the per-command countdown of remaining requests by setting the usage option to false.

Please read and follow the terms of service for these APIs, they are so awesome and useful and we’re all really lucky to have them. The rates for free access are generous. For those services that don’t have specific rate limits respect and preserve them for all to share.

Rhymebrain

leximaven command: rhymebrain, rbrain, rb

leximaven subcommands:

  • combine, comb, portmanteau (combine/portmanteau)
  • info (word info)
  • rhyme, rh (rhymes)

leximaven consumes the Rhymebrain.com JSON API to fetch rhymes, word info, and portmanteaus. ISO 639-1 Language Codes are needed for Rhymebrain functions. Requests are limited to 350/hour.

The offensive word flag is not set for some words that are quite widely considered offensive (bitch, shit, etc.). This flag is colored red, the other two are white.

Themes

leximaven command: list, ls, themes

leximaven has a theme system that is accessible through the configuration file setting theme and the themes directory. Theme files are in noon format and use dot notation to store the chalk styles.

When printing data to the console, themes follow this order: prefix->label->suffix->connector->content. The label function also takes a direction, ‘right’ or ‘down’. ‘Right’ prints everything on one line, whereas ‘down’ prints the connector and content on the line below.

Urban Dictionary

leximaven command: urban, urb, slang

leximaven consumes Urban Dictionary‘s unofficial JSON API to provide slang and colloquialisms. Because it’s an urban dictionary it should go without saying, but this site is frequently vulgar.

Wordmap

leximaven command: wordmap, map, wm

Printing the output of each leximaven command in succession is a feature I call wordmaps. The command leximaven wordmap ubiquity will get acronyms, anagrams, definitions, etymologies, examples, hyphenation, info, portmanteaus, pronunciations, related words, rhymes, and urban definitions for the word “ubiquity” all at once. For simplicity’s sake, the wordmap command does not serialize the data to file. If you want your results serialized, make a simple shell script like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
mkdir -p noon
leximaven wordnik define -o noon/define.noon $1
leximaven wordnik example -o noon/example.noon $1
leximaven wordnik hyphen -o noon/hyphen.noon $1
leximaven wordnik origin -o noon/origin.noon $1
leximaven wordnik phrase -o noon/phrase.noon $1
leximaven wordnik pronounce -o noon/pronounce.noon $1
leximaven wordnik relate -o noon/relate.noon $1
leximaven rhymebrain combine -o noon/combine.noon $1
leximaven rhymebrain info -o noon/info.noon $1
leximaven rhymebrain rhyme -o noon/rhyme.noon $1
leximaven acronym -o noon/acronym.noon $1
leximaven anagram -o noon/anagram.noon $1
leximaven datamuse -o noon/dmuse.noon ml=$1
leximaven onelook -o noon/look.noon $1
leximaven urban -o noon/urban.noon $1

Then run it like this: sh noon.sh ubiquity

Wordnik

leximaven command: wordnik, wnik, wn

leximaven subcommands:

  • define, def (definitions)
  • example, ex (examples)
  • hyphen, hyphenate, hy (hyphenation)
  • origin, or, etymology (origin/etymology)
  • phrase, ph, ngram (phrases)
  • pronounce, pr (pronunciations)
  • relate, related, rel (related words)

Please refer to CLI documentation or Wordnik’s developer docs for details. Wordnik is used to fetch definitions, examples, related words, pronunciations, hyphenation, phrases, and etymologies. These commands will not work without an API key in environment variable WORDNIK. Requests are limited to 15,000/hour.