leximaven
Introduction
Leximaven is a powerful tool for searching word-related APIs from the command line. It can fetch acronyms, anagrams, bi-gram phrases, definitions, etymologies, example uses, hyphenation, offensive word flags, portmanteaus, pronunciations (Arpabet & IPA), related words, rhymes, slang, syllable stress and count, and more. See the wiki for more info. Leximaven goes great with iloa.
Platform
I use leximaven on Linux, I have no Mac to test on. Windows testing is planned. Tested on Node:
- 4.x
- 5.x
- 6.x
- 7.x
Installation
To initialize the config file and load themes, your NODE_PATH environment variable must point to the lib/node_modules directory of the Node.js installation. You can set this path automatically like this:
export NP=$(which node)
export BP=${NP%bin/node} #this replaces the string '/bin/node'
export LP="${BP}lib/node_modules"
export NODE_PATH="$LP"
This should work for a system installation of Node.js and nvm. You’ll also need to get a Wordnik API key and put it in an environment variable WORDNIK. Add all of this to .bashrc, .zshrc, etc.
Then run:
npm install -g leximaven
leximaven config init
Usage
Leximaven has a built-in help system for CLI parameters and options. Access it with leximaven -h|--help [command] [subcommand]
. There is also the wiki.
Here are some examples:
// Get definitions for 'catharsis'
leximaven wordnik define catharsis
// Get antonyms for 'noise'
leximaven wordnik relate --canon --type antonym noises
// Pronounce 'quixotic'
leximaven wordnik pronounce quixotic
// Get etymology for 'special'
leximaven wordnik origin special
// Get words that sound like 'blue'
leximaven dmuse get sl=blue
// Get slang/colloquialisms for 'diesel'
leximaven urban diesel
// Get anagrams with at least 2 letters in each word and a maximum of 3 words
// per anagram using short form flags and exporting to JSON
leximaven anagram -n2 -w3 -o anagrams.json toomanysecrets
// Get a wordmap for 'ubiquity'
leximaven map ubiquity
See the tests for more.
Resources
The following links can help you use Leximaven or perform related tasks.
- alex Checks your writing for words or phrasings that might offend someone
- proselint checks your writing style and has plugins for multiple editors
- retext is a framework for natural language processing
- write-good Naive linter for English prose for developers who can’t write good and wanna learn to do other stuff good too
- ISO 639-1 Language Codes for Rhymebrain functions
- Arpabet phoneme list and IPA equivalents
- Dewey Decimal Classes for acronyms
- Browse Datamuse’s Onelook dictionaries, use its dictionary lookup, thesaurus/reverse lookup, and RhymeZone
Contributing
See CONTRIBUTING.
Gratitude
Many thanks to all contributors to the libraries used in this project! And thanks to the creators and maintainers of the APIs that this tool consumes. Acronym Server, Datamuse, Onelook, Rhymebrain, Urban Dictionary, Wordnik, and Wordsmith are awesome!
Prose
For fun, read some of my prose…
Table of Contents
- Project Status
- Introduction
- Installation
- Usage
- Resources
- Contributing
- Gratitude
- Extras
- Acronyms
- Anagrams
- Completion
- Configuration
- Datamuse
- Defaults
- Engine
- Frequently (or Never) Asked Questions
- Features
- New APIs
- Onelook
- Rate limiting
- Rhymebrain
- Themes
- Urban Dictionary
- Wordmap
- Wordnik
Acronyms
leximaven command: acronym, acro, ac
Silmaril’s Acronym Server began as an email service in 1987. Since then it has helped people know acronyms in one form or another. See their documentation for the XML API.
Anagrams
leximaven command: anagram, an
Wordsmith’s anagram generator is one of the best. It’s been around forever and it is rock solid. Here’s Wordsmith’s own tips for finding great anagrams, and practical uses of anagrams.
Completion
leximaven command: completion, comp
Bash instructions
Run leximaven comp >> ~/.bashrc
, then source ~/.bashrc
to enable shell completion for leximaven commands and options.
Zsh instructions
To Zsh read your bash script, add to ~/.zshrc
:autoload bashcompinit
bashcompinit
Now run leximaven comp >> ~/.zshrc
and source ~/.zshrc
.
Configuration
leximaven command: configuration, config, conf
leximaven subcommands:
- init - Creates configuration file in home directory (~/.leximaven.noon)
- get - Gets a value for a given key. Accepts dot notation for nested values.
- set - Sets a given key to a given value. Accepts dot notation for nested values.
leximaven can get and set values in the configuration file once it’s been initialized. leximaven uses the dotty library to allow dot notation for nested values. See the FAQ for why I chose noon for the configuration file format.
Examples
For single values like verbose
you can just do leximaven config set verbose true
. For nested properties, use dot notation like so:
|
|
Saving flags
By default, the merge option is set to true, so CLI flags will be merged with the configuration from file. If you pass -s|–save with this, the config file will be overwritten with the current flags. If merge is set to false, the config file will always be used and –save won’t work. To restore all defaults run leximaven config init --force
.
Editing
Refer to the noon syntax guide. When editing the configuration file manually, please remember that empty strings are denoted by two pipes: || and 2 or more spaces separate keys from values.
Datamuse
leximaven command: datamuse, dmuse, dm
leximaven subcommands:
- get Datamuse query
- info Datamuse metrics and API usage
Datamuse powers Onelook. It defines a query syntax for conditions which are described below. Follow URL parameter syntax (ml=word+or+phrase&) and join multi-word conditions with+plus+signs. See Datamuse for detailed API info. Requests are limited to 100,000/day.
Hard Constraints
ml - Means like constraint: require that the results have a meaning related to this string value, which can be any word or sequence of words. (This is effectively the reverse dictionary feature of OneLook.)
sl - Sounds like constraint: require that the results are pronounced similarly to this string of characters. (If the string of characters doesn’t have a known pronunciation, the system will make its best guess using a text-to-phonemes algorithm.)
sp - Spelled like constraint: require that the results are spelled similarly to this string of characters, or that they match this wildcard pattern. A pattern can include any combination of alphanumeric characters, spaces, and two reserved characters that represent placeholders — * (which matches any number of characters) and ? (which matches exactly one character).
rel_[code] - Related words constraints: require that each of the resulting words, when paired with the word in this parameter, are in a predefined lexical relation indicated by [code]. Any number of these parameters may be specified any number of times. An assortment of semantic, phonetic, and corpus-statistics-based relations are available.
[code] is a three-letter identifier from the list below.
[code] | Description | Example |
---|---|---|
jja | Popular nouns modified by the given adjective, per Google Books Ngrams | gradual → increase |
jjb | Popular adjectives used to modify the given noun, per Google Books Ngrams | beach → sandy |
syn | Synonyms (words contained within the same WordNet synset) | ocean → sea |
ant | Antonyms (per WordNet) | late → early |
spc | “Kind of” (direct hypernyms, per WordNet) | gondola → boat |
gen | “More general than” (direct hyponyms, per WordNet) | boat → gondola |
com | “Comprises” (direct holonyms, per WordNet) | car → accelerator |
par | “Part of” (direct meronyms, per WordNet) | trunk → tree |
bga | Frequent followers (w’ such that P(w’:w) ≥ 0.001, per Google Books Ngrams) | wreak → havoc |
bgb | Frequent predecessors (w′ such that P(w:w′) ≥ 0.001, per Google Books Ngrams) | havoc → wreak |
rhy | Rhymes (“perfect” rhymes, per RhymeZone) | spade → aid |
nry | Approximate rhymes (per RhymeZone) | forest → chorus |
hom | Homophones (sound-alike words) | course → coarse |
cns | Consonant match | sample → simple |
Contextual hints
topics - Topic words: An optional hint to the system about the theme of the document being written. Results will be skewed toward these topics. At most 5 words can be specified. Space or comma delimited. Nouns work best.
lc - Left context: An optional hint to the system about the word that appears immediately to the left of the target word in a sentence. (At this time, only a single word may be specified.)
rc - Right context: An optional hint to the system about the word that appears immediately to the right of the target word in a sentence. (At this time, only a single word may be specified.)
In the above table, the first four parameters (ml, sl, sp, rel_[code]) can be thought of as hard constraints on the result set, while the next three (topic, lc, and rc) can be thought of as context hints. The latter only impact the order in which results are returned. All parameters are optional.
Defaults
- The flags and their short forms are carefully chosen to avoid conflicts with that specific command’s other flags. Another command may use different short forms for disambiguation. For example, most commands use the short form -l/m for –limit/max. Now compare the anagram command, which has four long options that start with ‘l’, and three that start with ‘m’.
- The three options that are considered ‘global’ and are always the same are –force (-f), –out (-o), and –save (-s). Output format is determined automatically by the extension of the outfile.
- leximaven tries to have sensible defaults so that you can get interesting results without having to use flags. Most of the time you can just run the command with a query.
- Using the -h/–help flag with a command will also list default values for each flag.
- Mostly CLI defaults are based on the API’s defaults. In a few cases (like the WLMI option for Wordnik’s bi-gram phrases) I based the default on my experimentation with the API.
- The date section of Datamuse, Onelook, Rhymebrain, and Wordnik is hardcoded defaults used for rate-limiting.
- Find what works for you and use the configuration system to override the built-in defaults.
Engine
- chalk, ora, and yargonaut are used to style the terminal output.
- moment manipulates timestamps for rate limiting.
- good-guy-http gets and caches HTTP requests.
- yargs is used for creating the CLI.
- xml2js is used for XML format, and noon is used for all other formats.
- x-ray scrapes sites without an exposed JSON or XML API.
Frequently (or Never) Asked Questions
What is leximaven?
leximaven is a command-line tool that fetches information about words and pretty-prints the results in your terminal. It is based on Lyracyst which I released under my other pseudonym weirdpercent.
What is Lyracyst?
Lyracyst is a similar tool written in Ruby and is no longer maintained. I’ve wanted a Javascript version for awhile, and leximaven is the result. It has features never before seen in Lyracyst and is generally more robust.
What is noon?
noon, or ‘nother ordinary object notation, is a human-readable data format created by monsterkodi. Despite noon being very new, I decided to use it for leximaven’s configuration file because it is incredibly terse, even more so than hjson and cson. This is me supporting a young project that I would like to see get wider recognition. It also makes it dead simple to load and save cson, json, plist, and yaml.
Why Node.js?
I’ve been wanting to learn Javascript and Node for a long time, and porting a codebase I know very well seemed a good way to do that. Though Ruby is cross-platform, I don’t like the headache of trying to support multiple implementations (1.8 vs 1.9, JRuby, Rubinius, Macruby, etc). Node just seems to do cross-platform better. It also makes building some kind of web interface in the future infinitely easier.
Why ES6(2015)?
Because you can take the parts you like and leave the rest. Tools like Babel make it really easy to do this.
Why create a tool like this?
I am a leximaven, a lover of words. I do a lot of writing and I wanted a tool for constructing prose that rocks.
Features
- Extensible
- Maps of word info
- Configuration file in your home directory called
.leximaven.noon
- Access configuration settings by passing commands, setting flags, or just editing the configuration file manually
- Sensible command line defaults and aliases
- XML parsing and building with xml2js
- In-memory caching and other neat features handled by good-guy-http, which is based on request
- Scrape websites with x-ray
- Save data to CSON, JSON, noon, Plist, XML, and YAML
- Acronyms and Dewey Decimal Classification codes from Acronym Server
- Many kinds of related words based on constraints from Datamuse
- Definitions, phrases, related words, and resource links from Onelook
- Rhymes, word info, and portmanteaus from Rhymebrain
- Definitions from Urban Dictionary
- Definitions, examples, related words, pronunciations, hyphenation, phrases, and etymologies from Wordnik
- Anagrams from Wordsmith
- Automatic rate-limiting for Datamuse, Onelook, Rhymebrain, and Wordnik
New APIs
Adding new APIs to leximaven is pretty straightforward. Look at the docs for yargs to understand the module format. Urban Dictionary has the simplest API, so it’s a good example of how to implement a new module.
Each module registers a command with yargs. The command string defines the command and arguments. The builder object defines our options. The handler function contains our command logic.
After configuration checks and theme loading, we assemble the URL from the parsed arguments, argv
. good-guy-http
fetches the JSON response. For each piece of data that is printed in the console, the same data is added to the tofile
object. If --out
is specified, this object is passed to tools.outFile
which is then encoded and saved. Finally, if --save
is specified, options are saved to configuration.
As a more complete example, here is the Datamuse module:
datamuse.js sets our subdirectory and loads each module in this folder as a subcommand of datamuse.
get.js is the API query command. After the config is loaded, the timestamp is checked for rate limiting. If the limit hasn’t been reached, the proceed check runs the command logic.
info.js displays Datamuse metrics and API usage.
Onelook
leximaven command: onelook, one, ol
Onelook offers an XML API for basic search features. Onelook is good for definitions and quick searches, use Datamuse (which powers Onelook) when you need more control. Requests are limited to 10,000/day.
leximaven fetches definitions, phrases, related words, and optionally resource links (verbose).
Rate limiting
Rate limits are as follows:
- Datamuse - 100,000/day
- Onelook - 10,000/day
- Rhymebrain - 350/hour
- Wordnik - 15,000/hour
For these services, leximaven automatically handles these limits through the hardcoded date section of the config file. Timestamps are initialized during config init. In addition to notifying you of the remaining requests and each reset, an error will be thrown if:
- Attempting to set these values with config set
- Using a command when limit has been reached
You can turn off the per-command countdown of remaining requests by setting the usage option to false.
Please read and follow the terms of service for these APIs, they are so awesome and useful and we’re all really lucky to have them. The rates for free access are generous. For those services that don’t have specific rate limits respect and preserve them for all to share.
Rhymebrain
leximaven command: rhymebrain, rbrain, rb
leximaven subcommands:
- combine, comb, portmanteau (combine/portmanteau)
- info (word info)
- rhyme, rh (rhymes)
leximaven consumes the Rhymebrain.com JSON API to fetch rhymes, word info, and portmanteaus. ISO 639-1 Language Codes are needed for Rhymebrain functions. Requests are limited to 350/hour.
The offensive word flag is not set for some words that are quite widely considered offensive (bitch, shit, etc.). This flag is colored red, the other two are white.
Themes
leximaven command: list, ls, themes
leximaven has a theme system that is accessible through the configuration file setting theme
and the themes
directory. Theme files are in noon format and use dot notation to store the chalk styles.
When printing data to the console, themes follow this order: prefix->label->suffix->connector->content
. The label function also takes a direction, ‘right’ or ‘down’. ‘Right’ prints everything on one line, whereas ‘down’ prints the connector and content on the line below.
Urban Dictionary
leximaven command: urban, urb, slang
leximaven consumes Urban Dictionary‘s unofficial JSON API to provide slang and colloquialisms. Because it’s an urban dictionary it should go without saying, but this site is frequently vulgar.
Wordmap
leximaven command: wordmap, map, wm
Printing the output of each leximaven command in succession is a feature I call wordmaps. The command leximaven wordmap ubiquity
will get acronyms, anagrams, definitions, etymologies, examples, hyphenation, info, portmanteaus, pronunciations, related words, rhymes, and urban definitions for the word “ubiquity” all at once. For simplicity’s sake, the wordmap command does not serialize the data to file. If you want your results serialized, make a simple shell script like the following:
|
|
Then run it like this: sh noon.sh ubiquity
Wordnik
leximaven command: wordnik, wnik, wn
leximaven subcommands:
- define, def (definitions)
- example, ex (examples)
- hyphen, hyphenate, hy (hyphenation)
- origin, or, etymology (origin/etymology)
- phrase, ph, ngram (phrases)
- pronounce, pr (pronunciations)
- relate, related, rel (related words)
Please refer to CLI documentation or Wordnik’s developer docs for details. Wordnik is used to fetch definitions, examples, related words, pronunciations, hyphenation, phrases, and etymologies. These commands will not work without an API key in environment variable WORDNIK. Requests are limited to 15,000/hour.