Parsing command line options

Posted by Marcus Folkesson on Wednesday, January 29, 2020

Parsing command line options

Parsing command line options is something almost every command or applications needs to handle in some way, and there is too many home-made argument parsers out there. As so many programs needs to parse options from the command line, this facility is encapsulated in a standard library function getopt(2).

The GNU C library provides an even more sophisticated API for parsing the command line, argp(), and is described in the glibc manual [1]. However, this function is not portable.

There is also many libraries that provides such facilities, but lets keep us to what the glibc library provides.

Command line options

A typical UNIX command takes options in the following form

command [option] arguments

The options has the form of a hyphen (-) followed by a unique character and a possible argument. If the options take an argument, it may be separated from that argument by a white space. When multiple options is specified, those can be grouped after a single hyphen, and the last option in the group may be the only one that takes an argument.

Example on single option

ls -l

Example on grouped options

ls -lI *hidden* .

In the example above, the -l (long listing format) does not takes an argument, but -I (Ignore) takes *hidden* as argument.

Long options

It's not unusual that a command allows both a short (-I) and a long (--ignore) option syntax. A long option begins with two hyphens, and the option itself is identified using a word. If the options take an argument, it may be separated from that argument by a =.

To parse such options, use the getopt_long(2) glibc function, or the (non portable) argp().

Example using getopt_long()

getopt_long() is quite simple to use. First we create a struct option and defines the following elements: * name is the name of the long option.

  • has_arg
    is: no_argument (or 0) if the option does not take an argu‐ ment; required_argument (or 1) if the option requires an argu‐ ment; or optional_argument (or 2) if the option takes an optional argument.
  • flag
    specifies how results are returned for a long option. If flag is NULL, then getopt_long() returns val. (For example, the calling program may set val to the equivalent short option character.) Otherwise, getopt_long() returns 0, and flag points to a variable which is set to val if the option is found, but left unchanged if the option is not found.
  • val
    is the value to return, or to load into the variable pointed
    to by flag.

The last element of the array has to be filled with zeros,

The next step is to iterate through all options and take care of the arguments.

Example code

Example code

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <getopt.h>

struct arguments
{
    int a;
    int b;
    int c;
    int area;
    int perimeter;
};


void print_usage() {
    printf("Usage: triangle [Ap] -a num -b num -c num\n");
}

int main(int argc, char *argv[]) {
    int opt= 0;
    struct arguments arguments;

    /* Default values. */
    arguments.a = -1;
    arguments.b = -1;
    arguments.c = -1;
    arguments.area = 0;
    arguments.perimeter = 0;


    static struct option long_options[] = {
        {"area",      no_argument,       0,  'A' },
        {"perimeter", no_argument,       0,  'p' },
        {"hypotenuse",required_argument, 0,  'c' },
        {"opposite",  required_argument, 0,  'a' },
        {"adjecent",  required_argument, 0,  'b' },
        {0,           0,                 0,  0   }
    };

    int long_index =0;
    while ((opt = getopt_long(argc, argv,"Apa:b:c:",
                   long_options, &long_index )) != -1) {
        switch (opt) {
             case 'A':
                 arguments.area = 1;
                 break;
             case 'p':
                arguments.perimeter = 1;
                 break;
             case 'a':
                 arguments.a = atoi(optarg);
                 break;
             case 'b':
                 arguments.b = atoi(optarg);
                 break;
             case 'c':
                 arguments.c = atoi(optarg);
                 break;
             default: print_usage();
                 exit(EXIT_FAILURE);
        }
    }


    if (arguments.a == -1 || arguments.b == -1 || arguments.c == -1) {
        print_usage();
        exit(EXIT_FAILURE);
    }

    if (arguments.area) {
        arguments.area = (arguments.a*arguments.b)/2;
        printf("Area: %d\n",arguments.area);
    }

    if (arguments.perimeter) {
        arguments.perimeter = arguments.a + arguments.b + arguments.c;
        printf("Perimeter: %d\n",arguments.perimeter);
    }

    return 0;
}

Example of usages

Full example with short options

[13:49:00]marcus@little:~/tmp/cmdline$ ./getopt  -Ap -a 3 -b 4 -c 5
Area: 6
Perimeter: 12

Missing -c option

[14:07:37]marcus@little:~/tmp/cmdline$ ./getopt  -Ap -a 3 -b 4
Usage: triangle [Ap] -a num -b num -c num

Full example with long options

[14:09:38]marcus@little:~/tmp/cmdline$ ./getopt  --area --perimeter --opposite 3 --adjecent 4 --hypotenuse 5
Area: 6
Perimeter: 12

Invalid options

[14:10:14]marcus@little:~/tmp/cmdline$ ./getopt  --area --perimeter --opposite 3 --adjecent 4 -j=3
./getopt: invalid option -- 'j'
Usage: triangle [Ap] -a num -b num -c num

Full example with mixed syntaxes

[14:09:38]marcus@little:~/tmp/cmdline$ ./getopt  -A --perimeter --opposite=3 -b4 -c 5
Area: 6
Perimeter: 12

Variants

getopt_long_only() is like getopt_long(), but '-' as well as "--" can indicate a long option. If an option that starts with '-' (not "--") doesn't match a long option, but does match a short option, it's parsed as a short option instead.

Example using argp()

argp() is a more flexible and powerful than getopt() with friends, but it's not part of the POSIX standard and is therefor not portable between different POSIX-compatible operating systems. However, argp() provides a few interesting features that getopt() does not.

These features include automatically producing output in response to the ‘--help’ and ‘--version’ options, as described in the GNU coding standards. Using argp makes it less likely that programmers will neglect to implement these additional options or keep them up to date.

The implementation is pretty much straight forwards and similar to getopt() with a few notes.

const char *argp_program_version = Triangle 1.0";
const char *argp_program_bug_address = "<marcus.folkesson@combitech.se>";

Is used in automatic generation for the --help and --version options.

struct argp_option

This structure specifies a single option that an argp parser understands, as well as how to parse and document that option. It has the following fields:

  • const char *name
    The long name for this option, corresponding to the long option --name; this field may be zero if this option only has a short name. To specify multiple names for an option, additional entries may follow this one, with the OPTION_ALIAS flag set. See Argp Option Flags.
  • int key
    The integer key provided by the current option to the option parser. If key has a value that is a printable ASCII character (i.e., isascii (key) is true), it also specifies a short option ‘-char’, where char is the ASCII character with the code key.
  • const char *arg
    If non-zero, this is the name of an argument associated with this option, which must be provided (e.g., with the --name=value or -char value syntaxes), unless the OPTION_ARG_OPTIONAL flag (see Argp Option Flags) is set, in which case it may be provided.
  • int flags
    Flags associated with this option, some of which are referred to above. See Argp Option Flags.
  • const char *doc
    A documentation string for this option, for printing in help messages.

If both the name and key fields are zero, this string will be printed tabbed left from the normal option column, making it useful as a group header. This will be the first thing printed in its group. In this usage, it’s conventional to end the string with a : character.

Example code

Example code with little more comments

#include <stdlib.h>
#include <argp.h>

const char *argp_program_version = "Triangle 1.0";
const char *argp_program_bug_address = "<marcus.folkesson@combitech.se>";

/* Program documentation. */
static char doc[] = "Triangle example";

/* A description of the arguments we accept. */
static char args_doc[] = "ARG1 ARG2";

/* The options we understand. */
static struct argp_option options[] = {
    {"area",        'A',    0,  0,  "Calculate area"},
    {"perimeter",   'p',    0,  0,  "Calculate perimeter"},
    {"hypotenuse",  'c',    "VALUE",  0,  "Specify hypotenuse of the triangle"},
    {"opposite",    'b',    "VALUE",  0,  "Specify opposite of the triangle"},
    {"adjecent",    'a',    "VALUE",  0,  "Specify adjecent of the triangle"},
    { 0 }
};

/* Used by main to communicate with parse_opt. */
struct arguments
{
    int a;
    int b;
    int c;
    int area;
    int perimeter;
};

/* Parse a single option. */
static error_t parse_opt (int key, char *arg, struct argp_state *state)
{
    struct arguments *arguments = (struct arguments*)state->input;

    switch (key) {
        case 'a':
            arguments->a = atoi(arg);
            break;
        case 'b':
            arguments->b = atoi(arg);
            break;
        case 'c':
            arguments->c = atoi(arg);
            break;
        case 'p':
            arguments->perimeter = 1;
            break;
        case 'A':
            arguments->area = 1;
            break;

        default:
            return ARGP_ERR_UNKNOWN;
    }
    return 0;
}

/* Our argp parser. */
static struct argp argp = { options, parse_opt, args_doc, doc };

int
main (int argc, char **argv)
{
    struct arguments arguments;

    /* Default values. */
    arguments.a = -1;
    arguments.b = -1;
    arguments.c = -1;
    arguments.area = 0;
    arguments.perimeter = 0;

    /* Parse our arguments; every option seen by parse_opt will
     *      be reflected in arguments. */
    argp_parse (&argp, argc, argv, 0, 0, &arguments);


    if (arguments.a == -1 || arguments.b == -1 || arguments.c == -1) {
        exit(EXIT_FAILURE);
    }

    if (arguments.area) {
        arguments.area = (arguments.a*arguments.b)/2;
        printf("Area: %d\n",arguments.area);
    }

    if (arguments.perimeter) {
        arguments.perimeter = arguments.a + arguments.b + arguments.c;
        printf("Perimeter: %d\n",arguments.perimeter);
    }

    return EXIT_SUCCESS;
}

Example of usages

This application gives the same output as the getopt() usage, with the following extra features:

The options --help, --usage and --version is automaically generated

[15:53:04]marcus@little:~/tmp/cmdline$ ./argp --help
Usage: argp [OPTION...] ARG1 ARG2
Triangle example

  -a, --adjecent=VALUE       Specify adjecent of the triangle
  -A, --area                 Calculate area
  -b, --opposite=VALUE       Specify opposite of the triangle
  -c, --hypotenuse=VALUE     Specify hypotenuse of the triangle
  -p, --perimeter            Calculate perimeter
  -?, --help                 Give this help list
      --usage                Give a short usage message
  -V, --version              Print program version

Mandatory or optional arguments to long options are also mandatory or optional
for any corresponding short options.

Report bugs to <marcus.folkesson@combitech.se>.

Version information

[15:53:08]marcus@little:~/tmp/cmdline$ ./argp --version
Triangle 1.0

Conclusion

Parsing command line options is simple. argp() provides a log of features that I really appreciate.

When portability is no issue, I always go for argp() as, besides the extra features, the interface is more appealing.