2.3 Simplifying Command Line Parsing with Genparse

Genparse automatically creates command line parsing code, not unlike the code in mycopy2.c. It creates two or three files: a header file, a parsing file, and an optional callback file. In this section, we’ll write a Genparse specification for our file copying program and examine the parser that it creates.

Genparse runs on a simple input file, which we’ll call a Genparse file. In a Genparse file, each command line option is specified on one or more lines. The following code is a Genparse file for mycopy3.

/* */
i / iterations 	int	1	[0...]	"Number of times to output <file>."
					"File should be text format!"
o / outfile	string	{""}		"Output file name."

usage: __PROGRAM_NAME__ __OPTIONS_SHORT__ file
Print a file for a number of times to stdout.

The naming of this file follows the convention of all Genparse files ending in the extension .gp.

Let’s walk through The first line is a comment. It is ignored by Genparse.

The second line is the first option specification. This is the -i option, which, as before, may be specified in long form as --iterations. Our file indicates that -i must take an integer parameter, the default value of which is 1. The allowed range is non-negative. The final part of the third line is a description of the option’s usage (to appear in the usage () function).

The third line introduces a new option, that was not in mycopy2. The -o option takes a string parameter (where a "string" is any series of characters) with a default value of empty. As indicated by the description, this option is used to specify an output file to which mycopy3’s output will be directed.

Default values for strings must be specified within braces and quotes like {"This is a stupid comment"}, for chars it must be enclosed in single quotes, e.g. ’a’ or ’\0x13’. For other integers use the plain default value.

Starting in the third line the help screen is defined which will be printed if -h or --help is set or if an invalid command line is given. __PROGRAM_NAME__ will be replaced with the name of the executable (probably mycopy3), __OPTIONS_SHORT__ will be replaced with a list of allowed short options ("[ -iohv ]" in this example). "Print a file for a number of times to stdout." will be printed verbatim. __GLOSSARY__ will be replaced with a list explanations for each of the command line parameters. For more explanations on the #usage section See Usage Function. mycopy3 --help will print the following help screen:

usage: mycopy3 [ -iohv ] file
Print file for a number of times to stdout.
   [ -i ] [ --iterations ] (type=INTEGER, range=0..., default=1)
          Number of times to output <file>.
          File should be text format!
   [ -o ] [ --outfile ] (type=STRING)
          Output file name.
   [ -h ] [ --help ] (type=FLAG)
          Display this help and exit.
   [ -v ] [ --version ] (type=FLAG)
          Output version information and exit.

Genparse can be invoked on in a number of ways (Yes, Genparse has its own command line options! See Genparse Options.), but we’ll invoke it as follows.

genparse -o mycopy3_clp

This command tells Genparse to run on and to output program files named with mycopy3_clp. This particular Genparse file creates only a header file and a parser file since no callbacks are specified. Let’s first take a look at the header file, mycopy3_clp.h.

2.3.1 Header Files

Below, we walk through mycopy3_clp.h, an example header file created by Genparse. Header files, such as this one, must be included in all linked code that needs to access the command line parameter values.

/* mycopy3_clp.h */

#include <stdio.h>

#ifndef bool
typedef enum bool_t
  false = 0, true
} bool;

/* customized structure for command line parameters */
struct arg_t
  int i;
  char * o;
  bool h;
  bool v;
  int optind;

/* function prototypes */
void Cmdline (struct arg_t *my_args, int argc, char *argv[]);
void usage (int status, char *program_name);

Although "real" Genparse output files begin with a section of comments, for purposes of saving space, we’ll replace all of those with a short comment containing only the file’s name.

A Genparse-created header file contains four major sections: (1) includes and type definitions, (2) the definition of struct arg_t, (3) parsing function prototypes, and (4) callback function prototypes. Since we are not using callbacks in this example, only the first three sections appear in this header file.

The file begins with a list of header files to include. Including stdio.h is the default, and other includes may be specified in the Genparse file. Then the bool type is conditionally defined. While bool is typically predefined in C++, it is not in C. It comes in handy as a type for all flag options, which can only be on or off (true or false).

The struct arg_t structure contains a variable for each of the options defined in This include i, an integer, and o, a character pointer. For C output, all variables defined to be strings in Genparse files are declared as character pointers. For C++ output, the C++ string type from the standard C++ library is used.

In addition to the user-defined options, Genparse adds two extra flag options, -h and -v. The -h option (long form of --help) will cause the usage () function to be called, and the program to terminate. The -v option (long form of --version) will be passed back for the calling function to process. It is intended that the caller will display the program’s version number if this option is set. Note that if the calling program does not process the -v flag, its behavior will not be affected by this flag.

The optind variable records the value of the optind static variable that is used by getopt () and getopt_long (). However, Genparse has changed the behavior of this variable slightly. (See Parser Files.)

The final section of mycopy3_clp.h consists of function prototypes for the command line parser Cmdline ()2, and the usage () function. The Cmdline () function is where the meat of Genparse processing occurs. It takes as arguments a pointer to an arg_t struct which will be filled with the values of the options, and argc and argv which should be passed as the main () function receives them. Genparse asumes that arg_t is a valid pointer to an arg_t struct, the calling program is responsible for properly allocating memory for it. Typically, the parsing function should be called at the beginning of the program.

The usage function lists the command line options for the program, as well as any mandatory command line parameters. Once this information is displayed, the program is terminated. For example, the usage function output for mycopy3, as invoked by the -h option, is as follows:

usage: mycopy3 [ -iohqv ] file 
  [ -i ] [ --iterations  ] Number of times to output <file>.  (default = 1)
  [ -o ] [ --outfile  ] Output file. 
  [ -h ] [ --help  ] Display help information.  (default = 0)
  [ -v ] [ --version  ] Output version.  (default = 0)

After displaying a brief list of all single-character options and mandatory options, the usage message lists all options in short and long forms, along with user-defined descriptions and each option’s default value.

2.3.2 Parser Files

In this section we examine the parser file generated from running Genparse on

/* mycopy3_clp.c */

#include <string.h>
#include <stdlib.h>
#include <getopt.h>
#include "mycopy3_clp.h"

static struct option const long_options[] =
  {"iterations", required_argument, NULL, 'i'},
  {"outfile", required_argument, NULL, 'o'},
  {"help", no_argument, NULL, 'h'},
  {"version", no_argument, NULL, 'v'},
  {NULL, 0, NULL, 0}

** Cmdline ()
** Parse the argv array of command line parameters

void Cmdline (struct arg_t *my_args, int argc, char *argv[])
  extern char *optarg;
  extern int optind;
  int c;
  int errflg = 0;

  my_args->i = 1;
  my_args->o = NULL;
  my_args->h = false;
  my_args->v = false;

  optind = 0;
  while ((c = getopt_long (argc, argv, "i:o:hv", long_options, &optind)) != EOF)
      switch (c)
        case 'i':
          my_args->i = atoi (optarg);
          if (my_args->i < 0)
              fprintf (stderr, "parameter range error: i must be >= 0\n");

        case 'o':
          my_args->o = optarg;

        case 'h':
          my_args->h = true;
          usage (EXIT_SUCCESS, argv[0]);

        case 'v':
          my_args->v = true;

          usage (EXIT_FAILURE, argv[0]);

    } /* while */

  if (errflg)
    usage (EXIT_FAILURE, argv[0]);

  if (optind >= argc)
    my_args->optind = 0;
    my_args->optind = optind;

** usage ()
** Print out usage information, then exit

void usage (int status, char *program_name)
  if (status != EXIT_SUCCESS)
    fprintf (stderr, "Try `%s --help' for more information.\n",
      printf ("\
usage: %s [ -iohv ] file\n\
Print a file for a number of times to stdout.\n\
   [ -i ] [ --iterations ] (type=INTEGER, range=0..., default=1)\n\
          Number of times to output <file>.\n\
          File should be text format!\n\
   [ -o ] [ --outfile ] (type=STRING)\n\
          Output file name.\n\
   [ -h ] [ --help ] (type=FLAG)\n\
          Display this help and exit.\n\
   [ -v ] [ --version ] (type=FLAG)\n\
          Output version information and exit.\n", program_name);
  exit (status);

The parser file consists of two main functions. The usage () function displays the program’s usage information, then terminates. The parsing function, named Cmdline () in this case3, reads the command line and fills an arg_t struct with the command line parameters.

Since the usage () function is straightforward, we will not examine it in detail. Instead, we will focus our attention on Cmdline (). This function begins by defining a struct long_options array based on the specification in the file. This array will tell getopt_long () what options to expect on the command line. The struct arg_t is then initialized and the default parameter values from are set. Once this is complete, getopt_long () is looped through until all command line options have been processed. Each option has its own case in the switch statement. While this processing is fairly simple, there are several options worth examining in more detail.

After the loop over getopt_long () is complete, the error flag is checked. If it is raised, or if the help option has been set, the usage () function is called.

At the end of Cmdline (), the behavior of optind is modified slightly. While optind is returned to the caller in the struct arg_t, we set it to 0 if there are no non-option command line parameters. Otherwise, we pass it back as is, so that it can be used as a pointer into argv.

2.3.3 Main Program

Next to specification of the Genparse file, the most important part of using Genparse is interfacing it with user code. In this section, we show the main program code for mycopy3.c and describe how the routines created by Genparse are used.

/* mycopy3.c */

#include <stdio.h>
#include <stdlib.h>
#include "mycopy3_clp.h"

#define VERSION "3.0"

int main (int argc, char *argv[])
  int c, i;
  FILE *fp, *ofp;
  struct arg_t a;

  Cmdline (&a, argc, argv);

  if (a.v)
      printf ("%s version %s\n", argv[0], VERSION);
      exit (0);

  if (a.o)
    ofp = fopen (a.o, "w");
    ofp = stdout;
  if (!a.optind)
    fp = stdin;
    fp = fopen (argv[a.optind],"r");

  for (i = 0; i < a.i; i++)
      while ((c = fgetc (fp)) != EOF)
	fputc (c, ofp);
      rewind (fp);
  fclose (fp);
  return 0;

The mycopy3.c module begins by including mycopy3_clp.h, which is necessary for the definition of the arg_t struct and the parser function prototypes. The Cmdline () function fills an arg_t struct with the command line options. While the parsing function does not always have to be called before all other processing, it must be called before any command line parameters are used.

The program then checks a number of the parameters, as returned in the structure. In particular, if the -v option is set, the version number is displayed and then the program is terminated.

