Wednesday, 22 July 2015

Convert pdf to monochrome

From StackExchange

gs \
 -sOutputFile="BW-$1" \
 -sDEVICE=pdfwrite \
 -sColorConversionStrategy=Gray \
 -dProcessColorModel=/DeviceGray \
 -dCompatibilityLevel=1.4 \
 -dNOPAUSE \
 -dBATCH "$1"

Writes sample.pdf to BW-sample.pdf

Tuesday, 7 July 2015

Marriage of alloca and asprintf as asprintfa

It all comes down to not being able to return variable length arrays in C.

[Note: I realise that I am pushing obscenity to the limit, I don't really use this]

I think that asprintf is the way sprintf should work, but not everyone has it, and so folk either cart around an asprintf implementation, or enjoy getting the tedium of snprintf(NULL, 0, ...) or the worse error-prone tedium of adding up buffer sizes.

A single line  asprintf and corresponding  free() make a much simpler 1-2 lines of use.

Alloca

But while I'm not being portable it would be nice to be able to have asprintf use alloca as it's allocator...

alloca returns space from the current function stack frame (not scope) and is automatically freed when the function returns.

If alloca can't allocate enough memory, then who knows what might happen, but on the other hand If asprintf or malloc can't allocate enough memory, how does your program handle it?

So the plan is to alloca a buffer of the right size, and then sprintf into that.

This 1 line macro has the same syntax as the official asprintf.

#define asprintfa(PTR, ...) sprintf( (*(PTR)) = alloca(1 + snprintf(NULL, 0, __VA_ARGS__)), __VA_ARGS__)

and must be used with two lines:

char* x;
asprintfa(&x, "a %s b %d c\n", "<>", -1);

But I'd prefer a single line use:

char* x = asprintfa("a %s b %d c\n", "<>", -1);

So the plan is to alloca a buffer of the right size, and the right size is measured with 1 + nprintf(NULL, 0, ...). So we define a helpful macro:

#define measure_printf(...) snprintf(NULL, 0, __VA_ARGS__)

Once the size is measured, we can allocated with alloca and call sprintf(alloc, ...) to populate our alloc'd buffer. usual warning that macro arguments might get evaluated twice, but I don't see any other way around that without using a function call to stash the values, and require the heap or static buffers to hold them.

#define asprintfa(...) sprintf(alloca(1 + measure_printf(__VA_ARGS__)), __VA_ARGS__);

However, that will return whatever sprintf returns, which is the size, not the buffer. We need a temporary variable to hold the alloca pointer, and maybe return via a special GNU macro, like this:

#define asprintfa(...) ({ char* x=alloca(1 + measure_printf(__VA_ARGS__)); sprintf(x, __VA_ARGS__); x })

but that also requires us to come up with a variable name that will never be one of the __VA_ARGS__. And relying on alloca is bad enough without insisting on GNU extensions.

Maybe a tuple, like this:

#define asprintfa(...) ( char* x=alloca(measure_printf(__VA_ARGS__)), sprintf(x, __VA_ARGS__), x ) 

It's not a GNU extension but x still might clash with one of the __VA_ARGS__ and now we are declaring a variable in the middle of a scope.

It is simpler to define a wrapper to snprintf (or vsprintf ) that returns the buffer instead of the size. (Although perhaps a simple assembler push/pop would have saved the value nicely).

static char* a_sprintf(char* v, char* format, ...)
{
    if (! v) return NULL; // However alloca is undefined if it fails...
    va_list args;
    va_start(args, format);
    vsprintf(v, format, args);
    va_end(args);
    return v;
}

Which gives this winning combination which returns the string instead of the length (more useful):

#define measure_printf(...) snprintf(NULL, 0, __VA_ARGS__)
#define asprintfa(...) a_sprintf(alloca(measure_printf(__VA_ARGS__)), __VA_ARGS__);
/* sprintf wrapper that returns the buffer address. 
   Does not check size, intended to be used on a 
   properly allocated buffer as part of aasprintf */
char* a_sprintf(char* v, char* format, ...)
{
    if (! v) return NULL; // However alloca is undefined if it fails...
    va_list args;
    va_start(args, format);
    vsprintf(v, format, args);
    va_end(args);
    return v;
}

Here is another version that can be called like this:
char* x = asprintfa("a %s b %d c\n", "<>", -1);

it returns the pointer from the macro directly, but uses a stinky temp var along with GNU statement macro extensions again:
#define asprintfa(...) ({\
  char* __buf__;\
  sprintf(__buf__ = alloca(1 + snprintf(NULL, 0, __VA_ARGS__), __VAR_ARGS__);\
  __buf__; })

Variable Length Arrays

With this, you type:
char asprintfa(x, "a %s b %d c\n", "<>", -1);

and it expands to:
char x[1 + snprintf(NULL, 0, "a %s b %d c\n", "<>", -1)];
sprintf[x, sprintf("a %s b %d c\n", "<>", -1);

The code is:
#define asprintfa(name, ...) name[1 + snprintf(NULL, 0, __VA_ARGS__)]; \
sprintf(name, __VA_ARGS__);

Memory Allocation Failures

It doesn't warn you or fail sensibly if you try to print too much, but on the other hand it's 1 line to use and no need to worry about freeing.

Which is more likely?
With snprintf that you either:
  • get the snprintf buffer arithmetic wrong
  • or don't cope with allocating too much memory
  • or forget to free the memory
or with asprintfa
that you
  • allocates too much memory
There's certainly a lot less to get wrong, although admitedly you can't get it all right either - but there's asprintf for that.

sprintf was designed to get wrong, and snprint was designed to also get the arithmetic wrong.

And for completeness, asprintf (now tested):

int asprintf(char **v, const char* format, ...)
{ 
    int len = 0;
    if (! v) return -1;
 
    va_list args;
    va_start(args, format);
    len = vsnprintf(NULL, 0, format, args);
    va_end(args);
    va_start(args, format);
    *v = malloc(len + 1); 
    if (! *v) return -1;
       len = vsnprintf(*v, len + 1, format, args);
    va_end(args);
    return len;
}

[ Edit: Fixed loads of missing 1 + to the result of snprintf while measuring. snprintf returns just the number of characters written except the NULL character, but the length provided should also have space for the NULL.]