I know this is impossible because I've spent hours on it and also read these below but I am determined to approximate a clang block forward definition even if it takes linker tricks or inline assembly:
I did my homework
I'm using a macro library written for gcc (which is 1,000 - 10,000 times faster than jansson at outputting large quantities of json) which encourages horrors such as this test case:
JSON_with_BUFFERED_WRITER((1001, result),
JSON_OBJ(
JSON_STRING("logGroupName", "\005group©"),
JSON_STRING("logStreamName", "stream")
)
) (out) {
strncpy(out, data, length);
out+=length;
*out=0;
return 0;
};
Yes, that block at the end is actually partially sucked into the macro expansion to produce a function definition something like this below:
int __WRITER__1849_emit(const char *data, int length, typeof(result) out) {
strncpy(out, data, length);
out += length;
*out = 0;
return 0;
};
((out)
appearing after the macro expansion actually becomes an argument to a macro name which is the result of the macro expansion, and which consumes out
and then emits it as the last item of the generated parameter list, how sick is that? I'm very proud of that!)
Under gcc, this function (by means of an early forward definition in the macro expansion auto int __WRITER__1849_emit(const char *data, int length, typeof(result) out);
is assigned to a struct instance somewhere in the macro expansion and invoked as a callback as the macro generated code is executed.
Most of the library has been converted to also work under clang by means of clang blocks, generally with that final function prefixed with a slightly late assignment of the block to the struct field ready for use.
__WRITER__1849.emit = ^int(const char *data, int length, typeof(result) out) {...
and that works fine where no use is made until after that assignment but in the case of JSON_with_BUFFERED_WRITER
the processing is happening inside the macro arguments expansion before the assignment can take place.
So it is clear how important forward declarations are. The values are all known at link time anyway, but I need the value to be inserted into the struct instance static initialization before the generated code from the macro executes.
I know could beat-up the usage and have a 2-stage declare-and-then-use but it isn't that kind of quitting that has me waking up wanting to write more code (probably of the type that should never be written, but this isn't reddit so don't judge me).
Although that final function can get sucked into the tail of the macro expansion, and even be assigned to something by means of the macro tail expanding to an l-value, I don't have the option to have anything invoked after that assignment. C has no setters, I can't work out how to have anything go out of scope to run a destructor (within which I could do something) after the assignment, so that's a dead end.
I've tried persuading clang with attempts at alias
attributes:
static __attribute__((alias("__WRITER__1849_emit__"))) int (^__WRITER__1849_emit)(const char *data, int length, typeof(result) cookie);
...
__attribute__((external)) static int (^__WRITER__1849_emit__)(const char *data, int length, typeof(result) cookie) =
^int(const char *data, int length, typeof(result) out) {
strncpy(out, data, length);
out += length;
*out = 0;
return 0;
};
but it doesn't work, and inspecting the .s file, the alias attribute has had no effect.
It's not possible to add the reverse __attribute__((alias("__WRITER__1849_emit"))
to the static block definition as it was the "forward" declaration, perhaps because it is statically defined.
This hints at use of extern
but I haven't yet divined the required trick to have the subsequent static definition be linked and matched with the extern definition:
extern __attribute__((alias("__WRITER__1849_emit__"))) int (^__WRITER__1849_emit)(const char *data, int length, typeof(result) cookie);
...
static int (^__WRITER__1849_emit__)(const char *data, int length, typeof(result) cookie) =
^int(const char *data, int length, typeof(result) out) {
strncpy(out, data, length);
out += length;
*out = 0;
return 0;
};
This merely gives undefined reference to '__WRITER__1849_emit'
and nothing in the .s file to indicate that any aliasing or linking between the symbols will occur, and the only definition in the .s file of the extern
"forward declaration" hack is:
.addrsig_sym __WRITER__1849_emit
So now I'm looking at a way to declare/emit a global symbol from the static block definition in the function and/or maybe some inline assembly, which can map up to the extern "forward declaration"
Any tips? Special linker scripts?
It looks like I need to emit something like this somehow, to go with the extern
method:
.globl __WRITER__1849_emit
.set __WRITER__1849_emit, __WRITER__1849_emit__
maybe like this:
__asm__(" .globl __WRITER__1849_emit\n"
".set __WRITER__1849_emit, __WRITER__1849_emit__\n");
Well it almost works; a little warning: warning: relocation against '__WRITER__1849_emit__' in read-only section '.text'
but the real problem is that it is emitted before __WRITER__1849_emit__\
is defined (a static const initializer) giving rise to: undefined reference to '__WRITER__1849_emit__'
during link, logged against the use of __WRITER__1849_emit
the ostensible extern
"forward declaration"
I need to qualify my symbols as clang does, this worked in combination with extern! and without the need for .globl
extern int (^__WRITER__1849_emit)(const char *data, int length, typeof(result) cookie);
__asm__(".set __WRITER__1849_emit, selftest_json.__WRITER__1849_emit__\n");
So I just need to normalise the generation of the prefix selftest_json.
which might be hard as that is the outer function name which is not available as a character sequence at macro time, but it becomes hopeful!
I'd rather make this work and then not use the solution because it is unclean, than not be able to make it work. I will bend clang to my will!
I'm using Ubuntu clang version 15.0.7
Maybe I could substitute a global function pointer which receives a symbol name as an argument to use with dlopen/dlsym to find the real address - nope the block is "data" and dlsym doesn't find it.