Opened 11 years ago

Last modified 10 years ago

#582 new defect

printf() is unnecessarily non-standard in some cases

Reported by: Jiří Zárevúcky Owned by: Jakub Jermář
Priority: minor Milestone:
Component: helenos/lib/c Version: mainline
Keywords: Cc:
Blocker for: Depends on:
See also:

Description

In some cases, printf() discards bytes that are not valid UTF-8. Notably in case of "%c" specifier.

The standard version copies uninterpreted bytes to the input without change, and there is no practical benefit from this departure.

It is likely there are other non-standard behaviors, so it may be beneficial to find those cases and align them with recent POSIX specification.

Change History (4)

comment:1 by Martin Decky, 11 years ago

I would like to vote in favour of closing this ticket as wontfix. I believe that there is one good reason for departing from the behaviour of other systems (*) in this case: The printf() family of functions (as well as any other UTF-8 function in HelenOS libc) should never generate an invalid UTF-8 string, even if you try to force it by calling it with suspicious arguments.

(*) I am specifically speaking about "behaviour of other systems", not about POSIX. HelenOS openly declares that it is not POSIX compliant and that it does not aim for any such certification (when speaking about the system as a whole). Therefore the POSIX standards are in no way more binding for HelenOS that any actual behaviour of other systems, codified or not codified.

If any specific behaviour of some other system is reasonable, it is perfectly fine for HelenOS to align with that behaviour (and if this is also compliant with POSIX, then all the better). If we see a good reason to deviate from any specific behaviour of some other system, it is perfectly fine for HelenOS to deviate (despite the possible deviation from POSIX at the same time).

Of course, there are many reasons for having an optional and potentially 100% strictly compliant POSIX environment in HelenOS (libposix being just one a possibly not the best approach how to provide such an environment). But this has IMHO nothing to do with the native HelenOS environment.

comment:2 by Jiří Zárevúcky, 11 years ago

Any particular reason why printf() "should never generate an invalid UTF-8 string, even if you try to force it"?
I honestly don't see why that would be helpful, especially considering the performance impact of this requirement and the senseless departure from standard behavior.

And I will repeat again: If it's supposed to behave in a non-standard manner, just name it differently. Do you honestly want to require every programmer to inspect the code of every function they have been familiar with for years, just so that they can use it safely? This is a completely artificial obstacle for any potential contributor to HelenOS, and I can't express in words how pointless it is.

in reply to:  2 comment:3 by Martin Decky, 11 years ago

Any particular reason why printf() "should never generate an invalid UTF-8 string, even if you try to force it"?

It is called the robustness principle [1] and it might be formulated for example as:

Be conservative in what you send, be liberal in what you accept

[1] http://en.wikipedia.org/wiki/Robustness_principle

I honestly don't see why that would be helpful, especially considering the performance impact of this requirement and the senseless departure from standard behavior.

The performance impact is totally negligeable. You don't seriously think that the single if statement makes any difference considering that the output of the printf() call must then pass at least one server task (in the usual case) to be effectively visible (on the screen, in a file, etc.).

For you it is a senseless departure from standard, for me it is a meaningful departure from standard. We can agree that we disagree.

And I will repeat again: If it's supposed to behave in a non-standard manner, just name it differently.

And I will repeat again: POSIX is in no way binding for us. Not even for naming stuff. I believe that our printf() behaves as expected in the common case (similar to any printf() before and after the rule of POSIX) and there is no reason not to be called printf(). We have never claimed that it is a POSIX-compliant printf().

Do you honestly want to require every programmer to inspect the code of every function they have been familiar with for years, just so that they can use it safely?

Yes, of course. Precisely because the POSIX string functions operate on bytes, but our string functions operate on UTF-8 encoded Unicode code points. You can never assume that the code which is originally oblivious to UTF-8 will behave correctly when all char * strings are in UTF-8. Whether the difference is in the client code or in the usage of the library code (or both) makes no difference.

If you seek for an 100% POSIX compliant environment (where you certainly don't have to inspect old code for any change of behavior), then use that.

This is a completely artificial obstacle for any potential contributor to HelenOS, and I can't express in words how pointless it is.

Sorry to disappoint, but I don't consider it pointless at all. This is my vision of HelenOS from the very beginning — taking inspiration where it is due, but disrespecting stupid historical norms where it is also due.

I can't express in words how pointless would be for me to implement HelenOS to behave exactly according to some already existing system or standard. If I would like to do that, I would join the development of MINIX.

Now, since our design guidelines are incompatible, you have basically two options: (a) Try to live with that, or (b) Initiate some kind of vote for setting you design guidelines as binding for HelenOS.

(I totally acknowledge that the HelenOS community does not currently have any kind of governing body or governing process, thus it is hard to initiate the vote. But I certainly don't object to establishing some kind of semi-formal or formal governance for HelenOS to make the vote possible.)

comment:4 by Jakub Jermář, 10 years ago

Keywords: first-patch removed

This ticket seems to be quite controversial, removing the first-patch keyword.

Note: See TracTickets for help on using tickets.