directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Emmanuel Lécharny <>
Subject Re: Binary values and humanRedable flag
Date Mon, 10 Aug 2015 16:15:30 GMT
Le 10/08/15 17:25, Radovan Semancik a écrit :
> On 08/10/2015 03:10 PM, Emmanuel Lécharny wrote:
>> Le 10/08/15 13:33, Radovan Semancik a écrit :
>>> On 08/10/2015 12:42 PM, Emmanuel Lécharny wrote:
>>>> There is no flag that says an Attribute is H-R or not. The
>>>> information is provided in RFC 22524.3.2
>>>> <>
>>> Hmm, I was code for parsing of "X-NOT-HUMAN-READABLE" so I thought
>>> that it might be caused by this. Thanks for clarification. Anyway, the
>>> strange thing is that the syntax appears
>>> to be human readable.
>> WHich it is not :
>> version: 1
>> dn: m-oid=,ou=syntaxes,cn=system,ou=schema
>> objectclass: top
>> objectclass: metaTop
>> objectclass: metaSyntax
>> m-oid:
>> m-description: JPEG
>> m-obsolete: FALSE
>> x-not-human-readable: TRUE
>> entrycsn: 20100111202214.878000Z#000000#000#000000
>> creatorsname: uid=admin,ou=system
>> createtimestamp: 20100111145217Z
> Depends on the server. OpenLDAP defines the syntax like this:
> ldapSyntaxes: ( DESC 'JPEG'
>   'TRUE' )
> But OpenDJ like this:
> ldapSyntaxes: ( DESC 'JPEG' )
> This is probably the difference. (And thanks for pointing that out. I
> completely forgot that syntax declaration is also part of the schema.)
> I believe that the API works with ApacheDS :-) ... but my goal is to
> make it work with other LDAP servers as well. And the detection of H/R
> is clearly wrong with OpenDJ. So I'm trying to figure out what's going
> on. Now it looks like that the OpenDJ declaration of the syntax is
> correct. I would expect that is no X-NOT-HUMAN-READABLE clause is
> present then the H/R flag will be set according to the RFC. But it is
> not. The API seems to be assuming "true" as a default for H/R flag. Is
> this a bug in the API?

No, it's not a bug, it's a default setting. We have no clue about which
attrinbute should be H-R or not H-R, if teh server does not tell us.
Most of the servers don't provide the X-NOT-HUMAN-READABLE element, the
API then decide it's a H-R.

Now, to please exigeant clients (;-) we have added the
BinaryAttributeDetector which can be used in the LdapConnection :

     * Sets the object responsible for the detection of binary attributes.
    void setBinaryAttributeDetector( BinaryAttributeDetector
binaryAttributeDetecter );

This is done this way :

            new SchemaBinaryAttributeDetector(
                ldapServer.getDirectoryService().getSchemaManager() ) );

And you can use the DefaultConfigurableBinaryAttributeDetector class to
configure the list of Attributes that should be considerd as not H/R.

Here is the default list of such attributes :

 * <li>entryACI</li>
 * <li>prescriptiveACI</li>
 * <li>subentryACI</li>
 * <li>audio</li>
 * <li>javaByteCode</li>
 * <li>javaClassByteCode</li>
 * <li>krb5key</li>
 * <li>m-byteCode</li>
 * <li>privateKey</li>
 * <li>publicKey</li>
 * <li>userPKCS12</li>
 * <li>userSMIMECertificate</li>
 * <li>cACertificate</li>
 * <li>userCertificate</li>
 * <li>authorityRevocationList</li>
 * <li>certificateRevocationList</li>
 * <li>deltaRevocationList</li>
 * <li>crossCertificatePair</li>
 * <li>personalSignature</li>
 * <li>photo</li>
 * <li>jpegPhoto</li>
 * <li>supportedAlgorithms</li>

This list can be extedded.

That shoudl work, even with OpenDJ.
>> I expect the server or the client to *know* magically that this
>> attribute is H/R when connected to OpenDJ, right ? (irony)
> No magic needed here (although some magic might come very useful with
> some LDAP servers :-) ) .... I just expect that when no
> X-NOT-HUMAN-READABLE is present then the default from the RFC is used.
> Isn't that a reasonable expectation?
No, sadly.

>> Yes, that's true. the rational is that we do a best effort to inject
>> values correctly, converting them on the fly.
>> Note that this H-R flag itself is stupid. It was added 12 years ago as a
>> way to follow teh RFC, but as a matter of fact, the Syntax itself
>> already drives the type of data we can store in an Attribute. I made it
>> even more complex by trying to use Generics. Now, we have those
>> StringValue and BinaryValue all over the code.
>> Ideally, we should not have to care about what we store, and always
>> consider the stored values as byte[]. OTOH, it's not convenient when we
>> want to manipulate values as String, as converting them over and over
>> from byte[] to Strings is costly (epecially in the server). But I do
>> think we went way to far here. This conversion should be done internally
>> once, and that's it. It would save us a hell lot of time, and would make
>> the APi more comfortable to use.
>>> I tend to agree. Always storing the value as binary seems to be good
>>> idea.
>> Depends. from the performance POV, this is killing the server. Most of
>> the AT are H/R, and require some checks (comparison, normalization, etc)
>> during the processing of every request. Having only the binary value is
>> forcing the server to do the conversion back and forth multiple times.
>> We faced this issue and when we switched to StringValue and BinaryValue,
>> the performance boost was huge (100%).
>> Ideally, we should have 2 methods :
>> - getBinaryValue()
>> - getStringValue()
>> because we always know which type we are dealing with. But that's the
>> point : in the server, for operatiuons involving many attributes, that
>> would require a check on the Syntax everytime we want to manipulate a
>> value, which is a bit of a PITA, especially when we don't care about
>> this type. Having a Value<?> wrapper helps a lot here...
> I understand. And storing converted string values is not really a
> problem. As long as the binary value is the primary one. Current
> StringValue implementation has it the other way around. And this
> causes problems. E.g. I have binary value of
> 2e254d883270c44cd7ae2e254d883270. The '88' and 'C4' are not a valid
> UTF codes, so if they are converted to string, it will have those
> strange inverted question mark characters. And when converted back to
> binary it becomes 2e254defbfbd3270efbfbd4cd7ae2e254defbfbd3270 ... so
> both the '88' and 'C4' are translated to 'efbfbd' and the data are
> ruined.
> If the StringValue was implemeted the other way around then it may be
> less harmful. I.e. storing the binary value as a primary and
> converting that to string. Storing that string in the StringValue
> object is OK (as far as it is properly invalidated when the bytes
> change, but that should not be a problem). As far as I understand the
> StringValue is storing both values even now. So this is only matter of
> changing the implementation and always storing the binary value as
> primary - both in BinaryValue and Stringvalue.

It's more complex.

Values is stored in two forms :
- binary
- String

except that binary values are not stored in String form.

When we receive a new value for a H-R attribute, we receive it on the
server as an UTF-8 byte[], and we convert it immediately to a String.
for the on going processing, up to the point we store the data into the
backend, we always use the String representation, because this is how we
do comparison and normalization. When we store the data in the backend,
in order to save the String Preparation processing which is already
over-expensive ( - For the record,
I'm not sure OpenDJ does prepare string at all), we store both the
byte[] and String representation on disk. Of course, we read them back
too, because it's useful when we write it back to the client, as we use
the byte[] format.

Switching to a byte[] for everything *in the server* would force us to
rewrite all the syntax checkers/comparions/normalizers to work with
UTF-8, whoich would be a major burden.

Life is complicated...

>> I'm really willing to find a better solution, I have worked a full
>> quarter on this issue (bin/string values) and I haven't be able to come
>> with something that hide the inconsitency and complexity of LDAP in this
>> area, sadly... May be it's time for a rehearsal...
> I think that the code is not that bad to require a complete redesign.
> The interfaces should to be OK as far as I can tell now. So maybe only
> some internal refactoring is needed. That can be done in an
> evolutionary fashion. What about just starting with storing the binary
> value as a primary one? Then even if there is a problem with correct
> detection of attribute type then no data is really lost and the client
> can still safely use value.getBytes() regardless of whether it is
> BinaryValue or StringValue.

FTR, I'm not sure we store binary values in a String form at all. (from
the top of my head)

View raw message