directory-api mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Harris, Christopher P" <>
Subject RE: Proper use of LdapConnectionPool
Date Wed, 28 Jan 2015 00:15:35 GMT
Ah, crap.  I forgot to look at the Scope.  I've been using this code for so long for single-search
queries that I took it for granted.

I'll try setting it to ONE_LEVEL to simply bask in the glory of the speedy results, but still,
doing just 1 query makes a hell of a lot more sense.

Sorry, I don't know where my head was.

Thanks for steering me down the right path.

 - Chris

-----Original Message-----
From: Emmanuel Lécharny [] 
Sent: Tuesday, January 27, 2015 5:43 PM
Subject: Re: Proper use of LdapConnectionPool

Le 27/01/15 23:07, Harris, Christopher P a écrit :
> Hi, Emmanuel.
> "Can you tell us how you do that ? Ie, are you using a plain new connection for each
thread you spawn ?"
> Sure.  I can tell you how I am implementing a multi-threaded approach to read all of
LDAP/AD into memory.  I'll do the next best thing...paste my code at the end of my response.
> "In any case, the TimeOut is the default LDapConnection timeout (30 seconds) :"
> Yes, I noticed mention of the default timeout in your User Guide.
> "You have to set the LdapConnectionConfig timeout for all the created connections to
use it. there is a setTimeout() method for that which has been added in 1.0.0-M28."
> When visiting your site while seeking to explore connection pool options, I noticed that
you recently released M28 and fixed DIRAPI-217 and decided to update my pom.xml to M28 and
test out the PoolableLdapConnectionFactory.  Great job, btw.  Keep up the good work!
> Oh, and your example needs to be updated to using DefaultPoolableLdapConnectionFactory
instead of PoolableLdapConnectionFactory.
> "config.setTimeOut( whatever fits you );"
> Very good to know.  Thank you!
> "It is the right way."
> Sweeeeeeet!
> "Side note : you may face various problems when pulling everything 
> from an AD server. Typically, the AD config might not let you pull 
> more than
> 1000 entries, as there is a hard limit you need to change on AD if you want to get more
> Otherwise, the approach - ie, using multiple threads - might seems good, but the benefit
is limited. Pulling entries from the server is fast, you should be able to get tens of thousands
per second with one single thread. I'm not sure how AD support concurrent searches anyway.
Last, not least, it's likely that AD does not allow more than a certain number of concurrent
threads to run, which might lead to contention at some point."
> Ah, this is why I wanted to reach out to you guys.  You guys know this kind of in-depth
information about LDAP and AD.  So, I may adapt my code to a single-thread then.  I can live
with that.  I need to pull about 40k-60k entries, so 10's of thousands of entries per second
works for me.  I may need to run the code by you then if I go with a single-threaded approach
and need to check if I'm going about it in the most efficient manner.

The pb with the multi-threaded approach is that you *have* to know which entry has children,
because they won't give you such an info. So you will end doing a search for every single
entry you get at one level, with scope ONE_LEVEL, and most of the time, you will just get
teh entry itself. That would more than double the time it takes to grab everything.

> And now time for some code...
> import;
> import java.util.Iterator;
> import java.util.List;
> import java.util.Map;
> import java.util.concurrent.ConcurrentHashMap;
> import java.util.concurrent.ExecutorService;
> import java.util.concurrent.Executors; import 
> java.util.concurrent.TimeUnit; import java.util.logging.Level; import 
> java.util.logging.Logger;
> import org.apache.commons.pool.impl.GenericObjectPool;
> import;
> import;
> import;
> import;
> import;
> import;
> import;
> import;
> import;
> import;
> import 
> import;
> import;
> import;
> import;
> import 
> ory; import 
> actory; import;
> import;
> import 
> /**
>  * @author Chris Harris
>  *
>  */
> public class LdapClient {
> 	public LdapClient() {
> 	}
> 	public Person searchLdapForCeo() {
> 		return this.searchLdapUsingHybridApproach(ceoQuery);
> 	}
> 	public Map<String, Person> buildLdapMap() {
> 		SearchCursor cursor = new SearchCursorImpl(null, 300000, TimeUnit.SECONDS);
> 		LdapConnection connection = new LdapNetworkConnection(host, port);
> 		connection.setTimeOut(300000);
> 		Entry entry = null;
> 		try {
> 			connection.bind(dn, pwd);
>             			LdapClient.recursivelyGetLdapDirectReports(connection, cursor, entry,
>             			System.out.println("Finished all Ldap Map Builder threads...");
>         		} catch (LdapException ex) {
>             			Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex);
>         		} catch (CursorException ex) {
>             			Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null, ex);
>         		} finally {
>             			cursor.close();
>            			 try {
>                 			connection.close();
>             			} catch (IOException ex) {
>                 			Logger.getLogger(LdapClient.class.getName()).log(Level.SEVERE, null,
>             			}
>         		}
> 		return concurrentPersonMap;
> 	}
> 	private static Person recursivelyGetLdapDirectReports(LdapConnection connection, SearchCursor
cursor, Entry entry, String query) 
> 			throws CursorException {
> 		Person p = null;
>         		EntryMapper<Person> em = Person.getEntryMapper();
> 		try {	        
> 	        		SearchRequest sr = new SearchRequestImpl();
> 	        		sr.setBase(new Dn(searchBase));
> 	        		StringBuilder sb = new StringBuilder(query);
> 	        		sr.setFilter(sb.toString());
> 	        		sr.setScope( SearchScope.SUBTREE );

Ahhhhh !!!! STOP !!!

Ok, no need to go any further in your code.

You are doing a SUBTREE search on *every single entry* you are pulling from the base. if you
have 40 000 entries, you will do something like O(
40 000! ) (factorial) searches. No wonder why you get timeout... Imagine you have such a tree


The search on root with pull A1, A2, B1, B2, B3, B4, C1..8 (14 entries
-> 14 searches)
Then the search on A1 will pull B1, C1, C2, B2, C3, C4 (6 entries -> 6
Then the search on A2 will pull B3, C5, C6, B7, C8, C9 (6 entries -> 6
Then the search on B1 will pull C1, C2 ( 2 entries -> 2 searches, *4 = 8 ...

At the end, you have done 1 + 14 + 12 + 8 = 35 searches, when you have only 15 entries...

If you want to see what your algorithm is doing, just do a search using a SearchScope.ONE_LEVEL
instead. You will only do somehow O(40 000) searches, which is way less than what you are

But anyway, doing a search on the root with a SUBTREE scope will be way faster, because you
will do only one single search.

The information transmitted is intended only for the person(s) or entity to which it is addressed
and may contain confidential and/or legally privileged material. Delivery of this message
to any person other than the intended recipient(s) is not intended in any way to waive privilege
or confidentiality. Any review, retransmission, dissemination or other use of, or taking of
any action in reliance upon, this information by entities other than the intended recipient
is prohibited. If you receive this in error, please contact the sender and delete the material
from any computer.

For Translation:
View raw message