httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jonas Eckerman <>
Subject Re: [users@httpd] Serving remote files from Apache...
Date Thu, 01 Jan 2004 16:49:33 GMT
On Wed, 31 Dec 2003 12:04:52 -0800, bruce wrote:

> We've been investigating/searching for an answer to a problem... We've seen
> possible solutions, but nothing that's been exact!..

That's the way things usually work out. When searching for a solution you need to decide wich
of the following is the most important:

1: To get the perfect solution that fits your preconceptions of the solution exactly. This
often means you'll have to write the necessary code yourself.

2: To get a working solution that does solve the *original* problem, even if not the way you
envisioned when you set out and even if it means reconcidering some things. This can often
be done using existing solutions in creative ways, and is often the approach that works best.

Note my use of *original* problem above. It's very common that people first have a rather
abstract problem. Then they start thinking of solutions. When they think they're on the right
track they start investigating how to implement the solutions using existing means. If you
stumble at this stage, it's often a very good idea to step back to the first stage and see
if there are alternative ways to solve the original (often rather abstract) problem.

It's also very useful to explain the original problem when asking for help. When people know
the original problem, they may sometimes come up with suprising solutions that one would never
have thought about oneself.

Currently, you have explained a solution to a problem and have asked for help and tips about
implementing that solution. But you have not explain what problem your solution is meant to

I think you also need to separate your two different problems. You are actually looking for
solutions to two completely separate problems. The solutions may therefore be completely separate
as well. The problems:

1: Get the Apache machines to fetch their configuration from a central place.

2: Get the Apache machines to serve content stored on remote machines.

To me both problems seem rather easy to deal with using existing solutions, but as I don't
have all the info on exactly what you're trying to do they might indeed be difficult problems.
Some info that'd be very good to have about both problems:

1.1: How often does the configuration change?
1.2: How often must the Apaches update their configuration?
1.3: How critical is it that *all* the Apaches *allways* contain the latest configuration
1.4: Will all the Apache installations be identical in *all* ways?
1.4.2: Will the OS isntallations for all the Apache's be identical?

2.1: How much data are we talking about?
2.2: Is this dynamic data or static data? (The usefullness of mod_cache depends on this).
2.3: Will there be scripts r other dynamic stuff that the Apaches are supposed to fetch and
then execute themselves?
2.4: What kind of hardware (CPU numbers and speed, disk size, amount of RAM) will the Apache's
be installed on?
2.5: What kind of connections will the Apache's have to the remote machines?
2.6: What OS will the Apache's machines and the remotes run? (This can be important when considereing
mounting directories remotely.)

>  However, we'd actually like to have the website/page files reside
>  on the remote PC/Harddrives and to basically have them read from
>  the remote machine, and served via the Apache app...

Wich you can do with Apache's mod_proxy.

Honestly, I don't understand why you don't want to use mod_proxy for this. You need to use
some kind of transport protocol to have Apache fetch the files from a remote machine. You've
allready stated that you do not wish to use NFS. Why not use HTTP? Do you have any other particular
protocol for fetshing the files that you'd prefer to HTTP? Are the Apache's supposed to fetch
executable code (PHP pages, CGIs, etc) and execute it (mod_proxy won't do this)?

>         Apache Server ( ip address
>              |  ^
>              |  |
>              |  |
>              V  |
>         Remote PC/Server

> If this is not easily doable, and we need to utilize the ProxyPassReverse
> solution, what issues are involved?

This can of course be done, but you will need some way for Apache to fetch the files from
the remote server. With mod_proxy, Apache allrady contains the functionality that you describe
(as I interpret it). If, for some as yet unexplained reson, you do not wish to use mod_proxy
you can use some other method. You could mount (not necessarily through NFS) the remote machines
directories on the Apache machine, or you can implement your own module to get Apache to fetch

If you can accept the use of mod_proxy (wich does exactly what you want) but don't want Apache
to fetch the files with HTTP, you can use mod_proxy_ftp instead and have Apache fetch the
files with FTP, or you can implement your own module to get Apache's mod_proxy to use some
other protocol.

One thing you have not explained is why you want Apache to do the fetching of both content
and configurations. Why not let the OS do this?

> In particular, how scalable would the
> ProxyPassReverse approach be? We might need to server potentially 1000's of
> sites with this approach...

1000's of different web sites (meaning Apache will fetch from 1000's of different remote machines)?
Or maybe 1000's of different Apache's that will all fetch from just a few remote machines?
Or 1000's of something else?

* If you'll have 1000's of Apache instalations all fetching from just a few remote machines:

You've allready created scaling problems as those few remote machines can get extremely heavily

If this is the case, you should use mod_cache in the 1000's of Apache installations in order
to lower the load on the central remote machines. With mod_cache this scheme should be able
to scale very well.

One problem, even with using mod_proxy in this case is that mod_proxy just fetches data and
sends it on to clients. If you're using PHP, ASP, CGI or similar stuff, the remote machine
is the one that'll have to execute this code, because Apache with mod_proxy just fetches it
ans serves it as-is. If this is what you're doing (or might be doing), mod_proxy will create
problems rather than solve them. This means you will probably be better of letting the OS
handle the actuall fetching of the remote files, and use a more standard Apache that doesn't
really know wether the files are stored locally or remotely.

OTH, you did say you wanted small stripped Apaches, so I guess you do not want them to be
able to execute CGIs, PHP, ASP or other similar stuff anyway.

* If you'll have 1000's of different virtual hosts in each Apache or 1000's of remote machines:

I'm not sure where exactly the scaling problems will be, except that you really should look
at the modules for mass virtual hosting if you do this.


Jonas Eckerman,

The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:> for more info.
To unsubscribe, e-mail:
   "   from the digest:
For additional commands, e-mail:

View raw message