incubator-s4-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mmo...@apache.org
Subject [2/12] git commit: Doc updates for 0.6.0
Date Sun, 10 Mar 2013 20:07:10 GMT
Doc updates for 0.6.0


Project: http://git-wip-us.apache.org/repos/asf/incubator-s4/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-s4/commit/cb3c58aa
Tree: http://git-wip-us.apache.org/repos/asf/incubator-s4/tree/cb3c58aa
Diff: http://git-wip-us.apache.org/repos/asf/incubator-s4/diff/cb3c58aa

Branch: refs/heads/dev
Commit: cb3c58aa414f5c5946b53cb6ee4d46ac77d281fc
Parents: abc6a77
Author: Matthieu Morel <mmorel@apache.org>
Authored: Mon Feb 25 11:49:53 2013 +0100
Committer: Matthieu Morel <mmorel@apache.org>
Committed: Sat Mar 2 18:53:15 2013 +0100

----------------------------------------------------------------------
 website/README.markdown                            |   12 +-
 website/Rules                                      |   17 +-
 website/config.yaml                                |    2 +-
 website/content/doc/0.6.0/configuration.md         |   60 +++---
 website/content/doc/0.6.0/dev_tips.md              |   23 ++-
 website/content/doc/0.6.0/event_dispatch.md        |   10 +-
 website/content/doc/0.6.0/index.md                 |   22 ++-
 website/content/doc/0.6.0/metrics.md               |   41 ++++
 website/content/doc/0.6.0/overview.md              |   28 ++--
 website/content/doc/0.6.0/recommended_practices.md |   11 +
 website/content/doc/0.6.0/tools.md                 |   47 +++++
 .../content/doc/0.6.0/twitter_trending_example.md  |   77 ++++++++
 website/content/doc/0.6.0/walkthrough.md           |  143 ++++-----------
 website/content/style/pygmentize.scss              |   13 +-
 website/content/style/style.scss                   |    6 +-
 15 files changed, 323 insertions(+), 189 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/README.markdown
----------------------------------------------------------------------
diff --git a/website/README.markdown b/website/README.markdown
index 19931d0..11d4ca9 100644
--- a/website/README.markdown
+++ b/website/README.markdown
@@ -12,9 +12,12 @@ Entry pages are written with haml and the documentation is written with
markdown
 
 The generated static website is in `output/`
 
-# To upload the site to apache
 
-## first, commit the generated website to svn
+There are also a number of dependencies on other gem, error messages are explicit about which
ones and how to install them.
+
+We also use pygments for code syntax highlighting. It's a python program, see [here](http://pygments.org/docs/installation/)
for installing.
+
+# To upload the site to apache, commit the generated website to svn (site/ directory)
 
 	cp -R output/* $S4_SVN_LOC/site
 	cd $S4_SVN_LOC
@@ -23,7 +26,4 @@ The generated static website is in `output/`
 	svn add <whatever is missing>
 	svn commit --username <apache username> -m "commit message"
 
-## then checkout into web server
-	ssh people.apache.org
-	cd /www/incubator.apache.org/content/s4
-	svn checkout http://svn.apache.org/repos/asf/incubator/s4/site .
\ No newline at end of file
+With svnpubsub, the website is automatically updated
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/Rules
----------------------------------------------------------------------
diff --git a/website/Rules b/website/Rules
index 75ecf78..a2ff1bf 100644
--- a/website/Rules
+++ b/website/Rules
@@ -22,23 +22,18 @@ compile '/images/*/' do
   # do nothing
 end
 
-compile '/doc/*' do
-  if item.binary?
-  # don’t filter binary items
-  else
+compile '*' do
+  if item[:extension] == "haml"
+    filter :haml
+    layout 'default'
+  end
+  if item[:extension] == "md"
     filter :kramdown
 	filter :colorize_syntax,
        :default_colorizer => :pygmentize,
        :pygmentize => { :linenos => 'inline', :options => { :startinline => 'True'
} }
     layout 'default'
   end
-end
-
-compile '*' do
-  if item[:extension] == "haml"
-    filter :haml
-    layout 'default'
-  end
   filter :relativize_paths, :type => :html
 end
 

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/config.yaml
----------------------------------------------------------------------
diff --git a/website/config.yaml b/website/config.yaml
index 243e1aa..1672195 100644
--- a/website/config.yaml
+++ b/website/config.yaml
@@ -41,4 +41,4 @@ data_sources:
     layouts_root: /
 
 google_analytics_account_id: UA-19490961-1
-google_analytics_domain: .s4.io
+google_analytics_domain: incubator.apache.org

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/configuration.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/configuration.md b/website/content/doc/0.6.0/configuration.md
index 59ffd26..7bd093e 100644
--- a/website/content/doc/0.6.0/configuration.md
+++ b/website/content/doc/0.6.0/configuration.md
@@ -2,6 +2,8 @@
 title: Configuration
 ---
 
+> How to configure S4 clusters and applications
+
 # Toolset
 
 S4 provides a set of tools to:
@@ -13,15 +15,8 @@ S4 provides a set of tools to:
 * start a Zookeeper server for easy testing: `s4 zkServer`
 	* `s4 zkServer -t` will start a Zookeeper server and automatically configure 2 clusters
 * view the status of S4 clusters coordinated by a given Zookeeper ensemble: `s4 status`
-
-
-		./s4
-
-will  give you a list of available commands.
-
-	./s4 <command> -help
-
-will provide detailed documentation for each of these commands.
+* `s4` will  give you a list of available commands.
+* `./s4 <command> -help` will provide detailed documentation for each of these commands.
 
 
 # Cluster configuration
@@ -38,6 +33,7 @@ Before starting S4 nodes, you must define a logical cluster by specifying:
 The cluster configuration is maintained in Zookeeper, and can be set using S4 tools:
 
 	./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000
+
 See tool documentation by typing:
 	
 	./s4 newCluster -help
@@ -45,16 +41,16 @@ See tool documentation by typing:
 
 # Node configuration
 
-*Platform* *code and* *application* *code are fully configurable,* *at deployment time{*}*.*
+**Platform code and application code are fully configurable, at deployment time.**
 
-S4 nodes start as simple *bootstrap* processes whose initial role is merely to connect the
cluster manager:
+S4 nodes start as simple **bootstrap** processes whose initial role is merely to connect
the cluster manager:
 
-* the bootstrap code connects to the cluster manager
-* when an application is available on the cluster, the node gets notified
-* it downloads the platform configuration and code, as specified in the configuration of
the deployed application.
-* the communication and core components are loaded, bound and initialized
-* the application configuration and code, as specified in the configuration of the deployed
applciation, is downloaded
-* the application is initialized and started
+1. the bootstrap code connects to the cluster manager
+1. when an application is available on the cluster, the node gets notified
+1. it downloads the platform configuration and code, as specified in the configuration of
the deployed application.
+1. the communication and core components are loaded, bound and initialized
+1. the application configuration and code, as specified in the configuration of the deployed
applciation, is downloaded
+1. the application is initialized and started
 
 This figure illustrates the separation between the bootstrap code, the S4 platform code,
and application code in an S4 node:
 
@@ -73,9 +69,9 @@ Example:
 
 # Application configuration
 
-Deploying applications is easier when we can define both the parameters of the application
*and* the target environment.
+Deploying applications is easier when we can define both the parameters of the application
**and** the target environment.
 
-In S4, we achieve this by specifying *both* application parameters and S4 platform parameters
in the deployment phase :
+In S4, we achieve this by specifying **both** application parameters and S4 platform parameters
in the deployment phase :
 
 * which application class to use
 * where to fetch application code
@@ -87,7 +83,7 @@ In S4, we achieve this by specifying *both* application parameters and S4
platfo
 
 ## Modules configuration
 
-S4 follows a modular design and uses[Guice](http://code.google.com/p/google-guice/) for defining
modules and injecting dependencies.
+S4 follows a modular design and uses [Guice](http://code.google.com/p/google-guice/) for
defining modules and injecting dependencies.
 
 As illustrated above, an S4 node is composed of:
 * a base module that specifies how to connect to the cluster manager and how to download
code
@@ -105,30 +101,36 @@ For the core module, there is no default parameters.
 
 We provide default modules, but you may directly specify others through the command line,
and it is also possible to override them with new modules and even specify new ones (custom
modules classes must provide an empty no-args constructor).
 
-Custom overriding modules can be specified when deploying the application, through the`deploy`
command, through the _emc_ or _modulesClasses_ option.
+Custom overriding modules can be specified when deploying the application, through the`deploy`
command, through the `emc` or `modulesClasses` option.
 
 For instance, in order to enable file system based checkpointing, pass the corresponding
checkpointing module class :
 
 	./s4 deploy -s4r=uri/to/app.s4r -c=cluster1 -appName=myApp \
 	-emc=org.apache.s4.core.ft.FileSystemBackendCheckpointingModule 
 
-You can also write your own custom modules. In that case, just package them into a jar file,
and specify how to fetch that file when deploying the application, with the _mu_ or _modulesURIs_
 option.
+You can also write your own custom modules. In that case, just package them into a jar file,
and specify how to fetch that file when deploying the application, with the `mu` or `modulesURIs`
 option.
 
 For instance, if you checkpoint through a specific key value store, you can write you own
checkpointing implementation and module, package that into fancyKeyValueStoreCheckpointingModule.jar
, and then:
 
-	./s4 node -c=cluster1 -emc=my.project.FancyKeyValueStoreBackendCheckpointingModule \
+	./s4 deploy -c=cluster1 -emc=my.project.FancyKeyValueStoreBackendCheckpointingModule \
 	-mu=uri/to/fancyKeyValueStoreCheckpointingModule.jar
 
 ### overriding parameters
 
 A simple way to pass parameters to your application code is by:
 
-* injecting them in the application class:
+* injecting them in the application class (primitive types, enums and class literals are
automatically converted), for instance:
+
+~~~
+#!java
+
+@Inject
+@Named("thePortNumber")
+int port
+
+~~~
 
-		@Inject
-		@Named('myParam')
-		param
-* specifying the parameter value at node startup (using -p inline with the node command,
or with the '@' syntax)
+* specifying the parameter value at node startup (using `-p` inline with the node command,
or with the '`@`' syntax)
 
 S4 uses an internal Guice module that automatically injects configuration parameters passed
through the deploy command to matching `@Named` parameters.
 
@@ -141,7 +143,7 @@ Both application and platform parameters can be overriden. For instance,
specify
 ## File-based configuration
 
 Instead of specifying node parameters inline, you may refer to a file with the '@' notation:
-./s4 deploy @/path/to/config/file
+`./s4 deploy @/path/to/config/file`
 With contents of the referenced file like:
 
 	-s4r=uri/to/app.s4r

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/dev_tips.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/dev_tips.md b/website/content/doc/0.6.0/dev_tips.md
index 8751a3a..a5b3e36 100644
--- a/website/content/doc/0.6.0/dev_tips.md
+++ b/website/content/doc/0.6.0/dev_tips.md
@@ -2,10 +2,10 @@
 title: Development tips
 ---
 
-Here are a few tips to ease the development of S4 applications.
+> Here are a few tips to ease the development of S4 applications.
 
 
-### Import an S4 project into your IDE
+# Import an S4 project into your IDE
 
 You can run `gradlew eclipse` or `gradlew idea` at the root of your S4 application directory.
Then simply import the project into eclipse or intellij. You'll have both your application
classes _and_ S4 libraries imported to the classpath of the project.
 
@@ -18,28 +18,31 @@ In order to get the transitive dependencies of the platform included as
well, yo
 		./gradlew install -DskipTests
 * Then run `gradlew eclipse` or `gradlew idea`
 
+----
 
 
+# Start a local Zookeeper instance
 
-### Start a local Zookeeper instance
-
-* Use the default test configuration (2 clusters with following configs: `c=testCluster1:flp=12000:nbTasks=1`
and `c=testCluster2:flp=13000:nbTasks=1`)
+* Use the default test configuration (2 clusters with following configs: `-c=testCluster1:flp=12000:nbTasks=1`
and `-c=testCluster2:flp=13000:nbTasks=1`)
 
 		s4 zkServer -t
 * Start a Zookeeper instance with your custom configuration, e.g. with 1 partition:
 		
 		s4 zkServer -clusters=c=testCluster1:flp=12000:nbTasks=1
 
+----
 
-### Load an application in a new node directly from an IDE
+# Load an application in a new node directly from an IDE
 
 This allows to *skip the packaging phase!*
 
-A requirement is that you have both the application classes and the S4 classes in your classpath.
See above.
+Requirements:
+
+* application classes **and** S4 classes are in your classpath. See above.
+* application already configured in cluster (with the `-appClass` option, no need to package
the app)
 
-Then you just need to run the `org.apache.s4.core.Main` class and pass:
+Then just run the `org.apache.s4.core.S4Node` class and pass:
 
 * the cluster name: `-c=testCluster1`
-* the app class name: `-appClass=myAppClass`
 
-If you use a local Zookeeper instance, there is no need to specify the `-zk` option.
\ No newline at end of file
+If you use a local Zookeeper instance on the default port (2181), there is no need to specify
the `-zk` option.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/event_dispatch.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/event_dispatch.md b/website/content/doc/0.6.0/event_dispatch.md
index 7c12086..7dd9492 100644
--- a/website/content/doc/0.6.0/event_dispatch.md
+++ b/website/content/doc/0.6.0/event_dispatch.md
@@ -2,6 +2,8 @@
 title: Event dispatch
 ---
 
+> Exploring how events are dispatched to, from and within S4 nodes
+
 Events are dispatched according to their key.
 
 The key is identified in an `Event` through a `KeyFinder`.
@@ -97,10 +99,10 @@ S4 follows a staged event driven architecture and uses a pipeline of executors
t
 An executor is an object that executes tasks. It usually keeps a bounded queue of task items
and schedules their execution through a pool of threads.
 
 When processing queues are full, executors may adopt various possible behaviours, in particular,
in S4:
-	* **blocking**: the current thread simply waits until the queue is not full
-	* **shedding**: the current event is dropped
 
-**Throttling**, i.e. placing an upper bound on the maximum processing rate, is a convenient
way to avoid sending too many messages too fast.
+* **blocking**: the current thread simply waits until the queue is not full
+* **shedding**: the current event is dropped
+* **throttling**, i.e. placing an upper bound on the processing rate, is a convenient way
to avoid sending too many messages too fast.
 
 S4 provides various default implementations of these behaviours and you can also define your
own custom executors as appropriate.
 
@@ -116,7 +118,7 @@ The following picture illustrates the pipeline of executors.
 1. the message is passed to a deserializer executor
 	* this executor is loaded with the application, and therefore has access to application
classes, so that application specific messages can be deserialized
 	* by default it uses 1 thread and **blocks** if the processing queue is full
-1. the event (deserialized message) is dispatched to a stream executor 
+1. the event (the deserialized message) is dispatched to a stream executor 
 	* the stream executor is selected according to the stream information contained in the event
 	* by default it **blocks** if the processing queue is full
 1. the event is processed in the PE instance that matches the key of the event

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/index.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/index.md b/website/content/doc/0.6.0/index.md
index 521223c..27e72b6 100644
--- a/website/content/doc/0.6.0/index.md
+++ b/website/content/doc/0.6.0/index.md
@@ -12,13 +12,19 @@ S4 (Simple Scalable Streaming System) is a general-purpose, distributed,
scalabl
 
 * You may start with an [overview](overview) of the platform
 * Then follow a [walkthrough](walkthrough) for an hands-on introduction
+* Complement with a look at a [topic trending](twitter_trending_example) application using
Twitter data  
 * And [here](dev_tips) are some tips to ease the development process
 
 ## Configuration
 
-* How to [customize the platform and pass configuration parameters](configuration)
-* How to [add application dependencies](application_dependencies)
-* How to [dispatch events ](event_dispatch) within an application and between applications
+* [Customize the platform and pass configuration parameters](configuration)
+* Add [application dependencies](application_dependencies)
+* [Dispatch events ](event_dispatch) within an application and between applications
+
+## Running S4
+* [Commands](tools) for creating, running and managing applications
+* [Monitor](metrics) the system
+
 
 ## Features
 
@@ -26,5 +32,13 @@ S4 (Simple Scalable Streaming System) is a general-purpose, distributed,
scalabl
 
 ## Troubleshooting
 
+* [Recommended practices](recommended_practices)
 * Try the [FAQ](https://cwiki.apache.org/confluence/display/S4/FAQ)
-* Try the [mailing lists](https://cwiki.apache.org/S4/s4-apache-mailing-lists.html)
\ No newline at end of file
+* Try the [mailing lists](https://cwiki.apache.org/S4/s4-apache-mailing-lists.html)
+
+## Resources
+* Questions can be asked through the [mailing lists](https://cwiki.apache.org/confluence/display/S4/S4+Apache+mailing+lists)
+* The source code is available through [git](https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git),
[here](http://incubator.apache.org/s4/contrib/) are instructions for fetching the code.
+* A nice set of [slides](http://www.slideshare.net/leoneu/20111104-s4-overview) was used
for a presentation at Stanford in November 2011.
+* The driving ideas are detailed in a [conference publication](http://www.4lunas.org/pub/2010-s4.pdf)
from KDCloud'11
+* You can also watch the [video](http://vimeo.com/20489778) of a presentation given at LinkedIn.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/metrics.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/metrics.md b/website/content/doc/0.6.0/metrics.md
new file mode 100644
index 0000000..552df5d
--- /dev/null
+++ b/website/content/doc/0.6.0/metrics.md
@@ -0,0 +1,41 @@
+---
+title: Metrics
+---
+
+
+> S4 continuously collects runtime statistics. Let's see how to access these and add custom
ones.
+
+# Why?
+
+S4 aims at processing large quantities of events with low latency. In order to achieve this
goal, a key requirement is to be able to monitor system internals at runtime.
+
+# How?
+For that purpose, we include a system for gathering statistics about various parts of the
S4 platform.
+
+We rely on the [metrics](http://metrics.codahale.com) library, which offers an efficient
way to gather such information and relies on statistical techniques to minimize memory consumption.
+
+# What?
+
+By default, S4 instruments queues, caches, checkpointing, event reception and emission and
statistics are available for all of these components.
+
+You can also monitor your own PEs. Simply add new probes (`Meter`, `Gauge`, etc..) and report
interesting updates to them. There is nothing else to do, these custom metrics will be reported
along with the S4 metrics, as explained next.
+
+# Where? 
+
+By default, metrics are exposed by each node through JMX.
+
+The `s4.metrics.config` parameter enables periodic dumps of aggregated statistics to the
**console** or to **files** in csv format. This parameter is specified as an application parameter,
and must match the following regular expression: 
+
+	(csv:.+|console):(\d+):(DAYS|HOURS|MICROSECONDS|MILLISECONDS|MINUTES|NANOSECONDS|SECONDS)
+
+Examples:
+	
+	# dump metrics to csv files to /path/to/directory every 10 seconds
+	csv:file://path/to/directory:10:SECONDS
+	
+	# dump metrics to the console every minute
+	console:1:MINUTES
+	
+	
+
+Reporting to Ganglia or Graphite is not provided out of the box with S4, but it's quite easy
to add. You simply have to add the corresponding dependencies to your project and enable reporting
to these systems during the initialization of your application. See the [metrics](http://metrics.codahale.com)
documentation for more information.
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/overview.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/overview.md b/website/content/doc/0.6.0/overview.md
index 44eeb2e..3f44e8e 100644
--- a/website/content/doc/0.6.0/overview.md
+++ b/website/content/doc/0.6.0/overview.md
@@ -13,7 +13,7 @@ S4 0.5 focused on providing a functional complete refactoring.
 
 S4 0.6 builds on this basis and brings plenty of exciting features, in particular:
 
-* **performance improvements**: stream throughput improved by 1000 % (~200k messages / s
/ stream)
+* **performance improvements**: stream throughput improved by 1000 % (200k+ messages / s
/ stream)
 * improved [configurability](S4:Configuration - 0.6.0], for both the S4 platform and deployed
applications
 * **elasticity** and fine partition tuning, through an integration with Apache Helix
 
@@ -22,9 +22,10 @@ S4 0.6 builds on this basis and brings plenty of exciting features, in
particula
 
 **Flexible deployment**:
 
+* Application packages are standard jar files (suffixed `.s4r`)
 * By default keys are homogeneously sparsed over the cluster: helps balance the load, especially
for fine grained partitioning
 * S4 also provides fine control over the partitioning (with Apache Helix)
-* Features automatic rebalancing
+* Semi-automatic Rebalancing
 
 **Modular design**:
 
@@ -60,20 +61,21 @@ S4 0.6 builds on this basis and brings plenty of exciting features, in
particula
 **Platform**
 
 * S4 provides a runtime distributed platform that handles communication, scheduling and distribution
across containers.
-* Distributed containers are called *S4 nodes*
-* S4 nodes are deployed on *S4 clusters*
-* S4 clusters define named ensembles of S4 nodes, with a fixed size
-* The size of an S4 cluster corresponds to the number of logical *partitions* (sometimes
referred to as _tasks_)
+* Distributed containers are called **S4 nodes**
+* S4 nodes are deployed on **S4 clusters**
+* S4 clusters define named ensembles of S4 nodes
+	* by default, the size of the cluster is fixed
+	* the size of an S4 cluster corresponds to the number of logical **partitions** (sometimes
referred to as **tasks**)
+	* an ongoing integration with [Apache Helix](http://helix.apache.org) removes these limitations
and allows a variable number of nodes and a rebalancing of partitions
 
 **Applications**
 
-
 * Users develop applications and deploy them on S4 clusters
-* Applications are built from:
+* Applications are built as a graph of:
 	* **Processing elements** (PEs)
 	* **Streams** that interconnect PEs
 * PEs communicate asynchronously by sending **events** on streams.
-* Events are dispatched to nodes according to their key
+* Events are dispatched to nodes according to their **key**
 
 **External streams** are a special kind of stream that:
 
@@ -85,7 +87,6 @@ S4 0.6 builds on this basis and brings plenty of exciting features, in particula
 
 
 
-
 ## A hierarchical perspective on S4
 
 The following diagram sums-up the key concepts in a hierarchical fashion:
@@ -94,9 +95,4 @@ The following diagram sums-up the key concepts in a hierarchical fashion:
 
 # Where can I find more information?
 
-* [The website](http://incubator.apache.org/s4/) is a good starting point.
-* [The wiki](https://cwiki.apache.org/confluence/display/S4/) currently contains the most
up-to-date information: general information (this page), configuration, examples.
-* Questions can be asked through the [mailing lists](https://cwiki.apache.org/confluence/display/S4/S4+Apache+mailing+lists)
-* The source code is available throught [git](https://git-wip-us.apache.org/repos/asf?p=incubator-s4.git],
[here](http://incubator.apache.org/s4/contrib/) are instructions for fetching the code.
-* A nice set of [slides](http://www.slideshare.net/leoneu/20111104-s4-overview) was used
for a presentation at Stanford in November 2011.
-* The driving ideas are detailed in a [conference publication](http://www.4lunas.org/pub/2010-s4.pdf)
from KDCloud'11 (joint workshop with ICDM'11)
\ No newline at end of file
+See the [resources](resources) page.

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/recommended_practices.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/recommended_practices.md b/website/content/doc/0.6.0/recommended_practices.md
new file mode 100644
index 0000000..623e71d
--- /dev/null
+++ b/website/content/doc/0.6.0/recommended_practices.md
@@ -0,0 +1,11 @@
+---
+title: Recommended practices
+---
+
+
+# Do not reuse S4 events
+
+**S4 events are immutable, however immutability is not currently enforced.**
+Make sure you do not reuse incoming events and for instance simply update a field. Instead,
create a new event (you may extend the `Event` class and defined a copy constructor) with
the new field value.
+
+More information available in this [ticket](https://issues.apache.org/jira/browse/S4-104)
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/tools.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/tools.md b/website/content/doc/0.6.0/tools.md
new file mode 100644
index 0000000..04d94fb
--- /dev/null
+++ b/website/content/doc/0.6.0/tools.md
@@ -0,0 +1,47 @@
+---
+title: S4 commands
+---
+
+
+> List of commands
+
+S4 ships with a toolkit for creating, packaging, deploying and running applications.
+
+From the source distribution, these tools are built by running:
+
+	gradlew s4-tools:installApp
+
+This compiles the s4-tools subproject and generates shell scripts.
+
+
+# Available commands
+
+Here is the list of commands available with the `s4` tool. For each of these commands, the
comprehensive documentation of all parameters is shown by specifying the `-help` option.
+
+Syntax: `s4 <command> <options>`
+
+|---
+| Purpose | Description | Command 
+|-|-|-
+| Create a new application | Create a bootstrap project skeleton | `newApp`
+| Start a ZooKeeper server instance | Useful for testing | `zkServer`
+| Define an S4 cluster | Specify cluster size and initial ports for listening sockets | `newCluster`
+| Package an application | S4R archive to be deployed on S4 nodes |  `s4r` |
+| Deploy/configure an application | Specifies application and platform configuration | `deploy`
+| Start an S4 node | S4 node bootstrap process, connects to the cluster manager and fetches
app and platform configuration, as specified through `deploy` command | `node`
+| Get information about S4 infrastructure | Shows status of S4 clusters, apps, nodes and
external streams | `status`
+|---
+
+
+In addition, for easy injection of data, the `adapter` command allows to start an node without
having to package and deploy the application.
+
+
+# Undeploying an application
+
+There is currently no specific command for undeploying S4 applications. The recomended way
for removing an application deployed on cluster C1 is to:
+
+* kill S4 nodes belonging to cluster C1
+* delete ZooKeeper subtree /s4/clusters/C1
+* redefine cluster C1
+* deploy new application
+* restart nodes for cluster C1 (this could be automated with some utility like [daemontools](http://cr.yp.to/daemontools.html))
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/twitter_trending_example.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/twitter_trending_example.md b/website/content/doc/0.6.0/twitter_trending_example.md
new file mode 100644
index 0000000..a870f68
--- /dev/null
+++ b/website/content/doc/0.6.0/twitter_trending_example.md
@@ -0,0 +1,77 @@
+---
+title: Twitter trending example
+---
+> The [walkthrough](../walkthrough) describes a very basic example; here is a more realistic
one
+
+# Twitter trending example
+
+Let's have a look at another application, that computes trendy Twitter topics by listening
to the spritzer stream from the Twitter API. This application was adapted from a previous
example in S4 0.3.
+
+## Overview
+
+This application is divided into:
+
+* twitter-counter , in test-apps/twitter-counter/ : extracts topics from tweets and maintains
a count of the most popular ones, periodically dumped to disk
+* twitter-adapter, in test-apps/twitter-adapter/ : listens to the feed from Twitter, converts
status text into S4 events, and passes them to the "RawStatus" stream
+
+Have a look at the code in these directories. You'll note that:
+
+* the build.gradle file must be tailored to include new dependencies (twitter4j libs in twitter-adapter)
+* events are partitioned through various keys
+
+## Run it!
+
+> Note: You need a twitter4j.properties file in your home directory with the following
content (debug is optional):
+
+	debug=true
+	user=<a twitter username>
+	password=<matching password>
+
+* Start a Zookeeper instance. From the S4 base directory, do:
+	
+		./s4 zkServer
+
+* Define 2 clusters : 1 for deploying the twitter-counter app, and 1 for the adapter app
+
+		./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000; ./s4 newCluster -c=cluster2 -nbTasks=1
-flp=13000
+		
+* Start 2 app nodes (you may want to start each node in a separate console) :
+
+		./s4 node -c=cluster1
+		./s4 node -c=cluster1
+
+* Start 1 node for the adapter app:
+
+		./s4 node -c=cluster2 -p=s4.adapter.output.stream=RawStatus
+		
+* Deploy twitter-counter app (you may also first build the s4r then publish it, as described
in the previous section)
+
+		./s4 deploy -appName=twitter-counter -c=cluster1 -b=`pwd`/test-apps/twitter-counter/build.gradle
+		
+* Deploy twitter-adapter app. In this example, we don't directly specify the app class of
the adapter, we use the deployment approach for apps (remember, the adapter is also an app).
+
+		./s4 deploy -appName=twitter-adapter -c=cluster2 -b=`pwd`/test-apps/twitter-adapter/build.gradle
+		
+* Observe the current 10 most popular topics in file TopNTopics.txt. The file gets updated
at regular intervals, and only outputs topics with a minimum of 10 occurrences, so you may
have to wait a little before the file is updated :
+
+		tail -f TopNTopics.txt
+		
+* You may also check the status of the S4 node with:
+
+		./s4 status
+
+----
+
+# What next?
+
+You have now seen some basics applications, and you know how to run them, and how to get
events into the system. You may now try to code your own apps with your own data.
+
+[This page](../application_dependencies) will help for specifying your own dependencies.
+
+There are more parameters available for the scripts (typing the name of the task will list
the options). In particular, if you want distributed deployments, you'll need to pass the
Zookeeper connection strings when you start the nodes.
+
+You may also customize the communication and the core layers of S4 by tweaking configuration
files and modules.
+
+Last, the [javadoc](http://people.apache.org/~mmorel/apache-s4-0.6.0-incubating-doc/javadoc/)
will help you when writing applications.
+
+We hope this will help you start rapidly, and remember: we're happy to help!
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/doc/0.6.0/walkthrough.md
----------------------------------------------------------------------
diff --git a/website/content/doc/0.6.0/walkthrough.md b/website/content/doc/0.6.0/walkthrough.md
index 437f580..f3c72c1 100644
--- a/website/content/doc/0.6.0/walkthrough.md
+++ b/website/content/doc/0.6.0/walkthrough.md
@@ -18,7 +18,7 @@ There are 2 ways:
 
 If you get the binary release, s4 scripts are immediately available. Otherwise you must build
the project:
 
-* Compile and install S4 in the local maven repository: (you can also let the tests run without
the -DskipTests option)
+* Compile and install S4 in the local maven repository: (you can also let the tests run without
the `-DskipTests` option)
 
 		S4:incubator-s4$ ./gradlew install -DskipTests
 		.... verbose logs ...
@@ -56,7 +56,7 @@ S4 provides some scripts in order to simplify development and testing of
applica
 
 The src/main/java/hello directory contains 3 files:
 
-* HelloPE.java : a very simple PE that simply prints the name contained in incoming events
+* `HelloPE.java` : a very simple PE that simply prints the name contained in incoming events
 
 ~~~
 
@@ -64,50 +64,51 @@ The src/main/java/hello directory contains 3 files:
 
 // ProcessingElement provides integration with the S4 platform
 public class HelloPE extends ProcessingElement {
- // you should define downstream streams here and inject them in the app definition
-
- // PEs can maintain some state
- boolean seen = false;
-
- // This method is called upon a new Event on an incoming stream.
- // You may overload it for handling instances of your own specialized subclasses of Event
- public void onEvent(Event event) {
-     System.out.println("Hello " + (seen ? "again " : "") + event.get("name") + "!");
-     seen = true;
- }
+	// you should define downstream streams here and inject them in the app definition
+	
+	// PEs can maintain some state
+	boolean seen = false;
+	
+	// This method is called upon a new Event on an incoming stream.
+	// You may overload it for handling instances of your own specialized subclasses of Event
+	public void onEvent(Event event) {
+	    System.out.println("Hello " + (seen ? "again " : "") + event.get("name") + "!");
+	    seen = true;
+	}
 // skipped remaining methods
 ~~~
 
 * HelloApp.java: defines a simple application: exposes an input stream ("names"), connected
to the HelloPE. See [the event dispatch configuration page](event_dispatch) for more information
about how events are dispatched.
-	// App parent class provides integration with the S4 platform
-	public class HelloApp extends App {
-	
+		
 ~~~
 
 #!java
+
+// App parent class provides integration with the S4 platform
+public class HelloApp extends App {
 		
-@Override
-protected void onStart() {
-}
-
-@Override
-protected void onInit() {
-    // That's where we define PEs and streams
-    // create a prototype
-    HelloPE helloPE = createPE(HelloPE.class);
-    // Create a stream that listens to the "lines" stream and passes events to the helloPE
instance.
-    createInputStream("names", new KeyFinder<Event>() {
-            // the KeyFinder is used to identify keys
-        @Override
-        public List<String> get(Event event) {
-            return Arrays.asList(new String[] { event.get("name") });
-        }
-    }, helloPE);
-}
+	@Override
+	protected void onStart() {
+	}
+	
+	@Override
+	protected void onInit() {
+	    // That's where we define PEs and streams
+	    // create a prototype
+	    HelloPE helloPE = createPE(HelloPE.class);
+	    // Create a stream that listens to the "lines" stream and passes events to the helloPE
instance.
+	    createInputStream("names", new KeyFinder<Event>() {
+	            // the KeyFinder is used to identify keys
+	        @Override
+	        public List<String> get(Event event) {
+	            return Arrays.asList(new String[] { event.get("name") });
+	        }
+	    }, helloPE);
+	}
 // skipped remaining methods
 ~~~
 
-* HelloInputAdapter is a simple adapter that reads character lines from a socket, converts
them into events, and sends the events to interested S4 apps, through the "names" stream
+* `HelloInputAdapter` is a simple adapter that reads character lines from a socket, converts
them into events, and sends the events to interested S4 apps, through the "names" stream
 
 ## Run the sample app
 
@@ -121,7 +122,7 @@ In order to run an S4 application, you need :
 
 * In 2 steps:
 
-	1. Start a Zookeeper server instance (-clean option removes previous ZooKeeper data, if
any):
+	1. Start a Zookeeper server instance (`-clean` option removes previous ZooKeeper data, if
any):
 
 
 	
@@ -254,75 +255,7 @@ The following figures illustrate the various steps we have taken. The
local file
 ----
 
 
-# Run the Twitter trending example
-
-Let's have a look at another application, that computes trendy Twitter topics by listening
to the spritzer stream from the Twitter API. This application was adapted from a previous
example in S4 0.3.
-
-## Overview
-
-This application is divided into:
-
-* twitter-counter , in test-apps/twitter-counter/ : extracts topics from tweets and maintains
a count of the most popular ones, periodically dumped to disk
-* twitter-adapter, in test-apps/twitter-adapter/ : listens to the feed from Twitter, converts
status text into S4 events, and passes them to the "RawStatus" stream
-
-Have a look at the code in these directories. You'll note that:
-
-* the build.gradle file must be tailored to include new dependencies (twitter4j libs in twitter-adapter)
-* events are partitioned through various keys
-
-## Run it!
-
-> Note: You need a twitter4j.properties file in your home directory with the following
content (debug is optional):
-
-	debug=true
-	user=<a twitter username>
-	password=<matching password>
-
-* Start a Zookeeper instance. From the S4 base directory, do:
-	
-		./s4 zkServer
-
-* Define 2 clusters : 1 for deploying the twitter-counter app, and 1 for the adapter app
-
-		./s4 newCluster -c=cluster1 -nbTasks=2 -flp=12000; ./s4 newCluster -c=cluster2 -nbTasks=1
-flp=13000
-		
-* Start 2 app nodes (you may want to start each node in a separate console) :
-
-		./s4 node -c=cluster1
-		./s4 node -c=cluster1
-
-* Start 1 node for the adapter app:
-
-		./s4 node -c=cluster2 -p=s4.adapter.output.stream=RawStatus
-		
-* Deploy twitter-counter app (you may also first build the s4r then publish it, as described
in the previous section)
-
-		./s4 deploy -appName=twitter-counter -c=cluster1 -b=`pwd`/test-apps/twitter-counter/build.gradle
-		
-* Deploy twitter-adapter app. In this example, we don't directly specify the app class of
the adapter, we use the deployment approach for apps (remember, the adapter is also an app).
-
-		./s4 deploy -appName=twitter-adapter -c=cluster2 -b=`pwd`/test-apps/twitter-adapter/build.gradle
-		
-* Observe the current 10 most popular topics in file TopNTopics.txt. The file gets updated
at regular intervals, and only outputs topics with a minimum of 10 occurrences, so you may
have to wait a little before the file is updated :
-
-		tail -f TopNTopics.txt
-		
-* You may also check the status of the S4 node with:
-
-		./s4 status
-
-----
 
 # What next?
 
-You have now seen some basics applications, and you know how to run them, and how to get
events into the system. You may now try to code your own apps with your own data.
-
-[This page](../application_dependencies) will help for specifying your own dependencies.
-
-There are more parameters available for the scripts (typing the name of the task will list
the options). In particular, if you want distributed deployments, you'll need to pass the
Zookeeper connection strings when you start the nodes.
-
-You may also customize the communication and the core layers of S4 by tweaking configuration
files and modules.
-
-Last, the [javadoc](http://people.apache.org/~mmorel/apache-s4-0.6.0-incubating-doc/javadoc/)
will help you when writing applications.
-
-We hope this will help you start rapidly, and remember: we're happy to help!
\ No newline at end of file
+We suggest you take a look at a more comprehensive [example application](../twitter_trending_example).
\ No newline at end of file

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/style/pygmentize.scss
----------------------------------------------------------------------
diff --git a/website/content/style/pygmentize.scss b/website/content/style/pygmentize.scss
index c60e12b..bf04b2f 100644
--- a/website/content/style/pygmentize.scss
+++ b/website/content/style/pygmentize.scss
@@ -63,9 +63,9 @@
 
 pre {
     counter-reset: line-numbering;
-    border: solid 3px #d9d9d9;
+/*     border: solid 3px #d9d9d9; */
     border-radius: 5px;
-    background: #fff;
+/*     background: #fff; */
     padding: 5px;
     line-height: 23px;
     margin-bottom: 30px;
@@ -74,3 +74,12 @@ pre {
     word-break: inherit;
     word-wrap: inherit;
 }
+
+code {
+    border-radius: 5px;
+    padding: 2px;
+    margin-bottom: 30px;
+    white-space: pre,post;
+    word-break: inherit;
+    word-wrap: inherit;
+}

http://git-wip-us.apache.org/repos/asf/incubator-s4/blob/cb3c58aa/website/content/style/style.scss
----------------------------------------------------------------------
diff --git a/website/content/style/style.scss b/website/content/style/style.scss
index e2b3649..1eddc8f 100644
--- a/website/content/style/style.scss
+++ b/website/content/style/style.scss
@@ -7,6 +7,7 @@ $sec_header: #7E2217;
 $grad_start: #fafafa;
 $grad_end: #dfdfdf;
 $dark_bg: #dfdfdf;
+$light_bg: #EFEFEF;
 $font-color: #7E2217;
 
 body {
@@ -14,7 +15,10 @@ body {
     font-size:		90%;
     background-color: $bg_body;
     color: #333;	
-	
+		code,pre {
+			background-color:$light_bg;
+			font-color:black;
+			}
 }
 
 a {


Mime
View raw message