Hi, Please allow me to shed light on this. *A few preliminaries:* *1)* A *connect feed* statement (connect feed x to dataset y) is rewritten as per the following template: *for $x in feed-collect(params...) * * return $x * OR * for $x in feed-collect(params...) * * return f($x )* *// where f() represents an AQL/Java function that needs to be applied to each records prior to persistence. * Connect feed statement can be regarded as "syntactic sugar". The actual insert statement as compiled produced an ingestion pipeline with all the index (including secondary) insert operators. *2)* When building the flow of data from external source to the target dataset, one has two options: a) use the feed adaptor to retrieve records from external source. b) use an existing active feed to gain access to the records flowing within the AsterixDB system as part of an ongoing Hyracks job and further process them to redirect into separate target indexes. *3) *End-user is not expected to know and write an optimal insert statement to figure out the best way to produce records that define a feed. End-user is only exposed a simplistic connect feed statement. Read on... A connect feed statement in analyzed to ascertain if any existing flow of records across a parent feed could be used. This required a look up into the in-memory data-structure maintained by the FeedLifecycleListener (a thread in CC). If any parent feed is present, then the goal is to **subscribe** to it rather than re-building the flow from the external source via another channel. Depending on which ancestor of the given feed is active, there could be additional pre-processing required - here I am referring to all the UDFs associated with the parent feed(s) up till the active ancestor. Information on these UDFs is obtained from a look of the Metadata. Once the best way to build the ingestion pipeline for a given connect feed statement has been determined, the request to received data from the ancestor feed, possibly apply a sequence of UDFs and direct the output to a target dataset is expressed as a **subscription** request - that is a *SubscribeFeedStatement*. This statement is not exposed to the end user - it doesnt even have a syntax, but it contains all the required info to build the AQL (as per the template) described in (1) from list of preliminaries above. The resulting AQL has the right parameters for the feed-collect internal function. These parameters capture the parent feed, and the specific locations where the operators are running so that the pipeline for the feed being constructed can be corrected located/scheduled on the cluster so that data may subsequently flow in different directions along multiple pipelines in a concurrent manner. The SubscribedFeed statement is an internal statement that builds the right AQL counterpart of the simplistic vanilla connect feed statement. It can be regarded as an intermediate representation of a connect feed statement; note that the connect feed statement is not understood by the compiler neither is the SubscribeFeedStatement. It is the AQL translation of the SubscribeFeedStatement that is actually an insert statement (refer to the template from preliminary (1)) that is understood by the compiler to produce the right DAG with right set of index insert operators downstream and the right locations for the intake operators upstream to receive the feed records or subscribe to the records flowing in another pipeline. Details on how the statement re-writing and its translation into AQL is done is further described in detail in my thesis I hope I have answered the question as to why SubscribeFeedStatement is not exposed to the end-user? why it requires a Metadata look up? and why is the original connect feed statement is handed to the compiler again (in form an an (insert) AQL) . In case I did not clarify certain aspects, which are also not elaborated enough in the thesis, please ping me. I shall do my best to respond and address the concerns at the earliest . Regards, Raman On Wed, Sep 30, 2015 at 3:34 AM, Till Westmann wrote: > Yes, the parser should just care about syntax. > Semantic checks should be done in the translator or later. > > Cheers, > Till > > > On 29 Sep 2015, at 14:59, Yingyi Bu wrote: > > In ConnectedFeedStatement, a similar piece of code has been commented out. >> IMO, the AQL parser should just get an AST from a query, but not access >> the >> metadata nor do any real work.. >> >> Best, >> Yingyi >> >> On Tue, Sep 29, 2015 at 2:46 PM, Ian Maxon wrote: >> >> I always wondered where that plan's input came from in the CC logs. It >>> gets >>> generated during a connect statement as well. >>> >>> On Tue, Sep 29, 2015 at 2:04 PM, Mike Carey wrote: >>> >>> I wasn't aware of that statement...! >>>> On Sep 29, 2015 12:17 PM, "Yingyi Bu" wrote: >>>> >>>> All right, I will open an issue for that. >>>>> Thanks! >>>>> >>>>> Best, >>>>> Yingyi >>>>> >>>>> On Tue, Sep 29, 2015 at 12:11 PM, abdullah alamoudi < >>>>> >>>> bamousaa@gmail.com> >>> >>>> wrote: >>>>> >>>>> I am not aware of any special reason and it definitely looks a bit >>>>>> >>>>> too >>> >>>> hackish to me. >>>>>> I would say that it needs to be fixed but I don't think it is a >>>>>> >>>>> priority >>>> >>>>> at this point. Anyway, it is a private command that is not exposed to >>>>>> >>>>> the >>>> >>>>> end user. >>>>>> >>>>>> I would like to know if there is a reason as well. >>>>>> ~Abdullah. >>>>>> >>>>>> Amoudi, Abdullah. >>>>>> >>>>>> On Tue, Sep 29, 2015 at 9:53 PM, Yingyi Bu >>>>>> >>>>> wrote: >>> >>>> >>>>>> Does anyone know why SubscribeFeedStatement in asterix-aql needs to >>>>>>> access the MetadataManager to form yet-another AQL insert query >>>>>>> >>>>>> inside >>> >>>> it >>>>> >>>>>> and hand that to the AQLParser again? >>>>>>> >>>>>>> It seems a bit hackish to me. Is there a particular reason that it >>>>>>> >>>>>> must >>>> >>>>> be done this way? >>>>>>> Thanks! >>>>>>> >>>>>>> Best, >>>>>>> Yingyi >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> -- Raman