tika-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Meiklejohn <markmeiklej...@yahoo.co.uk.INVALID>
Subject Re: [irds-l] GSOC2016 Sentiment Analysis
Date Fri, 01 Apr 2016 18:24:17 GMT
Hi, 
I would be interested to join your hangout please add - I am interested in your project and
I would like to contribute where possible.
I have a phd doctorate from the University of Strathclyde in the area of generating UML models
from natural language requirement speicifcations by means of NLP and semantic analysis.
mrkmeiklejohn@googlemail.com 

Sent from Yahoo Mail on Android 
 
  On Fri, 1 Apr, 2016 at 18:03, Christian Alan Mattmann<mattmann@usc.edu> wrote:   Hi
anyone is welcome to join the hangout and discuss:

https://hangouts.google.com/call/jmkibffexvcupkka37gxzw7gime

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California
Los Angeles, CA 90089 USA
Email: mattmann@usc.edu
WWW: http://irds.usc.edu/
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++




-----Original Message-----
From: <irds-l-request@usc.edu> on behalf of jpluser
<chris.a.mattmann@jpl.nasa.gov>
Reply-To: "irds-l@usc.edu" <irds-l@usc.edu>
Date: Tuesday, March 29, 2016 at 8:50 AM
To: "dev@opennlp.apache.org" <dev@opennlp.apache.org>, Mondher Bouazizi
<mondher.bouazizi@gmail.com>, Madhawa Kasun Gunasekara
<madhawa30@gmail.com>
Cc: "dev@tika.apache.org" <dev@tika.apache.org>, Information and Data
Science Group USC List <irds-l@usc.edu>
Subject: Re: [irds-l] GSOC2016 Sentiment Analysis

>Great that sound awesome Anthony. Friday at 10am PT it is. Please
>
>add chris.mattmann@gmail.com to your GHangout buddy list.
>
>
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>Chris Mattmann, Ph.D.
>
>Chief Architect
>
>Instrument Software and Science Data Systems Section (398)
>
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>
>Office: 168-519, Mailstop: 168-527
>
>Email: chris.a.mattmann@nasa.gov
>
>WWW:  http://sunset.usc.edu/~mattmann/
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>Director, Information Retrieval and Data Science Group (IRDS)
>
>Adjunct Associate Professor, Computer Science Department
>
>University of Southern California, Los Angeles, CA 90089 USA
>
>WWW: http://irds.usc.edu/
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>
>
>
>
>
>-----Original Message-----
>
>From: Anthony Beylerian <anthonybeylerian@hotmail.com>
>
>Reply-To: "dev@opennlp.apache.org" <dev@opennlp.apache.org>
>
>Date: Tuesday, March 29, 2016 at 8:32 AM
>
>To: "dev@opennlp.apache.org" <dev@opennlp.apache.org>, Mondher Bouazizi
>
><mondher.bouazizi@gmail.com>, Madhawa Kasun Gunasekara
>
><madhawa30@gmail.com>
>
>Cc: "dev@tika.apache.org" <dev@tika.apache.org>, Information and Data
>
>Science Group USC List <irds-L@mymaillists.usc.edu>
>
>Subject: RE: GSOC2016 Sentiment Analysis
>
>
>
>>Dear Chris,
>
>>
>
>>Thank you again for reviewing our proposals.
>
>>We are looking forward to working together on this.
>
>>
>
>>In our previous trials we have used an annotated corpus made through
>
>>crowdflower for testing, and would be happy to share.
>
>>Although relatively modest and noisy (~10k training ~8k testing ~20k
>
>>pattern extraction) we believe it was enough to demonstrate encouraging
>
>>performances.
>
>>From our side, we also have a Java implementation that we would like to
>
>>shape up for production, however I'm also comfortable with Python in case
>
>>we will need it.
>
>>
>
>>On the other hand, it sounds intriguing to use a cross-lingual corpus, we
>
>>would love to discuss it.
>
>>As for the hangout session, I have just checked with Mondher and the time
>
>>works for us.
>
>>
>
>>Best,
>
>>
>
>>Anthony
>
>>
>
>>
>
>>> From: chris.a.mattmann@jpl.nasa.gov
>
>>> To: mondher.bouazizi@gmail.com; madhawa30@gmail.com
>
>>> CC: anthonybeylerian@hotmail.com; dev@opennlp.apache.org;
>
>>>dev@tika.apache.org; irds-L@mymaillists.usc.edu
>
>>> Subject: Re: GSOC2016 Sentiment Analysis
>
>>> Date: Tue, 29 Mar 2016 13:57:11 +0000
>
>>> 
>
>>> I like both of your comments Mondher and Madhawa. My team at USC
>
>>> has been investigating the use of particular corpuses including
>
>>> Fisher Callhome so as to support sentiment analysis. We have been
>
>>> writing Java code outside of both OpenNLP and Tika but with the
>
>>> goal of integrating them into both. We have a mix of Java and
>
>>> Python code that we’d like to bring into both projects.
>
>>> 
>
>>> I’m reviewing the proposals you wrote now, but would it make sense
>
>>> to have a Google hangout this Friday, ~10am PT Los Angeles/time?
>
>>> 
>
>>> Cheers,
>
>>> Chris
>
>>> 
>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> Chris Mattmann, Ph.D.
>
>>> Chief Architect
>
>>> Instrument Software and Science Data Systems Section (398)
>
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>
>>> Office: 168-519, Mailstop: 168-527
>
>>> Email: chris.a.mattmann@nasa.gov
>
>>> WWW:  http://sunset.usc.edu/~mattmann/
>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> Director, Information Retrieval and Data Science Group (IRDS)
>
>>> Adjunct Associate Professor, Computer Science Department
>
>>> University of Southern California, Los Angeles, CA 90089 USA
>
>>> WWW: http://irds.usc.edu/
>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> 
>
>>> 
>
>>> 
>
>>> 
>
>>> 
>
>>> -----Original Message-----
>
>>> From: Mondher Bouazizi <mondher.bouazizi@gmail.com>
>
>>> Date: Monday, March 28, 2016 at 11:46 PM
>
>>> To: Madhawa Kasun Gunasekara <madhawa30@gmail.com>, jpluser
>
>>> <chris.a.mattmann@jpl.nasa.gov>
>
>>> Cc: Anthony Beylerian <anthonybeylerian@hotmail.com>,
>
>>> "dev@opennlp.apache.org" <dev@opennlp.apache.org>,
>>>"dev@tika.apache.org"
>
>>> <dev@tika.apache.org>, Information and Data Science Group USC List
>
>>> <irds-L@mymaillists.usc.edu>
>
>>> Subject: Re: GSOC2016 Sentiment Analysis
>
>>> 
>
>>> >Dear Madhawa,
>
>>> >
>
>>> >
>
>>> >Thank you for your interest in the proposals.
>
>>> >The current tasks we proposed refer to the classification and
>
>>> >quantification regardless of the topic.
>
>>> >This can be used in a larger context where the topic is not specified,
>
>>>or
>
>>> >not unique, in which case we will need to identify the topic(s).
>
>>> >Therefore, a topic detector would be a good idea to implement, in
>>>order
>
>>> >to complement this.
>
>>> >
>
>>> >
>
>>> >As for the Document Categorizer, it is a general purpose component
>>>with
>
>>> >basic features (n-gram, bag of words, etc.).
>
>>> >
>
>>> >It is basically used for the classification of texts into a set of
>
>>> >classes defined by the user, whether they are sentiment classes or
>
>>>other.
>
>>> >
>
>>> >However it doesn't perform well for this purpose.
>
>>> >
>
>>> >Furthermore, the sentiment analysis component would not just perform
>
>>>the
>
>>> >naive classification but also additional tasks (e.g., quantification)
>
>>>and
>
>>> >implement more specific and sophisticated approaches.
>
>>> >
>
>>> >
>
>>> >Please share your thoughts.
>
>>> >
>
>>> >
>
>>> >Mondher
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >On Tue, Mar 29, 2016 at 1:51 PM, Madhawa Kasun Gunasekara
>
>>> ><madhawa30@gmail.com> wrote:
>
>>> >
>
>>> >Hi Chris / Antony
>
>>> >
>
>>> >
>
>>> >yes I would like to work on this, This proposal address most of the
>
>>> >things in Sentiment analysis,
>
>>> >
>
>>> >AFAIK most of the people use OpenNLP Document Categorizer for
>>>Sentiment
>
>>> >Analysis, since there isn't a proper functionality to do sentiment
>
>>> >analysis in OpenNLP, This would be great if we can add this feature on
>
>>> >OpenNLP project, and also I would like to suggest
>
>>> > that we should able to detect the target object of the opinions from
>
>>> >this feature as well.
>
>>> >
>
>>> >
>
>>> >WDYT ??
>
>>> >
>
>>> >
>
>>> >
>
>>> >Thanks,
>
>>> >
>
>>> >Madhawa
>
>>> >
>
>>> >
>
>>> >Madhawa
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >On Tue, Mar 29, 2016 at 2:11 AM, Mattmann, Chris A (3980)
>
>>> ><chris.a.mattmann@jpl.nasa.gov> wrote:
>
>>> >
>
>>> >Dear Anthony,
>
>>> >
>
>>> >Great! These both sound like fantastic proposals and I’m happy
>
>>> >to be a mentor. Madhawa, would you like to join in on these
>
>>> >efforts?
>
>>> >
>
>>> >Cheers,
>
>>> >Chris
>
>>> >
>
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> >Chris Mattmann, Ph.D.
>
>>> >Chief Architect
>
>>> >Instrument Software and Science Data Systems Section (398)
>
>>> >NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>
>>> >Office: 168-519, Mailstop: 168-527
>
>>> >Email: chris.a.mattmann@nasa.gov
>
>>> >WWW:  
>
>>> >http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> >Director, Information Retrieval and Data Science Group (IRDS)
>
>>> >Adjunct Associate Professor, Computer Science Department
>
>>> >University of Southern California, Los Angeles, CA 90089 USA
>
>>> >WWW: http://irds.usc.edu/
>
>>> >++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >-----Original Message-----
>
>>> >From: Anthony Beylerian <anthonybeylerian@hotmail.com>
>
>>> >Date: Monday, March 28, 2016 at 11:48 AM
>
>>> >To: "dev@opennlp.apache.org" <dev@opennlp.apache.org>,
>
>>> >"mondher.bouazizi@gmail.com" <mondher.bouazizi@gmail.com>
>
>>> >Cc: Madhawa Kasun Gunasekara <madhawa30@gmail.com>, jpluser
>
>>> ><chris.a.mattmann@jpl.nasa.gov>
>
>>> >Subject: RE: GSOC2016 Sentiment Analysis
>
>>> >
>
>>> >>Dear Chris,
>
>>> >>
>
>>> >>Thank you for starting the discussion.
>
>>> >>We are glad there is an interest in a sentiment analysis component.
>
>>> >>
>
>>> >>My colleague Mondher posted the two JIRA issues related to Sentiment
>
>>> >>Analysis [1][2] as references for our proposals [3][4] for GSoC.
>
>>> >>In fact, we have been researching this topic at our university.
>
>>> >>We are hoping to participate this year and work on integrating both a
>
>>> >>sentiment classifier and a quantifier for the library.
>
>>> >>
>
>>> >>It would be nice to also have an interface with Tika, maybe we can
>
>>> >>collaborate ?
>
>>> >>We are also looking for mentors, in case someone is willing to
>>>support
>
>>> >>our proposals.
>
>>> >>
>
>>> >>Best,
>
>>> >>
>
>>> >>Anthony
>
>>> >>
>
>>> >>[1] 
>
>>> 
>>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_
>>>>jira_browse_OPENNLP-2D842&d=CwIGaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p
>>>>7CSfnc_gI&r=yx2teqanfzTNZNPHRjcs-trRTzDJuv_k48mVuVl6PvQ&m=USOdvFJYjOEzV
>>>>EehdzKBF4TdCmsSQteZOEeQDiwNPGU&s=ulrHzFNlfHp2U4XjcfhMo3FC-v6m1ZmA5iMlr8
>>>>JrEPM&e= 
>
>>> 
>>>><https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
>>>>_jira_browse_OPENNLP-2D842&d=CwIGaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8
>>>>p7CSfnc_gI&r=yx2teqanfzTNZNPHRjcs-trRTzDJuv_k48mVuVl6PvQ&m=USOdvFJYjOEz
>>>>VEehdzKBF4TdCmsSQteZOEeQDiwNPGU&s=ulrHzFNlfHp2U4XjcfhMo3FC-v6m1ZmA5iMlr
>>>>8JrEPM&e= >
>
>>> >>[2] 
>
>>> 
>>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_
>>>>jira_browse_OPENNLP-2D840&d=CwIGaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p
>>>>7CSfnc_gI&r=yx2teqanfzTNZNPHRjcs-trRTzDJuv_k48mVuVl6PvQ&m=USOdvFJYjOEzV
>>>>EehdzKBF4TdCmsSQteZOEeQDiwNPGU&s=_GSdASoJBicP8G3r0SzzymafICsC5sonfFVAQx
>>>>JTt0U&e= 
>
>>> 
>>>><https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org
>>>>_jira_browse_OPENNLP-2D840&d=CwIGaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8
>>>>p7CSfnc_gI&r=yx2teqanfzTNZNPHRjcs-trRTzDJuv_k48mVuVl6PvQ&m=USOdvFJYjOEz
>>>>VEehdzKBF4TdCmsSQteZOEeQDiwNPGU&s=_GSdASoJBicP8G3r0SzzymafICsC5sonfFVAQ
>>>>xJTt0U&e= >
>
>>> >>[3]
>
>>> 
>
>>>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_d
>>>>>ocument_d_1nVnwpmGaOnwHERXr55IClE4V87jUX2sva-2Dm&d=CwIGaQ&c=clK7kQUTWt
>>>>>AVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=yx2teqanfzTNZNPHRjcs-trRTzDJuv_k48
>>>>>mVuVl6PvQ&m=USOdvFJYjOEzVEehdzKBF4TdCmsSQteZOEeQDiwNPGU&s=P9bnUigyPcxL
>>>>>VzxFsuz2ttH9TFOAstbWVGrpPipPN1c&e=
>
>>>>>kg
>
>>> >>W
>
>>> >>nR8n0/edit?usp=sharing
>
>>> >>[4]
>
>>> 
>
>>>>>https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_d
>>>>>ocument_d_1x02II9W3rirtuSbx-5FsY8kOQZSgOp0SIKeIW&d=CwIGaQ&c=clK7kQUTWt
>>>>>AVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=yx2teqanfzTNZNPHRjcs-trRTzDJuv_k48
>>>>>mVuVl6PvQ&m=USOdvFJYjOEzVEehdzKBF4TdCmsSQteZOEeQDiwNPGU&s=eSoyYAd7MwJL
>>>>>pM3kaKl5lu5m-SkEdOaeJsBqtfa4gvs&e=
>
>>>>>TC
>
>>> >>X
>
>>> >>EOJvo/edit?usp=sharing
>
>>> >>
>
>>> >>> From: chris.a.mattmann@jpl.nasa.gov
>
>>> >>> To: nishant.k02@gmail.com
>
>>> >>> CC: dev@opennlp.apache.org;
>
>>> >madhawa30@gmail.com;
>
>>> >hmanjuna@usc.edu <mailto:hmanjuna@usc.edu>;
>
>>> >>>kamalaku@usc.edu
>
>>> >>> Subject: Re: GSOC2016 Sentiment Analysis
>
>>> >>> Date: Sun, 27 Mar 2016 19:34:24 +0000
>
>>> >>>
>
>>> >>> No problem - I just wanted to encourage discussion thank you for
>
>>> >>> your prompt and courteous replies.
>
>>> >>>
>
>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> >>> Chris Mattmann, Ph.D.
>
>>> >>> Chief Architect
>
>>> >>> Instrument Software and Science Data Systems Section (398)
>
>>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>
>>> >>> Office: 168-519, Mailstop: 168-527
>
>>> >>> Email: chris.a.mattmann@nasa.gov
>
>>> >>> WWW: 
>
>>> >http://sunset.usc.edu/~mattmann/ <http://sunset.usc.edu/~mattmann/>
>
>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> >>> Director, Information Retrieval and Data Science Group (IRDS)
>
>>> >>> Adjunct Associate Professor, Computer Science Department
>
>>> >>> University of Southern California, Los Angeles, CA 90089 USA
>
>>> >>> WWW: http://irds.usc.edu/
>
>>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>>> >>
>
>>> >>
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> >
>
>>> 
>
>>                         
>
>
>

  

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message