Research Review: Utilizing Website Structure, Content and Ontologies for Web Usage Mining Preprocessing
1500 until 1630
Meeting Room 7th Floor
Mohammad Hani Nayel Al-Majali
Assoc. Prof. Dr. Cheah Yu-N
Web Usage Mining (WUM) is the process of incorporation data mining techniques over user accesses to web servers in order to discover useful patterns about users’ behavior and enhance the web surfing experience. The former technique has long been established to be significant in today’s social-web world, in which a huge number of content is being added daily to the corpus of the world wide web repository. Users’ behavior over the internet is now shaping the structure of the internet corpus, where accuracy of related content’s retrieval is becoming a troubling issue for search engines and websites’ engineers. Among the full processing of WUM, a large gap exists in investigating the tasks involved in the data preprocessing phase. Usage mining is replete with preprocessing techniques, yet most of which still process the log files based on its limited content. The need for understanding proper associations for users’ browsing behavior facilitated intensive research in the field of semantic-based web mining. Most WUM existing preprocessing techniques produce only “user Sessions”. This study investigates the current state-of-the-art WUM preprocessing techniques to highlight key concepts behind advanced techniques and present their advantages in an enhanced model of log file preprocessing able to produce enriched output in the form of “user episodes”. This research aims to include ontologies as an interpretation layer within the WUM system processes to enhance preprocessing outcomes and consider the content of web documents for episode identification and web structure for an enhanced quality of output episodes. The proposed model consists of four major tasks to mine a website’s access logs. The first is structure data preprocessing which crawls the website and presents the structure in the form of a matrix using Petri Nets maps. Discovery of reachability paths between webpages and pageview identification are achieved through the resulting matrix. The second is content data preprocessing which scrapes the documents’ content and associates it to a domain-specific ontology to find relatedness between documents. Ontology concepts are extended by labels representing hypernyms, hyponyms and synonyms from a lexical thesaurus, and with semantically related concepts from within the ontology. The third task involves preparing the log through fusion, cleaning and user-session identification before identifying user-episodes by combining all previous data. The last task is to integrate all preprocessed data together for other WUM phases. The proposed model was tested against a real-life dataset. The results are promising, confirming that the developed model successfully extracted user-episodes in a mineable form. Future work should include the integration of the proposed model into other WUM tools. However, these techniques require further investigation in terms of associating documents’ semantics into different mining categories. In addition, the proposed model could be extended to support episode discovery based on the purposes behind the mining process.
Research Seminar Series: Multiple, Object-oriented Segmentation Methods of Mammalian Cell Tomograms
1000 until 1100
Meeting Room 7th Floor
Nur Intan Raihana Ruhaiyem
Electron tomography (ET) is a powerful tool for quantitatively mapping the complex 3D sub-cellular structures of cells. High accuracy segmentation results are great value to cell biologists. They facilitate the comparison processes across statistically significant datasets of properties or structure information like number of granules, mitochondrial size/volume and the size/number of cisternae of the Golgi apparatus. The value of this data is significantly improved if the cellular compartments (e.g. organelles, particles) are accurately segmented and annotated. Manual segmentation is reasonably accurate but the process might be too slow – since the accuracy is highly dependent on the training of the person conducting the task. Automated segmentation therefore opens a number of opportunities. But these automated methods must be fast and capable of accurately delineating all contours of interest, ideally at organelle and molecular level – where many of which were reportedly not successful on ET datasets. Semi-automated approaches however have substantially allowed wider scope in resulting maintained cellular membrane tracing quality and accuracy and providing improved segmentation time. These reasons have motivated the development of a pipeline – semi-automated cellular tomogram segmentation (CTS) workflow – that will find the best settings of chosen combination methods for high resolution tomogram segmentation specific to the intrinsic properties of the image volume being processed. The study also introduced a set of tools – that allow for the first time the segmentation of organelles of interest with similar image properties done in sub-groups manner – the image categorisation technique and sets of scoring objectives for different organelle sub-groups. These enable timely segmentation of sub-cellular compartments and expedite the process of optimising method settings not currently afforded by any other technique.
Research Review: Enhanced Reinforcement Learning Models and Their Application in Brain Fiber Tracking Problem
0900 until 1030
Meeting Room 7th Floor
Prof. Dr. Mandava Rajeswari
Incremental temporal difference (TD) learning models offer powerful techniques for “value estimation” in sequential selections tasks in the areas of machine learning, adaptive control, decision support systems and industrial/autonomous robotics. Since, these models operate based-on the Gradient-descent (GD) learning, they are presented with some limitations especially on their step-size settings may cause of the models get more sensitive on “type of observations” and “parameter settings”. These limitations are more pronounced in on-line applications where the models are expected to be “adaptive” under non-stationary observations. It means that, inaccurate setting of the sensitive parameters as well as changing their observations characteristic may degrade the “convergence quality” of these Gradient TD algorithms. These issues indicate a gap; the existing TD models are not adaptive. It means that, there is the “parameter dependency” problem in the incremental TD algorithms in reinforcement learning (RL). Consequently, presenting a set of enhanced models that eliminate or minimize this parameter dependency is desirable.
For this purpose, a new class including some TD learning models in RL is presented. These models are governed by steepest descent (SD) optimization approach. The major focus is on the optimal computation of the step-size in incremental TD learning. Experimental results indicate that, proposed models “converge faster” than the existing similar models. This improvement, according to each model, is about 40-70%, while these new models maintain the “original linear complexity”. Besides, they “do not depend on” their step-size and initial parameter settings. These indicating the presented models are adaptive and may fill the gap.
Presenting RL-based brain white matter fiber tracking model is the second purpose of this study. The problem is a “sequential selection task” under “uncertainty” condition which is the target of the RL approach. Tractography processing plays an important role in pre/post brain surgery studies and traumatic brain injury assessments. This process still suffers from “long processing time” while streamline models, with good response time, fail to precisely reveal the brain fiber profiles in uncertainty areas. These were reasons which motivated us to apply RL approaches into the brain fiber tracking problem. Experimental results both on artificial and real dataset indicated considerable improvement in fiber tracking processing, especially in tissues that were challenging to other fiber tracking techniques.
Research Seminar Series: I Feel You: State-of-the-art in Emotion Modelling
1000 until 1100
Meeting Room 7th Floor
Dr. Syaheerah Lebai Lutfi
It is easy to get frustrated at machines, perhaps because they seem to be callous. By and large, the quality of human-computer interaction is affected due to the inability of the computerized agents to recognise and adapt to user emotional state (or sometimes, even personality). Now with the mass appeal of artificially-mediated communication, there has been an increasing need for machines to be socially and emotionally intelligent, that is, to infer
and adapt to their human interlocutors' emotions on the fly, in order to ascertain an affective, empathetic and naturalistic interaction. Recent studies has shown synthetic agents (be it a robot or a virtual agent) that are affect-sensitive have offer great advantages, from reducing user frustration to increasing learner’s engagement. These reasons have motivated the development of artificial agents towards including socio-emotional elements, turning them into affective and socially-sensitive interfaces.
This talk will discuss the state-of-art in Affective Computing including the effort towards culture-sensitive affective interfaces. To facililate listeners’ understanding, a sample of a recent study will be presented.
For more information, please visit www.syaheerah.com