Proposal Review: Towards Effective Retrieval of Software Design Specification
1530 until 1630
Meeting Room 7th Floor
Dr. Wan Mohd Nazmee Wan Zainon
Dr. Hamza Onornoiza Salami
Software reuse is the use of existing software artifacts to build new system rather than building from scratch. It has the potential in improving the software quality, the effective use of specialist, reduced the process risk and reduces the overall cost of software development. Despites, its clear benefits it suffers from few problems though. The software reuse problems includes lack of tool supports, increase in maintenance cost, not invented here syndrome, difficulty in maintaining reusable library, cost of retrieving and adapting a reusable software artifacts into a new projects.
A step towards the solution to these problems is the reuse of early-stage software products artifacts. Because in early-stage reuse once a matched is found all related later-stage artifacts can also be reuse. Early-stage software artifacts are domain models, requirement specification and software design while later-stage includes source code test case and software documentation. Typically these artifacts are stored in a components library or repository. As this repository increases in size, there is corresponding rise in the retrieval time which can lessen or even outweigh the expected savings in development time. Because one of the anticipated gains in software reuse is reduced the development time.
This research intends to propose a fast way of identifying subset of repository artifacts that are potentially similar to the user query prior to retrieval stage. The shortlisted repository artifacts are then compared with the user query in a subsequent computationally demanding retrieval stage to find their actual degree of similarity with the user query. This technique will leads to significant reduction in retrieval time, especially when the retrieval stage is time consuming.
Proposal Review: Impact of Technological, Organizational and Environmental Factors on Small and Medium Enterprises to Intention to Adopt e-Commerce in Jordan with the Moderating Role of Organizational Competency and External ICT Support
Research Review: An Approach for Automatic Topic Detection and Recognition Using Term Frequency-Inverse Document Frequency and K-Nearest Neighbour Algorithms
1000 until 1100
Meeting Room 7th Floor
Ammar Ismael Kadhim
Assoc. Prof. Dr. Cheah Yu-N
Dr. Nurul Hashimah Ahamed Hassain Malim
Topic detection and recognition provide a lot of significance for social network users due to their vital role in user trends analysis. Moreover, every new wave of outbreak reveals rapid evolution in terms of sophistication, detection, speed, and damage through searching process to detect the various topics. Unfortunately, the current topic detection research has not seen the same pace of advancement. Most of topic detection are unable to deal intelligently with different topics such Politics, Education, Health, Marketing, Music, News & Media, Recreation & Sports, Computers & Technology, Pets, Food, Family and other.
In this study, a comprehension for topic detection and their contents is identified. The main objective of the study is to detect and recognize the topics into different categories that are classified into one or more subjects that was predefined of classes based on their contents using five stages. Several algorithms are used for applying and implementing for each stage.
The first stage of topic detection is to prepare text documents by removing the non-informative features; the second stage is to find the statistical language modeling which involves word co-occurrence, statistical bag of words and mutual information; the third stage is to reduce high dimension to lower dimension which involves features extraction using Boolean and TF-IDF (term frequency-inverse document frequency) weighting methods and features selection using singular values decomposition (SVD) and the cosine score similarity; finally machine leaning techniques consist of two different methods as: unsupervised machine learning is used to collect data into one or more clusters using k-means algorithm and supervised machine learning is used to classify the topics into different topics using k-nearest neighbors algorithm. Four sets of dataset with varying sizes of documents were used in this study. The first set with the Reuters-21578 text categorization test collection, the second set with BBC news and BBC sport and data collection on Twitter using application-programming interface (API). The experimental results show that the accuracy for BBC News and BBC Sport are 94.67 and 95.00 respectively by using k-means. While the accuracy approximately is 96.2 by using KNN for the same dataset. These results indicate that the supervised machine learning presents higher efficiency and accuracy in revealing topic detection and their contents.
Research Review: Enhanced Approaches for Sindhi and Multi-Script Optical Character Recognition
1000 until 1130
Meeting Room 7th Floor
Prof. Dr. Abdullah Zawawi Hj Talib
Optical Character Recognition (OCR) system which is an integral part of machine vision and image processing, biomedical imaging, language processing and speech recognition poses many challenging problems. The non-cursive OCR systems have achieved perfection whereas the OCRs for cursive languages still need attention. OCR work on Sindhi OCR is still in infancy and there is no complete OCR for the language which is based on the Arabic script and spoken by over 60 million people in Pakistan and other parts of the world. There was no text image database available for testing and training of the Sindhi characters and most text image databases are created for only one single script. In this research, a study is made on the challenges posed by the Sindhi script with respect to its OCR. A huge database containing 4 billion words and 15 billion characters is created for testing and training of Sindhi script with the help of custom built software together with a multi-script database for multiple scripts suitable for Sindhi script and multi-script OCR on a single platform comprising multi-billion words and characters for 84 languages. Sindhi has the largest extension of the original Arabic script among the languages adopting the Arabic script. Therefore, in this research, an enhanced segmentation algorithm and feature extraction algorithms are proposed for Sindhi which can also be used for other scripts. The segmentation algorithm based on energy level produce good results and also segments other script characters. Zoning based feature extraction applied as individual and combined approach for extracting features from Sindhi characters and other scripts. An integrated Sindhi OCR and multi-script OCR is developed in this research. The enhanced segmentation algorithm and enhanced feature extraction algorithm produced good results on Sindhi and multi-script characters. The integrated OCR for Sindhi obtained 89% of recognition rate and 90.33% to 99.90% recognition rate for some of other scripts tested on selected subset of the database created with the custom built software. A working software for recognizing some of the languages and scripts which can be easily extended to more scripts recognition. The database size for Sindhi and other scripts can be increased easily by adding more data and creating images for testing and training of these scripts.