-
Essay / A Contextual Model for Resume Analysis Using Big Data Approach
Table of ContentsAbstractRelated WorkArchitecture of the Proposed SystemResume Processing ModuleResume Matching ModuleResultsConclusion and Future WorkSummaryConventional systems are flooded by non-resume features conventional methods of Big Data and offer researchers several opportunities to explore avenues coupled with Big Data research. The Semantic Web provides constructs for associating semantics or meaning with data and can be used to express and query data in the form of triples. Combining Semantic Web constructs with Big Data tools can lead to a scalable and powerful automated system. A research proposal to develop a contextual model for big data analysis using Advanced Analytics has already been published. The use case proposed to create the model addresses the CV analysis problem which consists of designing and developing a model to find the most appropriate fit between the criteria for describing the demand and individuals from a large pool of unstructured digital resources. Say no to plagiarism. . Get a tailor-made essay on “Why violent video games should not be banned”?Get the original essayThe research work presented here is an attempt to develop a preliminary prototype to realize the model. Knowledge representation using semantic technologies and search and matching based on concepts, rather than keywords, to identify relevant conceptual correspondence have been adopted in this work. 1. Introduction The term Big Data is now ubiquitous and the issues and challenges associated with the characteristics of Big Data, namely volume, velocity, variety, veracity, etc., need to be addressed. The taxonomy presented in classifies big data analytics based on the dimensions of time, techniques, and domain. The Big Data Analysis Pipeline, in particular, encourages researchers to present innovative thoughts and proposals to address concerns related to the data deluge, for the benefit of the community at large. Research in the field of Big Data has produced new methods of processing unstructured data which constitute a very important sector of Big Data. Web 2.0 technologies, being focused on content and manifestation, lack data reuse and information sharing between two or more data sources. This is where the Semantic Web or Web 3.0 comes in and provides data formats suitable for data linking and sharing. Linked Data, built on standard web technologies such as HTTP, RDF and URI, allows data to be published in a structured format on the Internet. It makes it possible to link and query data from different sources and present the information on the web in such a way that not only humans, but even computers/programs, can read and extract meaning from it. This facilitates communication between two programs without any human intervention. The Semantic Web uses a graph database to store data and associate meaning with it and has two main components: Resource Description Framework (RDF) and Ontology. RDF is a W3C standard that describes the properties and attributes of resources on the Web in the form of triples to allow machines to consume and exchange information. Ontology is a way to address the problem of heterogeneity at a schematic level by providing a set of terms and concepts that can be used in data modeling to share knowledge. The organizations areoften faced with the problem of finding the most suitable candidate to fill a vacant position and, at the same time, candidates face challenges in identifying the suitable job opportunity based on their knowledge, academic and professional qualifications and their experience. A CV is part of the semi-structured or unstructured data that is used by professionals to showcase their work experience, skills, qualifications, awards and achievements, etc. People's CVs and skills profiles are a type of information particularly relevant to the ExpertFinder initiative. which aims to develop vocabularies, their rule extensions, good practices and recommendations towards standardization of metadata in the form of a Resume-RDF ontology which would allow IT agents to find experts on particular subjects. The proposed system is an archetype to demonstrate the model published in previous works. The system integrates the Resume-RDF ontology to express CVs as a hierarchy of concepts. It exercises a skills graph to enhance the skills section of resumes to get more meaningful search results than the traditional keyword search approach. SPARQL, a recursive acronym for SPARQL Protocol and RDF Query Language, is used to retrieve relevant CVs. Apache Spark, a distributed cluster computing engine used for large-scale data processing, speeds up the tedious task of CV matching. Related Work There are many software applications available for managing resumes within a company. Previous work standardizes skills information by mapping resume skills to a preloaded list of skills presented in a tree format. The system matches candidates based on user-entered criteria and obtains profiles based on a direct match. When there is no direct match, it performs sibling matching and ranks matching CVs. The system does not perform semantic analysis of the data. Automated methods for analyzing unstructured CVs and extracting information were then proposed. A proposed CV parser automatically separates information into four phases using a named entity clustering algorithm. However, this approach does not say much about matching when a job criterion is given. Another work uses a self-recommendation learning engine to dynamically populate a candidate's parameters. The classification proposed by the following methods resumes when there are several matches for the criteria. The authors present a research problem of developing an improved approach that will help in selecting the right CV by processing a set of similar CVs. The approach is independent of user queries and helps the user discover useful information that they are unaware of. Experimental results show that there is a 50-94% reduction in the number of features that the recruiter must consider to select suitable CVs. The proposed method does not take into account the specialty in other sections such as "Education", "Achievements", "Professional experience", etc. Matchmaking strategies were then combined with the concept of ontologies. Another research describes an approach that uses ontology. and a deductive model to determine what type of match exists between a job seeker and a job posting, then rank resumes based on similarity measures if a partial match exists. THEsystem does not, however, use standard/formal ontologies that can be used for automated sharing/updating of information between other sources. Approaches for writing and publishing CVs semantically were then proposed. The approach captures CV information via a semantically-assisted graphical user interface. The approach focuses only on CV composition.ORP – Ontology-based CV parser integrates a Semantic Web approach to find suitable candidates. The approach does not handle partial matching or CV ranking. One method presented in the research details the combination of latent semantic analysis (LSA) and ontological concepts to match resumes and job description. The approach addresses issues related to using ontology in building LSA and clustering for better matching. Another research work classifies linked data as part of the Big Data landscape. The 4th paradigm of science is exploration, which involves learning new facts based on existing data. Linked Data serves as a testing ground to research some of the challenges of Big Data that use the underlying ontology to describe data in terms of entities, attributes, and values. Similarity-based matching when an exact match is not found is not considered in this approach. Furthermore, it does not comment on the effect of a large CV volume on system performance. Parallel research on applying graph theory principles to resume processing has led to expressing CVs as a graph and applying Big Data tools and technologies on CV graphs. Architecture of the Proposed System The overall architecture of the proposed system is shown in fig. 1 which includes two main modules: Resume Processing and Resume Matching. Resume processing involves capturing resumes, tokenizing and segmenting them, converting resumes into concept hierarchies, and keeping them in permanent storage. Resume Matching consists of accepting a search criterion, converting the criterion into a SPARQL query and retrieving the results. Both use the Skills Graph module. The three modules, Skills Graph, Resume Processing and Resume Matching are described next. This module reads skills and their associations from a text file and constructs a graphical hierarchy. The skills graph groups skills into a hierarchy based on their interrelationships. A node at a higher level is broken down into smaller nodes at lower levels. Generic skills and concepts are at the top level of the chart. Specific tools, technologies, and languages form leaf nodes. Resume Processing Module This module captures resumes from a web user interface, performs preprocessing, and generates a concept hierarchy. The pre-processing step involves converting the entered CVs into a sequence of tokens which are then mapped to one of four sections: Personal, Work Experience, Education and Skills. The Skills section is then mapped to the skills chart to add context by augmenting them. Increasing skills results in the extraction of hidden meanings and associations between skills. Each skill, mentioned in the entry, is searched in the graph. If found, all nodes in the specific node at the root of the graph are added as skills, and the CV is thus semantically increased. For example, let's say a person mentioned C++ in their CV. When this CV passes through the skills graph, the system learns that C++ is a language oriented,.