-
Essay / Internet is a bigger space in the life of humanity
Table of ContentsIntroductionData MiningAssociation Rule MiningHadoopLiterature SurveyResearchers Mohit K. Gupta and Geeta Sikka worked on Multi-objective Genetic GoalsConclusion and Scope futureInternet has occupied a larger space in the life of humanity. It has become important in almost every sector of the world. The basic advantage of the Internet is rapid communication and rapid transfer of information through different modes. Say no to plagiarism. Get a tailor-made essay on “Why Violent Video Games Should Not Be Banned”? Get an Original Essay As with the evolution of technology, the Internet is used not only for acquiring knowledge but also for communication purposes. It has become a way to exchange or express ideas. Mostly, the current scenario is that people use social networking sites as a way to connect with other people and share information with them. A social network is a large network of individuals interconnected by interpersonal relationships. A lot of data is used by individuals to exchange information in the form of images, videos, etc. The data generated is called social media data. These data help determine various aspects of society. Data mining is the process of inspecting data from different angles to find unknown elements. One of the important tasks of data mining, which helps in the discovery of associations, correlations, statistically relevant patterns, causality and emerging patterns in social networks, is known as data mining. rules of association.IntroductionPreviously, people communicated verbally or nonverbally. Non-verbal communication takes place by writing letters in newspapers or writing drafts, etc. This communication has certain limits and is a bit confined. There were fewer or not many means of non-verbal communication. The effect of the Internet, also known as the network of networks, has enabled people to obtain information globally in various aspects. Initially, the only use of the web was to collect information and share it. Nowadays, the Internet plays a bigger role in life. of humanity. It has become important in almost every sector of the world. The basic advantage of the Internet is rapid communication and rapid transfer of information through different modes. Over time, the need to collect information to share, contribute and make an impact grew and eventually gave impetus to collecting, analyzing and channeling huge data in a precise manner. The creation, collection, storage, retrieval and presentation of data have become an integral part of the knowledge society. Ultimately, the Internet is not only a means of acquiring knowledge, but is now also used as a means of communication. Currently, millions of people use the Internet as a means to express their ideas and share information. Most people use social networking sites or blogs to connect with people and share information. Social networks have thus developed at a remarkable speed throughout the world. Many social networking sites are now available such as Facebook, Twitter, etc. Facebook had more than 1.44 billion active users in 2015. This translates into a drastic boom in the emergence of social sites. For example, Twitter is one of those social networking sites which has become popular in a short time due to its simple andinnovations such as tweets which are short text messages. These tweets are much faster and are used to collect various information. Millions of tweets are used every day to gather information that can help with decision-making. A social network is essentially a network of individuals connected by interpersonal relationships. Social media data refers to data generated by people socializing on that social media. This user-generated data helps examine several social community assets when analyzed and leveraged. This can be accomplished by Social Network Analysis. Mapping and measuring relationships is known as social network analysis [SNA]. Thus, SNA plays a decisive role in representing the various assets of the socializing community. Data Mining Various data from various social media sites are stored in files and other repositories, which helps us to analyze and interpret such amount of data together, which gives us many interesting insights. knowledge that could help us make other decisions. Data mining, also known as knowledge discovery process[4], is the process of finding unknown information by analyzing data from different angles. Here, patterns are discovered in large datasets. Information is extracted from a dataset and reshaped. So, data mining and knowledge discovery in databases (or KDD) are used as substitutes for each other, but data mining is the actual process of the knowledge discovery process. Association rule miningOne of the important tasks Data mining, which helps in the discovery of associations, correlations, statistically related patterns, causality and emerging patterns in social networks, is carried out by mining rules of association. Another data mining technique known as frequent itemset mining plays an important role in many data mining tasks. Frequent item set mining plays an important role in many data mining tasks that attempt to discover interesting patterns from databases, such as association rules, correlations, sequences , classifiers and clusters. The exploitation of association rules is one of the major problems of all these problems. Recognition of sets of elements, products, manifestations and peculiarities, which often appear together in the given database, can be considered one of the most primitive tasks of Data Mining. For example, the bread, potatoes->sandwich association rule would reveal that if a customer buys bread and potatoes together, he or she will likely also buy a sandwich. Here, bread and potatoes are support and the sandwich is confidence. This knowledge can be used for decision-making purposes. Consider a social networking environment that collects and shares user-generated text documents (e.g., discussion threads, blogs, etc.). It would be helpful to know what words people typically use in speech related to a specific topic, or what set of words are often used together. For example, in a discussion topic related to "US elections", the frequent use of the word "economy" shows that the economy is the most important aspect of the bureaucratic habitat. Therefore, a set of frequent elements of the first countcould be a good marker for a central topic of discussion. Similarly, a set of frequent items having a number or length of two can show what other factors are important. Therefore, an item set mining algorithm frequently run on a set of text documents produced on a social network can display the central topic of discussion and word usage pattern in discussion threads and blogs . With the exponential growth of social media data to a terabyte or more, it has become more difficult to analyze data on a single machine. Thus, the Apriori algorithm [6], which is one of the best-known methods for extracting frequent item sets in a transactional database, proves to be ineffective in handling ever-increasing data. To solve this problem, the MapReduce framework [7], which is a cloud computing technique, is used. HadoopHadoop is an open source platform licensed under Apache v2 that provides the analytical technologies and computing power needed to work with large volumes. of data. The Hadoop framework is built in such a way that it allows the user to store and process large data in a distributed environment across many computers connected in a cluster using simple programming models. It is designed in such a way that one can manage thousands of machines from a single server. , with ease of storage and local calculation. It divides data into manageable chunks, replicates them, and distributes multiple copies across all nodes in a cluster so that everyone can get their data processed quickly and reliably later. Rather than relying on hardware to provide high availability, the Apache Hadoop software library itself is designed to detect and manage failures at the application layer, providing a highly available service on top of a computer cluster. Hadoop is also used to perform data analysis. The main components of Apache Hadoop consist of a storage part, known as Hadoop Distributed File System (HDFS) and a processing part called MapReduce.Literature SurveyMethods of discovering the relationship between variables in large databases is called Association Rule Mining. It was introduced to check the regularity between products during large-scale transactions through a Point of Scale (POS) system by Rakesh Agrawal. This was based on Association rule. For example: bread, tomatoes, mayonnaise directly refer to a sandwich. According to various sales data from the supermarket, if a customer buys tomatoes and mayonnaise together, they may also buy a sandwich. To make decisions, this data can be used.T. Karthikeyan and N. Ravikumar, in their article conclude after examination and observation. They conclude that much attention and focus has been given to the performance and scalability of the algorithms, but not to the quality of the rule generated. According to them, the algorithm could be improved to infer execution time, complexity and would also improve accuracy. Furthermore, we conclude that it is necessary to focus more on designing an efficient algorithm with decreased I/O operations by reducing database analysis in the rule mining process. of association. This article gives a theoretical study on some existing association rule mining algorithms. The concept behind this is provided at the beginning followed by an overview of the research work. This article aims to provide a theoretical overview of some association rule mining algorithms.The advantages and disadvantages are discussed and concluded with inference. Rakesh Agrawal and Ramakrishnan Srikant proposed a starting set concept to generate new large sets of elements called candidate element sets which counted the actual support of these at the end of the pass. until no new large set is found. These two algorithms for finding association rules between items in a large database of sales transactions were named Apriori and AprioriTid.J. Han, J. Pei, and Y. Yi developed a systematic extraction method based on FP tree called FP growth to extract recurring patterns based on the concept of fragment growth. The problem has been approached from 3 aspects: mainly the data structure called FP tree, where only elements of recurring length will have nodes in the tree. They also created a model based on the FP tree which studied its conditional basis and then constructed its FP tree. and periodically carried out mining with such a tree. Additionally, the divide and conquer method was used instead of the bottom-up search technique. A new strategy for extracting frequent itemsets from terabyte-scale datasets on cluster systems was developed by S. Cong, J. Han, J. Hoeflinger and D. Padoue who s is focused on the idea of a sampling-based framework for parallel data mining. The whole idea of targeted data mining has been included in the algorithm. Processor performance, memory hierarchy and available network were taken into account. This developed algorithm was the fastest sequential algorithm that could expand its work in parallel and therefore used all the resources made available efficiently. A new narrative for data mining was introduced by PV Sander, W. Fang, KK Lau which used next generation graphics processing units (GPUs) known as GPUMiner. The system depended on the massively multithreaded SIMD (Single Instruction, Multiple-Data) architecture provided by the GPUs. The GPU miner consists of three components: a buffer manager and a CPU-based storage that managed data and I/O transfer between the graphics processing unit and central processing unit, it also integrated a module of CPU-GPU co-parallel processing data mining, and finally it included a GPU-based mining visualization module. The two FP-Tree based techniques, a lock-free dataset tiling parallelization and a cache-aware FP arrays were proposed in "Optimizing Recurrent Item Set Mining on a Multi-Core CPU" , which addressed the low utilization of multi-core system and effectively improved data localization performance and used hardware and software prefetching. Also the FP tree construction algorithm can be reapproximated by a lock-free parallelization algorithm. To divide the recurring task of exploring itemsets in the top-down approach, C. Aykanat, E. Ozkural and B. Ucar developed a distribution scheme based on database transactions. This method works on a graph where the vertices correspond to the recurring element and the edges correspond to the sets of recurring elements of size 2. A vertex separator separates this graph so that the distribution of elements can be decided and operated independently. The two new mining algorithms were developed from this diagram. The elements corresponding to the separator are recreated by these algorithms. One of the algorithms recreates the work and the.