Crime Rate Prediction Using Social Media Data Mining on Twitter
Chapter 3: Research Methodology
This chapter is a description if the methodology that is applied in this dissertation. This study aims to quantify the crime prediction gains achieved by linking the Twitter-driven information to standard. Crime prediction technique which is based on kernel density estimation, find the existing crime prediction techniques that can be used in evaluating tweets to predict crime and identify factors that upset many Twitter-based crime forecast techniques. In this part, the author provides an outline of the research strategy, research method, research approach, methods of data collection and tools, sample selection, research process, data analysis, ethical considerations and research limitations of the dissertation.
The researcher used the survey strategy in this study. A survey is described as a process of gathering information from a group of people through brief discussions or interviews on a specific research topic (Mai, 2016). Surveys are used when seeking to collect particular sets of data from a specific group of people. Surveys can be conducted through face-to-face interviews, the internet, emails, and through phone calls. Reviews are essential in gathering information used in social research. Studies are applicable when the researcher wants to assess the opinion and experiences of people over a particular phenomenon (Mai, 2016). Surveys offer a flexible approach to a research study since they can be used to attain specific or universal research goals. When using the survey research strategy, the researcher uses a set of predetermined research questions on a selected sample of participants. The information gathered from the selected sample is used to conclude the broader population. Thus, for a survey to meet the objectives of the study the researcher should carefully choose the sample population so that he or she can acquire relevant information about the whole community (Mukhopadhyay and Gupta, 2014). The researcher opted to use this research strategy because it is easy to conduct since they consume less time when compared to other research strategies. Surveys can be used in collecting a wide range of data including attitudes, opinions and factual information from the participants. The researcher can also administer many questions to the respondents who give him or her more flexibility when analyzing data (Mukhopadhyay and Gupta, 2014). However, the accuracy of surveys may be affected since the respondents may fail to provide honest answers. Surveys which provide the respondents with close-ended questions usually have a lower validity than the ones with open-ended questions. To overcome this disadvantage the researcher sued the survey software to analyze data to determine the validity and reliability of the information provided (Mai, 2016).
Research method: Qualitative method
The researcher applies qualitative research methodology to meet the objectives of the dissertation. Qualitative research methodology is mainly exploratory (Flick, 2014). This methodology is essential in studies that require the researcher to understand reasons, and opinions on the research topic (Bryman, 2017). The qualitative research method is essential where the researcher is seeking to understand certain social trends, interactions, and behaviors such as crime in social media. The qualitative research method is suitable for this study because it is carried out in the natural setting. The research is going to collect data from the platform where the participants are affected by the problem under study. While using this methodology, the researcher is a crucial instrument as he or she collects data by evaluating the documents, observing and interviewing the participants (Flick, 2014). The outcomes of qualitative research methodology are neither measurable nor quantifiable. However, they can be used as a foundation for quantitative research. The strength of using qualitative research methodology in this dissertation is that provides detailed information about the feelings and their behavior while using the twitter (Creswell, 2013). Qualitative research method also provides a flexible research approach which gives the researcher alternatives if the chosen plan fails to provide the adequate information on the topic. Qualitative research is mainly based on human experiences which is essential in this particular study. However, qualitative research has several shortcomings which may affect its effectiveness. First, the quality of the research is mainly based on the research skills of the researcher, and it is prone to personal biases of the researcher (Bryman, 2017). Using this method the researcher gather a large volume of data which might consume a lot of time to analyze and interpret.
In this study of the crime rate prediction using social media data mining on Twitter, the phenomenological research approach is used to achieve the objective of the study. Phenomenology is a qualitative research approach which focuses on studying the everyday experiences facing a particular group of people (Lewis, 2015). The primary purpose of this approach provides a detailed description of a specific occurrence among the people. The researcher gathers information from a chosen niche of respondents who have first-hand knowledge of the research topic through interviews or questionnaires. According to Creswell, (2013), the questions attempt to get answers on the experiences of the respondents in a particular phenomenon and the context in which influenced the respondents to experience the events. The researcher then evaluates and analyzes the responses given and groups the data according to the meaning provided. Through this process, the researcher can create the universal definition of the phenomena which leads to an enhanced understanding of the events. Phenomenology is focused on extracting the most reliable data since the researcher documents information from the subject’s experiences and thus reducing biases (Lewis, 2015). The strengths of using this research approach are that it aims at establishing a universal truth about an experience which is essential in providing a deeper understanding of the research topic. This approach also helps in understanding the people’s experiences which makes them meaningful (Creswell, 2013). This may help in the development of new theories, and policies to help in addressing the issue. However, the approach may be affected by the language barrier, and thus the researcher may fail to get desired information from the participants. The plan is also subject to biases when interpreting data.
Data collection method and tools
The researcher in this study used structured interviews to meet the objective of the study. A formal interview is a tool data collection which can be applied in both qualitative and quantitative types of research (Brinkmann, 2014). The purpose of using structured meetings is to ensure that all the participants receive the same interview questions and in the same order. This allows the researcher to compare data easily and categorize data into various subgroups for further analysis. The researcher is going to administer the questions verbally and ensure that the questions are presented to the participant using the same order of word as presented in the survey questionnaire. The questionnaire contains closed-ended questions with specific answers which the participant is required to respond. Structured interview ensures that all the respondents answer the problems in the same context. When using structured interviews in qualitative research, the researcher is required formulate an interview schedule which guides the order of questions in an interview (Brinkmann, 2014). By using an interview schedule, the researcher increases the credibility and the reliability of the research data. Lastly, by using structured questions, the interviewer is present and may explain complex issues to the respondent thus avoiding misinterpretation (Bryman, 2015).
The researcher used a structured questionnaire as a tool for data collection. The questionnaire is used by the researcher to guide the interview (McGuirk, and O’Neill, 2016). The researcher prepared specific questions that were administered to the respondents which helped in getting appropriate responses about the research topic. Questionnaire as a tool of data collection is advantageous because they can be sued to gather information from a huge population of respondents. Using structured questionnaires makes it easy for the respondent to analyze data. A questionnaire is also an efficient tool for data collection which is also cost effective thus they are affordable. Structured questions also require specific information from the respondent and this they help in eliminating inappropriate responses. However, structured questionnaires have limited flexibility thus the researcher which restrict the researcher into certain aspects of the phenomena under study (McGuirk, and O’Neill, 2016). To overcome this disadvantage the researcher allowed the respondent to make few comments which helped in gathering useful information that could not have been collected using the questionnaire.
Predominantly, the tweets were collected by the tweeter developers in a large geographical place on Twitter. After that Tweets were then accumulated according to crime-related subjects inside a specific steady territory of every city. The mission was to embrace for collecting reasons. Some signals were used to distinguish crime related tweets. Twitter regions restrict on the designer’s record for dependability importance. The evasion rate limits for calls to the API alter as per the strategy being utilized and regardless of whether the technique itself requires ultimate validation. There are two disadvantages:
- Presence of Unauthenticated calls which have around 200 solicitations for each hour. Unauthenticated calls are assumed to be opposing the general population confronting IP of the server causing changes in the demand.
- Also, there are OAuth calls are allowed 400 solicitations for every hour and are assumed to be opposing the OAuth token employed as a part of the demand changes.
Given the Twitter Developer restrictions over its database, need about several gadgets that could pass 200 demands in half-hourly interims. The research took around 20 unmistakable engineers, keys and figured out how to disregard 3000 calls with are specified intervals.
|1||Miami||Mon Feb 26 20:22:35 CDT 2018||When I was walking in the middle of the night from the job. I confronted three armed guys. They stole everything I had….|
|2||Colorado||Fri March 20 20:53:22 CDT 2018||I was a terrible day a polite and good guy with a gun takes out a bad guy with a gun who invaded a hotel. Injuries reported after long shots were fired at Colorado.|
|3||Los Angeles||Wed March 27 23:50:31 CDT 2018||Gunshots are so pleased to hear in the middle of town……Puh! Puh! I ran to the house.|
With the cooperation of Twitter developers, tweets were collected in Colorado, Los Angele and Miami cities for 30 days that completed the data over 200,000 tweets in the file (Bolla, 2014). Though research did was there are a variety of data achieved different kinds of tweets collected, pertinent, especially the attention was unique information from tweets.
Tweets were held amid February and March of 2018 with the engineers of the designers of Tweet, to see the acknowledgment of their execution in the whole research. In particular, the scientist came in contact with and asked for them to participate in the exploration in the wake of giving out nature and the extension of the whole research. When all is said in done terms, the respondents were ready to partake in the examination, and even meetings were performed amongst February and March of 2018 (Bolla, 2014). The exchanges occurred at the workplaces of the designers. Significantly, the team kept notes and assisted the Twitter developers to dissect the assembled and analyzed information.
The tweet was analyzed then estimation investigation was led. The analyzing procedures are incorporated in the following: –
- Accomplish the separation of the individual terms from each tweet as per the blank area limits
- It is also advisable to convert tweet into case letters
- It is vital to eliminate the non-alphanumeric characters that highlighted in the tweets
These means assisted with distinguishing the individual arrangement of the tokens achieved to coordinate as significant r aspect of the research approach (Bolla, 2014). The opinion research approach requires non-alphanumeric characters to be forbidden in the fieldwork. The research additionally is reached out by gingering every one of the tweets which can build the number of times mapped to concerning the counterparts in the region.
The research is a part of information examination including both grouping and examining of every datum test in an arrangement of tweets as illustrated in the table (Bolla, 2014). The essential objective is to distinguish the patterns. The datasets obtained as result of the investigation expelled from Twitter should have been separated before models appeared. The parameters characterized by the study stage incorporating the following terminologies and formulae:
The term Tweet Mean is described by an average number of tweets collected in the day. The average tweet includes for all days in each specific region was figured to assess
Tweet Mean µ :
, is illustrated as the tweet count
The Search Area is a particular region allocated to the city where the research is done and where the tweets were obtained
ℎ = 2
( ) is the fixed radius of the estimated region.
- Population Count:
The population count was defined as the count of people in a particular
Search Area = is the de × ℎ within a city
is the total number of people within a specific area.
The Tweet Ratio term is the Population Count calculated as per categorized.
- People TR
The People TR is the determination of the reciprocal of the tweet ratio calculated. This is calculated according to the probability that an individual would do a crime.
The following procedures were invented to design a step for geographic analysis on the tweets collected:
- To be able to read and scan the information with the cluster tweets as per to the city
- Ensure to classify the region-specific data depending on the date when the tweet was posted on the Twitter handle.
- Ensure a vital count of the number times of the crime-related tweets were done in the region.
- perform the population density which depicted the population in a specified area of each region.
- lastly to be able to calculate the parameters in the Twitter application.
The present investigation was liable to specific moral in the crime rate prediction using twitter research. As it was specified before, all research members and twitter developers announced their composed salutation in regards to their cooperation in the study where they signed a letter for briefing and consent. In the meantime, test individuals were needed to put a signature on the letters. The letters ensured a promise to the members that their cooperation in the exploration is intended. In this time, the Twitter developers were allowed to come together or depart if reason appears. Besides, the developers and official members were trained and more skills added to them in regards to the terminuses for the stipulated research. A team of researchers aimed at their responses that were dealt with privately and confidentiality. It is for academic reasons and only for the reasons for the individual research. The research members were not discouraged or misused except were given excellent hospitality in the process of the study. Equally, the Twitter developers and research team aimed at providing and maintaining the conducive atmosphere.
The research has tended to the first two limitations of area maps by anticipating the criminal point process into an element space that portrays each point regarding its vicinity. For instance, nearby roadways and police base camp. This space is then demonstrated utilizing necessary procedures such as summed up added substance models or strategic relapse. The advantages of this approach are clear. It can all the while think about a wide assortment of verifiable what is more, spatial factors when making forecasts. Moreover, expectations can be made for geographic zones that need chronicled crime records, in so far as the territories related with the essential spatial data. The customary hotspot maps are the absence of thought for web-based social networking that has been halfway tended to by models examined in the regions.
The type of the data collected from the twitter is accumulated in the form of the qualitative research. This is the place the individual thought of data gathering in subjective analysis can in like manner be a negative portion of the methodology. What one expert may feel is fundamental to amass data that another pro feel is minor and won’t contribute vitality looking for after it. Having unusual and distinctive perspectives including instinctual decisions can incite unbelievably essential data. It can in like manner provoke data that is summed up or even misguided given its reliance on researcher subjectivisms.
Brinkmann, S., 2014. Interview. In Encyclopedia of critical psychology (pp. 1008-1010). Springer New
Bryman, A., 2015. Social research methods. Oxford university press. York.
Bryman, A., 2017. Quantitative and qualitative research: further reflections on their integration. In Mixing methods: Qualitative and quantitative research (pp. 57-78). Routledge.
Creswell, J.W. 2013. Qualitative Inquiry & Research Design: Choosing Among the Five Approaches. Thousand Oaks, CA: SAGE Publications, Inc. (pp. 77-83)
Flick, U., 2014. An introduction to qualitative research. Sage.
Lewis, S., 2015. Qualitative inquiry and research design: Choosing among five approaches. Health promotion practice, 16(4), pp.473-475.
Mai, J.E., 2016. Looking for information: A survey of research on information seeking, needs, and behavior. Emerald Group Publishing.
McGurk, P.M. and O’Neill, P., 2016. Using questionnaires in qualitative human geography.
Mukhopadhyay, S. and Gupta, R.K., 2014. Survey of qualitative research methodology in strategy research and implication for Indian researchers. Vision, 18(2), pp.109-123.
Bolla, Raja Ashok, 2014 “Crime pattern detection using online social media.” Masters Theses. 7321. http://scholarsmine.mst.edu/masters_theses/7321