Grouping of Village Status in West Java Province Using the Manhattan, Euclidean and Chebyshev Methods on the K-Mean Algorithm

− Village Potential Data in 2014 (Podes 2014) in West Java Province is data released by the Central Statistics Agency in collaboration with the Ministry of Villages PDTT in unsupervised form and consists of 5319 village data. The 2014 Podes data in West Java Province is based on the level of village development (village specific) in Indonesia, by using the village as the unit of analysis. However, village funds have not been distributed effectively and accurately in accordance with the conditions and potential of the village due to the lack of clear information about the status of the village. So that there are no priority villages that should receive greater funds and attention from the government. One of the algorithms that can be used for the clustering process is the k-means algorithm. Grouping data using k-means is done by calculating the shortest distance from a data point to a centroid. In this study, a comparison of the distance calculation methods on k-means between Manhattan, Euclidean and Chebyshev will be carried out. Tests will be performed using execution time and davies bouldin index. From this test, the 2014 Village Potential data in West Java Province has been grouped into 5 village statuses by obtaining the number of villages for each cluster , namely cluster as many as 694 villages, cluster as many as 567 villages, cluster as many as 1440 villages. , cluster is 1557 villages and cluster is 1061 villages. For distance calculation, Chebyshev has the most efficient accumulation of time compared to Manhattan and Euclidean. Meanwhile, the Euclidean method has the Davies Index compared to the Manhattan and Chebyshev methods.


INTRODUCTION
The Republic of Indonesia is one of the countries in Southeast Asia which is located between two continents (Asia and Australia) and two oceans (India and the Pacific) which makes Indonesia also known as the Archipelago (Intermediate Archipelago). Indonesia is the largest archipelagic country in the world consisting of 17,504 islands. With a population of 222 million in 2006, Indonesia is the country with the fourth largest population in the world. Indonesia consists of various ethnic groups, languages and different religions. The total area of Indonesia is 1,913,578 km2, and has the second largest biodiversity in the world [10].
The large number of people and the rapid development in cities have an impact on the Indonesian economy. So that the development makes people come to the city to get a job and settle down. This phenomenon is also known as urbanization. Urbanization in Indonesia causes many problems, the impact on the village is the reduction in human resources because the population moves to cities which causes villages in Indonesia to not experience real development. The existence of urbanization is triggered by development facilities between rural and urban areas [14].
The Ministry of Villages, Development of Disadvantaged Regions and Transmigration of the Republic of Indonesia (Kemendes) is a ministry within the Government of Indonesia in charge of developing rural areas and rural areas, empowering rural communities, accelerating development of disadvantaged areas, and transmigration. The Ministry of Villages is under and responsible to the President. Based on Presidential Regulation Number 18 of 2020 concerning the 2020-2024 National Medium-Term Development Plan (RPJMN), the 2020-2024 RPJMN is a strategic document that contains development plans that must be carried out by the government for the next five years. The 2020-2024 RPJMN is used as an official reference for local governments and other stakeholders in implementing development. The RPJMN 2020-2024 was raised with the aim of mapping rural conditions in Indonesia based on their level of development, setting development targets/targets in the next five years that must be achieved by village development actors, and photographing the performance of village development that has been carried out. In the 2020-2024 RPJMN for the area and spatial planning of the subsector of village development and rural areas, it contains village development targets that must be achieved in the next five years, namely reducing the number of underdeveloped villages to 5,000 villages and increasing the number of independent villages to at least 2,000 villages in 2024. Realizing this requires mapping the status of the village to be built.
The Ministry of Village based on the Regulation of the Minister of Villages, Development of Disadvantaged Regions, and Transmigration of the Republic of Indonesia number 2 of 2016 concerning the index of developing villages, states that the status of villages is categorized into 5 categories, namely independent villages, developed villages, developing villages, underdeveloped villages and very underdeveloped villages with the explanation as follows: following: 1) Independent Village or the so-called Sembada Village is an Advanced Village that has the ability to carry out village development to improve the quality of life and life as much as possible for the welfare of the Village community with social resilience, economic resilience, and ecological resilience in a sustainable manner. Independent Village or Madya Village is a Village that has a Developing Village Index greater (>) than 0.8155.
2) Advanced Village or the so-called Pre-Sufficient Village is a Village that has the potential of social, economic and ecological resources, as well as the ability to manage them to improve the welfare of the Village community, quality of human life, and alleviating poverty. Advanced Village or Pre-Madya Village is a Village that has a Developing Village Index less and equal to (≤) 0.8155 and greater (>) than 0.7072.

3) Developing Village or what is called Madya
Village is a potential village to become an advanced village, which has the potential of social, economic and ecological resources but has not managed them optimally for improving the welfare of the village community, quality of human life and alleviating poverty. Developing Village or Middle Village is a Village that has a Developing Village Index less and equal to (≤) 0.7072 and greater (>) than 0.5989.

4) Disadvantaged Villages or so-called Pre-Madya
Villages are villages that have potential social, economic, and ecological resources but have not, or have not managed them in an effort to improve the welfare of the Village community, the quality of human life and experience poverty in its various forms. Disadvantaged Villages or Pre-Madya Villages are Villages that have a Developing Village Index less and equal to (≤) 0.5989 and greater (>) than 0.4907. 5) Very Disadvantaged Villages or so-called Pratama Villages are villages that experience vulnerability due to natural disasters, economic shocks, and social conflicts so that they are not able to manage the potential of social, economic and ecological resources, and experience poverty in various forms. Very Disadvantaged Villages or Pratama Villages are Villages that have a Developing Village Index less and smaller (≤) than 0.4907.
Village status is in fact inseparable from village development that uses funds from the government. In the Law of the Republic of Indonesia number 6 of 2015 concerning villages, article 86 paragraph 2 states that the Government and Regional Governments are obliged to develop a village information system and development of rural areas. The Ministry of Villages in collaboration with the National Development Planning Agency and the Central Statistics Agency issued data on village potential (Podes) in West Java Province which has 5,319 instances and consists of 42 dependent attributes without labels. Podes data is a measurement method that is compiled based on the level of village development in Indonesia by making the village the unit of analysis. Podes data is compiled with reference to Law Number 6 of 2014 concerning Villages, which is intended to capture the level of village development in West Java Province and can be used as a reference for policy planning and village development supervision [19].
In information technology, data is an important part that cannot be separated from information retrieval. Data mining is a series of activities used to find new, hidden or unexpected patterns contained in the data. The term data mining is often considered as a synonym for knowledge discovery from data (KDD), namely the discovery of knowledge from data that focuses on the purpose of the mining process [21]. Data mining can be used to perform clustering, classification and association. Clustering is a grouping process that is carried out by finding similar characteristics between data according to certain class groups [8]. In simple terms, clustering can be used to analyze a set of data and generate a set of clustering rules that can be used to group future data.
In the real world sometimes data is not only grouped into status binary (binary class),but needs also to be grouped into the multi-status(multi-class).In the case of multi-class data-sets, grouping will be more difficult than in the case of classes binary. K-mean is an interactive clustering algorithm that partitions the data-set into number of K clusters a predetermined. Ever done a comparative study of clustering, the partition-based clustering hierarchical based and clustering. density-basedIn this study, it was revealed that k-mean which is a partitionbased algorithm provides better performance and k-mean is superior for large/lots of data compared to clustering hierarchical and-based density [6]. In addition, several other studies also mention that clustering using the k-means algorithm is faster than algorithms clustering otherand also produces quality clusters when using large datasets [7] [20] [12].
The k-mean algorithm has been used for multi-class grouping which shows effective results and is able to improve performance classification [1]. In another case, two algorithms were compared, namely k-mean and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) which said that the two algorithms are algorithms clustering the most commonly used for grouping data with different criteria, and the test results show that k-mean algorithm is better than DBSCAN in time efficiency analysis [5]. In addition, in research on the comparison of algorithms clustering data mining comparing k-mean, Density based clustering and Hierarchical Clusterings concluded that the k-mean algorithm is superior in time efficiency compared to the other two algorithms and also k-mean is able to distribute cluster instances quite accurately [17]. When the k-mean algorithm performs data grouping, the k-mean algorithm will calculate the closest distance between a data set to appoint centroid. The calculation of the distance on the kmean algorithm can use Manhattan, Euclidean and Chebyshev. Each method of calculating distance is superior to one another depending on the dataset used [3] [18] [15].
The calculation of the distance in the k-means algorithm can use Manhattan, Euclidean and Chebyshev. Awasthi in his research on the comparison of the Manhattan and Euclidean distance calculation methods on the k-means algorithm to determine the number of squared errors, the data used in this study was a data-set bank that was tested using the WEKA tool. The test results show that the Manhattan distance calculation method is better than the Euclidean method [3].
In another study, a comparison of 3 methods of calculating distances in the k-means algorithm was carried out, namely Manhattan, Euclidean and Minkowski to find the best distance calculation method, the study was carried out by comparing the results of previous studies which concluded that the Euclidean distance calculation method was better than the Manhattan and Minkowski method [18].
Kouser and Sunita also conducted other research on the comparison of the Manhattan, Euclidean and Chebyshev distance calculation methods on the k-means algorithm to determine the accuracy and mean absolute error. From the tests conducted using the flower data-set, it was found that the Chebyshev distance calculation method is better than the Manhattan and Euclidean methods [15]. From previous studies, it is known that the Manhattan, Euclidean and Chebyshev distance calculation methods are superior to each other depending on the dataset used.
Based on the considerations previously mentioned in this study,will be carried out clustering village status in West Java Provinceusing the k-mean algorithm into 5 village statuses, as well as comparing which distance calculation method is the most effective for grouping village status.

II. RESEARCH METHODOLOGY A. Research Methods
This research is a quantitative method where there is data on Village Potential in 2014 (Podes 2014) in West Java Province which will be processed. In this study,will be carried out clustering village status in West Java Province, using the k-mean algorithm with the Manhattan, Euclidean and Chebyshev distance calculation methods into 5 village statuses, as well as comparing which distance calculation method has the most efficient accumulation of time and which has the most efficient time accumulation. the value Davies Index is most optimal.

B. Data Collection Methods The data
Used in this study is the potential of villages in West Java Province in 2014 (Podes 2014). Podes 2014 data is secondary data issued by the Central Statistics Agency based on Law Number 6 of 2014 concerning villages. Data Podes 2014 consists of 5,319 instances and has 42 dependent attributes without labels. The description of the 2014 Podes data owned can be seen in Table 1. In Table 1 there are attributes I1, I2, I3 to I42 where the initial I stands for "Indicator". The value of each attribute is 0 to 5, where the value 0 is the lowest value while the value 5 is the highest value. Information on values 0-5 for each indicator can be seen in Appendix 1.
Data on the potential of villages in West Java Province in 2014 is the result of measurements made based on the level of village development (village specific) in West Java Province by using the village as the unit of analysis. Podes 2014 in West Java Province is used as a reference for the main indicators that make up the index, and data on government administration areas (MDNR Indonesia, 2015) which is used as a reference standard for the number of integrated villages in West Java Province. Podes 2014 in West Java Province is a complex multidimensional concept consisting of dimensions, variables and indicators that are used as measuring tools for village development. In the 2014 Podes data in West Java Province, there are 5 dimensions, 12 variables and 42 indicators. Completely from the dimensions, variables and indicators can be seen in Table 2.

C. Sample Selection Methods
Data samples random taken from the original data is data 2014 Village Potential in West Java province as many as 15 villages and 42 indicators initialized as I1 to I42 to be grouped using the k-means algorithm, the data obtained are shown in Table 3. From table 3 will be used for grouping into 5 clusters initialized as C1 SD C5. The grouping will be done using the k-means algorithm with three different distance calculation methods, namely Manhattan, Euclidean and Chebyshev.

D. Evaluation
In this study, the evaluation will be carried out by grouping the village potential data. The grouping will use the k-mean algorithm where the distance calculation will use Manhattan, Euclidean and Chebyshev. The results of the grouping obtained are the grouping of village potential data into 5 clusters, namely cluster 0, cluster 1, cluster 2, cluster 3 and cluster 4. Until this stage it is not a known cluster which can be called a cluster independent village, developed village, developing village, underdeveloped villages and very underdeveloped villages.
In the village potential data, each attribute/indicator has a value of 0 to 5, where a value of 0 is the lowest value while a value of 5 is the highest value. From each cluster obtained has value centroid, where the centroid is the "midpoint" of the cluster. So to determine the status of the village, we can calculate the number of centroids for each cluster, which can be written with the equation: From equation 8, CI is the centroid of each indicator and each cluster has 42 indicators. Determination of village status will be sorted based on the sum of the values centroid of each indicator in each cluster, where the lowest sum value will be initialized as very underdeveloped village status and the highest sum value will be initialized as independent village status.

E. Validation
In this study, validation will be carried out to test the distance calculation method on which k-mean algorithm is most effectively used for grouping village potential data. The test will be carried out using tools RapidMinerto obtain the accumulated time and the value of the Bouldin Index for each distance calculation method used. The best time efficiency is the one that has the minimum accumulated time. Meanwhile, by using the Davies Bouldin Index, a cluster will be considered to have scheme clustering an optimal which has a Davies Bouldin Index minimal.

III.
RESULTS AND DISCUSSION A. Testing the proposed Method Testing the Manhattan, Euclidean and Chebyshev distance calculation methods on the k-means algorithm used to group the 2014 Village Potential data in West Java Province will be carried out using latest model clustering that is validated using execution time and Davies Bouldin Index.

Manhattan Distance
From the use of the k-means algorithm with the Manhattan Calculation method distance to group the 2014 Podes data in West Java Province which amounted to 5319 villages, the number of villages from each obtained cluster was as follows: · Cluster 0: 1,005 villages · Cluster 1: 1,189 villages · Cluster 2: 1,084 villages · Cluster 3: 447 villages · Cluster 4: 1,594 villages When viewed from the number of centroids calculated by the equation above and the number of villages in each cluster, the status of the village can be obtained from the k-means grouping using the Manhattan Calculation method distance as shown in Table 4.

Euclidean Distance
From the use of the k-means algorithm with the Euclidean Calculation method distance to group the 2014 Podes data in West Java Province which amounted to 5,319 villages, the number of villages from each obtained cluster was as follows: · Cluster 0: 1,061 villages · Cluster 1: 567 villages · Cluster 2: 1,557 villages · Cluster 3: 1,440 villages · Cluster 4: 694 villages When viewed from the number of centroids calculated by the equation above and the number of villages in each cluster, the status of the village can be obtained from the kmeans grouping using the Euclidean Calculation method distance as shown in Table 5. When viewed from the number of centroids calculated by the equation above and the number of villages in each cluster, the status of the village can be obtained from the kmeans grouping using the Chebyshev Calculation method distance as shown in Table 6. B. Testing the proposed Accumulation of time is carried out by executing 5 times for each distance calculation method used. The 5 executions will then be averaged to obtain the most efficient execution time for each distance calculation method. From the tests that have been carried out, it is obtained that the length of execution time is different, as for the length of execution time from the test of the Manhattan, Euclidean and Chebyshev distance calculation methods that have been carried out, it can be seen in Figure 1.

Figure 1. Execution Time
In Figure 1. It can be seen that the execution time of the method Manhattan distance to test 1 to test 5 in a row is 3 seconds, 3 seconds, 2 seconds, 2 seconds and 2 seconds, so when taken on average execution time of a Manhattan distance is 2.4 seconds. Meanwhile, the execution time of the Euclidean Method distance for testing 1 to 5, respectively, is 1 second, 2 seconds, 2 seconds, 1 second and 2 seconds, so that the average execution time of the Euclidean distance is 1.6 seconds. Then the execution time of the Chebyshev Method distance for testing 1 to 5, respectively, namely 1 second, 1 second, 1 second, 1 second and 1 second, so that when taken the average execution time of Chebyshev distance is 1 second. The more easily the execution time required for the Manhattan, Euclidean and Chebyshev methods can be seen in Table 7. In this study, the Davies Bouldin Index (DBI) was used to validate the data in each cluster. Measurement using DBI aims to maximize the distance inter-cluster. By using DBI A cluster will be considered to have scheme clustering an optimal if it has a Davies Index minimal. As for the tests that have been carried out, the values obtained Davies Index from the Manhattan, Euclidean and Chebyshev methods are shown in Figure 2.  Figure 2, it can be seen that the value Davies Index from the Manhattan method is 0.926, the value Davies Index from the Euclidean method is 0.886 and the value Davies Index from the Chebyshev method is 0.990. As for the easier value of the Davies Index from the Manhattan, Euclidean and Chebyshev methods, it can be seen in Table 8 From the tests that have been carried out, it can be seen that the 2014 Village Potential data grouping in West Java Province using the k-means algorithm with the distance calculation method Chebyshev has the most efficient accumulation of time compared to Manhattan and Euclidean, while the Euclidean method has the value Davies Index most optimal compared to the method. Manhattan and Chebyshev. So when viewed from the quality of the cluster based on the Davies Index , the obtained cluster status of the village is from the k-means algorithm with the distance calculation method Euclidean as follows: -Cluster Very Disadvantaged Village As many as 694 villages -Cluster Disadvantaged Village As many as 567 villages -Cluster Developing Village Of 1,440 villages -Cluster Advanced Village as many as 1,557 villages -Cluster Independent Villageof 1,061 villages

IV.
CONCLUSION From the discussion and evaluation in the previous chapters, the 2014 Village Potential data grouping in West Java Province into 5 groups using the k-means algorithm with the Manhattan, Euclidean and Chebyshev distance calculation methods, the conclusions are: 1. The 2014 Village Potential data in West Java Province has been grouped into 5 village statuses with the obtained number of villages for each cluster , namely cluster as many as 694 villages, cluster as many as 567 villages, cluster as many as 1440 villages, cluster as many as 1557 villages and cluster as many as 1061 villages.