As graduate students in Management Science and Engineering at Columbia University, what would you dream of working on? - developing management strategies for a business company or developing a business model for a finance company? How about having an opportunity to contribute towards one of the most prominent Development organizations? Well, that just came true for a group of students at Columbia University in the form of a Capstone Project for UNDP’s Crisis Bureau. The students got an opportunity to use data to explore the Humanitarian Development Peace Nexus.
Conflicts and UNDP funding in Sub-Saharan Africa
The capstone project focused on finding answers to several research questions related to the Sub-Saharan Africa region. The first challenge was identifying the relationship between the escalation of violent conflicts and the corresponding shift in UNDP’s funding around peace. The second question focused on finding insights to how UNDP responds to violent conflicts or adjusts itself ahead of these conflicts.
The data sources used for the project were UNDP Project Data and ACLED conflict event data. Out of these two data sources, UNDP Project Data contained data bucketed into different SDGs. However, this SDG tagging was incomplete. The team’s major accomplishment was categorizing the project dataset into 17 different SDGs. To accomplish this task, the team used supervised classification models such as, Random Forest, Catboost, SVM, Decision Tree and Naive Bayes. After an initial experiment on the above five models, the team found that the best model has 89% accuracy on the training data and 43% accuracy on the test data, and this was caused by model overfitting. The team applied a machine learning technique called tree-pruning to tackle this challenge. After pruning the tree to a simpler version, the team achieved roughly the same performance on training data and test data (~70%). addressing the overfitting problem. To further improve the model, the team used parameter optimization techniques to find the best parameters for the current tree. The resulting dataset will reinforce UNDP’s project data repository and enable more granular and precise analytics.
The correlation between UNDP projects data and conflict events
The next stage was correlation analysis in which the team tried establishing a viable relationship between development data and crisis data from the tagged data set accomplished in the previous stage. The team compared the amount of budget allocated in a particular area with the number of violent events to determine a relationship between them. This process was then performed for all the 44 different countries individually and a comprehensive analysis was derived.
The team found that while all types of projects are positively correlated with next year’s violent conflict level, most projects are also positively correlated with the previous year’s violent conflicts. In particular, projects related to the SDGs “No poverty” and “Peace, justice and strong institutions” have a stronger correlation, possibly indicating that these project types are launched pre-emptively.
The collaboration with UNDP not only enabled the students to find different ways to apply what they have learned from their courses at Columbia but also, extend their findings to gain insights into various contemporary and historical crisis situations. The Capstone Project put the students in a position to learn more about data science in general as well as giving them a deeper understanding into curbing these devastating human-made crises.