| Attributes | Descriptions |
| Data_id_num | A number that the same as the name of the CV file and uniquely identifies each row of data. |
| Blind_resume | The name of the CV file that has been anonymized |
| Position | The target position/title required by the relevant consulting service or project |
| Project_and_role_detail_description | A description of the expected expertise, competencies, past experience, and project-specific requirements for the position. |
| Client_name | The anonymized name of the client |
| Client_Industry | The client's industry |
| Label | A binary label that indicates the accuracy of the match. |
| Technologies | Technologies in which the candidate has expertise and actively uses in their projects. |
| Education | The university and department the candidate graduated from. |
| Experience_work_experience | The candidate's previous companies, positions, work dates, and key responsibilities. |
| Projects | The projects in which the candidate has participated. |
| Similarity | The semantic similarity score between the Position + Project and Role Detail Description and the CV text (float, between 0 and 1). |
| CV_word_count | Total word count in the CV text. |
| Common_kw_count | Total number of words in the position + role text used in the CV. |
| Position_kw_in_cv | Total number of times the words in the position name appear in the CV. |
Table 1: Attributes and their descriptions
| Model | Hyperparameter Range |
| SVM | “C_Values”: [0.1 - 100] |
| LightGBM | “Verbosity”: [-1], |
| “Seed”: [42], | |
| “Num_Boost_Round”: [100], | |
| “Num_Leaves”: [15, 31, 63], | |
| “Min_Child_Samples”: [5, 10, 20], | |
| “Max_Depth”: [-1, 5, 10], | |
| “Learning_Rate”: [0.01, 0.05, 0.1], | |
| “N_Estimators”: [50, 100, 200], | |
| “Min_Split_Gain”: [0, 0.01, 0.1] | |
| LR | “Random_State”: [42], |
| “Max_Iter”:[100-1000], | |
| “C”: [0.01, 0- 100] | |
| ELM | “N_Hidden”: [50] |
| “Random_State”: [42] | |
| Voting | LR (“C”: [10], |
| “Max_Iter”: [1000], “Random_State”:[42]) | |
| SVM(“C”:1.0, “Random_State”: [42]) | |
| LightGBM (“Learning_Rate”: [0.01], “Max_Depth”: [-1], “Min_Child_Samples”: [20], “Min_Split_Gain”: [0], “N_Estimators”: [100], “Num_Leaves”: [15], “Random_State”: [42]) | |
| Bagging | LR (“N_Estimators”: [50]) |
| SVM and LightGBM (“N_Estimators”: [20], | |
| “Random_State”: [42], “N_Jobs”: [-1]) | |
| Boosting | LR (“C”: [10], “Max_Iter”:[1000], “Random_State”: [42]), “N_Estimators” : [20], “Learning_Rate”: [1.0]) |
| LightGBM (“Learning_Rate” : [0.01], “Max_Depth” : [-1], “Min_Child_Samples”: [20], “Min_Split_Gain”: [0], “N_Estimators”: 100, “Num_Leaves”: [15], “Random_State”: [42]) | |
| Stacking | LR (“C”: [10], “Max_Iter”: [1000]) |
| SVM (“C”: [1.0]) | |
| LightGBM (“Learning_Rate”: [0.01], “Max_Depth”: [-1], “Min_Child_Samples”: [20], “Min_Split_Gain”: [0], “N_Estimators”: [100], “Num_Leaves”: [15]) |
Table 2: Model hyperparameter ranges
| Method | Model Type | Precision | Recall | Accuracy | F1-Score |
| SVM | Model 1 | 0.6333 | 0.6333 | 0.6271 | 0.6333 |
| Model 2 | 0.7037 | 0.6333 | 0.678 | 0.6667 | |
| Model 3 | 0.7241 | 0.7 | 0.7118 | 0.7118 | |
| Model 4 | 0.7586 | 0.7333 | 0.7458 | 0.7458 | |
| Model 5 | 0.5946 | 0.7333 | 0.6102 | 0.6567 | |
| LightGBM | Model 1 | 0.6786 | 0.6333 | 0.661 | 0.6552 |
| Model 2 | 0.7931 | 0.7767 | 0.7797 | 0.7797 | |
| Model 3 | 0.7812 | 0.8333 | 0.7966 | 0.8065 | |
| LR | Model 1 | 0.7 | 0.7 | 0.6949 | 0.7 |
| Model 2 | 0.7778 | 0.7 | 0.7458 | 0.7368 | |
| Model 3 | 0.7667 | 0.7667 | 0.7627 | 0.7667 | |
| ELM | Model 1 | 0.7857 | 0.7333 | 0.7627 | 0.7586 |
| Voting | Hard Voting | 0.8276 | 0.8 | 0.8136 | 0.8136 |
| Soft Voting | 0.8462 | 0.7333 | 0.796 | 0.7857 | |
| Bagging | LightGBM | 0.8065 | 0.8333 | 0.8136 | 0.8197 |
| SVM | 0.8 | 0.8 | 0.7966 | 0.8 | |
| LR | 0.8214 | 0.7667 | 0.7966 | 0.7931 | |
| Boosting | Boosting - LightGBM | 0.7143 | 0.8333 | 0.7458 | 0.7692 |
| AdaBoost - LR | 0.75 | 0.7 | 0.7288 | 0.7241 | |
| Stacking | LR + SVM + LightGBM | 0.75 | 0.8 | 0.7627 | 0.7742 |
Table 3: The results obtained with the developed models
Tables at a glance