PM Martijn

Preprocessing your data

Preprocessing is the general step of transforming your data so that it is easier to work with. What kind of preprocessing you will need is thus dependent on your input data and your specific research context. Also, it is not uncommon that you will need multiple different preprocessing steps in your analysis. At the start of your project, you will generally not know what exact preprocessing you will need. Also, if you split up your preprocessing in different steps, it is also generally more manageable and more reusable. If you are creating your own machine-readable dataset from documents in PDF, DOCX or similar formats, extracting the (relevant) text from these files will generally be your first preprocessing step. However, it is also possible that you already have a dataset of machine readable court judgements, but you want to look at specific segments and will thus need to split up these judgements accordingly.

TODO: 

Last updated: 11-Dec-2024