Week 9
This week, I worked on creating a GitHub repository that lets people replicate our work conveniently. This is also for storing most of my progress during the internship as my internship is nearing an end. Therefore, along with many scripts I created during the internship, I added the crossfilter data generator script that I used last week to the repository. In fact, instead of just storing the raw script, I created a bash script that does all the job (dataset generation) in a correct order for the convenience of users. All the user needs to do is specifying the input path, metadata path, and the output path in the bash file. In addition, I created a README file with an instruction on how to generate the data the same way our team did. In addition, I created a generator that extracts certain statistical information from the data as a whole, such as its standard deviation, mean, etc., because it is required for generating normalized arguments later to operations. Because a similar process could be found in crossfilter data generator script, I decided to scrape some of the codes written by Battle et al. This generator has three options: i) returning the input exactly in the format needed for the benchmark tester (not really human-readable); ii) returning the input in the way that is more human-readable; iii) saving the result in json since this was written in python (because the crossfilter data generator was written in python but our benchmarker is written in JavaScript.)