NIH Deeplesion, auto download version#1224
Conversation
…nually download data
…ed manual_down_instruction
|
The tests are failing with: |
fixed |
add comments and minor change the code for readability
delete empty line; remove unused line
|
Thanks for the review! @cyfra |
|
Hey @jason-zl190 - unfortunately this dataset turns out to be quite large - so in such cases, we'd like to have it use 'iter_archive' approach, rather than 'download_and_extract'. The iter_archive, makes it more efficient (as you 'read' data only once). |
Hi, @cyfra Thanks for the review and notice. My concern is that whole zip archives will be iterated three times if they are not allowed to be extracted out. This is because those zip archives aren't compressed separately on splits. I need to iterate all zip archives once for collecting training data, once for validation data, and once for test data. However, I can refactor the code using Another concern is that I plan to provide contextual images in the next version. However, I got |
|
@jason-zl190 does NIH predefine a validation/test split? |
Hi @Ouwen. They do. They provided a CSV file, named "DL_info.csv," in which each key slice belongs to a split. However, the zip files they provided were compressed and separated orderly. For each split, images are randomly scattered in those archives. |
|
@jason-zl190 - first, thanks a lot for your contribution - this is quite a large dataset, so it is bound to be challenging ;-) What I'd suggest is: b) for the contextual images part: I'd tackle it in the separate PR. As you've mentioned, there are couple challenges (including the size of the records). One thing that you could do now - is to make sure that you include the 'series' name somewhere in the record (currently you include only file name from what I see). |
refactor the code to read data directly from archives
@cyfra Hi, Thanks for the suggestions. I changed the code to read data directly from zip files. However, I didn't use |
chagne version to 1.0.0
|
@jason-zl190 will do. |
correct user.name and user.email in the commit history