Spaces:
Build error
Build error
Temp Directory Cleanup Guide
Problem
The temp directory accumulates many intermediate GROMACS files during MD simulation and descriptor computation. Most of these files are only needed during computation and can be safely deleted afterward, significantly reducing storage requirements.
Solution
A cleanup utility has been created that:
- Identifies required files needed for pipeline functionality
- Removes intermediate files that can be regenerated or are no longer needed
- Can be run manually or integrated into the pipeline
Required Files
The pipeline needs these files to function:
Critical Files (Cannot Delete)
- Structure files:
{antibody_name}.pdb,processed.pdb,processed.gro,topol.top,index.ndx - Final trajectories:
md_final_{temp}.xtc,md_final_{temp}.gro,md_{temp}.tpr(per temperature) - Descriptors:
descriptors.csv/descriptors.pkl, all*.xvgfiles,res_sasa_{temp}.np,sconf_{temp}.log
Optional Files (Can Delete but Useful)
- Order parameter CSVs:
order_s2_*.csv,order_lambda_*.csv(can be regenerated)
Files That Can Be Deleted
- GROMACS backup files:
#*#pattern (e.g.,#aver.xvg.1#,#aver.xvg.10#) - These are backups created when GROMACS overwrites files - Intermediate trajectories:
md_whole_*.xtc,md_nopbcjump_*.xtc,md_{temp}.xtc - Equilibration files:
nvt_*.gro,nvt_*.xtc,npt_*.gro,npt_*.xtc, etc. - System setup:
box.gro,solv.gro,solv_ions.gro,em.gro, etc. - Checkpoints:
*.cptfiles - Energy files:
*.edrfiles - Logs:
md_*.log(exceptsconf_*.log)
Usage
Manual Cleanup
# Dry run to see what would be deleted (recommended first step)
python src/cleanup_temp_files.py run_data/run_2/temp \
--antibody-name "my_antibody" \
--temperatures 300 350 400 \
--dry-run
# Actually delete intermediate files
python src/cleanup_temp_files.py run_data/run_2/temp \
--antibody-name "my_antibody" \
--temperatures 300 350 400
# Also delete order parameter CSVs
python src/cleanup_temp_files.py run_data/run_2/temp \
--antibody-name "my_antibody" \
--temperatures 300 350 400 \
--delete-order-params
Automatic Cleanup (Integrated)
The cleanup is now integrated into the pipeline. Configure it in your YAML config:
performance:
cleanup_temp: true # Enable automatic cleanup
cleanup_after: "inference" # When to cleanup: "descriptors" or "inference"
delete_order_params: false # Also delete order param CSVs (default: keep)
The cleanup runs automatically after inference completes (if cleanup_temp: true).
File Size Impact
Before cleanup: ~10-50 GB per antibody (depending on simulation length) After cleanup: ~1-2 GB per antibody
Savings: 80-95% reduction in storage requirements
Safety Features
- Dry run mode: Always test with
--dry-runfirst - Required file protection: Never deletes files needed for pipeline
- Suspicious file detection: Warns about files not in known patterns
- Non-fatal errors: Cleanup failures don't stop the pipeline
When to Clean Up
Option 1: After Inference (Recommended)
- Cleanup runs automatically after predictions are made
- All required files preserved for re-running inference
- Cannot re-run descriptor computation without re-running MD
Option 2: After Descriptors
- Set
cleanup_after: "descriptors"in config - Preserves ability to re-run inference
- Cannot re-compute descriptors without re-running MD
Option 3: Manual Only
- Set
cleanup_temp: falsein config - Run cleanup utility manually when needed
- Full control over when cleanup happens
Troubleshooting
"Missing required files" error after cleanup
- Check that cleanup didn't accidentally delete required files
- Verify antibody name and temperatures match your run
- Restore from backup if needed
Cleanup didn't delete expected files
- Check file patterns match your simulation setup
- Some files may be in use (close any open file handles)
- Check file permissions
Want to keep more files
- Modify
REQUIRED_FILESincleanup_temp_files.py - Or set
cleanup_temp: falseand clean manually
Files Reference
See src/REQUIRED_FILES.md for complete list of required vs. intermediate files.