Hi DilapidatedDucks58 ,
Just making sure all 8 works have different worker ids? (you can see 8 in the workers page in the UI)
Also, are they running this docker or venv mode?
it might be that there is not enough space on our SSD, experiments cache a lot of preprocessed data during the first epoch...
Could you verify you have 8 subfolders named 'venv.X' in the cache folder ~/. trains ?
If that's the case check the free space in the monitoring of the experiment, you will find the free space in GB logged