Hi DilapidatedDucks58 ,
Just making sure all 8 works have different worker ids? (you can see 8 in the workers page in the UI)
Also, are they running this docker or venv mode?
Could you verify you have 8 subfolders named 'venv.X' in the cache folder ~/. trains ?
it might be that there is not enough space on our SSD, experiments cache a lot of preprocessed data during the first epoch...
If that's the case check the free space in the monitoring of the experiment, you will find the free space in GB logged