Reputation
Badges 1
25 × Eureka!The odd thing it was able to authenticate but then it could not find the Task to delete.
Could it be someone already deleted the Task ?
(BTW: a new version of the cleanup service is in the working π )
Hi JitteryCoyote63
Signal 9 is killed signal, could it be someone killed the process ? Do you have other logs to share ? Is this reproducible ?
That makes total sense, this is exactly an OS scenario for signal 9 π
By your description it seems to make no difference whether I added the files via sync or add, since I will have to create a new dataset either way.
Sync is design to take a local folder/s and add/remove files from a dataset based on the local changes (it does that automatically based on file existence / content)
The changes (i.e. added files) are uploaded as delta changes relative to the parent version, this means we are not always uploading all files.
Add on the other hand means you...
Hi SkinnyPanda43
Every "commit" is a new version, so sync changes you need to either create a new version (with parent version as the previous one), and sync the local folder (or manually add/remove files).
If you do not need to actually store the "current" version, you can just reset the Task, and sync it again.
wdyt?
I might gave an idea, could you test with:
` from clearml import Task
Task._report_subprocess_enabled = False
...
real code here `
FreshReindeer51
Could you provide some logs ?
Follow up: I see that if I move an Experiment to a new project, it does not copy the associated model files and must be done manually.Β Once I moved the models to the new project, the query works as expected.
Correct π
Nice catch!
(I think it is the empty config file)
Well done man!
PompousParrot44 That should be very easy to do, basically a service mode code that clones a base task and puts it into a queue:
This should more or less do what you need :)
` from trains import Task
task = Task.init('devops', 'daily train', task_type='controller')
stop the local execution of this code, and put it into the service queue, so we have a remote machine running it.
task = execute_remotely('services')
while True:
a_task = Task.clone(base_task_id='aaabb111')
Task.enqueu...
Hi SlipperyDove40
plotly is about 4Mb... trains about 0.5MB what'd the breakdown of the packages ? This seems far away from 250Mb limit
I can raise this as an issue on the repo if that is useful?
I think this is a good idea, at least increased visibility π
Please do π
Hi RobustRat47
What do you mean by "log space for hyperparameter" , what would be the difference ? (Notice that on the graph itself you can switch to log scale when viewing in the UI) ?
Or are you referring to the hyper parameter optimization, allowing you to add log space ?
Yup, I just wanted to mark it completed, honestly. But then when I run it, Colab crashes.
task.close()
will do that
BTW what's the exception you are getting ?
pip cache & git cache & venvs cache
Are all supported, you just need to map the folders.
If you do not want to spin a PVC with NFS mount, you can just mount an S3 bucket with s3fs as part of the container extra bash script,
https://github.com/allegroai/clearml-agent/blob/b39b54bbafab39e6731cb742fdf317bc6dcae54a/docs/clearml.conf#L140
s3 FUSE fuse filesystems:
https://github.com/kahing/goofys
https://github.com/s3fs-fuse/s3fs-fuse
WDYT?
Hi JollyChimpanzee19
What are the versions (clearml , TF , PT), also could you add one more line from the stack (I.e. which call triggered the exception)
Hmm do you host it somewhere? Is it pre-installed on the container?
You can however change the prefix, and you can always have access to these links.
Any reason for controlling the exact output destination ?
(BTW: You can manually upload via StorageManager, and then register the uploaded link)
And when exactly are you getting the "user aborted" message)?
How do you start the process (are you manually running it, or is it an agent, or maybe pycharm?)
Can you provide the full log ?
GreasyPenguin14 you mean the artifacts/models ?
Okay, this is odd the request returned exactly 100 out 100.
It seems not all of them were reported?!
Could you post the toy code, I'll check what's going on.
I think I found something, let me test my theory
Let me rerun the code and check
Okay ConfusedPig65 I found the problem. For some reason the latest TF.keras.load_model . save_model is not tracked.
I'll make sure we push a fix later today
Hmm, I think it is this line:
WARNING - Model configuration only supports dictionary or string objects
done
Let me check something.
ConfusedPig65 could you send the full log (console) of this execution?
okay, let me know if it works