Reputation
Badges 1
25 × Eureka!I think there is a bug on the UI that causes series with "." to only use the first part of the series name for the color selection. This means "epsilon 0" and "epsilon 0.1" will always get the same color, and this will explain why it works on other graphs
HighOtter69 I was able to change the color individually without an issue. What's your clearml-server ? are you using the community server ?
HighOtter69 , let me check something
HighOtter69 inside the legend click on the color rectangle next to the series name, you can change the color of the series on the graph. This property is stored so it will always remember your color preferences (yes even logging from another machine 🙂 )
HighOtter69
Could you test with the latest RC? I think this fixed it:
https://github.com/allegroai/clearml/issues/306
the separate experiments are not starting back at iteration 0
What do you mean by that?
We workaround the issue by downloading the file with a request and unzipping only when needed.
We have located the issue, it seems the file-server is changing the header when sending back the file (basically saying CSV with gzip compression, which in turn will cause any http download client to automatically unzip the content). Working on a hot fix for it 🙂
YummyMoth34
It tried to upload all events and then killed the experiment
Could you send a log?
Also, what's the train package version ?
Hi YummyMoth34 they will keep on trying to send reports.
I think they try for at least several hours.
No, I mean actually compare using the UI, maybe the arguments are different or the "installed packages"
BattyLion34
if I simply clone nntraining stage and run it in default queue - everything goes fine.
When you compare the Task you clone manually and the Task created by the pipeline , what's the difference ?
BattyLion34 are you saying you do not have the "APP CREDENTIALS" section in the profile page?
Full markdown edit on the project so you can create your own reports and share them (you can also put links to the experiments themselves inside the markdown). Notice this is not per experiment reporting (we kind of assumed maintaining a per experiment report is not realistic)
Sure thing! this feature is all you guys, ask and shall receive 🙂
😞 It's working as expected for me...
That said I tested on Linux & pip,
Any specific req to test with? from the log I see this is conda on windows, are you using the base conda env or a venv inside conda?
but this gives me an idea, I will try to check if the notebook is considered as trusted, perhaps it isn't and that causes issues?
This is exactly what I was thinking (communication with the jupyter service is done over http, to localhost, sometimes AV/Firewall software will block it, false-positive detection I assume)
I did not start with python -m, as a module. I'll try that
I do not think this is the issue.
It sounds like anything you do on your specific setup will end with the same error, which might point to a problem with the git/folder ?
Nice! I'll see if we can have better error handling for it, or solve it altogether 🙂
Great if this is what you do how come you need to change the entry script in the ui?
I am logging debug images via Tensorboard (via
add_image
function), however apparently these debug images are not collected within fileserver,
ZanyPig66 what do you mean not collected to the file server? are you saying the TB add_image is not automatically uploading images? or that you cannot access the files on your files server?
And you are calling Task.init? And the scalars show under scalars and the images are not under debug samples?
OddShrimp85
the Task id is UUID that is generated by the backend server, there is no real way to force it to have a specific value 😞
EnviousPanda91 notice that when passing these arguments to clearml-agent you are actually passing default args, if you want an additional argument to Always be used, set the extra_docker_arguments
here:
https://github.com/allegroai/clearml-agent/blob/9eee213683252cd0bd19aae3f9b2c65939d75ac3/docs/clearml.conf#L170
it will only if oom killer is enabled
true, but you will still get OOM (I believe). I think the main issue is the even from inside the container, when you query the memory, you see the entire machine's memory... I'm not sure what we can do about that
Hi RotundSquirrel78
How did you end up with this command line?/home/sigalr/.clearml/venvs-builds/3.8/code/unet_sindiff_1_level_2_resblk --dataset humanml --device 0 --arch unet --channel_mult 1 --num_res_blocks 2 --use_scale_shift_norm --use_checkpoint --num_steps 300000
the arguments passed are odd (there should be none, they are passed inside the execution) and I suspect this is the issue
If you wan to change the Args, go to the Args section in the Configuration tab, when the Task is in draft mode you can edit them there
That being said it returns none for me when I reload a task but it's probably something on my side.
MistakenDragonfly51 just making sure, you did call Task.init, correct ?
What duesfrom clearml import Task task = Task.current_task()
returns ?
Notice that you need to create the Task before actually calling Logger.current_logger()
or Task.current_task()
I ended up using
task = Task.init(
continue_last_task
=task_id)
to reload a specific task and it seems to work well so far.
Exactly, this will initialize and auto log the current process into existing task (task_id). Without the argument continue_last_task ` it will just create a new Task and auto log everything to it 🙂