Reputation
Badges 1
25 × Eureka!That said, it might be different backend, I'll test with the demoserver
Basically try with the latest RC 🙂
pip install trains 0.15.2rc0
Hi SmoothSheep78
Do you need to import the previous state of the trains-server, or are you starting from scratch ?
Could you manually configure the ~/trains.conf ?
(Just copy paste the section from the UI)
then try to run:trains-agent list
corporate firewall... let's start with http 🙂
I think the easiest way is to add another glue instance and connect it with CPU pods and the services queue. I have to admit that it has been a while since I looked at the chart but there should be a way to do that
Hi IcySwallow94
Are you deploying the clearml server with the helm chart ?
Hi @<1544128915683938304:profile|DepravedBee6>
You mean like backup the entire instance and restore it on another machine? Or are you referring to specific data you want to migrate?
BTW if you are upgrading old versions of the server I would recommend upgrading to every version in the middle (there are some migration scripts that need to be run in a few of them)
Hi ShinyPuppy47 ,
Yes that is correct. Use Task.init for automagic logging
Hmm I see your point.
Any chance you can open a github issue with a small code snippet to make sure we can reproduce and fix it?
Hi SpicyOtter88plt.plot([0, 1], [0, 1], 'r--', label='')
ti cannot have a legend without a label, so it gives it "anonymous" label, I think it should just get "unlabeled 0" wdyt?
Having the ability to pack jobs/tasks onto the same "resource" (underlying server/EC2 instance)
This is essentially a "queue". Basically a queue is a way to abstract a specific type of resource, so that you can achieve exactly what you descibed.
open up a streaming use case, wherein batch (offline) inference could be done directly inside of a ClearML pipeline in reaction to an event/trigger (like new data landing in your data lake).
Yes, that's exactly how clearml is designed, a...
multiple machines and reporting to the same task.
Out of curiosity , how do you launch it on multiple machines?
reporting to the same task.
So the "funny" think is, they all report on on top (overwriting) the other...
In order for them to report individually, it might be that you need multiple Tasks (i.e. one per machine)
Maybe we could somehow have prefix with rank on the cpu/network etc?! or should it be a different "title", wdyt?
and the clearml server version ?
The release was supposed to be out this week, got delayed by some py2 support issue, anyhow the release will be almost exactly like the latest we now have on the GitHub repo (and I'm assuming it will be out just after the weekend)
Hmm can you test with the latest RC? or even better from the GitHub (that said the Github will break the interface, as we upgraded the pipeline 🙂 )
Oh that is odd... let me check something
You need to use tf.summary.image and not summary_ops_v2.image
Fixed on main branch (see github issue), RC later today
Image needs to be in range [0, 1] and not [0, 255] (matplotlib and tensorboard can handle either one)
Is there a code to reproduce ?
I get the same "white" image in both TB & ClearML 😞
Hi JumpyPig73
Funny enough this is being fixed as we speak 🙂
The main issue is that as you mentioned, ClearML does not "detect" the exit code when os.exit() is called, and this is why it is "missing" the failed test (because as mentioned, all exceptions are caught). This should be fixed in the next RC
You mean the job with the exact same arguments ?
do you have other arguments you are passing ?
Are you using Optuna / HBOB ?
DeliciousSeal67 the agent will use the "install packages" section in order to install packages for the code. If you clear the entire section (you can do that in the UI or programmatically) then it will revert to requirementsd.txt
Make sense ?
Hi BrightGoat74
So merging general purpose plotly plots is very hard (i.e. putting both on the same graph)
But if you report using logger.report_scatter(...) the UI will merge the ROC curves into the dame graph, wdyt?
https://clear.ml/docs/latest/docs/guides/reporting/scatter_hist_confusion_mat_reporting#2d-scatter-plots
Hi ScaryKoala63
Which versions are you using (clearml / lightning) ?
I think the ClearmlLogger is kind of deprecated ...
Basically all you need is Task.init at the beginning , the default tensorboard logger will be caught by clearml
And where is the ClearmlLogger
comming from?
Hi ScaryKoala63
Sure, add the following to your clearml.conf:sdk.storage.cache.default_cache_manager_size = 400
I think you are correct, it seems like for some reason you hit the cache limit, and a previous entry was deleted