
Reputation
Badges 1
25 × Eureka!Add '/' , like you would with a file system.Task.init(project_name='main_project/sub_project', task_name='test')
You can always access the entire experiment data from python
'Task.get_task(Id).data'
It should all be there.
What's the exact use case you had in mind?
Hi @<1567321739677929472:profile|StoutGorilla30>
Is it necessary to serve keras model using triton engine?
It is not, but it is the most efficient way to serve keras models, and this is why by default clearml-serving is using Nvidia Triton (we are talking 10x factors)
I would start with the keras example, see that it works and then work your way into your example (notice you always need to provide the layers form the in/out of the model)
[None](https://github.com/allegroai/clearml-s...
Since this fix is all about synchronizing different processes, we wanted to be extra careful with the release. That said I think that what we have now should be quite stable. Plan is to have the RC available right after the weekend.
Okay so the way it works is that it moves all the logging to background process, But if you have a Lot of data, actually pushing the data between python processes is Not very efficient. This line basically tells it to just use background thread (instead of background process), for sending the data to the server.
The idea behind using background process in the first place is to better support pytorch workers that spin a lot of subprocesses, and we do not want to add a thread per process and in...
My task starts up and checks the mounted EFS volume for x data, if x data does not exist there, it then pulls x data from S3.
BoredHedgehog47 you can just use StorageManager and configure clearml cache for the EFS, it will essentially do the same 🙂
Regrading helm chart with EFS,
you need to configure the clearml-glue pod template with the EFS mount
example :
https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/e7f647f4e6fc76f983d61522e635353005f1472f/examples/kubernetes/volu...
DilapidatedDucks58 if you have so many parameters, why don't you use the
task.connect_configuration(dict)
It will put it in the artifacts, as an editable json alike string.
AbruptHedgehog21 what exactly do you store as a Mode file ? is this a python object pickled ?
Hi @<1747428509627715584:profile|CumbersomeDuck6>
but is it possible to use ClearML in Rust, without writing a wrapper.
With the RestAPI you can...
noticed the API doesnt cover dataset operations but the CLI can.
Yes the CLI will fetch/create datasets for you,
wdyt?
@<1523701304709353472:profile|OddShrimp85> are you trying to shut down the one running on your machine ?
That should spin up an instance, right? (it currently doesn't, and I'm not sure where to debug)
Do you see the AWS scaler Task running ?
(This is the code/process that actually spins a new EC2 instance)
In the UI you can see all the agents and their IDs
Then you can so
clearml-agent daemon --stop <agent id>
Hi DepressedChimpanzee34
How do I reproduce the issue ?
What are we expecting to get there ?
Is that a Colab issue or hyper-parameter encoding issue ?
So it seems to get the "hint" from the type:
This will worktf.summary.image('toy255', (ex * 255).astype(np.uint8), step=step, max_outputs=10)
wdyt, should it actually check min/max and manually cast it ?
. but when we try to do a "New Run" from UI, it tries to follow the DAG of previous run (the run with all child nodes skipped) and the new run fails too.
This is odd, is this reproducible ? what's the clearml python package version ?
yes
argument saying always create from code
can be helpful
@<1523701523954012160:profile|ShallowCormorant89> any chance you can open a github issue on that, just so we do not forget ?
if we can edit the configuration objects of a pipeline, that can be beneficial too. which we're unable to do from UI
Actually you already can, after you clone the pipeline, you can press on details then go to configuration Tab, and edit the pipeline object. The format is HOCON (...
(Caused by SSLError(SSLError(1, '[SSL: DECRYPTION_FAILED_OR_BAD_RECORD_MAC] decryption failed or bad record mac
Where is the code running (agent) GCP instance ? your machine ?
What about Calling Taskl.init Without the agent?
The confusion matrix shows under debug sample, but the image is empty, is that correct?
Hi JitteryCoyote63 report_frequency_sec=30.
controller how frequently monitoring events are sent to the server, default is every 30 seconds (you can change the UI display to wall-time to review). You can change it to 180 so it will only send an event every 3 minutes (for example).
sample_frequency_per_sec is the sampling frequency it uses internally, then it will average the results over the course of the report_frequency_sec
time window, and send the averaged result on the repo...
the error for uploading is weird
wait, are you still getting this error?
So it should cache the venvs right?
Correct,
path: /clearml-cache/venvs-cache
Just making sure, this is the path to the host cache folder
ClumsyElephant70 I think I lost track of the current issue 😞 what's exactly not being cached (or working)?
Hi PanickyMoth78 an RC with a fix is out, let me know if it works (notice you can now set the max_workers from CLI or Dataset functions) pip install clearml==1.8.1rc1
its should logged all in the end as I understand
Hmm let me check the code for a minute
Thanks NonchalantDeer14 !
BTW: how do you submit the multi GPU job? Is it multi-gpu or multi node ?
Hi FriendlyKoala70 , trains will report all the tensorboard graphs, I'm assuming that's who is creating the epoch_lr graph. On top of it, you can always report manually with logger (as you pointed). Does that make sense to you?
Hi RipeGoose2 all PR's are welcome, feel free to submit :)
Hi @<1644147961996775424:profile|HurtStarfish47>
. I see
Add image.jpg
being printed for all my data items ...
I assume you forgot to call upload
? the sync "marks" files for uploaded / deletion but the upload call actually does the work,
Kind of like git add / push , if that makes sense ?