Hi CooperativeFox72 ,
From the backend guys, long story short, upgrade your machine => more cpu cores , more processes , it is that easy š
Hi GiganticTurtle0
Sure, OutputModel can be manually connected:model = OutputModel(task=Task.current_task()) model.update_weights(weights_filename='localfile.pkl')
Also, I just wanted to say thanks for the tool! I'm managing a small data science practice and it's going to be really nice to have a view of all of the experiments we've got and know our GPU utilization, all without having to give every data scientist access to each box where the workflows are run. Incredibly stoked.
ā„ ā¤ ā„
iām working on creating a custom config with istio
That is awesome! let me know if we could help š
Also please consider PRing it, I'm sure other users will appreciate the option
That is correct.
Obviously once it is in the system, you can just clone/edit/enqueue it.
Running it once is a mean to populate the trains-server.
Make sense ?
yes, TrickySheep9 use the k8s glue from here:
https://github.com/allegroai/clearml-agent/blob/master/examples/k8s_glue_example.py
EnviousPanda91 please feel free to PR if it works š
https://github.com/allegroai/clearml/blob/86586fbf35d6bdfbf96b6ee3e0068eac3e6c0979/clearml/binding/frameworks/catboost_bind.py#L114
I want to use services queue for running services, and I want to do it on k8s
So yes, as a standalone pod with the agent in venv mode (as opposed to docker mode)
Does that make sense to you?
The thing I don't understand is how come this DOES work on our linux setups
I do not think it actually works... I could not have find a code that will convert the ENV in the config string ...
I'll be happy to test it out if there's any commit available?
Please do, and feel free to PR it š
https://github.com/allegroai/clearml/blob/d3e986393ac8d1a1ea48302224962570ab8e6f9e/clearml/backend_api/session/session.py#L576
https://github.com/allegroai/clearml/blob/d3e98639...
Hmmm that is odd... based on the reply "'Task' object has no attribute 'hyperparams'", I would assume API version is lower then 2.9. But you specifically said you see Session.api_version == 2.9
is that correct?
And you have the exact same folder structure / content, and server A/B give a different set of experiments ?
(is serverB empty, meaning no experiments at all?)
OutrageousGiraffe8 this sounds like a bug, how can we reproduce it?
Maybe a add another layer here?
https://github.com/allegroai/clearml/blob/a47f127679ebf5912690f7c3e60791a2daa5c984/examples/frameworks/tensorflow/tensorflow_mnist.py#L40
No idea, I just remember it is relatively old š
Hi JitteryCoyote63
Signal 9 is killed signal, could it be someone killed the process ? Do you have other logs to share ? Is this reproducible ?
I mean clone the Task in the UI (right click Clone), then go to the execution Tab, to the "installed packages" section, then click on Edit -> go to the torchvision http link, and replace it with torchvision == 0.7.0
and save.
Then right enqueue the Task (to the default queue) and see if the Agent can run it,
DeterminedToad86 Make sense ?
WickedElephant66 this seems like a general network issue, like the docker service is missing your companies firewall certificate.
Can you pull any container from docker hub ?
That said, it might be different backend, I'll test with the demoserver
Hi JitteryCoyote63 , let me check, this backwards compatibility might only apply for API version mismatch between the client and server.
Not really sure that's easily done ... I mean you could query the data, but I'm not sure how you would import it. Btw why would you move from pro to self hosted?
JitteryCoyote63 are you calling to:my_task.output_uri = "
s3://my-bucket
in the code itself ?
Why not with Task.init output_uri=...
Also this is running remotely there is no need fo r that, use the Execution -> Output -> Destination and put it there, it will do everything for you š
JitteryCoyote63 Is this an Ignite feature ? what is the expectation ? (I guess the ClearML Logger just inherits from the base ignite logger)
corporate firewall... let's start with http š
LOL, Okay I'm not sure we can do something that one.
You should probably increase the storage on your instance š
GreasyLeopard35 I think you are on to something, I think UniformParameterRange just misses a min value:
https://github.com/allegroai/clearml/blob/fcad50b6266f445424a1f1fb361f5a4bc5c7f6a3/clearml/automation/parameters.py#L168
Should be:[self.min_value + v*step_size for v in range(0, int(steps))]
I can add files to the data set, even after I finish the experiment?
Correct
https://clear.ml/docs/latest/docs/clearml_data#creating-a-dataset
https://clear.ml/docs/latest/docs/guides/data%20management/data_man_cifar_classification
https://github.com/allegroai/clearml/blob/master/docs/datasets.md#create-dataset-from-code
Hmm I assume it is not running from the code directory...
(I'm still amazed it worked the first time)
Are you actually using "." ?