Reputation
Badges 1
25 × Eureka!Thanks MagnificentSeaurchin79 ! This code snippet is exactly what I needed, let me check if I can reproduce it.
but somewhere along the way, the request actually remove the header
Where are you seeing the returned value?
Metadata might be expensive, it's a RestAPI call, and we have found users putting hundreds of artifacts, with preview entries ...
Hi SpotlessLeopard9
I got many tasks that were just hang at the end of the script without ...
I remember this exact issue was fixed with 1.1.5rc0, see here:
https://clearml.slack.com/archives/CTK20V944/p1634910855059900
Can you verify with the latest RC?pip install clearml==1.1.5rc3
You should manually remove the cudatoolkit from the installed packages section in the UI, then try to send it to the agent and see if it works. The question is how it ended there in the first place
The problem is not really for the agents to wait (this is easily solved by additional high priority queue) the problem is will you have a "free" agent... you see my point ?
SweetGiraffe8 Task.init will autolog everything (git/python packages/console etc), for your existing process.
Task.create purely creates a new Task in the system, and lets' you manually fill in all the details on that Task
Make sense ?
Hi PunyGoose16 ,
I think the website is probably the easiest 🙂
https://clear.ml/contact-us/
I think they get back to quite quickly
Ohh "~/trains.conf" is root probably
I’m not sure if
https
will work because I want to use ssh keys for creds.
BTW: I was not aware github provide pypi like artifactory, do they ?
Regrading SSH keys, they are passed from the host machine (i.e. in venv mode it will use the SSH keys from the user running the agent, and n docker mode, they are automatically mapped into the container)
Hi ConvolutedSealion94
Just making sure, you spinned the docker-compose of the clearml serving as well ?
Task.debug_simulate_remote_task
simulates the Task being executed by the agent (basically same behaviour, only local). the argument it gets is the Task ID (string).
The to see how it works is to run the code once (no debug_simulate call), get the Task ID we created, then rerun with the debug_simulate_remote_task
passing the previous Task.ID
Make sense ?
which to my understanding has to be given before a call to an argparser,
SmarmySeaurchin8 You can call argparse before Task.init, no worries it will catch the arguments and trains-agent
will be able to override them :)
not really, the OS will almost never allow for that, actually it is based on fairness and priority. we can set the entire agent to have the same low priority for all of them, then the OS will always take CPU when needed (most of the time it won't) and all the agents will split the CPU's among them, no one will get starved 🙂 With GPUs , it is a different story, there is no actual context switching or fairness mechanisms like in CPU
Hi ItchyJellyfish73
This seems aligned with scenario you are describing, it seems the api server is overloaded with simultaneous connections.
Add an additional apiserver instance to the docker-compose and an nginx as load balancer:
https://github.com/allegroai/clearml-server/blob/09ab2af34cbf9a38f317e15d17454a2eb4c7efd0/docker/docker-compose.yml#L4
`
apiserver:
command:
- apiserver
container_name: clearml-apiserver
image: allegroai/clearml:latest
restart: unless-sto...
GrievingTurkey78 sure, aws autoscaler can do that:
https://github.com/allegroai/clearml/blob/master/examples/services/aws-autoscaler/aws_autoscaler.py
This is the thread checking the state of the running pods (and updating the Task status, so you have visibility into the state of the pod inside the cluster before it starts running)
So this is optuna 🙂 the idea is it will test which parameters have potential (with early stopping), then launch a subset of the selected parameters
Nice!
is trainsConfig
pure text blob ?
Also there was a truck that worked in the previous big, could you zoom out in the browser, and see if you suddenly get the plot?
okay this points to an issue with the k8s glue, I think it somehow failed to launch the pod. Can you send me the log of the clearml-k8s-glue ?
That makes total sense. The question was about the Mac users and OS environment in the configuration file and having that os environment set in code (this is my assumption as it seems that at import time it does not exist). What am I missing here?
Seems like passing the Task object is not working as expected (I'll make sure it is fixed).
Try:dataset._task.set_parent(Task.current_task().id)
Thanks BoredHedgehog47 !
And yes if the Task.init() call was only in main.py
then the TB inside the subprocess (train.py) would as you perceived not be captured.
Did you by any chance test calling Task.init in Both main.py
and train.py
?
Sorry my bad:config_obj['sdk']['stuff']['here'] = value