Reputation
Badges 1
25 × Eureka!what is the best approach to update the package if we have frequent update on this common code?
since this package has an indirect affect on the model endpoint, I would package with the preprocess code of the endpoint.
Each server is updating it's own local copy, and it will make sure it can take it and deploy it hand over hand without breaking its ability to serve these endpoints.
the "wastefulness" of holding multiple copies is negligible when comparing to a situation where everyone ...
trains-agent doesn't run the clone, it is pip...
basically calling "pip install git+https://..."
Not sure you can pass extra arguments
Also, this is not a setup problem, otherwise it would have seen consistently failing ... this actually looks like a network issue.
The only thing I can think of is retrying to install if we get network error (not sure whats the exit code of pip though (maybe 9?)
So you want these two on two different graphs ?
SuperiorDucks36 you mean to manually set an experiment (and the dummy Task is just a way to have an entry to configure), do I understand you correctly ?
Following on that, we are thinking of doing it all for you with a CLI , that will basically create a task from a code/repo you already have on your machine. What do you think?
You mean one machine with multiple clearml-agents ?
(worker is a unique ID of an agent, so you cannot have two agents with the exact same worker name)
Or do you mean two agents pulling from the same queue ? (that is supported)
Is there a way to document these non-standard entry points
@<1541954607595393024:profile|BattyCrocodile47> you should see the "run" in the Args section under Configuration
in case of HF you should see "-m huggingface" and then the rest in the Args section
(if this does not work, then I assume this is a bug π )
The idea is of course that you can always enqueue and reproduce, so if that part is broken we should fix it π
ClumsyElephant70 yes there is πclearml-agent build --id <task id> --target <folder>(I might have a typo there, but you can basically check the full help clearml-agent build --help )
Ssh is used to access the actual container, all other communication is tunneled on top of it. What exactly is the reason to bind to 0.0.0.0 ? Maybe it could be a flag that you, but I'm not sure in what's the scenario and what are we solving, thoughts?
Let me know if I can be of help π
Hi GiddyPeacock64
If you already have K8s setup, and are already using ClearML.
In your kubeflow Yaml:trains-agent execute --id <task_id> --full-monitoringThis will install everything your Task needs inside the docker. Just make sure that you pass the env variable setting the ClearML , see here:
https://github.com/allegroai/clearml-server/blob/6434f1028e6e7fd2479b22fe553f7bca3f8a716f/docker/docker-compose.yml#L127
Could you test with the same file? Maybe timeout has something to do with the file size ?
CooperativeFox72 you can you start by checking the latest RC :)pip install trains==0.15.2rc0
Well that depends on how you think about the automation. If you are running your experiments manually (i.e. you specifically call/execute them), then at the beginning of each experiment (or function) call Task.init and when you are done call Task.close . This can be done in parallel if you are running them from separate processes.
If you want to automate the process, you can start using the trains-agent which could help you spin those experiments on as many machines as you l...
ReassuredTiger98
Okay, but you should have had the prints ...uploading artifactanddone uploading artifactSo I suspect something is going on with the agent.
Did you manage to run any experiment on this agent ?
EDIT: Can you try with artifacts example we have on the repo:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
yes, so you can have a few options π
Do people generally update the same model βentryβ? That feels so wrong to meβ¦how do you reproduce a older model version or do a rollback etc?
Correct, they do not π On the Task itself the output models will reflect the diff filenames you saved, usually ppl just add a running number.
Ohh ignore the YAML
Hi OutrageousGiraffe8
Does anybody knows why this is happening and is there any workaround, e.g. how to manually report model?
What exactly is the error you are getting? and with which clearml version are you using?
Regrading manual Model reporting:
https://clear.ml/docs/latest/docs/fundamentals/artifacts#manual-model-logging
Hi NastyFox63 yes I think the problem was found (actually backend side).
It will be solved in the upcoming release (due after this weekend π )
GreasyPenguin66 Nice !!!
Very cool setup, and kudos on making it work with multiple users!
Quick question, shouldn't the JUPYTERHUB_API_TOKEN env variable be enough to gain access to the server? Why did you need to add it to the 'nbserver-x.json' as well?
JitteryCoyote63 instead of _update_requirements, call the following before Task.init:Task.add_requirements('torch', '1.3.1') Task.add_requirements('git+ ')
Hi PerplexedWalrus3
you should get something like the following on the console :ClearML Task: created new task id=1ca59ef1f86d44bd81cb517d529d9e5a 2021-07-25 13:59:09 ClearML results page: 2021-07-25 13:59:16
GreasyPenguin66 you can pass:AZURE_STORAGE_ACCOUNT AZURE_STORAGE_KEYAs the default azure access/secret π
but now sinceΒ
Task.current_task()
Β doesn't work on the pipeline object we have a serious problem
How is that possible ?
Is there a small toy code that can reproduce it ?
Hi @<1598487094601191424:profile|MysteriousCow84>
only one of them uses an already created venv from cache for this task. And the other node starts to re-create the same virtual environment.
Just be clear, the second one is running, but it does not use the same venv as the other one (that is running in parallel), is that correct?