Reputation
Badges 1
25 × Eureka!EnviousStarfish54 Notice that you can configure it on the agent machine only, so in development you are not "wasting" storage when uploading debug checkpoints/models π
Hi PerplexedCow66
I'm assuming an extension for this:
https://github.com/allegroai/clearml-serving/issues/32
Basically JWT can be used as a general access/block all endpoints, which is most efficnely used if handled by k8s loadbalancer (nginx/envoy),
but if you want a per-endpoint check (or maybe do something based on the JWT values)
See adding JWT to FastAPI here:
https://fastapi.tiangolo.com/tutorial/security/oauth2-jwt/?h=jwt#oauth2-with-password-and-hashing-bearer-with-jwt-tokens
T...
Can I change the parameters before executing the draft task
Yes you can, after you clone the experiment everything becomes editable, so you can edit the config in the UI.
For example, let's assume I have config.yml, and in my code I do:my_file = task.connect_configuration('config.yml') with open(my_file, 'rt') as f: ...
Then after I clone it in the UI and edit the configuration, when it will be executed remotely,my_file
will contain the content of the configuration as s...
SmarmyDolphin68 if you can reproduce the behavior in a standalone script , it will really accelerate fixing this issue
In the Task log itself it will say the version of all the packages, basically I wonder maybe it is using an older clearml version, and this is why I cannot reproduce it..
WackyRabbit7
regular trains-agent modus operandi is one job at a time (i.e. until the Task is done, no other Tasks will be pulled from the queue).
When adding --services-mode, it is Not 1-1 but 1-N, meaning a single trains-agent will launch as many Tasks as it can.
The trains-agent pulls a job from the queue and spins a docker (only dockers are supported for the time being) and lets the job run in the background (the job itself will be registered as another "worker" in the system). Then the...
This is a part of a bigger process which times quite some time and resources, I hope I can try this soon if this will help get to the bottom of this
No worries, if you have another handle on how/why/when we loose the current Task, please share π
Also, I would upgrade the backend 0.15.1 a few bugs were fixed since 0.14.x some have to do with the plots...
(I suspect you are correct, but I'm missing some information in order to understand where the problem is)
WackyRabbit7 can you send mock code that explains how you create the pipeline ?
Or can I enable agent in this kind of local mode?
You just built a local agent
ShinyLobster84
fatal: could not read Username for '
': terminal prompts disabled
This is the main issue, it needs git credentials to clone the repo code, containing the pipeline logic (this is the exact same behaviour as pipeline v1 execute_remotely(), which is now the default, could it be that before you executed the pipeline logic, locally ?)
WackyRabbit7 could the local/remote pipeline logic could apply in your case as well ?
feature is however available in the Enterprise Version as HyperDatasets. Am i correct?
Correct
BTW you could do:datasets_used = dict(dataset_id="83cfb45cfcbb4a8293ed9f14a2c562c0") task.connect(datasets_used, name='datasets') from clearml import Dataset dataset_path = Dataset.get(dataset_id=datasets_used['dataset_id']).get_local_copy()
This will ensure that not only you have a new section called "datasets" on the Task's configuration, buy tou will also be able to replace the datase...
from clearml.backend_api.session.client import APIClient client = APIClient() result = client.queues.get_next_task(queue='queue_ID_here')
Seems to work for me (latest RC 1.1.5rc2)
sorry typo client.task.
should be client.tasks.
I'm glad you were able to solve the issue!
WackyRabbit7 I could not reproduce it, what did you pass in "GOOGLE_APPLICATION_CREDENTIALS" ?
Edit the cloned version and enqueue it?
Last but not least - can I cancel the offline zip creation if I'm not interested in it
you can override with OS environment, would that work?
Or well, because it's not geared for tests, I'm just encountering weird shit. Just calling
task.close()
takes a long time
It actually zips the entire offline folder so you can later upload it. Maybe we can disable that part?!
` # generate the script section
script = (
"fr...
In your trains.conf, change the valuefiles_server: '
s3://ip :port/bucket'
Besides that, what are your impressions on these serving engines? Are they much better than just creating my own API + ONNX or even my own API + normal Pytorch inference?
I would separate ML frameworks from DL frameworks.
With ML frameworks, the main advantage is multi-model serving on a single container, which is more cost effective when it comes to multiple model serving. As well as the ability to quickly update models from the clearml model repository (just tag + publish and the end...
So is there any tutorial on this topic
Dude, we just invented it π
Any chance you feel like writing something in a github issue, so other users know how to do this ?
Guess Iβll need to implement job schedule myself
You have a scheduler, it will pull jobs from the queue by order, then run them one after the other (one at a time)
WackyRabbit7 I might be missing something here, but the pipeline itself should be launched on the "pipelines" queue, is the pipeline itself running? or is it the step itself that is stuck in ""queued" state?
can you bump me to that thread?
https://clearml.slack.com/archives/CTK20V944/p1630610430171200
I realise I'll need to catalogue all the dataset ids created by ppl separately on a spreadsheet.
Okay this part I missed, why would you need to add additional "catalog" when you have the UI?
@<1533619716533260288:profile|SmallPigeon24> , failed task should not actually be reused (i.e. cached), are you saying a failed Task is being reused? or are you saying that you want to "invalidate" the cache in the execution but still leave the Task as completed ?
But I am considreing just failing the task.
This will of course work, just raise exception in the Task itself, and protect the call from the pipeline logic function with try/except
regrading the second option, try to nullify the hash on the Component Task:
# running the Task component here
# if we do not want someone to use us
Task.current_task()._set_runtime_properties({"pipeline_job_hash": None})
MuddySquid7
are you saying that for some reason the models pick the artifacts ? Is that reproducible ? (they are two different things)
Can you see the df.pkl on the Models section of the Task (in the UI) ?
Hi @<1658281099807166464:profile|SmallCamel52>
Lack of authentication in all versions of the fileserver component
Are you leaving the fileserver open to the world ?
An upload of 11GB took around 20 hours which cannot be right.
That is very very slow this is 152kbps ...
Hi JuicyFox94 ,
Actually we just added that π (still on GitHub , RC soon)
https://github.com/allegroai/clearml/blob/400c6ec103d9f2193694c54d7491bb1a74bbe8e8/clearml/automation/controller.py#L696
. In short, I was not able to do it withΒ
Task.clone
Β andΒ
Task.create
Β , the behavior differs from what is described in docs and docstrings (this is another story - I can submit an issue on github later)
The easiest is to use task_ task_overrides
Then pass:task_overrides = dict('script': dict(diff='', branch='main'))
FrothyShark37 what was different in your script ?