Hi MortifiedCrow63
I finally got GS credentials, there is something weird going on. I can verify the issue, with model upload I get timeout error while upload_artifacts just works.
Just updating here that we are looking into it.
but when I removed output_uri from Task.init, the pickled model has path
When you run the job on the k8s pod?
-e
:user/private_package.git@57f382f51d124299788544b3e7afa11c4cba2d1f#egg=private_package
Is this the correct link to the repo and a valid commit id ?
Can you post a few more lines from the agent's log ?
Something is failing to install I'm just not sure what
SoggyBeetle95 maybe it makes sense to configure the agent with an access-all credentials? Wdyt
Do I set theย
CLEARML_FILES_HOST
ย to the end point instead of an s3 bucket?
Yes you are right this is not straight forward:CLEARML_FILES_HOST="
s3://minio_ip:9001 "
Notice you must specify "port" , this is how it knows this is not AWS. I would avoid using an IP and register the minio as a host on your local DNS / firewall. This way if you change the IP the links will not get broken ๐
Thank you AttractiveWoodpecker16 !
Removing the uncommitted changes so that you can launch it from an agent? Or is it visual only?
Hi @<1569496075083976704:profile|SweetShells3>
These environment variable are injected into the new process, are you passing them on the vault?
None
CourageousLizard33 column order / specific selection is stored per user. If you press the share button you will have a link with all the definitions embedded on it.
Column resizing and order is in the next version release :)
I saw documentation, but I can't make the proper dict object for hyperparams
I see, this is what you are after (I think)
https://github.com/allegroai/clearml/blob/fb644fe9ec6be36b8f2f70a34256fbdc593d663a/clearml/backend_api/services/v2_20/tasks.py#L3138
Hi PerplexedCow66
I'm assuming an extension for this:
https://github.com/allegroai/clearml-serving/issues/32
Basically JWT can be used as a general access/block all endpoints, which is most efficnely used if handled by k8s loadbalancer (nginx/envoy),
but if you want a per-endpoint check (or maybe do something based on the JWT values)
See adding JWT to FastAPI here:
https://fastapi.tiangolo.com/tutorial/security/oauth2-jwt/?h=jwt#oauth2-with-password-and-hashing-bearer-with-jwt-tokens
T...
There is no way to create an artifact/model/dataset without a task, right?
Models are a an entity of it's own, and you can actually create one without a Task.
(just for my own interest: how much does the enterprise version divert from the open source version? It it just extended or are there core changes to the enterprise version)
It adds a few security layers on top, and adds a few features that are just not part of the open source (RBAC, hyper-datasets, advanced scheduling, cu...
Can you copy the "Installed Packages" here, and point to the package causing the issue?
The reasoning is that most likely simultaneous processes will fail on GPU due to memory limit
The agent is installing the "Installed Paclages" section of the Task (think of it as requirements.txt)
And again, what do you have there? Is it the outcome of the Task.init auto populating it?
But I am starting to wonder whether It would be easier just changing sys,path on the scripts that use the sibling libs.
that depends, how would the sibling packages get to a remote machine ?
I still see things being installed when the experiment starts. Why does that happen?
This only means no new venv is created, it basically means install in "default" python env (usually whatever is preset inside the docker)
Make sense ?
Why would you skip the entire python env setup ? Did you turn on venvs cache ? (basically caching the entire venv, even if running inside a container)
I mean what is the actual link?
File:// is a path to a file.
If your machine cannot access that path you get an error.
For example:
file:///home/user/file.bin
translates to /home/user/file.bin
If you do not have the file /home/user/file.bin on your machine you get an error.
GrievingTurkey78 make sense ?
Note that by default trains / clearml will not upload your weights file anywhere , only if you set "output_uri" to a specific location it will do that .
Hi JitteryCoyote63 , I cannot reproduce it... when I call set initial iteration 0, it does what I'm expecting, and resend the scalar. I tested with the clearml ignite example, any thoughts on how I can reproduce?
And is there an easy way to get all the metrics associated with a project?
Metrics are per Task, but you can get the min/max/last of all the tasks in a project. Is that it?
we also provide a custom
aux-config
file. We also had to make sure to update the name inside
config.pbtxt
so that Triton is happy:
Good point, what would be the logic of the auto "config.pbtxt" patching we should employ ?
Unfortunately not yet in venv mode. What would you have put there?
LazyTurkey38 I think this is caused by new versions of pip to report the wrong link:
https://github.com/bwoodsend/pip/commit/f533671b0ca9689855b7bdda67f44108387fe2a9
YummyWhale40 no idea what the pytorch-lighting guys did there. let me check a the actual code.
I am very confused now, I tried switch to my local machine and change the clearml.conf.
It only partly worked :
Notice that the Dataset.get (...) is downloading an artifact that was uploaded before, basically it gets the full URL and downloads the data. it seems the original dataset uploaded to "localhost:8081", could that be the case?
the issue moving forward is if we restart the pod we will have to manually update that again.
Can't you map the nginx configuration file ? (making the changes persistent across pods)
when I run it on my laptop...
Then yes, you need to set the default_output_uri
on Your laptop's clearml.conf (just like you set it on the k8s glue)
Make sense ?
So it's seemingly not the image, but maybe something to do with how Studio runs it as a kernel.
Yeah I think that for some reason it fails detecting this is actually jupyter noteboko (not really sure why), Thank you for double checking on the container !!
. I'm trying to run to get a task to run using a specific docker image and to source a bash script before execution of the python script.
Are you running an agent in docker mode ? if so you should be able to see the Output of your bash script first thing in the log
(and it will appear in the docker CMD)