Hi @<1618780810947596288:profile|ExuberantLion50> , can you please a code snippet that reproduces this?
Hi @<1673863788857659392:profile|HomelyRabbit25> , the Dataset object should have artifacts and those should have a url attribute. I'd suggest poking around there!
Also, I don't think the serving should run on the same machine as the server as serving can require quite a lot of resources
Very similar to a task, a project has also a unique identifier - the ID (Although I think project names are also unique)
You can get the project ID either from UI (If you go to a specific project, the project ID will be in the url) or from the api as documented in:
https://clear.ml/docs/latest/docs/references/api/projects#post-projectsget_all
or from the sdk as documented here:
https://clear.ml/docs/latest/docs/references/sdk/task#taskget_project_id
Plug that project ID into the filter ...
MagnificentWorm7 , I'm taking a look if it's possible 🙂
As a workaround - I think you could split the dataset into different versions and then use Dataset.squash to merge into a single dataset
https://clear.ml/docs/latest/docs/references/sdk/dataset#datasetsquash
Hi @<1526371965655322624:profile|NuttyCamel41> , can you add the full log?
Hi @<1813020708339453952:profile|PompousGoldfish33> , it looks like clearml.conf isn't configured in the environment that the flask app is running in. Which process is giving this traceback?
Hi @<1523708920831414272:profile|SuperficialDolphin93> , simply set output_uri=/mnt/nfs/shared in Task.init
Hi @<1664079296102141952:profile|DangerousStarfish38> , you can control it in the agent.default_docker.image section of the clearml.conf where the agent is running. You can also control it via the CLI when you use the --docker tag and finally, you can also control it via the webUI in the execution tab -> container -> image section
Hi @<1570220844972511232:profile|ObnoxiousBluewhale25> , you can click on the model in the artifacts tab and that should take you to the model repository. What is logged in the url of the model?
Do you see any errors in the dev tools console (F12)?
Also are there any errors in elastic?
Also, if you open Developer Tools, do you see any errors in the console?
What if you set the default_output_uri to false ?
Hi CloudySwallow27 ,
I think currently the way to do this is by disabling the framework detection and reporting the debug images manually.
You can do this by Task.init( auto_connect_frameworks=False )
Does it go back to working if you revert the changes?
Can you verify you ~/.clearml.conf has proper configuration. If you dofrom clearml import Task t=Task.init()Does this work?
Hi @<1892021261433835520:profile|EnchantingMouse92> , I see that it says at the start of the page you linked that it is an enterprise only feature 🙂
Regarding differences, you can find a comparison between the different versions at this page - None
Just scroll down and you'll have different sections you can expand to see the differences.
It means there is nothing reporting iterations explicitly or any iterations being reported by any framework. This means scalers will show with time from start as x axist instead of iterations
RattyLouse61 , I think you can save the yml conda env file as an artifact, this way it would also be accessible by other tasks 🙂
Hi SubstantialElk6 , I think you need to have Task.init() inside these sub processes as well.
Try removing the region, it might be confusing it
Hi IrritableJellyfish76 , it looks like you need to create the services queue in the system. You can do it directly through the UI by going to Workers & Queues -> Queues -> New Queue
Hi @<1799974757064511488:profile|ResponsivePeacock56> , in that case I think you would need to actually migrate the files from files server to S3 and then also change the links logged in MongoDB associated to the artifacts.
Are you still having these issues? Did you check if it's maybe a connectivity issue?
Hi @<1826066729852211200:profile|DullSwallow71> , I would suggest looking at it from another perspective. Check machine availability and only then push a job into a queue. You can see all the related information to usage in the 'Workers' screen in the 'Orchestration' tab.
Then you can either push jobs manually according to usage or write your own service to sample usage and push jobs accordingly.
Before injecting anything into the instances you need to spin them up somehow. This is achieved by the application that is running and the credentials provided. So the credentials need to be provided to the AWS application somehow.
Hi @<1572032849320611840:profile|HurtRaccoon43> , I'd suggest trying this docker image: nvcr.io/nvidia/pytorch:23.03-py3
Hi @<1580367711848894464:profile|ApprehensiveRaven81> , I'm afraid this is only option for the open source version. In the Scale/Enterprise licenses there are SSO/LDAP integrations
Hi @<1800699527066292224:profile|SucculentKitten7> , I think you're confusing the publish action to deployment. Publishing a model does not deploy it, it simply changes the state of the model to published so it cannot be changed anymore and also publishes the task that created it.
To deploy models you need to either use clearml-serving or the LLM deployment application