What command did you run? What were you trying to do? What was the setup?
Also, make sure to install virtualenv, I see there was a failure in the log on that as well
Hi @<1826791494376230912:profile|CornyLobster42> , can you add logs from the VMs themselves? They should be saved on the Autoscaler
Hi @<1639074542859063296:profile|StunningSwallow12> , here are the docs for the agent - None
Try running the following script
from clearml import Task
import time
task = Task.init(output_uri="
")
print("start sleep")
time.sleep(20)
print("end sleep")
Please add the logs
Hi @<1547028074090991616:profile|ShaggySwan64> , so the issue is when writing to the files server? Is it possible that the machine itself is having a hard time to write the data?
Hi @<1618780810947596288:profile|ExuberantLion50> , can you please a code snippet that reproduces this?
Hi @<1673863788857659392:profile|HomelyRabbit25> , the Dataset object should have artifacts and those should have a url attribute. I'd suggest poking around there!
Hi ElegantCoyote26 ,
What happens if you delete ~/.clearml (This is the default cache for ClearML) and rerun?
Hi @<1702492411105644544:profile|YummyGrasshopper29> , console logs are saved in Elastic. I would check on the status of your container
Also, I don't think the serving should run on the same machine as the server as serving can require quite a lot of resources
In the UI check under the execution tab in the experiment view then scroll to the bottom - You will have a field called "OUTPUT" what is in there? Select an experiment that is giving you trouble?
Hi @<1665891247245496320:profile|TimelyOtter30> , not sure I follow. It looks like a misconfiguration. I think you need to see the correct settings here: None , also note the direct reference to minio 🙂
Very similar to a task, a project has also a unique identifier - the ID (Although I think project names are also unique)
You can get the project ID either from UI (If you go to a specific project, the project ID will be in the url) or from the api as documented in:
https://clear.ml/docs/latest/docs/references/api/projects#post-projectsget_all
or from the sdk as documented here:
https://clear.ml/docs/latest/docs/references/sdk/task#taskget_project_id
Plug that project ID into the filter ...
MagnificentWorm7 , I'm taking a look if it's possible 🙂
As a workaround - I think you could split the dataset into different versions and then use Dataset.squash to merge into a single dataset
https://clear.ml/docs/latest/docs/references/sdk/dataset#datasetsquash
Hi @<1526371965655322624:profile|NuttyCamel41> , can you add the full log?
Hi @<1813020708339453952:profile|PompousGoldfish33> , it looks like clearml.conf isn't configured in the environment that the flask app is running in. Which process is giving this traceback?
Hi FierceHamster54 , is this an old autoscaler instance? What is the version? You can see the version when you're on the application and click on 'More' at the top left text area
Hi @<1523708920831414272:profile|SuperficialDolphin93> , simply set output_uri=/mnt/nfs/shared in Task.init
Hi @<1664079296102141952:profile|DangerousStarfish38> , you can control it in the agent.default_docker.image section of the clearml.conf where the agent is running. You can also control it via the CLI when you use the --docker tag and finally, you can also control it via the webUI in the execution tab -> container -> image section
Hi @<1570220844972511232:profile|ObnoxiousBluewhale25> , you can click on the model in the artifacts tab and that should take you to the model repository. What is logged in the url of the model?
Hi @<1686909730389233664:profile|AmiableSheep6> , I could suggest using the StorageManager module to pull specific files from S3.
There is no option to download specific files from a dataset. I would suggest breaking it into maybe smaller versions.
You would however need to pull the data locally for training anyways, wouldn't breaking it into smaller versions help this issue?
Do you see any errors in the dev tools console (F12)?
Also are there any errors in elastic?
Also, if you open Developer Tools, do you see any errors in the console?
What if you set the default_output_uri to false ?
JitteryCoyote63 Does it happen to you also with 1.1.1?
Hi CloudySwallow27 ,
I think currently the way to do this is by disabling the framework detection and reporting the debug images manually.
You can do this by Task.init( auto_connect_frameworks=False )
Does it go back to working if you revert the changes?
Can you verify you ~/.clearml.conf has proper configuration. If you dofrom clearml import Task t=Task.init()Does this work?
Hi @<1892021261433835520:profile|EnchantingMouse92> , I see that it says at the start of the page you linked that it is an enterprise only feature 🙂
Regarding differences, you can find a comparison between the different versions at this page - None
Just scroll down and you'll have different sections you can expand to see the differences.