Reputation
Badges 1
25 × Eureka!What about output_uri?
If you are using StorageManager directly, output_uri
is not relevant
Can you share the storagemanager usage, and error you are getting ?
So you are uploading a local file (stored in a Dataset) into GS bucket? may I ask why ?
Regrading usage (I might have a typo but this is the gist):torageManager.upload_file( local_file=separated_file_posix_path, remote_url=remote_file_path + separated_file_posix_path.relative_to(files_rgb) )
Notice that you need to provide the full upload URL (including path and file name to be used on your GS storage)
Hi SubstantialElk6
What if I have OS library dependencies as well? (Apt install, rpm install...etc).
If these are OS libraries that you always need you can put them here:
https://github.com/allegroai/clearml-agent/blob/d9b9b4984bb8a83914d0ec6d53c86c68bb847ef8/docs/clearml.conf#L136agent.extra_docker_shell_script: ["apt-get install -y bindfs", ]
In the next version, this could be controlled on a per Task basis.
FYI: the default apt package that are installed:
` apt-get update
a...
SubstantialElk6 it seems the auto resolve of pytorch cuda failed,
What do you have in the "installed packages" section?
Ohh SubstantialElk6 please use agent RC3, (latest RC is somewhat broken sorry, we will pull it out)
SubstantialElk6 could you post "Installed packaged" section under Execution of this specific Task?
Hi SubstantialElk6
clearml-agent was just updated, it should solve the issue.2. Notice that "torch" / "torchvision" packages are resolved by the agent based on the pytorch compatibility table. Is there a way to reproduce the issue where it fails resolving the torch version? could you send a full log?
3. If you want a specific torch version , you can put a direct link to the torch wheel, for example: https://download.pytorch.org/whl/cu102/torch-1.6.0-cp37-cp37m-linux_x86_64.whl
SubstantialElk6 could you try with the latest (just released)?pip install clearml-agent==0.17.2
Then if possible, could you attach the full log of the agent's execution (Task->results->Console)
I could take a look and figure that out.
This will greatly accelerate integration π
CooperativeFox72 yes 20 experiments in parallel means that you always have at least 20 connection coming from different machines, and then you have the UI adding on top of it. I'm assuming the sluggishness you feel are the requests being delayed.
You can configure the API server to have more process workers, you just need to make sure the machine has enough memory to support it.
GrievingTurkey78 can you send the entire log?
How so? Installing a local package should work, what am I missing?
VictoriousPenguin97 I'm assuming the exact same server version ?
Hi WittyOwl57
Are you starting a new server from scratch or is it running on previously stored data?
So in summary: subprocess calls appear to break clearML tracking, even if I do Task.init() in both main.py and train.py.
Okay let me see if we can reproduce & fix this, it should not be long
The system denies my deletion requiest since it deems the venv-builds dir as in use
Sorry, yes you have to take down the agent when you delete the cache π
so I assume clearml moves them from one queue to the other?
Correct. When it creates the k8s job and launches it on the cluster it moves it into the queue.
Can you see it on your k8s cluster (meaning the job/pod)?
I look forward to your response on Github.
Great, I would like to make this discussion a bit more open and accessible so GitHub is probably better
I'd like to start contributing to the project...
That will be awesome!
The 'on-premise' server fails to connect to the ClearML server because of the VPN I think
I think you are correct.
You can quickly test it, try ti run curl
http://local-server:8008 see if that works
Sure thing!
BTW: not sure if it helps but the SaaS version integrates with Genesis Cloud I know they provide cheap GPUs might be worth checking
But I believe it would be harder for our team to detect and respond to failures in the event handler functions if they were placed there because it seems unclear how we could use our existing systems and practices to do that.
Okay I think this is the issue, handler functions
are not "supposed" to fail, they are supposed to trigger Tasks, these can fail.
e.g.:
Model Tag Trigger -> handler function creates a Task -> Task does something, like build container, trigger CI/CD etc -> Task...
named asΒ
venv_update
Β (I believe it's still in beta). Do you think enabling this parameter significantly helps to build environments faster?
This is deprecated... it was a test to use the a package that can update pip venvs, but it was never stable, we will remove it in the next version
Yes, I guess. Since pipelines are designed to be executed remotely it may be pointless to enable anΒ
output_uri
Β parameter in theΒ
PipelineDecorator.componen...
Thanks SmallDeer34 , I think you are correct, the 'output' model is returned properly, but "input" are returned as model name not model object.
Let me check something
Hi @<1601386194774528000:profile|AmusedPanda8>
I think the project name is ./model_training/trained_models/yolov8n-TEST_OKTODELETE/
and for some reason you have "." as a project project?
(notice jested projects are automatically created based on the project name with '/' as separator)
TrickyRaccoon92 the title
provided by write.scalars is also a representing string for the specific metric. This is more than just a title on the plot itself.
It means that this will be the name of the scalar metric (title/series combination) .
Is that your intention, or is it for viewing purpose only?