Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly
seems like the network inside the running code cannot access the localhost (even though you have --network=host . Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
You can switch to docker-mode for better control over cuda drivers, or use conda and specify cudatoolkit (this feature will be part of the next RC, meanwhile it will install the cudatoolkit based on the global cuda_version).
Hi OddAlligator72
for instance - remove all the metrics from some step onward?
(I think that as long as the Task is not published you could do such a thing directly with the RestAPI (aka APIClient from python)
What's the use case?
as i also noticed that uploads are sometimes slow, and i see here max_connections=2
Makes sense to me, please go ahead and add that as well (basically the same thing on _AzureBlobServiceStorageDriver.upload_object and an additional variable on the AzureContainerConfigurations class.
Could you PR a tested draft ? we will be able to take from there
(apologies I just got to it now)
First of all, kudos on the video, this is so nice!!!
And thanks to you I think I found it:
None
we have to call serialize Before the execute_remotely
(the reason why sometimes it works is that it syncs in the background, so sometimes it's just fast enough and you get the config object)
Let me check if we can push an RC with a ...
AdventurousRabbit79 are you passing cache_executed_step=False to the PipelineController ?
https://github.com/allegroai/clearml/blob/332ceab3eadef4997e897d171957975a247a6dc1/clearml/automation/controller.py#L129
Could you send a usage example ?
my pipeline controller always updates to the latest git commit id
This will only happen if the Task the pipeline creates has no specific commit ID, and instead just uses the latest from the git repo. Is this the case ?
Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?
ReassuredTiger98 there is an open issue on supporting bash script as pre run inside a docker (which will be supported in the next major release)
BTW: if you already have a docker file the fastest way would just to build the docker file and push it once, then you just specify the docker image:tag, this can be done a Task specific level.
Great! btw: final v1.2.0 should be out after the weekend
its should logged all in the end as I understand
Hmm let me check the code for a minute
he said it was something in the nginx config though
That makes sense 🙂
Hi AgitatedTurtle16
You can find documentation here:
https://github.com/allegroai/clearml-session
Basically it uses the cleaml-agents to launch a session on one of the machines in the cluster.
In the remote session itself it install jupyterlab + vscode-server, then it connects to the remote session (running on the agent's machine) automatically over ssh and creates tunnel to these services.
, when I am running the pipeline remotely is there a way the remote machine can access it?
Well for the dataset to be accessible, you need to upload it with Dataset class, then the remote machine can do Dataset.get(...).get_local_copy() to get the actual data on the remote machine
Hi PungentLouse55
it depends on the trains-server version you are running.
If the trains-server >= 0.16 then you have to add "Args/" prefix. If you are running an older version, then you should not add any prefix.
Hi SmallDeer34
Hmm I'm not sure you can, the code will by default use rglob with the last part of the path as wildcard selection
😞
You can of course manually create a zip file...
How would you change the interface to support it ?
upload_artifact will actually do two things:
upload the file to the trains-server register it as an artifact on the experiment
What did you mean by "register the artifact manually"? You still need to upload the file to the trains-server (so it is later accessible )
Hi GloriousPenguin2
Had to do some linux updates and redeploy clearml server, now i can access web UI & the service only if i do port-forwarding to that remote machine
So you are saying before you were able to directly browse to the server, but now you need a "jump box" ?
IrritableJellyfish76 point taken, suggestions on improving the interface ?
Hi ShinyWhale52
Luigi's approach is basically an extension of a functional dag, where each node is a single function. Let's think of Kedro as extension of this approach.
With both the assumption is that a node is a single function (sometimes it really is) and we just want to create a meta execution path (i.e. the execution dag, quite similar to TF v1).
ClearML pipelines are a different story (in a way).
The main difference is that with ClearML each node is a Task, not a function. That mean...
Hmm I'm assuming something wrong here:
https://github.com/allegroai/clearml-server/blob/a64c4d264d00eadd2d11818b37151d3cc6266d99/docker/docker-compose.yml#L119
What's the host machine OS ?
IrritableJellyfish76 hmm maybe we should an an extra argument partial_name_matching=False to maintain backwards compatibility?
You mean the entire organization already has Kubeflow, or to better organize something (if this is the second, what are we organizing, pipelines?)
IrritableJellyfish76 if this is the case, my question is what is the reason to use Kubeflow? (jupyterLab server spinning is a good answer for example, pipelines are to my opinion a lot less)
Oh, then no, you should probably do the opposite 🙂
What is the flow like now? (meaning what are you using kubeflow for and how)
No sure I follow, you mean to launch it on the kubernretes cluster from the ClearML UI?
(like the clearml-k8s-glue ?)
Hi MelancholyElk85
However, when I clone the pipeline from web UI and launch it once again, it works. Is there a way to bypass this?
In both cases, are you seeing a different behavior on the same machine running the agent (i.e. clonening from the UI vs code) ?
Thanks BroadSeaturtle49
I think I was able to locate the issue != breaks the pytroch lookup
I will make sure we fix asap and release an RC.
BTW: how come 0.13.x have No linux x64 support? and the same for 0.12.x
https://download.pytorch.org/whl/cu111/torch_stable.html
BroadSeaturtle49 agent RC is out with a fix:pip3 install clearml-agent==1.5.0rc0Let me know if it solved the issue