Reputation
Badges 1
25 × Eureka!if the file is untracked by git, it is not saved by clearml
Yep π
Does clearml-agent install the repo withΒ
pip install -e .
It is supported, but the path to the repo cannot be absolute (as it will probably be something else in the agent env)
You can add "git+ https://github.com ...." to the "installed packages" The root path of your repository is always added to the PYTHONPATH when the agents executes it, so in theory there is no need to install it wi...
No worries, I'll see if I can replicate it anyhow
So currently there is a limit (from the elasticsearch) of about 10k (anything above the is subsampled)
In the new version we are adding a "maximize" button, then in the full screen you will have the raw data including all ???k samples. sounds good?
Or use python:3.9 when starting the agent
This is probably the best solution π
The 'on-premise' server fails to connect to the ClearML server because of the VPN I think
I think you are correct.
You can quickly test it, try ti run curl http://local-server:8008 see if that works
@<1587253076522176512:profile|HollowPeacock33>
Is this a commercial ad? this seems like out of scope for this channel
Can you expand?
Out of curiosity, what ended up being the issue?
Hi RoundMosquito25
What do you mean by "local commits" ?
we have a separate cache
Why? they can share
That makes sense...
Basically in the open-source version the approach is everyone sees everything for maximum transparency (and also ease of use). I know there are access-roles in the paid tier and vault for exactly these types of things...
Where do you currently save them? and how do you pass them to the remote machine ?
None
notice there is a scroll_id there, you might need to call the API multiple times until you scroll over All the events
could that be it?
I was just able to reproduce with "localhost"
Clearml automatically gets these reported metrics from TB, since you mentioned see the scalars , I assume huggingface reports to TB. Could you verify? Is there a quick code sample to reproduce?
Hi @<1560798754280312832:profile|AntsyPenguin90>
The image itself is uploaded in a blackground process, flush just triggers the starting of the process.
Could it be that it is showing a few seconds after?
Hi CheerfulGorilla72
see
Notice all posts on that channel are @ channel π
I have to problem that "debug samples" are not shown anymore after running many iterations.
ReassuredTiger98 could you expand on it? What do you mean by "not shown anymore" ?
Can you see other reports ?
Thanks for the logs @<1627478122452488192:profile|AdorableDeer85>
Notice that the log you attached means the preprocessing is executed and the GPU backend is returning an error.
Could you provide the log of the docker compose specifically the intersting part is the Triton container, I want to verify it loads the model properly
seems like the network inside the running code cannot access the localhost (even though you have --network=host . Could you test it with the machine's IP?
(Actually the best practice is to add a name to the machine (in your hosts file), so that if later you move the server, all the links will be valid)
You can switch to docker-mode for better control over cuda drivers, or use conda and specify cudatoolkit (this feature will be part of the next RC, meanwhile it will install the cudatoolkit based on the global cuda_version).
Hi OddAlligator72
for instance - remove all the metrics from some step onward?Β
(I think that as long as the Task is not published you could do such a thing directly with the RestAPI (aka APIClient from python)
What's the use case?
as i also noticed that uploads are sometimes slow, and i see here max_connections=2
Makes sense to me, please go ahead and add that as well (basically the same thing on _AzureBlobServiceStorageDriver.upload_object and an additional variable on the AzureContainerConfigurations class.
Could you PR a tested draft ? we will be able to take from there
(apologies I just got to it now)
First of all, kudos on the video, this is so nice!!!
And thanks to you I think I found it:
None
we have to call serialize Before the execute_remotely
(the reason why sometimes it works is that it syncs in the background, so sometimes it's just fast enough and you get the config object)
Let me check if we can push an RC with a ...
AdventurousRabbit79 are you passing cache_executed_step=False to the PipelineController ?
https://github.com/allegroai/clearml/blob/332ceab3eadef4997e897d171957975a247a6dc1/clearml/automation/controller.py#L129
Could you send a usage example ?
my pipeline controller always updates to the latest git commit id
This will only happen if the Task the pipeline creates has no specific commit ID, and instead just uses the latest from the git repo. Is this the case ?
Yes, I was referring to logging the "clearlm-data" Dataset ID on the Task itself, not an external database.
Make sense?
ReassuredTiger98 there is an open issue on supporting bash script as pre run inside a docker (which will be supported in the next major release)
BTW: if you already have a docker file the fastest way would just to build the docker file and push it once, then you just specify the docker image:tag, this can be done a Task specific level.
Great! btw: final v1.2.0 should be out after the weekend
its should logged all in the end as I understand
Hmm let me check the code for a minute
he said it was something in the nginx config though
That makes sense π
Hi AgitatedTurtle16
You can find documentation here:
https://github.com/allegroai/clearml-session
Basically it uses the cleaml-agents to launch a session on one of the machines in the cluster.
In the remote session itself it install jupyterlab + vscode-server, then it connects to the remote session (running on the agent's machine) automatically over ssh and creates tunnel to these services.
, when I am running the pipeline remotely is there a way the remote machine can access it?
Well for the dataset to be accessible, you need to upload it with Dataset class, then the remote machine can do Dataset.get(...).get_local_copy() to get the actual data on the remote machine