Reputation
Badges 1
25 × Eureka!GrumpyPenguin23 could you help and point us to an overview/getting-started video?
Sorry if it's something trivial. I recently started working with ClearML.
No worries, this has actually more to do with how you work with Dask
The Task ID is the unique id of the any Task in the system (task.id will return the UID str)
Can you post a toy Dash code here, I'll explain how to make it compatible with clearml 🙂
So is there any tutorial on this topic
Dude, we just invented it 🙂
Any chance you feel like writing something in a github issue, so other users know how to do this ?
Guess I’ll need to implement job schedule myself
You have a scheduler, it will pull jobs from the queue by order, then run them one after the other (one at a time)
Hmm I guess that now that you mention it, not that obvious when I'm on a Mac as well, maybe we should have the archive button at the bottom as well..
SteadyFox10 What do you think?
from task pick-up to "git clone" is now ~30s, much better.
This is "spent" calling apt update && update install && pip install clearml-agent
if you have those preinstalled it should be quick
though as far as I understand, the recommendation is still to not run workers-in-docker like this:
if you do not want it to install anything and just use existing venv (leaving the venv as is) and if something is missing then so be it, then yes sure that the way to go
Oh, did you try task.connect_configuration
?
https://allegro.ai/docs/examples/reporting/model_config/#using-a-configuration-file
LOL, if this is important we probably could add some support (meaning you will be able to specify it in the "installed packages" section, per Task).
If you find an actual scenario where it is needed, I'll make sure we support it 🙂
Hi TrickyRaccoon92
... would any running experiment keep a cache of to-be-sent-data, fail the experiment, or continue the run, skipping the recordings until the server is back up?
Basically they will keep trying to send data to server until it is up again (you should not loose any of the logs)
Are there any clever functionality for dumping experiment data to external storage to avoid filling up the server?
You mean artifacts or the database ?
That might be me, let me check...
You mean to add these two to the model when deploying?
│ ├── model_NVIDIA_GeForce_RTX_3080.plan
│ └── model_Tesla_T4.plan
Notice the preprocess.py
is Not running on the GPU instance, it is running on a CPU instance (technically not the same machine)
I want to run only that sub-dag on all historical data in ad-hoc manner
But wouldn't that be covered by the caching mechanism ?
Hi @<1556450111259676672:profile|PlainSeaurchin97>
While testing the migration, we found that all of our models had their
MODEL URL
set to the IP of the old server.
Yes all the artifacts/models/debug-samples are stored "as is" , this means that if you configured your original setup with IP, it is kind of stuck there, this is why it is always preferred to use host-name ...
you apparently also need to rename
all
model URLs
Yes 😞
Hmmm, yes we should definitely add --debug (if you can, please add a GitHub issue so it is not forgotten).
FiercePenguin76 Specifically are you able to ssh manually to <external_address>:<external_ssh_port> ?
from the notebook run !ls ~/clearml.conf
By default the agent will add the root of the git repository into the pythonpath , so that you can import...
and this?avg(100*increase(test12_model_custom:Glucose_bucket[1m])/increase(test12_model_custom:Glucose_sum[1m]))
Can you post the toml file? Maybe the answer is there
In that case you should probably mount the .ssh
from the host file-system into the docker. for example:docker run -v /home/user/.ssh:/root/.ssh ...
WickedGoat98 the above assumes your are running the docker manually, if you are using docker-compose.yml file the same mount should be added to the docker-compose.yml
I see, so basically fix old links that are now not accessible? If this is the case you might need to manually change the document on the mongodb running in the backend
I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)
Follow-up question: how does clearML "inject" the argparse arguments before the task is initialized?
it patches the actual parse_args
call, to make sure it works you just need to make sure it was imported before the actual call takes place
I had to do another workaround since when
torch.distributed.run
called it's
ArgumentParser
, it was getting the arguments from my script (and from my task) instead of the ones I passed it
Are you saying...
ReassuredTiger98
Okay, but you should have had the prints ...uploading artifact
anddone uploading artifact
So I suspect something is going on with the agent.
Did you manage to run any experiment on this agent ?
EDIT: Can you try with artifacts example we have on the repo:
https://github.com/allegroai/clearml/blob/master/examples/reporting/artifacts.py
DefeatedOstrich93 can you verify lightning actually only stored once ?
:) yes on your gateway/firewall set http://demoapi.trains.allegro.ai to 127.0.0.1 . That's always good practice ;)
suspect permissions, but not entirely sure what and where
Seems like it.
Check the config file on the agent machine
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L18
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L19
Hi SoreDragonfly16
Sadly no, the idea is to create full visibility to all users in the system (basically saying share everything with your colleagues) .
That said, I know the enterprise version have permission / security features, I'm sure it covers this scenario as well.
Hi SubstantialElk6
Generically, we would 'export' the preprocessing steps, setup an inference server, and then pipe data through the above to get results. How should we achieve this with ClearML?
We are working on integrating the OpenVino serving and Nvidia Triton serving engiones, into ClearML (they will be both available soon)
Automated retraining
In cases of data drift, retraining of models would be necessary. Generically, we pass newly labelled data to fine...