I would like to bypass this behavior because my code has a need for a specific version of PyTorch.
DilapidatedCow43 you will get exactly the pytorch version you need, but complied to the CUDA version that is installed (pytorch people actually maintain multiple versions based on different cuda versions)
This is already part of the docker-compose file,
https://github.com/allegroai/clearml-server/blob/master/docker/docker-compose.yml
hi @<1546303293918023680:profile|MiniatureRobin9>
I can still see the metrics in Grafana. I
it will not delete it from grafana, it means it's no longer collected, make sense ?
Can you test with the hydra example? if the example works, any chance you can send a toy to reproduce it ?
https://github.com/allegroai/clearml/tree/master/examples/frameworks/hydra
DepressedChimpanzee34 I cannot find cfg.py here
https://github.com/allegroai/clearml/tree/master/examples/frameworks/hydra/config_files
(or anywhere else)
ohh, not really 😞 this is really low level editing the DB.
You might be able to forcefully edit the links (i.e. artifacts) on the Dataset (task)
Check if this works
from clearml.backend_api.session.client import APIClient
c = APIClient()
t = c.tasks.get_by_id("DATASET_UUID_HERE")
# you might need to loop over the artifacts
t.data.execution.artifacts[0].uri = "
"
c.tasks.edit(task=t.id, execution=t.data.execution, force=True)
I'm trying to queue a task in python but I'd like to reuse the prior task ID.
is it your own Task? i,,e, enqueue yourself, if this is the case use task.execute_remotely
it will do just that.
If this is another Task, then if it is aborted then you can just enqueue it, by definition it will continue with the Same Task ID.
Hi @<1603198134261911552:profile|ColossalReindeer77>
When you select poetry as package manager the agent passes control to poetry, this means poetry needs to decide on hte correct torch wheel based on your cuda. I do not think poetry can do that, but I do think you can specify the extra index url to take the torch wheel from:
None
DrabSwan66
Did you set "docker_install_opencv_libs: true" in your clearml.conf on the host machine ?
https://github.com/allegroai/clearml-agent/blob/e416ab526ba9fe05daa977b34c9e46b50fb214a0/docs/clearml.conf#L150
Just making sure, you are running clearml-agent in docker mode, correct?
What's the container you are using ?
Hmm, maybe the original Task was executed with older versions? (before the section names were introduced)
Let's try:DiscreteParameterRange('epochs', values=[30]),
Does that gives a warning ?
ReassuredTiger98 quick update, the issue was located, next RC will already contain a fix.
In the mean time, you can avoid it by using limiting pip version:
https://github.com/allegroai/clearml-agent/blob/715f102f6d98a44131d5bee909ee779b456c6229/docs/clearml.conf#L67pip_version: "<20.2"
Hey JoyousKoala59 , it seems the helm chart for the clearml server is due to be released tomorrow. My apologies for the confusion :(
Hi @<1523709807092043776:profile|GrittyKangaroo27>
some of my completed datasets,
This only has an effect on the dataset when it is being uploaded, if completed it is there for logging purposes only. What is exactly the use case? (just to be verify, once a Task/Dataset is completed you cannot edit it)
Back to the feature request, if this is taken care of (both adding a missed package, and the S3 upload), do you still believe there is a room for this kind of feature?
Hi @<1577106212921544704:profile|WickedSquirrel54>
We are self hosting it using Docker Swarm
Nice!
and were wondering if this is something that the community would be interested in.
Always!
what did you have in mind? I have to admit I'm not familiar with the latest in Docker swarm but we all lover Docker the product and the company
AgitatedTurtle16 from the screenshot, it seems the Task is stuck in the queue. which means there is no agent running to actual run the interactive session.
Basic setup:
A machine running clearml-agent
(this is the "remote machine") A machine running cleaml-session (let's call it laptop 🙂 )You need to first start the agent on the "remote machine" (basically call clearml-agent daemon --docker --queue default
), Once the agent is running on the remote machine, from your laptop ru...
Hi HollowDolphin18
Sure just use:Task.set_credentials( api_host=None, web_host=None, files_host=None, key=None, secret=None, store_conf_file=False )
https://github.com/allegroai/clearml/blob/912f6f5ba2328b26de042de03f02de5802df360f/clearml/task.py#L2153
no available 😞
This is what I think you should end up withDiscreteParameterRange('General/dataset_url', values=["option 1 for url", "option 2 for url"])
If args['dataset_url']
is a list, you should just do values=args['dataset_url']
docstring ?
Usually the preferred way is StorageManager
https://clear.ml/docs/latest/docs/references/sdk/storage
https://clear.ml/docs/latest/docs/integrations/storage
BattyLion34 is this consistent?
(Really I can't see eny difference, one time it is able to create the venv and another it is failing with permission error)
Thanks! @<1792364603552829440:profile|TestyBeetle31> I'll pass it to the maintainers
Hi @<1541954607595393024:profile|BattyCrocodile47> and @<1523701225533476864:profile|ObedientDolphin41>
"we're already on AWS, why not use SageMaker?"
TBH, I've never gone through the ML workflow with SageMaker.
LOL I'm assuming this is why you are asking 🙂
- First, you can use SageMaker and still log everything to ClearML (2 lines integration). At least you will have visibility to everything that is running/failing 🙂
- SageMaker job is a container, which means for ...
WackyRabbit7 I might be missing something here, but the pipeline itself should be launched on the "pipelines" queue, is the pipeline itself running? or is it the step itself that is stuck in ""queued" state?
Like get the tasks that uses the most metrics API?