REMOTE MACHINE:
- git ssh key is located at ~/.ssh/id_rsa
Is this also mounted into the docker itself?
SparklingElephant70 , Hi 🙂
Please create a queue in the system called 'services' and run an agent against that queue
How are you building the pipeline?
Pending means it is enqueued. Check to which queue it belongs by looking at the info tab after clicking on the task :)
You'll need to assign an agent to run on the queue, something like this: 'clearml-agent daemon -- foreground --queue services'
But you said that pipeline demo is stuck. Which task is the agent running?
I think you can get the task from outside and then add tags to that object
Hi, how do you connect your configs currently?
You mean you'd like to be able to connect/create configuration objects via UI?
Hi SubstantialElk6 ,
Define prior to running the pipeline, which tasks to be running on which remote queue using which images?
What type of pipeline steps are you running? From task, decorator or function?
Make certain tasks in the pipeline run in the same container session, instead of spawning new container sessions? (To improve efficiency)
If they're all running on the same container why not make them the same task and do things in parallel?
Hi @<1610445887681597440:profile|WittyBadger59> , how are you reporting the plots?
I would suggest taking a look here and running all the different examples to see the reporting capabilities:
None
SkinnyPanda43 , I think so yes 🙂
Can you provide a snippet to try and reproduce?
Hi @<1523708920831414272:profile|SuperficialDolphin93> , once you deleted the cache folder did it work?
Also, did you try pulling the specific commit using the same credentials that are defined on the agent machine?
Hi @<1523701553372860416:profile|DrabOwl94> , do you see any errors in the elastic?
Is it possible the machines are running out of memory? Do you get this error on the pipeline controller itself? Does this constantly reproduce?
Why do you want to keep containers running after the code finished executing?
This is part of the log - I'll need the entire thing 🙂
` ERROR: Could not find a version that satisfies the requirement ipython==7.33.0 (from -r /tmp/cached-reqssiv6gjvc.txt (line 4)) (from versions: 0.10, 0.10.1, 0.10.2, 0.11, 0.12, 0.12.1, 0.13, 0.13.1, 0.13.2, 1.0.0, 1.1.0, 1.2.0, 1.2.1, 2.0.0, 2.1.0, 2.2.0, 2.3.0, 2.3.1, 2.4.0, 2.4.1, 3.0.0, 3.1.0, 3.2.0, 3.2.1, 3.2.2, 3.2.3, 4.0.0b1, 4.0.0, 4.0.1, 4.0.2, 4.0.3, 4.1.0rc1, 4.1.0rc2, 4.1.0, 4.1.1, 4.1.2, 4.2.0, 4.2.1, 5.0.0b1, 5.0.0b2, 5...
And also exactly what command line you used to run the agent?
You will need to find the appropriate docker image with the python version you're looking for.
Yes & Yes.task.upload_artifact('test_artifact', artifact_object='foobar')
You can save a string, however please note that in the end it will be saved as a file and not a pythonic object. If you want to keep your object, you can pickle it 🙂
What is being reported that isn't auto-logged?
Also a small clarification:
ClearML doesn't build the docker image itself. You need to have a docker image already built to be used by ClearML
Hi @<1523701601770934272:profile|GiganticMole91> , I think for binaries and not just the model files themselves you would need to do a bit of tweaking
Hi ElegantCoyote26 ,
What happens if you delete ~/.clearml
(This is the default cache for ClearML) and rerun?
Hi @<1750327614469312512:profile|CrabbyParrot75> , why use the StorageManager module and not the Datasets to manage your data?
Does ClearML support running the experiments on any "serverless" environments
Can you please elaborate by what you mean "serverless"?
such that GPU resources are allocated on demand?
You can define various queues for resources according to whatever structure you want. Does that make sense?
Alternatively, is there a story for auto-scaling GPU machines based on experiments waiting in the queue and some policy?
Do you mean an autoscaler for AWS for example?