JumpyPig73 I think fire was just added:
https://github.com/allegroai/clearml/pull/550
You can test with the latest RC:pip install clearml==1.2.0rc1Regrading not finding Hydra-core package, what do you have listed under Execution: "Installed Packages" (it should have auto detected that you are importing hydra and list it there)
Can I make the Tasks that I'm adding to the pipeline also run locally, such that the entire pipeline runs locally?
Ohh I think only if you have an agent running on your machine.
What is the use case ? (maybe we can add local execution as well?!)
Correct π
you mean The Task already exists or you want to create a Task from the code ?
Hi PompousBeetle71
Could you test the latest RC, I think the warning were fixed:pip install trains==0.16.2rc0Let me know...
Hi RoundMosquito25
How did you spin the agent (whats the cmd line? is it in docker mode or venv mode?)
From the console it seems the pip installation inside the container (based on the log this is what I assume) seems like it is stuck ?!
TrickySheep9 you mean custom containers in clearml-session for remote development ?
I guess I would need to put this in the extra_vm_bash_script param of the auto-scaler, but it will reboot in loop right? Isnβt there an easier way to achieve that?
You can edit the extra_vm_bash_script which means the next time the instance is booted you will have the bash script executed,
In the meantime, you can ssh to the running instance and change the ulimit manually, wdyt?
but not as a component (using the decorator)
Hmm yes, I think that component calling component as an external component is not supported yet
(basically the difference is , is it actually running as a function, or running on a different machine as another pipeline component)
I noticed that when a pipeline step returns an instance of a class, it tries to pickle.
Yes this is how the serialization works, when we pass data from one node to another (by design it supports multiple mach...
These both point to nvidia docker runtime installation issue.
I'm assuming that in both cases you cannot run the docker manually as well, which is essentially what the agent will have to do ...
Hi RipeGoose2
So the http://app.community.clear.ml already contains it.
Next release of the standalone server (a.k.a clearml-server) will include it as well.
I think the ETA is end of the year (i.e. 2 weeks), but I'm not sure on the exact timeframe.
Sounds good ?
Is there any better way to avoid the upload of some artifacts of pipeline steps?
How would you pass "huge datasets (some GBs)" between different machines without storing it somewhere?
(btw, I would also turn on component caching so if this is the same code with the same arguments the pipeline step is reused instead of reexecuted all over again)
Hi ScaryLeopard77
Could that be solved with this PR?
https://github.com/allegroai/clearml/pull/548
I think you can force it to be started, let me check (I pretty sure you can on aborted Task).
MysteriousBee56 what do you mean "save Scalars on the machine"? All metrics are sent to the trains server. You can later retrieve them from code, if you need.
Sure, thing, I'll fix the "create_draft" docstring to suggest it
BTW
/home/local/user/.clearml/venvs-builds/3.7/bin/python: can't open file 'train.py': [Errno 2] No such file or directory
This error is from the agent, correct? it seems it did not clone the correct code, is train.py committed to the repository ?
VexedCat68 make sense, we could also (if implementing this feature) add a special Tag to the dataset , so you know it contains "external" links, wdyt?
And is Task.init called on all processes ?
Hi FancyWhale93pipe.start() should actually stop the local pipeline logic execution and fire it on the "services queue".
The idea is that you can launch the pipeline locally, but the actual execution of the entire logic is remote.
You can have the pipeline running locally if you call pipe.start_locally or also run the steps locally (as sub processes) with pipe.start_locally(run_pipeline_steps_locally=False)
BTW: based on your example, a more intuitive code might be the pi...
AgitatedTurtle16 could you check with the latest clearml RC (I remember a similar issue was fixed).pip install clearml==0.17.5rc3Then run againclearml-task ...
It's always preferred to use conda_freeze: false
That said, if you do use conda_freeze: true it should also freeze the cudatoolkit, so it should have worked.
BTW when you say it worked, is it 0.17.2 version or the hacked RC I sent ?
Hmm whats the OS and python version?
Is this simple example working for you?
None
Hi JumpyPig73 , I think it was synced to github. You can already test with: git install git+ https://github.com/allegroai/clearml.git
in clearml.conf we could have:azure.storage { max_connections = 10 # containers: [ # { # account_name: "clearml" # account_key: "secret" # # container_name: # } # ] }Then in AzureContainerConfigurations :
` @classmethod
def from_config(cls, configuration):
...
class AzureContainerConfigurations(object):
def init(self, container_configs=None, max_connections=None):
...