Hi SkinnyPanda43
Let's say that I install the shared libs with pip in editable mode on my development evironment, how does the clearml-agent will handle those libraries if I submit a job
So installing packages from local folders with "-e" is in general ill-advised.
But using a full git path should work out of the box. for example if you install pip install https://github.com/user/repo/repo.git then the agent will be able to install it on the remote machine. The main challenge...
Mmm well, I can think of a pipeline that could save its state in the instant before the error occurred.
This is already the case, if you clone the pipeline Task change the Args/_continue_pipeline_ to True and enqueue
Is there a helper function option at all that means you can flush the clearml-agent working space automatically, or by command?
Every Task execution the agent clears the venv (packages are cached locally, but the actual venv is cleared). If you want you can turn on the venv cache, but there is no need to manually clear the agent's cache.
Hi SourOx12
I think that you do not actually need this one:step = step - cfg.start_epoch + 1you can just dostep += 1ClearML Will take care of the offset itself
If you are using the latest RC:pip install clearml==0.17.5rc5You can pass True it will use the "files_server" as configured in your clearml.conf
I used the http link as a filler to point to the files_server.
Make sense ?
Ohh I see, makes total sense. I'm assuming the code base itself can do both π
Is there a way to move existing pipelines between projects?
You should be able to, go to your settings page and turn on "show hidden folders"
Then go to your project, you should see " .pipeline " sub project there, right click it and move it to another folder.
Any chance your code needs more than the main script, but it is Not in a git repo? Because the agent supports either single script file, or a git repo with multiple files
seems it was fixed π
MagnificentWorm7 thank you for noticing ! π
When a remote task runs
Dataset.get()
it is not using the correct URL
BoredHedgehog47 it will get the link the data was Registered with, when creating the Dataset.
This has Nothing to do with the local configuration, it can point to any arbitrary file location on the internet.
It was created there, because at the time of the dataset creation someone (manually or via the config) set a specific host as the file location, and to that host the files were uploaded (again ...
IntriguedRat44 how do I reproduce it ?
Can you confirm that marking out the Task.init(..) call will fix it ?
SmallBluewhale13
And the Task.init registers 0.17.2 , even though it prints (while running the same code from the same venv) 0.17.2 ?
It was set to true earlier, I changed it to false to see if there would be any difference but doesnβt seem like it
I would actually just add:Task.add_requirements('google.cloud')Before the Task.init call (Notice, it has to be before the the init call)
that is odd..
So if you have 3 agents, how many concurrent experiment are they running ? (actually running, not registered as running)
Task.init should be called before pytorch distribution is called, then on each instance you need to call Task.current_task() to get the instance (and make sure the logs are tracked).
Thanks JitteryCoyote63 let me double check if there is a reason for that (there might be one, not sure)
Create one experiment (I guess in the scheduler)
task = Task.init('test', 'one big experiment')
Then make sure the the scheduler creates the "main" process as subprocess, basically the default behavior)
Then the sub process can call Task.init and it will get the scheduler Task (i.e. it will not create a new task). Just make sure they all call Task init with the same task name and the same project name.
for example, one notebook will be dedicated to explore columns, spot outliers and create transformations for specific column values.
This actually implies each notebook is a standalone "process", which makes a ton of sense. But this is where notebooks and proper SW design break, in traditional SW, the notebooks are actually python files, and then of course you can import one from another, unfortunately this does not work in notebooks...
If you are really keen on using notebooks I wou...
Thanks TroubledJellyfish71 I manged to locate the bug (and indeed it's the new aarach package support)
I'll make sure we push an RC in the next few days, until then as a workaround, you can put the full link (http) to the torch wheel
BTW: 1.11 is the first version to support aarch64, if you request a lower torch version, you will not encounter the bug
LOL totally π
Hmm interesting, I guess once you are able to connect it with ClearML you can just clone / modify / enqueue and let users train models directly from the UI on any hardware, is that the plan ?
Is there a solution for that?
Hi DisturbedElk70
Well assuming you mount/sync the "temp" folder of the offline experiment to a storage solution, then have another process (on the other side), syncing these folders, it will work and you will get "real-time" updates π
Offline Folder:get_cache_dir() / 'offline' / task_id
We should probably have a section on that (i.e. running two agents on the same GPU, then explain how top use it)
Hi TrickyRaccoon92
Yes please update me once you can, I would love to be able to reproduce the issue so we could fix for the next RC π
Do you think such a feature exists in ClearML?
Currently this is "fixed" for iterations (which is actually just a integer monotonic value) or the time stamp.
But I cannot see any reason why we could not allow users to control the x-axis title, and to be able to set it in code, I'm assuming this is what you have in mind?
but cant catch that only one way for service queue or I can experiments with that?
UnevenOstrich23 I'm not sure what exactly is the question, but if you are asking weather this is limited, the answer is no it is not limited to that use case.
Specifically you can run as many agents in "services-mode" pulling from any queue/s that you need, and they can run any Task that is enqueued on those queues. There is no enforced limitation. Did that answer the question ?
does clearml expect them to be actuall installed to add them as installed packages for a task?
It should add itself to the list (assuming you will end up calling Task.init in your code)