When a remote task runs
Dataset.get()
it is not using the correct URL
BoredHedgehog47 it will get the link the data was Registered with, when creating the Dataset.
This has Nothing to do with the local configuration, it can point to any arbitrary file location on the internet.
It was created there, because at the time of the dataset creation someone (manually or via the config) set a specific host as the file location, and to that host the files were uploaded (again ...
The problem is of course filling in all the configuration details, so that they are viewable.
Other than that, check out:
https://allegro.ai/docs/task.html#trains.task.Task.export_task
https://allegro.ai/docs/task.html#trains.task.Task.import_task
Sounds good ?
SubstantialElk6 this is odd, how are they passed ? what's the exact setup ?
is there a way to assign a job to a specific worker? or is it only working on queue level
Only on a queue level, but you can have as many as you like and spin the agent on it (notice you can have multiple queues on the same agent, pulling based on priority/order).
directly from the UI from the services queue?
Spin the agent with --service-mode
it will keep pulling jobs from the queue and spinning them (BTW, it will only start the next job after the first one finished the env setup, and you must be running with --docker mode 🙂
Hmm so if I understand what's going on, convert_test.py
needs to have the test.json
, since it creates the test.json but it does not call git add
on it, the test.json will not be part of the git diff
hence missing when executing remotely by the agent.
If test.json is relatively small (i.e. not 10s of MB) you could store it as configuration on the Task. for example:
` local_copy_of_test_json = task.connect_configuration('/path/to/test.json', name='test config')
print(...
Hi HappyLion37
It seems that you are "reusing" the Tasks. Which means the second time you open them you are essentially resetting the old run and starting all over.
Try to do:task1 = Task.init('examples', 'step one', reuse_last_task_id=False) print('do stuff') task1.close() task2 = Task.init('examples', 'step two', reuse_last_task_id=False) print('do some more stuff') task2.close()
ShaggyHare67
Now theÂ
trains-agent
 is running my code but it is unable to importÂ
trains
 ...
What you are saying is you spin the 'trains-agent' inside a docker? but in venv mode ?
On the server I have both python (2.7) and python3,
Hmm make sure that you run the agent with python3 trains-agent
this way it will use the python3 for the experiments
ShaggyHare67 I'm just making sure I understand the setup:
First "manual" run of the base experiment. It creates an experiment in the system, you see all the hyper parameters under General section. trains-agent
running on a machine HPO example is executed with the above HP as optimization paamateres HPO creates clones of the original experiment, with different configurations (verified in the UI) trains-agent executes said experiments, aand they are not completed.But it seems the paramete...
--docker or in clearml.conf https://github.com/allegroai/clearml-agent/blob/21c4857795e6392a848b296ceb5480aca5f98e4b/docs/clearml.conf#L153
so far I understand, clearml tracks each library called from scripts and saves the list of this libraries somewhere (as I assume, this list is saved as requirements.txt file somewhere - which is later loaded into venv, when pipeline is running).
Correct
Can I edit this file (just to comment the row with "object-detection==0.1)?
BTW, regarding the object-detection library. My training scripts have calls like:
Yes in the UI, iu can right click on the Task select "reset", then it...
NVIDIA_VISIBLE_DEVICES=0,1
Basically it is uses "as is" and Nvidia drivers do the rest
Same goes for all
or 0-3
etc.
Hi @<1694157594333024256:profile|DisturbedParrot38>
The dataset ID is also the task ID :)
Hi WickedElephant66
Setting the pipeline controller with pipeline_execution_queue as None
is actually launching the pipeline controller on your "dev" machine, not sure why you have two of them?
it knows it’s a notebook and automatically adds the notebook as an artifact right?
correct
and the uncommited changes becomes the nottebook converted to a script?
correct
In one case I am seeing actual git diff coming in instead of the notebook.
it might be there is both a git repository and a notebook and the git diff will show before the notebook is detected and shown instead ? (there is a watchdog refreshing the notebook every 30sec or so)
Can you verify it fixes the timeout issue as well? (or some insight on how to reproduce the issue?)
Can i log new lines to an old dataframe plot? any other suggestions?
Hi ChubbyLouse32
you mean to an already reported Table? or an artifact ? or a dataset ?
do you have git repo link in the execution section of the experiment ?
My question is what should be the path to the requirements.txt file?
Is it relative to the repo base?
This is actually in runtime (i.e. when running the code), so relative to the working directory. Make sense ? (you can specify absolute path, probably something I would avoid in the code base though...)
Hi TrickySheep9
So basically the idea is you can quickly code a scheduler with your own logic, then launch is on the "services queue" to run basically forever 🙂
This could be a good example:
https://github.com/allegroai/clearml/blob/master/examples/services/monitoring/slack_alerts.py
https://github.com/allegroai/clearml/blob/master/examples/automation/task_piping_example.py
overrides -> "kubectl run --overrides "
template -> "kubectl apply template.yaml"
Hmm are you running the clearml-agent on this machine? (This is the orchestration module, it will spin the Tasks and the dockers on the gpus)
(Do notice that even though you can spin two agents on the same GPU, the nvidia drivers cannot share allocated GPU memory, so if one Task consumes too much memory the other will not have enough free GPU memory to run)
Basically the same restriction as manually launching two processes using the same GPU
Hi @<1571308003204796416:profile|HollowPeacock58>
could you share the full log ?
Hi PunyPigeon71
Can you send the log from the remote execution?
Can you see on the Task in the UI , under execution tab, the correct git repo reference, commit ID, and uncommitted changes?
Manually I was installing the
leap
package through
python -m pip install .
when building the docker container.
NaughtyFish36 what happnes if you add to your "installed packages" /opt/keras-hannd
? This should translate to "pip install /opt/keras-hannd" which seems like exactly what you want, no ?
Long story short, work in progress.
BTW: are you referring to manual execution or trains-agent
?
What do you have under the "installed packages" section? Also you can configure the agent to use poetry to restore the environment (instead of pip)