
Reputation
Badges 1
25 × Eureka!@<1689446563463565312:profile|SmallTurkey79> could you attach the full log of the Task?
also I would recommend "export CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1" (not true
)
Usually binary env vars are 0/1
(I can see that the docs here: None
never mention it, I'll ask them to add that)
Task.current_task().connect(training_args, name='hugggingface args')
And you should be able to change them when launching remotely π
SmallDeer34 btw: "set_parameters_as_dict" will replace all the arguments (and is one way) ...
If possible, can we have a "only one experiment can be given a single tag"
You mean "moving a tag" automatically (i.e. if someone else had the same tag it is removed from it)?
I'm sorry my bad, this is use_current_task
https://github.com/allegroai/clearml/blob/6d09ff15187197e1f574902352115aa08dc1c28a/clearml/datasets/dataset.py#L663task = Task.init(...) dataset = Dataset.create(..., use_current_task=True) dataset.add_files(...)
Hi SubstantialElk6
you can do:from clearml.config import config_obj config_obj.get('sdk')
You will get the entire configuration tree of the SDK section (if you need sub sections, you can access them with '.' notation, e.h. sdk.storage
)
But the artifacts and my dataset of my old experiments still use the old adress for the download ( is there a way to change that ) ?
MotionlessCoral18 the old artifacts are stored with direct links, hence the issue, as SweetBadger76 noted you might be able to replace the links directly inside the backend databases
Hi PanickyMoth78
Hmm yes, I think the StorageManager (i.e. the google storage pythonclinet) also needs a json file with the credentials.
Let me check something
You can set torch to be installed last:
post_packages: ["horovod", "torch"]
Which will make sure the "trains-agent" version (the one you specified in the "installed packages" will be installed last.
you can also specify additional packages on the decorator@PipelineDecorator.component(..., packages=["tqdm>=2.1", "scikit-learn"]) def step_one(...): # code here
Hi RoughTiger69
One quirk I found was that even with this flag on, the agent decides to install whatever is in the requirements.txt
Whats the clearml-agent you are using?
I just noticed that even when I clear the list of installed packages in the UI, upon startup, clearml agent still picks up the requirements.txt (after checking out the code) and tries to install it.
It can also just skip the entire Python installation with:CLEARML_AGENT_SKIP_PYTHON_ENV_INSTALL=1
I'm trying to queue a task in python but I'd like to reuse the prior task ID.
is it your own Task? i,,e, enqueue yourself, if this is the case use task.execute_remotely
it will do just that.
If this is another Task, then if it is aborted then you can just enqueue it, by definition it will continue with the Same Task ID.
I want to store only my raw data in my blob storage, and I want to create a Hyperdataset with all the artificats, metrics, frames,
Yes that's exactly how it works.
None
This line adds a reference to raw file (local/remote)
[https://github.com/allegroai/clearml/blob/1b474dc0b057b69c76bc2daa9eb8be927cb25efa[β¦]es/hyperdatasets/data-registration/register_dataset_wit...
Hi PompousBeetle71
I remember it was an issue, but it was solved a while ago. Which Trains version are you using?
Hi SteadyFox10
I'll use your version instead and put any comment if I find something.
Feel free to join the discussion π https://github.com/pytorch/ignite/issues/892
Thansk for theΒ
ouput_uri
Β can I put in theΒ
~/trains.conf
Β file ?
Sure you can π
https://github.com/allegroai/trains/blob/master/docs/trains.conf#L152
You can add it in the trains-agent machine's conf file, or/and on your development machine. Notice that once you run ...
BeefyHippopotamus73 this error seems like it is coming from boto3, are you sure the credentials are properly configured and that you have read permission ?
PompousParrot44 now that I think about it, you might be able to limit the cpu affinity, would that help?
I'm guessing the extra index URL can be a URL to the github repo of interest?
The extra index URL is exactly what you would be passing to pip install, meaning it has to comply to pypi artifactory api.
Make sense ?
SubstantialElk6 try to add -e CLEARML_AGENT_EXTRA_PYTHON_PATH=/code/app/flair
It should add it to the runtime pythonpath
(to the BASE DOCKER IMAGE on the Task itself)
can we somehow in clearml-session choose the pool of ports for work?
Yes, I think you can.
How do you spin the worker nodes? Is it Kubernetes ?
the time taken to upload halved. It is puzzling because as you say it's not that much to upload.
Maybe it was the load on the server? meaning dealing with multiple requests at the same time delayed the requests?!
For now I've whittled down the number of entries to a more select but useful few and that has solved the issue. If it crops up again I will try
connect_configuration
properly.
Thanks for your help!
My pleasure π
Funny this was mentioned just today π
https://clearml.slack.com/archives/CTK20V944/p1620664770492400
Basically if you want to always ignore the "installed packages" in your clearml.conf, put:agent.package_manager.force_repo_requirements_txt = true
I see.
You can get the offline folder programmatically then copy the folder content (it's the same as the zip, and you can also pass a folder instead of zip to the import function)task.get_offline_mode_folder()
You can also have a soft link of the offline folder (if you are working on a linux machine:ln -s myoffline_folder ~/.trains/cache/offline
ConvolutedSealion94 what's your python version?
(the error itself is clearml failing to execute git diff, or read the output, I suspect unicode or something, assuming you were able to run the same command manually)
Hi BoredPigeon26
what do you mean by "reuse the task" ? is this manual execution (i.e. from code)?
How about archiving the old version?
You can also force Task.init to always create a new Task (which preserves the previous run alongside the execution tab)
Basically what's the specific use case ?
ThickDove42 If you need the name itself :events.plots[0]['metric'] events.plots[0]['variant']
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?