Reputation
Badges 1
25 × Eureka!DefeatedCrab47 If I remember correctly v1+ has their arguments coming from argparse .
Are you using this feature ? 2. How do you set the TB HParam ? Currently Trains does not support TB HParams, the reason is the set of HParams needs to match a single experiment. Is that your case?
which part of the code?
the main script?!
but is not part of the package
is the repo it self a package ?
A few implementation / design details:
When you run code with Trains (and call init) it will record your environment (python packages, git code, uncommitted changes etc) Everything is stored on the Task object in the trains-server, when you clone a task you literally create a copy of the Task object (i.e. a second experiment). on the cloned experiment, you can edit everything (parameters, git, base docker image etc) When you enqueue a Task you add its ID to the execution queue list a trains-a...
Basically it solves the remote-execution problem, so you can scale to multiple machines relatively easy :)
Hi VivaciousWalrus99
Could you attach the log of the run ?
By default it will use the python it is running with.
Any chance the original experiment was executed with python2 ?
ZanyPig66 you are correct in your assumptions. What exactly do you have in the Task? If there is no git repo the entire script should be under "uncommitted changes. What is your case?
while I want to upload a converted
.onnx
weights with custom tags to my custom project
Oh I see, sure, see this one?
https://github.com/allegroai/clearml/blob/master/examples/reporting/model_reporting.py
Or:output_model.update_weights(weights_filename="/path/to/file.onnx")
You need to adjust it to your setup , specifically change the queue name to one you have. Does that make sense ?
@<1535793988726951936:profile|YummyElephant76> oh you mean like jupyter server was running, then inside the notebook you would start a new venv, in that venv "notebook" package was missing, hence it failed detecting the notebook ?
Hi SubstantialElk6
quick update, once clearml 1.1 is out, we will push the clearml-data improvement, supporting chunks per version (i.e. packaging the changeset into multiple zip files, instead of a single one as the current version does).
regrading (1) storage limit server.
Ideally, we should be able to specify the batch size that we want to download, or even better, tie this in with the training by parallelising the data download, data preprocessing and batch trains.
With the nex...
Could it be you have two entries of "console_cr_flush_period" ?
Hi BoredPigeon26
what do you mean by "reuse the task" ? is this manual execution (i.e. from code)?
How about archiving the old version?
You can also force Task.init to always create a new Task (which preserves the previous run alongside the execution tab)
Basically what's the specific use case ?
About .get_local_copy... would that then work in the agent though?
Yes it would work both locally (i.e. without agent) and remotely
Because I understand that there might not be a local copy in the Agent?
If the file does not exist locally it will be downloaded and cached for you
Wonβt they be printed out as well in the Web UI?
They would in the log, but it will not be stored back on the Task (the idea is these are "agent specific" additions no need for them to go with the Task)
So Iβve tried the approach and it does work,
ScantChimpanzee51 What do you mean it does not work? what exatcly are you trying with task.connect and does not work?
Is there a way to inject environment variables into a Task or into its container?
Yes you can with:
` task.s...
Internally we use blob.upload_from_file it has a default 60sec timeout on the connection (I'm assuming the upload could take longer).
SubstantialElk6 Ohh okay I see.
Let's start with background on how the agent works:
When the agent pulls a job (Task), it will clone the code based on the git credentials available on the host itself, or based on the git_user/git_pass configured in ~/clearml.conf
https://github.com/allegroai/clearml-agent/blob/77d6ff6630e97ec9a322e6d265cd874d0ab00c87/docs/clearml.conf#L18
The agent can work in two modes:
Virtual environment mode, where it will create a new venv for each experiment ba...
using caching where specified but the pipeline page doesn't show anything at all.
What do you mean by " the pipeline page doesn't show anything at all."? are you running the pipeline ? how ?
Notice PipelineDecorator.component needs to be Top level not nested inside the pipeline logic, like in the original example
@PipelineDecorator.component(
cache=True,
name=f'append_string_{x}',
)
AntsyElk37
and when i try to use --output-uri i can't pass true because obviously i can't pass a boolean only strings
hmm, that sounds right, I think we should fix that so when using --output-uri true the value that is passed is actually True, not the string "true".
Regrading the issue itself:
are you saying --skip-task-init is being ignored ? and it always adds the Task.init call? you can also pass --output-uri https://files.clear.ml (which is the same as True) ,...
And if you could also update the docs with all env vars possible to set up it would awesome!
Yes, I'll pass it on, that is a good point
Thanks! Yes, this could be great !
Could you please open a GitHub issue, so we remember to update the feature ?
BTW: if you want to sync between artifacts / settings, I would recommend calling task.reload() to get the latest values back from the server.
ReassuredTiger98 there is an open issue on supporting bash script as pre run inside a docker (which will be supported in the next major release)
BTW: if you already have a docker file the fastest way would just to build the docker file and push it once, then you just specify the docker image:tag, this can be done a Task specific level.
BurlyPig26 if this is about Task.init delaying execution, did you check:Task.init(..., deferred_init=True)it will execute the initialization in the background without disturbing execution.
If this is about Model auto logging, see Task.init(..., auto_connect_frameworks) you can specify per framework a wild card to log the models, or disable completely https://github.com/allegroai/clearml/blob/b24ed1937cf8a685f929aef5ac0625449d29cb69/clearml/task.py#L370
Make sense ?
Guys, any chance you can verify the RC solves the issue?pip install clearml==1.0.2rc0
GreasyPenguin14 makes total sense.
In that case I would say variants to the accuracy make sense to me, I would suggest:title='trains', series='accuracy/day' and title='trains', series='accuracy/night'
Regrading hierarchy, from the implementation perspective a unique identifier is always the combination of title/series (or in other words metric/variant), introducing another level is a system wide change.
This means it might be more challenging than expected ...
Based on what I see when the ec2 instance starts it installs the latest, could it be this instance is still running?
RoundMosquito25 this is a good point, I mean in theory it could be done, the question is the actual Bayesian optimization you are using.
Is it optuna (OptimizerOptuna) or OptimizerBOHB?