if in the "installed packages" I have all the packages installed from the requirements.txt than I guess I can clone it and use "installed packages"
After the agent finished installing the "requirements.txt" it will put back the entire "pip freeze" into the "installed packages", this means that later we will be able to fully reproduce the working environment, even if packages change (which will eventually happen as we cannot expect everyone to constantly freeze versions)
My problem...
Make sure you have the S3 credentials in your agent's clearml.conf :
https://github.com/allegroai/clearml-agent/blob/822984301889327ae1a703ffdc56470ad006a951/docs/clearml.conf#L210
i.e. change them per experiment ?
DefeatedCrab47 if TB has it as image, you should find it under "debug_samples" as image.
Can you locate it there ?
PompousParrot44
It should still create a new venv, but inherit the packages from the system-wide (or specific venv) installed packages. Meaning it will not reinstalled packages you already installed, but it will ive you the option of just replacing a specific package (or install a new one) without reinstalling the entire venv
YummyMoth34
It tried to upload all events and then killed the experiment
Could you send a log?
Also, what's the train package version ?
PompousBeetle71 , the reason I'm asking is the warning you see is due to the fact it cannot detect the filename you are saving your model to ... I'm trying to figure out how that actually happened .
BTW: in the next version we will probably remove this warning altogether, but I'm still curious on how to reproduce 🙂
shows that the trains-agent is stuck running the first experiment, not
the trains_agent execute --full-monitoring --id a445e40b53c5417da1a6489aad616fee
is the second trains-agent instance running inside the docker, if the task is aborted, this process should have quit...
Any suggestions on how I can reproduce it?
Interesting, if this is the issue, a simple sleep after reporting should prove it. Wdyt?
BTW are you using the latest package? What's your OS?
PompousBeetle71 you can also use ModelOutput.update_weights_package to store multiple files at once (they will all be packaged into a single zip, and unpacked when you get them back via ModelInput). Would that help?
it would be clearml-server’s job to distribute to each user internally?
So you mean the user will never know their own S3 access credentials?
Are those credentials unique per user or once"hidden" for all of them?
JumpyPig73 Do you see all the configurations under the Args section in the "Configuration" Tab ?
(Maybe I'm wrong and the latest RC does Not include the python-fire support)
Ok, I think figured it out.
Nice!
ClearML doesn't add all the imported packages needed to run the task to the Installed Packages
It does (but not derivative packages, that are used by the required packages, the derivative packages will be added when the agent is running it, because it creates a new clean venv and then it add the required packages, then it updates back with everything in pip freeze, because it now represents All the packages the Task needs)
Two questions:
Is t...
SparklingElephant70 , let me make sure I understand, the idea is to make sure the pipeline will launch a specific commit/branch, and that you can control it? Also are you using the pipeline add_step
function or are you decorating a function with PipelineDecorator ?
No, they're not in Tensorboard
Yep that makes sense
Logger.current_logger().report_scalar("test", test_metric, posttrain_metrics[test_metric], 0)
That seems like a great solution
Hi SubstantialElk6
clearml-agent was just updated, it should solve the issue.2. Notice that "torch" / "torchvision" packages are resolved by the agent based on the pytorch compatibility table. Is there a way to reproduce the issue where it fails resolving the torch version? could you send a full log?
3. If you want a specific torch version , you can put a direct link to the torch wheel, for example: https://download.pytorch.org/whl/cu102/torch-1.6.0-cp37-cp37m-linux_x86_64.whl
Do you think It can be fixed somehow? It would be the easiest way to launch new experiments with a different configuration
Let me check, it might be it.
It would be the easiest way to launch new experiments with a different configuration
Definitely
And as far as I can see there is no mechanism installed to load other objects than the model file inside the Preprocess class, right?
Well actually this is possible, let's assume you have another Model that is part of the preprocessing, then you could have:
something like that should work
def preprocess(...)
if not getattr(self, "_preprocess_model):
self._preprocess_model = joblib.load(Model(model_id).get_weights())
Yeah I can write a script to transfer it over, I was just wondering if there was a built in feature.
unfortunately no 😞
Maybe if you have a script we can put it somewhere?
But the artifacts and my dataset of my old experiments still use the old adress for the download ( is there a way to change that ) ?
MotionlessCoral18 the old artifacts are stored with direct links, hence the issue, as SweetBadger76 noted you might be able to replace the links directly inside the backend databases
I get the same "white" image in both TB & ClearML 😞
Hmm we might need more detailed logs ...
When you say there is a lag, what exactly doe s that mean? if you have enough apiserver instances answering the requests, the bottleneck might be the mongo or the elastic ?
if this is the case pytorch really messed things up, this means they removed packages
Let me check something
or can I directly open a PR?
Open a direct PR and link to this thread, I will make sure it is passed along 🙂
100% of things withÂ
task_overrides
 would be the most convenient way
I think the issue is that you have to pass the project ID not project name (the project unique IS is the property that is actually stored on the Task)
@<1523707653782507520:profile|MelancholyElk85> can you check the following works:
pipe.add_task(, ..., task_overrides={'project': Task.get_project_id(project_name='examples')},)
HealthyStarfish45
No, it should work 🙂
MuddySquid7 the fix was pushed to GitHub, you can now install directly from the repo:pip install git+
Hi VexedKangaroo32 , there is now an RC with a fix:pip install trains==0.13.4rc0
Let me know if it solved the problem
task.set_script(working_dir=dir, entry_point="my_script.py")
Why do you have this part? isn't it the same code, the script entry point is auto detected ?
... or when I run my_script.py locally (in order to create and enqueue the task)?
the latter, When the script is running locally
So something like
os.path.join(os.path.dirname(file), "requirements.txt")
is the right way?
Sure this will work 🙂