Reputation
Badges 1
25 × Eureka!JitteryCoyote63
Sure, just please add a github issue request, so it does not get forgotten.
BTW: wouldn't it be more convinient to configure in the trains.conf ?
understood trains does not have auto versioning
What do you mean auto versioning ?
task name is not unique, task ID is unique, you can have multiple tasks with the same name and you can edit the name post execution
Hi MysteriousBee56 ,
Yes this is permissions issue, the docker creates all folders as root (as it is the root user running inside the docker), Then when you execute in venv mode, you are running it from your user, which obviously cannot change root created folders.
Hi PompousParrot44
So do you mean something like:
` task_model_a = Task.get('id_a')
task_model_b = Task.get('id_b')
model_a_file = task_model_a.models['output][-1].get_local_copy()
model_b_file = task_model_b.models['output][-1].get_local_copy() `
If that's the case check the free space in the monitoring of the experiment, you will find the free space in GB logged
Hi PompousParrot44
Could you send the "Installed Packages" list?
I think there is a bug in the current trains-agent (there is already a fix but the RC is still not out),
where "packeg @ git+http" packages ignore the git+http link.
You can solve it manually by just editing the "Installed packages" (when Task is in draft mode, the section becomes editable), and remove the "package @" part, and leave the "git+http" link.
Hi UnevenDolphin73 , are those per user/project/system environment variables ?
If these are secrets (that you do not want to expose), maybe it is best just to have them on he agent's machine ?
BTW, I think there is some "vault" support in the paid tiers for these kind of secret, not sure on which level (i.e. user/system/project)
BTW if the plots are too complicated to convert to interactive plotly graphs, they will be rendered to images and the server will show them. This is usually the case with seaborn plots
Could you maybe send a screenshot? This is very strange? Also what's the trains version?
What's the trains-server version?
Also there was a truck that worked in the previous big, could you zoom out in the browser, and see if you suddenly get the plot?
This is strange... Could you send the browser console log, maybe there is an exception there
Done HandsomeCrow5 +1 added 🙂
btw: if you feel you can share how your reports looks like (screen shot is great), this will greatly help in supporting this feature , thanks
Different question. How can I pass PYTHONPATH env variable to a task, run by agent (so python can find classes inside m subdirectories)?
Hi HelpfulHare30
By default the working directory will be added to the python path, this means if I have under execution:Working Dir: "." Script: "src/script.py"
The root git repo will be added to the python path.
BTW: next RC you could add a flag to the agent to always add the git repo
And it is not working ? what's the Working Dir you have under the Execution Tab ?
Hi DilapidatedDucks58 just making sure, the link is pyrorch nightly artifactory? Or is it a direct link to the package? Reason for asking, I was not aware they have proper artifactory... When the task runs the trains agent will update the installed packages with all the installed packages it used. Could you verify you have the correct version?
Regarding the extra files, you are correct, the docker container is reset every run, so they will get lost. What are those files for? Could you add ...
Hi VivaciousBadger56
Basically you can think of MLRun as "amazon lambda service without amazon". It is designed to run a "function" in scale on multiple nodes.
ClearML on the other hand is an MLOps platform. It does the experiment tracking, it orchestrates Task (think jobs), it does data management and lastly we recently released the serving. These are two different use cases.
Am I making sense here?
That depends on the HPO algorithm, basically the will be pushed based on the limit of "concurrent jobs", so you do not end up exploding the queue. It also might be a Bayesian process, i.e. based on previous set of parameters and runs, like how hyper-band works (optuna/hpbandster)
Make sense ?
I have mounted my s3 bucket at the location /opt/clearml/data/fileserver/ but I can see my data is not being stored in s3 but its storing in ebs. How so?
I'm assuming the mount was not successful
What you should see is a link to the files server inside clearml, and actual files in your S3 bucket
well that depends on you, what did you write there to know it is the best one ? file name ? added some metric ?
Hmm what's the clearml version? Whats the python version, whats the OS? And pytorch version?
Hi @<1657918706052763648:profile|SillyRobin38>
In the
preprocess.py
files, we will have so many similar lines which is not good.
Actually the clearml-serving supports also directories, i.e. you can package an entire module as part of the preprocess, which would be easier for your code
Another option is to package your code in a python package and have that installed on the container (there is a special env var that allows you to add those to the serving container)
...
Should work out of the box, maybe the only thing to notice is that you will get a Task for every local_rank 0 process
does that make sense ?
However, regarding your recommendation of using
StorageManager
class to delete the URL, it seems that this class only contains methods for checking existence of files, downloading files and uploading files, but
no method
for actually
deleting
files based on their URL (see doc
and
).
Yes you are correct 😞 you should use a "deeper" class:
helper = StorageHelper.get(remote_url)
helper.delete(remo...
As I installed ClearML using pip,
Where is the clearml-serving runs ? usually your configuration file is in ~/clearml.conf
Notice if it is not there it means it is using the defaults so just create a new one and add that line