Reputation
Badges 1
25 × Eureka!Hi GracefulDog98
Are argument parameters to the script not passed on to the workers, or am I missing something?
The arguments are passed directly when the code is executed (i.e. the argparser parse_args is called).
If the code fails, I'm assuming the argparse is called before clearml is imported, could that be the case ?
Could it be there is a Task.init being called Before this code snippet ?
right click on the experiment, select Reset, now you can edit it.
What's the "working dir" ? (where in the repo the script is executed from)
GrittyHawk31
what are you getting when you are running:docker psand what are you getting with:netstat -natp | grep LISTEN
I mean to use a function decorated withΒ
PipelineDecorator.pipeline
Β inside another pipeline decorated in the same way.
Ohh... so would it make sense to add "helper_functions" so that a function will be available in the step's context ?
Or maybe we need a new to support "standalone" decorator?! Currently to actually "launch" the function step, you have to call it from the "pipeline" main logic function, but, at least in theory, one could do without the Pipeline itself.....
FierceHamster54 what you are saying that Inside the container it took 20 min to run? or that spinning the GCP instance until it registered as an Agent took 20min ?
Most of the time is took by building wheels for
nympy
and
pandas
...
BTW: This happens if there is a version mismatch and pip decides it needs to build the numpy from source, Can you send the full logs of that? Maybe we can somehow avoid that?
Hi @<1561885941545570304:profile|PunyKangaroo87>
What do mean by store data locally?
Like clearml-data? I.e Dataset?
You can always use file:///root/path/folder as destination, this will store everything into the local folder, is that it?
Hi @<1526371965655322624:profile|NuttyCamel41>
. I do that because I do not know how to get the pickle file into the docker container
What would the pickle file do?
and load the MinMaxScaler within the script, as the sklearn dependency is missing
what do you mean by that? are you getting an error when loading your model ?
When I give my Minio to output_uri argument, it uploads 500 KB /sec as before.
But it worked well when using StorageManager and uploading to the minio directly, is that correct?
.. I give my Minio to output_uri argument
How long did it take to run the demo code I posted?
(The one you mentioned took 0.16s to run locally)
What's the difference between the example pipeeline and this code ?
Could it be the "parents" argument ? what is it?
FlatStarfish45
In the parent task, the libs appear installed.
What do you mean by "parent Task"? Is this the base task we are optimizing (i.e. the experiment / model we are optimizing) ?
Or is it the "Optimization Task" itself?
Thanks GrievingTurkey78 !
It seems that under the hood they user argparser
See here:
https://github.com/google/python-fire/blob/c507c093fa6622ab5efee21709ffbf25974e4cf7/fire/parser.py
Which means it might just work?!
What do you think?
Hi ReassuredTiger98
Basically assuming Linux, init.d will do the trick
https://unix.stackexchange.com/questions/20357/how-can-i-make-a-script-in-etc-init-d-start-at-boot
CrookedWalrus33 any chance you can think of a sample code to reproduce?
Hi JitteryCoyote63
Is it possible to rollback from 1.2.0 to 1.1.0?
Not really there was a DB migration so out of the box downgrade is not really supported.
That said, v1.3.1 is already out, with what seems like a fix:
As a quick fix, can you test with auto refresh (see top right button with the pause sign you have on the video)
CloudyHamster42
RC probably in a few days, but notice that it will just remove the warnings, I still can't reproduce the double axis issue.
It will be helpful if you could send a small script to reproduce the problem.
Maybe this example code can help ? https://github.com/allegroai/trains/blob/master/examples/manual_reporting.py
Hi ConvincingSwan15
For the train.py do I need a setup.py file in my repo to work corerctly with the agent ? For now it is just the path to train,py
I'm assuming the train.py is part of the repository, no?
If it is, how come the agent after cloning the repository cannot find it ?
Could it be it was accidentally not added to the git repo ?
In regards to the YAML how would you pass data? Like the pipeline from tasks example?
Thanks JitteryCoyote63 , once we have a reproducible example the fix should be very quick to push (with these things reproducing it is the challenge)
I'm kind of at a point where I don't know a lot of what to even search for.
we feel you π , yes there still isn't a very good source of information on where to get started...
This is because the entire field is constantly changing and evolving, and one solution will usually only apply to specific use case...
I would start with the mlops community slack channel, and youtube talks (specifically those by companies describe how they built their own internal infrastructure, i...
WackyRabbit7 I might be missing something here, but the pipeline itself should be launched on the "pipelines" queue, is the pipeline itself running? or is it the step itself that is stuck in ""queued" state?
I could improve the cost-efficiency of my provisionned GCP A100 instances
But their pricing is linear, if you do not need a100 get a cheaper instance ?! no?
Yep it is the scale π and yes it should appear once you upgrade
try:
import os
...
dataset_path = Dataset.get(
dataset_name=dataset_name,
dataset_project=dataset_project,
alias="0013_Dataset"
).get_local_copy()
dataset_path = os.path.join(dataset_path, "data.yaml")
...
Thanks! Let me check something
No (this is deprecated and was removed because it was confusing)
https://github.com/allegroai/clearml-agent/blob/cec6420c8f40d92ab1cd6cbe5ca8f24cf351abd8/docs/clearml.conf#L101